"tune2fs" is Your Friend!

jimrh · March 11, 2022, 1:14pm

Greetings!

The use-case of a robot, especially an educational robot, involves frequent reboots and potential crashed filesystems.

The ext-4 filesystem is designed to be robust and - essentially - self-healing. However, that has its limitations. It’s self-healing capabilities only go so far and are no substitute for a thorough periodic fsck.

It is especially noteworthy that even the designers and toolsmiths for the ext filesystem recommend a periodic fsck.

The man page for tune2fs says this well:

-c max-mount-counts

Adjust the number of mounts after which the filesystem will be checked by e2fsck(8). If max-mount-counts is 0 or -1, the number of times the filesystem is mounted will be disregarded by e2fsck(8) and the kernel.

Staggering the mount-counts at which filesystems are forcibly checked will avoid all filesystems being checked at one time when using journaled filesystems.

You should strongly consider the consequences of disabling mount-count-dependent checking entirely. Bad disk drives, cables, memory, and kernel bugs could all corrupt a filesystem without marking the filesystem dirty or in error. If you are using journaling on your filesystem, your filesystem will never be marked dirty, so it will not normally be checked. A filesystem error detected by the kernel will still force an fsck on the next reboot, but it may already be too late to prevent data loss at that point.

See also the -i option for time-dependent checking.

and

Note that tune2fs can be safely used on a mounted filesystem.

As a standard thing, I set my filesystems as follows:

pi@GoPiGo:~ $ sudo tune2fs -c 5 -C 6 -i 5 -e remount-ro /dev/mmcblk0p2
tune2fs 1.44.5 (15-Dec-2018)
Setting maximal mount count to 5
Setting current mount count to 6
Setting error behavior to 2
Setting interval between checks to 432000 seconds
pi@GoPiGo:~ $

This does:

(-c) sets the max mount counts before checking to five.
(-C) sets the current mount count to six, forcing a fsck on the next reboot.
(-i) Sets the number of days before forcing an fsck to five. This guarantees a thorough fsck at least once a week.
(-e) Sets the error behavior to “remount read only” so that the filesystem can be removed from the robot and carefully fixed on another system while it is cold.

I then reboot to start the process going.

You can do other things like resetting the UUID, adding a label, setting the reserved sector count - which should be zero on an external data device like a flash-drive that isn’t a system device - etc.

Tune2fs is one of those lesser-known commands that, (IMHO), are as important as knowing man pages and being able to web search.

cyclicalobsessive · March 11, 2022, 1:43pm

Is this true on a non-RTC bot?

(I don’t delve into the boot period much, but I remember being frustrated that some log was not using the date set from the networking.)

jimrh · March 11, 2022, 3:08pm

If the clock on the raspberry pi is totally borked, then I don’t know.

If it gets periodic updates from the network, it should work normally, or close to normally.

This has nothing to do with “booting” except that the mount, interval, and “should I fsck this time?” tests are done then

This should work on any Linux setup. (I don’t know about the Mac, but it should work on any Raspberry Pi install, especially if it’s Debian/Ubuntu based.

Network time updates can take several minutes to occur and, unless you have real-time-clock hardware installed, you depend on the time stored when you last booted - which can be vastly different depending on how long the device has been sitting.

Adafruit makes a wonderful plug-and-play RTC module that just plugs into the first six pins on the top of the GoPiGo board and uses i² address 68.

They also have an article on how to set it up - and it’s relatively trivial.

Since the Raspberry Pi reads the clock at boot up, this will guarantee that the time is reasonable. They have a tempreture compensated, high-accuracy RTC module that I use.

In my case, since I want the pins on top of the GoPiGo board available, I hard-wired it to the bottom of the Raspberry Pi’s PCB and stuck it to the top of the micro-SD card slot’s housing with crazy glue.

With a RTC module installed, the time is always sensible.

P.S.

The RTC modules are dirt cheap, especially the non-tempreture-compensated ones.

jimrh · March 11, 2022, 4:12pm

The “c/C” options don’t depend on the clock at all so even on a system with a totally borked clock - like the earlier Dexter O/S’s - the number of mounts will still increment and trigger a fsck.

cyclicalobsessive · March 11, 2022, 4:21pm

The reason I question the “should I fsck due to -interval” is not knowing if it checks before the networking sets the date or after. I would guess that the check is occurring prior to establishing the network connection and therefore the system will probably think it is always Jan 1st 1970.

In this case the -c -C options will be the active trigger.

jimrh · March 11, 2022, 4:32pm

If we assume that you have a network connection and the date is correct after the NTP daemon sets the time, then -i should work.

Why?

Because the real world time will eventually be set, it will eventually exceed that interval, and it will trigger a fsck on the next reboot.

There is a file, (I forger where), called hwclock(.conf?) that regulates both a RTC, (if installed), and the Raspbian “fake_hwclock” utility.

“Fake-hwclock” does two things:

On boot: It retrieves a saved time value from the last shutdown and applies that as the system time.
On shutdown: It takes the current system time and stores it for future retrieval.

The result of this is that the time is not Jan 1, 1970. but whatever the time was whenever the system was last shut down.

Example:

Clock is set by network, system is shut down, system is rebooted the next day.
- The clock will start as the previous day until set by the network.
Clock is set by network, system is shut down, system is rebooted some period of time later, (weeks? months?)
- The clock will start at the last set time until the network updates it.

This behavior can have really strange effects on systems with PAM and or SeaLinux security profiles set as they are time dependent.

If the time is too far off - say Jan. 1 1970 or something like a month or so off - the NTP daemon won’t set it because it is too far away. You can set it manually to correct this.

An inexpensive RTC module avoids all these issues and I always include one. (I bought extras, they’re cheap.)

I always include both because, sometimes, I might reboot the 'bot several times a day and after five boots - (which can happen!) - the fsck is triggered. If the robot sits for a week or two, the next time I restart it triggers an fsck to check for bit-rot.