Before embedded devices, file systems were designed to work in servers and desktops. Power loss was an infrequent occurrence, so little consideration was given to protecting the data. Frequent checks of the file system structures were important, and were often handled at system startup by a program such as chkdsk (for FAT) or fsck (for Linux file systems). In each case, the OS could also request a run of these utilities when an inconsistency is detected, or when the power was interrupted.
The method behind these tools is a check of the entire disk – reading each block to determine if it is allocated for use, then cross checking with an allocated list located elsewhere on the media. FAT file systems have little other protection, and can only flag sections of the media without matching metadata by creating a CHK file for later user analysis. Linux file systems add in a journal mechanism to detect which files are affected, and can often correct the damage without user intervention.
These utilities are necessary because these basic file systems are not atomic in nature – data and metadata are written separately. Datalight’s Reliance Nitro file system treats updates as a single operation, and thus the file system is never in a state where it would need to be corrected. Our Dynamic Transaction Point technology allows the user to customize just how atomic their design is, protecting not just a block of data and metadata but the whole file – half a JPG is pretty much useless, from a user perspective.
The repairs that fsck and chkdsk can perform are completely unnecessary with the Reliance Nitro file system. At the device design level, this results in quicker boot times for a system that is completely protected from power failure. A file system checker is of course provided, and is useful for detecting failures caused by media corruption.
Taking chkdsk and fsk to the next level of protection would be a tool to repair some media corruption. If a block of data on the media becomes only partially readable, this tool could read it multiple times (to try and collect the most data) and store the results in a newly allocated block, correcting the file system structures appropriately. User intervention would likely be required to understand if enough data was recovered to make this effort worthwhile. Stay tuned for more updates on this topic.
Thom Denholm | January 7, 2013 | Reliability