Doing In-Place OS updates for Embedded Devices
Palm just announced the 1.1 update to its popular WebOS that runs on the Palm Pre device. Apple released the 3.0 update to its Mac OSX for the iPhone and iPod touch. Microsoft is expected to launch Windows Mobile 6.5 soon and users are hoping that they will be able to update their 6.1 devices to 6.5. Google last month updated Android OS to 1.5. These events point to a recent and very fast growing phenomena: embedded devices are becoming more and more like PCs where users expect to be able to update their device long after it has been released. This was not always the case; OEMs refrained from updating embedded devices unless in cases of high severity bug fixes. There are several reasons for this:
- Updating embedded devices is more difficult than updating PCs from a distribution standpoint
- Because of #1, updating devices is also expensive
- Potential of bricking devices (device does not boot anymore) due to user error is very high leading to high risk of warranty returns
Today’s blog post focuses on #3 because it has very real technical risks and solutions. Before we begin discussing risks of bricking device, let’s talk about 2 different types of updates
- Update to application code – This is usually much simpler and does not typically involve changes to the bootloader or the boot image
- Update to system code/OS image – most of the times when OEMs have to update devices, it is due to some severe error. In our experience it usually involves changing system files. If the entire OS is stored as a single image on disk/flash, then entire image has to be correctly replaced with the new one
If the update is of type 1, then the process has less likelihood of bricking the device. In most cases even if the update fails, support can help user start the device in “safe” mode and restore. Updates of type 2 are by nature riskier because any failure is likely to stop the device from booting up, negating any remote debugging options. Note that It is also possible that for some devices, the application and system code is stored in single boot image. In that case, the distinction of types made above are irrelevant for this discussion.
Here is how devices typically partition the data storage for boot and application data
Note: Some devices may not use the file system for the boot partition and instead directly talk to the block device. In that case, the remainder of this discussion is not applicable.
During an update process involving system code, the boot image has to be replaced with a new one. Typically the update process will overwrite the existing image. The problem happens when the update process is interrupted due to erroneous circumstances such as
- Device battery dies before the update process is completed
- The user pulls out the USB cord connecting the device to host
In these cases, the OS image will get corrupted and the device may not be able to boot back up, leading to a bricked device.
One of the features of Reliance (and Reliance Nitro) file system is that it never overwrites live data. It will always use free space on disk or in case there is no space, it will give “disk full” error back to the application. Reliance also has a special transaction mode called “Application-controlled”. In this case, Reliance only conducts a transaction point when asked by the application. Here is how these 2 constructs help Reliance provide a fail-safe means of in-place updates
- The OS image is stored on a Reliance partition
- The update application calls Reliance API to disable all transaction modes. Reliance will now execute a transaction point only when specifically called by the update app
- The update app starts “overwriting” the existing OS image. Because Reliance never overwrites live data, it will start copying the new image to free space on disk
- In case power is interrupted, Reliance discards the new image and device can still boot back to the old OS image and restart the update process
- Once the entire update process is completed, the update app calls Reliance to execute a transaction point. Reliance, in one atomic operation, updates its committed state to now use the new image. When the device boots back up, it now uses the new image. The old image is now marked as free space by the file system
Using Reliance for boot partition can thus help in providing a safe in-place update process. It also has the advantage of using Reliance extreme fast mount times, which can help in speeding device boot speeds.
Note that the obvious caveat of the above is that there has to enough free space for the new OS image. With disk storage being cheap (compared to device cost) and always increasing, this becomes less and less of an issue. OEMs should strongly consider going this alternative (whether they use Reliance or not) in order to ensure that the device update process will go smoothly for the end users.