4 Best Practices for IoT OTA Updates
Don't want to brick your devices? Read on.
Join the DZone community and get the full member experience.
Join For FreeEvery embedded device needs device firmware update (DFU) capability. Why?
Inevitably, you are bound to ship bugs. You can’t anticipate every user’s input, and you can’t anticipate all the ways in which end users will use your device. But that’s only half of it; devices are now increasingly complex. Even the best QA teams won’t be able to catch every issue. Consider NASA, which is no stranger to needed firmware updates for the Mars Rover, whether due to an unintended crash bug or a crash after a software update. No one disputes NASA’s extremely rigorous process and well-documented testing regimens but, even NASA relies on firmware updates.
Another factor accelerating the need for DFU planning is the increase in third-party software used in firmware updates. It’s likely that third-party code is central to your device, which introduces a variety of considerations. SDKs, protocol implementations, and third-party libraries are changing at an ever-increasing rate, so device developers need to ensure their code can keep up. Maintaining compatibility or keeping up with SSL certs will certainly require future updates. And the bugs found in third-party software are often particularly serious.
One example comes from the famous Cypress PSoC family of chips, and their Bluetooth Low Energy stack that included a broken random number generator. The risk was that a hacker could interrupt traffic between Cypress and the device with which it was pairing. The ability to fix a security flaw like this or other especially critical bugs with an OTA update is essential. Establishing a solid infrastructure for firmware updates is a way to future-proof your device.
Lastly, a robust firmware update system enables teams to ship better products and reach customers faster. By embracing agile development workflows, device development teams can ship their minimum viable product (MVP) and iterate. I’m seeing more and more products enter mass production unfinished as teams plan for completion in what is called a Day-0 update – when you freeze firmware in an incomplete state in favor of shipping the product and updating the device once it’s in customer hands. The benefit of Day-0 updates to device developers is clear; with traditional general manufacturing (GM) freezes, you basically lose out on three months of development. With Day-0 updates, you can continuously improve algorithms and products and ship products faster.
It’s no surprise that device firmware updates are one of the more sensitive subsystems in firmware. If done poorly, there’s a real risk of bricking devices. Below are four best practices for safe, over-the-air (OTA) design to do it well.
DFU Needs to Be Separate From the Application
I imagine most hardware developers have had an experience (or a few) when they’ve had to learn the hard way that this separation is essential. For me, it was an uninitialized variable in our code, set to 0, that became set to 1 after some shuffling of our application code. It was preventing a large number of devices from updating from the existing version. App code needs to change regularly, and DFU code should be as stable as possible. By keeping the DFU code separate from app code, you can ensure that no matter what you do in your app, your firmware update stays the same. In practice, you can update code in your bootloader to check for an update when the device starts. If one is available, the update will begin before the application loads; if not, the application will just load as expected.
DFU Should Be Updatable
Again, bugs in code are inevitable. It’s helpful here to turn to Jack Ganssle’s contention about software engineering, that the elite, or top 1% “inject just about 11 bugs, in requirements gathering through coding, per thousand lines of code,” while the lower 99% average about 120 bugs per KLOC. If we apply the same maxim to firmware, the need to plan for device firmware updates becomes obvious. To implement this and align with the previously discussed practice of separating device and app code, we need to create a third program, a boot updater. The boot updater’s only job is to update the bootloader. If we want to push an update, it will update before starting the normal application. Otherwise, the bootloader will load the application as normal.
Failure Mid-DFU Should Be Recoverable
One constant issue for device development and operations is what happens if power fails in the middle of an update? To mitigate this, we need a way to a fallback to our DFU-updater. I recommend splitting the bootloader into two. First, a primary, immutable bootloader whose only job is to load other programs. You can’t update it, but it is extremely simple. The secondary bootloader’s role is more complex with the ability to update and load. In the event an update fails or is disabled due to a loss of power, we can fall back to the first bootloader, which will see if there's an invalid program where the second bootloader should be, allowing teams to load the boot updater and update it ourselves. This architecture is a bit more complex but not difficult. With this, we have an architecture that is robust to a large number of failures, splits the firmware code from the app, and is itself updatable.
Version Your Non-Volatile Data
Every firmware relies on some non-volatile data, whether it’s a few structures serialized into Flash or data that we keep in NVRAM that stays powered. That non-volatile data will sometimes need to change from version to version. If we don’t think about this ahead of time, as part of a firmware update process, then we don’t have a good way to migrate that data, and changing that data structure becomes very difficult. I recommend adding three fields to all non-volatile data structures.
A version field - 8 bits is more than enough, and a simple integer version will work fine.
A commit-bit - a bit that when set to 1, the data is not yet committed, when set to 0, then it is. You know to verify the data structure is committed before you read it.
An erase-bit - a bit where you don't have to erase a full sector in your Flash in order to mark an old version of your data structure as invalid.
With these fields, you can implement a migration function for every version increment (e.g 1->2, 2->3, 3->4, . . . etc.). If a user has particularly outdated firmware on a device they haven’t used in a year, they can power it on, turn on the migration function, and be updated to the latest version.
These four best practices should help guide device developers as they build out their firmware architecture that treats DFU as a necessary part of product development. Focus on a firmware update structure that is safe and effective with the tools necessary to help developers out of tricky situations, and you can simultaneously ensure end users can make much better use of their devices.
Opinions expressed by DZone contributors are their own.
Comments