Using E-Paper Displays to Indicate Fatal Errors and Compromised Security in Critical IoT Nodes
Contributed By Digi-Key's North American Editors
Internet of Things (IoT) and Industrial IoT (IIoT) nodes are being used in increasingly secure systems where the security and safety of the network as a whole is more important than the functionality of the individual devices on that network. This means that if an IoT node detects that it has been compromised, or if an irrecoverable firmware error should occur, then the safest action may be for the node to power down as soon as is practically possible to protect the node and the network from potentially dangerous consequences.
However, once the node is powered down, all volatile memory contents are lost. Storing debug data in non-volatile memory such as EEPROM or flash consumes time and power, increasing the risk of potential damage. In addition, the system may have been compromised to a point where reading back the data on power-up may not provide trusted data if the power-up sequence has been compromised as well.
This article explains how e-paper displays (EPDs) can be easily attached to an IoT or IIoT node to display the last known error, providing a visual indication of the reason for the power-down event so that technicians can take the appropriate action. It then provides examples of e-paper displays from Pervasive Displays and Display Visions and discusses how these displays can be interfaced to a microcontroller and configured to provide diagnostics information while drawing little or no power.
High-security IoT and IIoT nodes
The onus is increasingly on designers of IoT and IIoT nodes to have in place increasingly sophisticated methods of security to guarantee proper operation of the host microcontroller. In general, there are three kinds of security threats that must be guarded against:
- A microcontroller firmware malfunction
- Invalid input data from sensors, keypads, serial peripherals, or other external devices
- Action from a malicious actor
A microcontroller firmware malfunction can occur for a number of reasons: coding errors in the installed firmware; invalid computations that result in a malfunction; or, in extremely rare cases, a hardware malfunction of the microcontroller. Well-written firmware usually detects this by scrubbing the inputs to subroutines and functions. In extreme cases where the firmware is locked or looped, a watchdog timeout will recover the firmware by vectoring to an error control subroutine or performing a hard reset of the microcontroller.
In the case of invalid input data, such as if an external sensor malfunctions or is tampered with, out-of-range data can result that may not have been properly accounted for in the application code. For example, if the ambient temperature sensor in a human-inhabited control room incorrectly registers a blistering 250°F, this might be a sensor malfunction or malicious tampering. A careless firmware programmer may have not coded for that high a temperature reading which could lead to something as trivial as incorrect data logging, or something as serious as allowing an intruder access to a secure area, or as critical as an error in a control algorithm computation that could lead to equipment failure or serious personal injury. The potential negative outcomes are many.
Malicious actors are different in that they may have the deliberate intention of causing the IoT node to malfunction. The malfunction due to the hacking attempt might be detected by security routines as an intrusion; however, it might also disguise itself as a firmware malfunction or invalid external input data. The example of an ambient temperature reading of 250°F could be caused by a malicious actor testing firmware behavior at such a high reading with the intention of testing a method of intrusion; for example, doors may be automatically unlocked if the 250°F ambient temperature reading is wrongly evaluated as a fire.
Reacting to firmware malfunctions
Regardless of the error source, microcontroller firmware for high-security IoT and IIoT nodes needs to be fault intolerant. Any and all faults must be coded for and trapped. Inputs to subroutines and functions must be scrubbed, and all sensor input data validated. Watchdog timers must be programmed with the intention of detecting locked or looped code that is taking too long based on a known run time.
When a firmware malfunction is detected in a high-security IoT or IIoT node, regardless of whether the malfunction is accidental or deliberate, firmware must trap the event as soon as possible. Common actions include attempting to compensate for the malfunction. For a malfunctioning sensor that is consistently out of range, the firmware may have a “limp mode” for that sensor to compensate for the bad data until it can be replaced. A firmware routine returning incorrect results may be re-initialized. Often an error code is sent across the network to notify the network host of the problem.
However, in some high-security IoT or IIoT nodes there is a special category of malfunctions for which there cannot, or should not be, compensation or countermeasures. This can include physical tampering detection, internal checksum failures, some built-in self-test (BIST) failures, and any failure that can be caused by compromised firmware or a hacked system. For these high-security situations, the only option may be to immediately and safely power down the node. The network host will determine that the node has powered down when it fails to respond to network requests. If the node powered down without sending an error report to the host, and if the node ignores network commands to restart, it is an indication that a fatal failure has occurred and that a technician must be dispatched to physically examine the node for the cause.
However, once the node has powered down, all volatile memory and status data is immediately erased. This makes diagnosing the cause of the shutdown very difficult, if not impossible. Optionally, before powering down the node, diagnostic data could be stored in non-volatile memory such as EEPROM or flash memory. The problem is that writing to these types of memory takes time, during which the node must remain active and possibly result in additional damage occurring.
Diagnosing fatal errors with e-paper
EPDs draw very little power and can be used to store and display error and diagnostic information just before powering down the node. Once the node is powered down, the EPD can maintain its display image without any power for days or weeks. The information on the display gives technicians a visual indication of the reason for the shutdown, allowing them to decide whether it is safe to power up the IoT node, or if it should be taken off the network for detailed analysis.
An example of an EPD suitable for displaying diagnostic information is Pervasive Displays’ E2271CS091 EPD module. It interfaces to any compatible microcontroller with an SPI serial interface and has a high-contrast 2.71 inch (in.) display (Figure 1).
Figure 1: The E2271CS091 EPD module has a high-contrast 2.71 in. display with a resolution of 264 x 176 pixels. It has a wide viewing angle and interfaces to any compatible microcontroller with an SPI interface. (Image source: Pervasive Displays)
The E2271CS091 EPD module uses an active matrix thin-film transistor (TFT) display with a native resolution of 264 x 176 pixels at 117 dots per inch (dpi). This allows the display to contain a lot of information to assist technicians in diagnostics. The anti-glare screen has a wide viewing angle of nearly 180˚, allowing for easy viewing of the display when in unusual mounting locations. The EPD requires a 3.0 volt power supply.
The host microcontroller sends data to the EPD over an SPI interface on the display’s 24-pin ribbon connector. The SPI data communication is only one way, from the host microcontroller to the EPD. The only communication back from the EPD to the host microcontroller is a “device busy” pin on the ribbon connector, greatly simplifying the interface and increasing the confidence in the diagnostic data displayed.
If an error or hack attempt is detected, and if the error is serious enough to require a shutdown of the node, the error must first be trapped by firmware, watchdog, or other method. Control must then be transferred to the error logging routine that sends data to the EPD. This error logging routine should be the highest priority task to prevent interruption or corruption of data. For maximum reliability, it is recommended that the error logging routine be entirely self-contained, with no calls to external subroutines or functions. Ideally the error logging routine should be in permanently write-protected flash to guarantee the integrity of the code, even after firmware updates.
Before the EPD is updated with the error data, the host microcontroller should first send a soft reset command over the SPI interface to the EPD to clear the display. It then sends the black and white display information in a series of byte sequences, where each bit in a byte represents a pixel on the EPD. Once the sequence is complete, the error logging routine can shut down the microcontroller. Different microcontroller manufacturers have different ways of shutting down as this is architecture and manufacturer dependent. In some situations, and for security reasons, the manufacturer may have an undocumented way of shutting down the microcontroller which is only available by request. Optionally, an external circuit may be used to discontinue power to the microcontroller; however, this increases the complexity of the system, which reduces reliability. Therefore, firmware shutdown control of the microcontroller is preferred.
To assist in development using the EPD, Pervasive Displays offers the B3000MS034 EPD extension kit (Figure 2). It has an extension board with a connector for the 24-pin EPD display, as well as connectors for other Pervasive Display EPDs that require 40-pin and 26-pin connectors. The extension board is compatible with Texas Instruments’ LaunchPad development and evaluation kits, but can also be used with other development kits. A 20-pin bridging cable can be connected to the 20-pin 90˚ header connector, which when soldered to the extension board, allows control signals to the EPD to be monitored during development.
Figure 2: The extension kit for the Pervasive Displays E2271CS091 EPD module includes a 24-pin connector on the extension board for the display’s 24-pin ribbon cable. It also includes a bridging cable and a 90˚ 20-pin header connector. (Image source: Pervasive Displays)
Another EPD option is the Display Visions EA EPA20-A (Figure 3).
Figure 3: The Display Visions EA EPA20-A EPD is a 172 x 72 display that can maintain the display status with no power connected. (Image source: Display Visions)
This EPD has a 172 x 72 grayscale display and also uses an SPI interface for communication with a host microcontroller. The EPD is extremely low power, requiring a single 3.3 volt supply and draws only 40 milliwatts (mW) during a display change. The Display Visions EA EPA20-A EPD can also maintain its display when no power is applied.
High-security IoT and IIoT nodes sometimes must power down in response to a fatal firmware error or detected threat. This can result in the loss of all volatile data including the internal status of the host microcontroller. However, status and diagnostic data can be sent to a connected EPD before shutdown and displayed for days or weeks. This provides technicians with the information they need to determine the cause of the shutdown and to take future precautions, if necessary, to safeguard and secure the node and the network.
Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.