Consumers expect personal electronics and other mobile devices to provide both faster response and greater functionality—all while delivering longer battery life. For developers, however, the requirements for real-time response and high performance in many applications have dictated the use of separate processors to serve these conflicting demands. This adds cost, power, and space, as well as both hardware layout and software complexity.
A better approach would be to integrate the required hardware into a single chip. Enter heterogeneous multicore processing (HMP) devices. Containing multiple cores of different types, these processors can offer advantages of performance optimization, reduced power consumption, and improved system security and reliability.
This article describes how developers can use a heterogeneous multicore processor from NXP Semiconductors to meet the demand for these mixed workloads without compromising requirements for low power and reduced design complexity.
Advances in sensor technology and data processing algorithms have created significant opportunities for developers to manage the conflicting demands of real-time data acquisition and computationally intensive algorithm execution. In the past, developers have typically partitioned these workloads into separate systems.
At the lowest level of the network hierarchy, embedded processors such as those based on the Arm® Cortex®-M4 core would collect data, running optimized code on a real-time operating system (RTOS) or a bare-metal system. At a higher level of the hierarchy, high-performance applications processors such as those based on the Arm Cortex-A7 core would in turn execute data analysis algorithms, running applications code on familiar operating systems such as Linux or Android.
The rise of edge computing systems has moved application code execution closer to the data source. In fact, demand for faster response from more complex analysis algorithms has now pushed applications processing requirements into end devices themselves. Increasingly, consumers expect sophisticated analysis capabilities including artificial intelligence methods to be built into devices such as Internet of Things (IoT) sensors, wearables, and other low-power products.
The role of heterogeneous multicore processing
The emergence of HMP devices that combine embedded and applications processor cores has helped developers handle mixed workloads more efficiently in many applications. HMP processors integrate different cores, each optimized to meet different requirements associated with the target product’s workload. With the NXP i.MX 7ULP (ultra-low-power) family of processors, developers can leverage the performance capabilities of an HMP architecture to meet consumers’ uncompromising demand for high performance and long battery life in next-generation ULP products.
Available in both consumer (MCIMX7U5DVP07SC) and industrial (MCIMX7U5CVP06SC) versions, the i.MX 7ULP processors integrate their heterogeneous cores with graphics processing units, security accelerators, memory controllers, and a full set of peripheral interfaces (Figure 1).
Figure 1: Along with an extensive complement of modules and peripherals, the NXP i.MX 7ULP applications processor family combines an Arm Cortex-M4 core for real-time processing together with an Arm Cortex-A7 core for applications processing. It uses separate power domains for optimization of power and performance. (Image source: NXP)
Designed specifically for power-constrained portable designs, the NXP i.MX 7ULP family addresses emerging requirements by combining an Arm Cortex-A7 core and a Cortex-M4 core, each supplied by a separate power domain. In addition, the use of different power islands enables different modules to be selectively powered down when not required. As described below, sophisticated power management features integrated in i.MX 7ULP devices let developers use these power domains and power islands to tune performance and power consumption to suit their applications.
When designing the i.MX 7ULP family, NXP built power and performance optimization features into the devices starting at the chip design level and throughout the architecture.
At the most fundamental level, the i.MX 7ULP family combines fabrication methods that reduce leakage current with transistor geometries that lower parasitics, thereby reducing dynamic power consumption. Unlike conventional transistor structures (Figure 2, top), i.MX 7ULP devices are fabricated with an ultra-thin buried oxide (Figure 2, middle) that reduces the flow of electrons from source to drain, thus reducing leakage current; a further enhancement allows designers to add forward body bias (FBB) or reverse body bias (RBB) (Figure 2, bottom).
Figure 2: A conventional transistor can exhibit considerable leakage as electrons flow from source to drain (top), but the NXP i.MX 7ULP family is fabricated with an ultra-thin buried oxide that impedes electron flow (middle), and a structure that further speeds or slows electron flow with forward body bias (FBB) or reverse body bias (RBB) (bottom). (Image source: NXP)
When energy efficiency is a top priority, developers can use RBB to reduce the flow of electrons and further reduce leakage current and overall device power consumption at the cost of lower performance. Conversely, developers can use FBB, which enhances electron flow, to boost performance at the cost of increased power consumption due to higher leakage current.
At the chip design level, the i.MX 7ULP family incorporates multiple techniques including dynamic frequency scaling (DFS) and dynamic voltage scaling (DVS), software-based clock gating, and software-based power gating. Besides reducing the power consumption of different peripherals, developers can use these features to selectively shut down blocks of internal memory or place memory in different power-saving modes.
At the architectural level, the ability to tune power and performance is further extended with the use of multiple power domains including the separate power domains, noted earlier, for the Cortex-A7 and Cortex-M4 subsystems.
Each of the processor core power domains includes FBB and RBB drivers, dedicated low-dropout (LDO) regulators, and high level detector (HVD) and low level detector (LVD) monitors designed to signal supply excursions above or below designated thresholds. A separate power-on-reset (PoR) monitor tracks the voltage level in the always-on power domain.
Along with the separate core power domains, individual power domains also control system functions such as always-on hardware, while a battery-supported domain manages power to critical functions including the real-time clock and a secure non-volatile storage module, among others. As with the core power domains, each of these specialized power domains supports a comprehensive set of specialized power savings features (of which there are too many to address individually in a single article).
To take one example, the power domain for always-on functionality includes a Low-Leakage Wake-Up Unit (LLWU) module that lets developers use multiple external pins or internal modules as the wake-up source for special low-leakage power modes described below.
These architectural features are tied together in the device’s integrated power management controller (PMC), which handles these separate power domains and the device’s power islands (Figure 3).
Figure 3: The NXP i.MX 7ULP family integrates a sophisticated control capability that lets developers programmatically configure power domains and power islands to tune power and performance to meet changing application requirements. (Image source: NXP)
In this approach, developers initiate power mode transitions by sending commands through the normal intelligent peripheral subsystem (IPS) bus to a control complex comprising three tightly coupled modules:
- Core Mode Controller (CMC), which supports multiple core functions
- Multicore System Mode Controller (MSMC), comprising System Mode Controller 0 (SMC0) for Cortex-M4 power domain and SMC1 for the Cortex-A7 power domain, which handles sequencing between different power modes, monitors events used to initiate power-mode transitions, and generally controls power, clock, and memory features associated with power optimization
- Reset Mode Controller (RMC), which handles chip reset functions
Tuning power and performance
For all its power management capabilities, the i.MX 7ULP family presents a familiar programming model for developers. As with other advanced processors, i.MX 7ULP devices achieve different low-power operating states through a series of programmable low-power modes. In fact, i.MX 7ULP processor cores support several software controlled low-power modes that allow developers to reduce power consumption to the lowest possible level consistent with required functionality.
Using these different low-power modes, developers can set one or both cores and their subsystems in different variations of a normal RUN mode, WAIT mode, and STOP mode.
Normal RUN mode and high-speed HSRUN mode provide high-performance operation to support calculation-intensive portions of an application. In HSRUN mode, the core subsystem operates at its highest frequency. If the application can tolerate lower performance, developers can set the core in Very Low Power Run (VLPR) mode for operation at a maximum frequency of 48 megahertz (MHz) with correspondingly lower power consumption.
In normal WAIT mode, peripherals operate fully but the core is clock gated, waiting in a static state but ready to wake on receipt of a Wait-For-Interrupt (WFI) signal. With this mode, developers can let autonomous peripheral operations fill buffers or use direct-memory access (DMA) transactions to fill system memory before issuing an interrupt that takes the core out of its WAIT state. Very Low Power Wait (VLPW) mode allows peripherals to continue operations at the reduced frequency but gates the core clocks.
In applications such as wearables or portable devices, the system may face extended periods of inactivity, perhaps interrupted periodically by bursts of activity. In these cases, the ability to save power is critical for battery life. When the application can tolerate a slower wake-up time for the core, the ability to place the device in even deeper sleep states than RUN, WAIT, or very-low-power variations offers a particularly effective option. To support this approach, developers can place each i.MX 7ULP core subsystem in a deeper sleep state that incurs different amounts of wake-up time:
- In STOP state, some peripherals can operate asynchronously but the core remains in a static state with wake-up times of 7 microseconds (μs) for the Cortex-A7 or 7 μs for the Cortex-M4
- In Very Low Power Stop (VLPS) mode, peripheral operations are further limited but the core remains in a static state with wake-up times of 21.5 μs (Cortex-A7) or 9 μs (Cortex-M4).
For applications with even more stringent power requirements, developers can set each core in the following special low-leakage modes that shut down more device subsystems:
- Low Leakage Stop (LLS), which clock gates the core, bus, and peripherals, leaving the core in a WFI state, resulting in a wake-up time of 40 μs (Cortex-A7) or 58 μs (Cortex-M4)
- Very Low Leakage Stop (VLLS), which clock gates the core’s entire power domain, leading to more extended wake-up times including 60 μs for the Cortex-A7 or 375 μs for the Cortex-M4
For even greater power savings, developers can use RBB in some power modes including VLPS and LLS with a corresponding reduction in performance and an incremental increase in wake-up time of about 2 to 4 μs.
Conversely, when needing to deal with compute-intensive workloads, developers can run the cores in a special high speed run (HSRUN) mode. HSRUN mode shifts the Cortex-A7 core from its normal 500 MHz operating frequency to an overdrive mode running at 720 MHz.
With this fine level of control, developers can configure the i.MX 7ULP to meet even extreme power requirements without sacrificing essential functionality. For example, an application may need the lowest possible power consumption but require the real-time functionality of the Cortex-M4 core, as well as use of specific Cortex-A7 subsystem peripherals or memory. In this case, the developer can place the Cortex-A7 subsystem in STOP or VLPS state, accessing its memory or peripherals from the Cortex-M4 as that core performs its real-time operations. For further power savings, developers can use the Cortex-M4 clock to drive the Cortex-A7’s peripherals.
Simple system implementation
To implement a low-power system with the i.MX 7ULP, developers can choose from available software programmable power modes and configurations to match requirements for power and performance. On the hardware side, system design is even simpler.
For typical applications, developers can simply combine an i.MX 7ULP processor with the companion NXP MC32PF1550A3EPR2 power management IC (PMIC) to complete a design able to handle mixed workloads without compromising limited power budgets (Figure 4).
Figure 4: The NXP MC32PF1550A3EPR2 power management IC provides the complete set of supply sources required by the NXP i.MX 7ULP processor, reducing hardware design to a straightforward combination of these two devices and a few passive components. (Image source: NXP)
Designed specifically to support supply requirements of NXP processors such as the i.MX 7ULP family, the MC32PF1550A3EPR2 integrates three switched-mode buck regulators (SW1, SW2, SW3), three LDO regulators (LDO1, LDO2, LDO3), a memory reference voltage source, a complete single-cell lithium battery charger, and one-time programmable (OTP) memory for device configuration.
With its MCIMX7ULP-EVK evaluation kit, NXP demonstrates the straightforward hardware interface needed to combine the MC32PF1550A3EPR2 PMIC and i.MX 7ULP device. Along with a system-on-module (SOM) board containing the i.MX 7ULP processor and MC32PF1550A3EPR2 PMIC, the kit includes a baseboard with multiple sensors, wireless capability, an audio codec, an SD connector, and multiple other connectors including JTAG and Arduino (Figure 5).
Figure 5: The MCIMX7ULP-EVK evaluation kit combines a system-on-module board containing an i.MX 7ULP processor and MC32PF1550A3EPR2 PMIC with a baseboard containing sensors, connectors, and other components required to speed software development with i.MX 7ULP devices. (Image source: NXP)
While the evaluation kit provides out-of-the-box functionality, NXP also provides developers with downloadable design files, tools, and board support packages for custom software using FreeRTOS for real-time code, and Linux or Android for applications code.
The demand for both more sophisticated functionality and longer battery life in mobile products has traditionally forced developers to some level of compromise between power and performance. Also, growing expectations for timely data from more sensors in IoT devices, wearables, and other portable products has forced further compromise between real-time capabilities and applications level performance.
However, as shown, developers can turn to HMP architectures such as those used in NXP’s i.MX 7ULP processor family to meet stringent ultra-low-power requirements without sacrificing capabilities.