Real Time: Some Notes on Microcontroller Interrupt Latency

By Jim Harrison

Contributed By Electronic Products

Interrupts take a lot out of a high-speed processor, especially one that is heavily pipelined and, capable of issuing more than one instruction per cycle. There could be eight to ten instructions in flight at any one time that either have to be run to completion, or annulled and restarted once normal execution resumes.

The electrical engineer needs to check that the interrupt responds fast enough for the application and, that the overhead of the interrupt does not swamp the main application.

Just how fast can a given MCU perform an interrupt? That is certainly affected by the application, but it seems unreasonably hard to find a number for this item.

When an interrupt occurs, the CPU saves some of its registers and executes the interrupt service routine (ISR), and then returns to the highest-priority task in the ready state. Interrupts are usually maskable and nestable.

Just to be clear, latency is usually specified as the time between the interrupt request and execution of the first instruction in the interrupt service routine. However the "real latency" must include some housekeeping that must be done in the ISR, which can cause confusion.

The value in which an electrical engineer is usually interested is the worst -case interrupt latency. This is a sum of many different smaller delays.

  1. The interrupt request signal needs to be synchronized to the CPU clock. Depending on the synchronization logic, typically up to three CPU cycles can be lost before the interrupt request has reached the CPU core.
  2. The CPU will typically complete the current instruction. This instruction can take a lot of cycles, with divide, push-multiple, or memory-copy instructions requiring most clock cycles taking the most time. There are often additional cycles required for memory access. In an ARM7 system, for example, the instruction STMDB SP!,{R0-R11,LR} (Push parameters and perm.) Registers is typically the worst case instruction. It stores 13 32-bit registers on the stack and requires 15 clock cycles.
  3. The memory system may require additional cycles for wait states.
  4. After completion of the current instruction, the CPU performs a mode switch or pushes registers (typically PC and flag registers) on the stack. In general, modern CPUs (such as ARM) perform a mode switch, which requires less CPU cycles than saving registers.
  5. If your CPU is pipelined, the mode switch has flushed the pipeline and a few more cycles are required to refill it. But we are not done yet. In more complex systems, there can be additional causes for interrupt latencies.

In more complex systems, there can be additional cause for interrupt latencies.

  1. Latencies cause by cache line fill:
    If the memory system has one or multiple caches, these may not contain the required data. Then, not only the required data is loaded from memory, but in many cases a complete line fill needs to be performed, reading multiple words from memory.
  2. Latencies caused by cache write back:
    A cache miss may cause a line to replaced. If this line is marked as dirty, it needs to be written back to main memory, causing an additional delay.
  3. Latencies caused by Memory Management Units (MMU) translation table walks:
    Translation table walks can take a considerable amount of time, especially as they involve potentially slow main memory accesses. In real-time interrupt handlers, translation table walks caused by the Translation Lookaside Buffer (TLB) not containing translations for the handler and/or the data it accesses can increase interrupt latency significantly.
  4. Latencies caused by the application program:
    The application program can cause additional latencies by disabling interrupts.
  5. Latencies caused by interrupt routines:
    If the application has more than one urgent interrupt, they cannot be masked off so another may be requested, lengthening the total time.
  6. Latencies caused by the RTOS:
    A RTOS also needs to temporarily disable the interrupts which can call API-functions. Some RTOSs disable all interrupts, effectively worsening interrupt latencies for all interrupts, some (like embOS from Segger) disable only low-priority interrupts.

ARM7 and ARM Cortex

The ARM7 and ARM Cortex are very different in the interrupt area. By integrating the interrupt controller in the processor, Cortex-M3 processor-based microcontrollers have one interrupt vector entry and interrupt handler per interrupt source. This avoids the need for re-entrant interrupt handlers, which have a negative effect on interrupt latency.

  ARM7TDMI Cortex-M3
Interrupt controller External to processor Integrated nested vectored interrupt controller
Interrupt handlers One fast (nFIQ) and one slow (nIRQ) One handler per interrupt source
RTOS system timer Uses one timer of the microcontroller Uses integrated "SysTick" timer on the processor
System calls SWI instruction (interrupts disabled) SVC instruction (interrupts enabled)
Memory interface Single interface, data read/write takes 3 cycles Separate instruction and data bus interfaces, single cycle data read/write
Pipeline Three-stage Three-stage with branch speculation
Bit manipulation Read, modify, write Single instruction

The Cortex-M3 also accelerates the execution of interrupt handlers with logic to automatically save its general purpose and status registers in the stack when an interrupt arrives. The M3 is made even more efficient, in certain circumstances, by tail-chaining interrupts that arrive at the same time, as shown in Figure 1.

The interrupt latency is up to 12 cycles for the Cortex-M3 processor-based MCU, and the context switch time is <4 μs, while the ARM7 is <7 μs.

Figure 1: Tail-chaining on Cortex-M3 processor speeds up things.


According to Keith Curtis, technical staff engineer at Microchip, the 8-bit PIC-16/PIC-18 MCUs take 12 to 20 clock cycles to get to the ISR — depending on the type of instruction that was in progress at interrupt time. Then, in the ISR, the compiler will add instructions to determine where the interrupt originated and to push some registers. If you are using assembly language, you would put in your own items that need pushing, perhaps none.

Microchip's 32-bit PIC32 MCUs, according to Adrian Aur, applications engineer, will take a maximum of 11 clock cycles to get to the ISR where you will save at least some registers — worst case, all 32 of them need one clock cycle each. If you are responding to INT7, the highest priority (and not interruptible), a set of shadow registers will be used, making response much faster. Then, the RTOS may want to make a thread change, or enable nested interrupts when running at lower priority levels, which will add some latency. Other than that, you should be fine


In 2008, Electronic Products Magazine gave Atmel a Product of the Year Award for the AVR XMEGA microcontroller family. The biggest reason for that was its innovative eight-channel event system which enables inter-peripheral communication without CPU or DMA usage using a bus separate from the data bus. The benefit of this is predictable, low-latency, inter-peripheral signal communication, reduced CPU usage, and the freeing of interrupt resources.

Independent of the CPU and DMA, the response time for the event system will never be more than two clock cycles of the I/O clock (usually 62.5 ns).

The XMEGA uses a Harvard architecture with the program memory separate from data. Program memory is accessed with single level pipelining. While one instruction is being executed, the next is prefetched. Performance is enhanced with the fast-access RISC register file — 32 x 8-bit general-purpose working registers. Within one single clock cycle, XMEGA can feed two arbitrary registers from the register file to the ALU, do a requested operation, and write back the result to an arbitrary register.

The interrupt response time for all the enabled interrupts is a minimum of five CPU clock cycles. During these five clock cycles, the program counter is pushed on the stack. After five clock cycles, the program vector for the interrupt is executed. The jump to the interrupt handler takes three clock cycles.

If an interrupt occurs during execution of a multicycle instruction, this instruction is completed before the interrupt is served. If an interrupt occurs when the device is in sleep mode, the interrupt execution response time is increased by five clock cycles. In addition, the response time is increased by the start-up time from the selected sleep mode.

A return from an interrupt-handling routine takes five clock cycles. During these five clock cycles, the program counter is popped from the stack and the stack pointer is incremented.

Disclaimer: The opinions, beliefs, and viewpoints expressed by the various authors and/or forum participants on this website do not necessarily reflect the opinions, beliefs, and viewpoints of Digi-Key Electronics or official policies of Digi-Key Electronics.

About this author

Jim Harrison

About this publisher

Electronic Products

Electronic Products magazine and serves engineers and engineering managers responsible for designing electronic equipment and systems.