Making ISRs Safe
The Problem
In the early days of embedded systems (50 years ago) we had a general rule to make ISRs as short as possible. Apparently this rule is no longer in vogue. For example, in the HAL code of a leading microcontroller vendor, the SD ISR is 628 bytes long and calls 7 subroutines, which call other subroutines. The latter are generally short and not all are included in a build, but they do add up and contribute about 700 bytes more. The DMA ISR is 418 bytes long and includes some optional callback functions. So the total for the two ISRs is about 1800 bytes. Assuming a 6:1 ratio of 16-bit to 32-bit instructions this works out to about 800 instructions in the two ISRs.
That is a lot of code and there are likely to be several exploitable vulnerabilities in it that a hacker could use to insert his malware into a device. Since ISRs run in handler mode (hmode), if an ISR has been breached by malware, it can turn off the MPU and the device becomes defenseless. Then the malware has access to all data in the device’s memory, including crypto keys and proprietary information. It also can shut the device down or cause it to malfunction in a way that causes external damage. Possibly even worse, malware that has invaded a device, can move laterally into other equipment connected to the device – often into corporate IT computers to exfiltrate confidential business or customer information or to operate as ransomware.
This problem is due to a security flaw in the Cortex-M architecture and probably in all other microcontrollers and microprocessors that are in use today. That is: ISRs should not execute in hmode. They should not be able to turn off or change the MPU, nor any other vital security hardware. Hopefully this flaw will be fixed in future generations. Unfortunately, for now, we are stuck with it.
The Solution
The SMX RTOS has always supported a design philosophy wherein ISR code is minimized and most interrupt processing is deferred to link service routines (LSRs). These run at a priority between ISRs and tasks and have much lower overhead than tasks. This type of LSR is called a trusted LSR (tLSR) and was discussed in a previous article [1].
Unfortunately tLSRs run in hmode and thus do not solve the ISR security problem. For this, SecureSMX implements safe LSRs (sLSRs), available in two flavors: pmode LSRs (pLSRs) and umode LSRs (uLSRs). Each sLSR has its own stack and its own MPA, and behaves like a mini task. sLSRs have very small control blocks and typically require very small stacks. pSLRs are for use during development since they are easier to debug. Once debugged a pLSR is converted to a uLSR, which offers the security that is needed.
The diagram illustrates how a uLSR operates: isrA is triggered by an interrupt and it invokes lsrA, which is a uLSR, does minimal housekeeping, then re-enables the interrupt. Thus, isrA has very little code and can easily be made resistant to hacking. Invoke puts the lsrA handle with one parameter into the LSR queue. When all outstanding ISRs have run, the LSR scheduler gets the lsrA handle and dispatches lsrA with its parameter. This consists of loading the lsrA MPA into the MPU, loading the PSP[1] register with a pointer to the exception frame in the lsrA stack, then doing an exception return to lsrA_main(), which runs in umode, as shown.
Now, the original isrA code that was moved into lsrA runs in umode in partition A and utilizes some of the same MPU regions and subroutines as taskA. In fact, its MPU regions come from the same MPA template as taskA and any other tasks in partition A. This commonality helps to produce a more secure design. If a hacker hacks into lsrA code, he is no better off than hacking into taskA code – either way his malware is sandboxed.
When done, lsrA autostops by running through the last }. This triggers uSchedAutoStopLSR() via the LSR exception frame, which triggers an SVC exception, causing the SVC Handler (SVCH()) to run. SVCH() calls SchedAutoStopLSR() via the jump table. The latter does some cleanup, then triggers the PendSV Handler (PSVH()) and returns to SVCH(). SVCH() tail-chains to PSVH(), which returns to the LSR Scheduler to dispatch the next LSR, if any. Most of this code runs with interrupts enabled, hence ISRs can run and new LSRs can be invoked. These can be a mixture of tLSRs and sLSRs. When all ISRs and LSRs have run, control returns to tasks.
Recommended by LinkedIn
Where The Tire Meets The Road
If the particular HAL ISRs mentioned above are an indication of how other HAL ISRs are written, it is probably necessary to rewrite them for security. In the SD ISR case, the interrupt flags set by the interrupt are directly tested in 33 places in the code before being reset! In order for the uLSR to run, these flags must be saved in a global, then reset, and the global tested instead of the interrupt flags. I am studying whether this is possible merely by changing the test macro. However, there appear to be other problems and it may be necessary to rewrite the SD and DMA ISRs in order to move their code into uLSRs.
Overhead measurements: tLSR = 330 clocks, uLSR = 930 clocks. So there is a significant increase in overhead for a uLSR vs a tLSR. In the cases considered above, the interrupt is signaling the end of an operation, so the uLSR overhead is probably not a problem. However, if an interrupt is involved in data transfer, then the overhead would not be acceptable. In that case the data transfer code should remain in the ISR and post-data processing could be moved into a uLSR. In any case, the ISR code must be carefully written to minimize vulnerabilities.
Conclusion
Long ISRs, coupled with operation in pmode, create major vulnerabilities for embedded and IoT devices. Safe LSRs offer a potential solution to this problem, provided that ISR code can be moved into them and that their overhead is not too great. Safe LSR code can be found in xsched.c, xlsr.c, and xarmm_iar.s at www.github.com/Micro-Digital/SecureSMX.
#SecureSMX, #RTOS, #Security, #Interrupts, #Safe LSRs
Reference:
1. “LSRs Improve Interrupt Handling”, Ralph Moore, LinkedIn, Nov 2025.
[1] Process Stack Pointer. At this point the code is using the Main Stack Pointer, MSP.
Really insightful! Sandboxing ISRs with sLSRs seems like a smart way to balance security and functionality in embedded systems especially given today’s malware risks.
Outstanding insight, Ralph. SecureSMX’s RTOS security framework demonstrates advanced isolation and safe interrupt management, significantly enhancing embedded and IoT system resilience against emerging cybersecurity threats
The ISR does not have to run in hmode. If you put a ISR-handler (trusted code) in between, the ISR can be handled as super-priority task
This is valuable knowledge. I suppose the CPU cycles are measured on Cortex-M7? Are there plans to support other toolchains besides IAR?