The “Important” Things of the Internet
“Digitization” and evolution of powerful digital platforms allow running a large number of functions of different natures and with different requirements on the same hardware. This coexistence leads to a significant increase in the complexity of embedded applications.
The emergence of the IoT accelerates the opening of software architectures and reinforces the need for more stringent security (cybersecurity) and safety (functional safety) of products to guarantee the security of people and assets.
Thanks to determinism and spatial and temporal segregation, a new generation of Operating Systems offers development approaches allowing the reconciliation of openness, flexibility and security and functional safety, providing tools for mastering the transformation of closed and isolated products on safe and secure application platforms. (Smart, Safe and Secure).
The Internet of Things
The emergence of the IoT is indisputably in progress. Many connected devices are appearing on the market: the strap attached to their wrist that allows athletes to analyze their performance in real time, the connected thermostat incorporating weather forecasts, the connected toothbrush analyzing the frequency and performance of brushing, etc.
In all these examples, connecting these devices to the internet has a limited risk, and the consequences of a possible failure are not dangerous. Moreover, in these examples, human beings remain inside the loop "object/Internet/human being/object”.
The Internet Of “Important Things”
But when you consider cars, aircraft, trains or electrical distribution products (protection relays, contactors, variable speed drives), their openness to the Cloud is not without risk, because they work in direct interaction with the environment (without human intervention). These “Things” play an important role in the security of their users and their environment. Disturbances coming from the Cloud, whether by accident or malicious, introduce a risk that must be mastered.
Problems ranging from malicious intrusion to a simple bug (infinite loop, memory overflow, etc.) in connectivity (or a function using this connectivity) can have catastrophic consequences for the main functions of the product, which in most cases are closely related to security.
Important Things = Things that are important for safety
A recent study by Deloitte showed that even if the IoT seems to have emerged from the consumer market, 60% of its value will come from industrial applications. The majority of objects on the Internet of Things may be "Important Things”. In this context, it is clear that the IoT will either respect security and safety, or it will not exist.
The two facets of “Important Things”
Connecting "Important Things" to the Cloud is sensitive, because these products support security features such as the emergency stop of a variable speed drive in a lift, the thermal protection of a motor starter, the protection function in a protection relay of a glass oven, etc. The confidence customers have in such security should not be degraded by connecting these devices to the Cloud.
Schneider Electric aims to transform all its products into digital and connected products enabling high-value services. Connected products will allow to track and understand their usage; they will be upgradable all along their life cycle, remotely when possible; they will report about their internal health (self-diagnosis); they will send an alarm in case of failure or risk of failure and provide necessary information to deliver optimal predictive maintenance services, etc.
Two facets will have to coexist within the same product:
- The facet of traditional products, as we know them today: security products supporting certified functions, developed using sometimes complex and expensive methods.
- The facet of connected products, with a large number of functions connected to the Internet, flexible and scalable, quickly adaptable to new needs and specificities of a market, customer or location.
These two facets have completely contradictory requirements:
- Security functions. They must be isolated so as not to be affected by the failure of other functions; some will be certified and their quality must be proven. Their evolution is slow, with each change representing a major cost.
- Open (connected) functions that do not meet the same performance, quality, flexibility, functional safety and cyber-security requirements. They shall be flexible, modified frequently and developed using software components available on the market (that shall not be subject to our internal quality control processes).
These two facets with completely opposite requirements will coexist within the same device.
Coexistence of “traditional” and “IoT” facets in the same product
How should “products with security functions AND connected to the Internet” be developed? How security functions and “Internet Connected functions” can coexist within the same product? How safety functions can be certified when they coexist with open functions that might cause unpredictable disruptions from outside?
The classical approach
The "classic" response involves physical separation. Two different, completely separate hardware platforms, each host the software implementation of one of the two facets, without any shared resource. However, this solution has some limitations:
- In terms of BOM costs, energy consumption, use of space, heat emission, etc.
- In terms of integration and communication, because in general, there are functions of both facets needing to share common resources and data.
- In terms of evolution, as the static distribution of functions will be difficult to adapt to new requirements.
- In terms of development costs, because all developments on the same execution platform will be subject to the requirements of the most demanding function.
The deterministic approach
Our work on Safety Operating Systems have led us to identify an interesting alternative for implementing software functions of different nature and criticality within the same execution platform. This new approach significantly simplifies software architectures, increases maintainability and reduces hardware resources and energy consumption. It is based on two properties: on the one hand determinism, and on the other hand spatial and temporal separation. With these two properties, it is possible to guarantee that any failure (execution failure, impairment of memory) occurring during a process (a task) will not impact other processes (other tasks). This allows the coexistence of critical (security-related) and non-critical functions on the same hardware platform, even if a part of the critical functions have to be certified.
The real-time execution kernel is the key element to master both the time-to-market [2] and the coexistence of critical and non-critical functions on the same hardware platform. It allows sharing hardware resources safely and effectively [4]. As noted by Edward A. Lee [3], a professor at Berkeley University, and Wolfgang Pree [1], a professor at the University of Salzburg, classical real-time operating systems are inadequate both because of their indeterminism and the non-portability of their programs and by the lack of control of the temporal behavior they create. Therefore, these professors recommend the development model of real-time systems combining time limitation and parallelism.
KRONO-SAFE, a French software company created in 2011, designed and developed ASTERIOS, a real-time kernel and configuration chain. This technology, initially designed by CEA to meet the safety and certification requirements of nuclear plant control systems, features the properties required to master the complexity of critical embedded software: determinism and space and time partitioning.
Determinism
In development processes based on classic OS, so-called "asynchronous", the temporal behavior of the software is described at the integration phase through concepts such as priorities, interruptions, timers, watchdog and many other mechanisms which may vary from one OS to another. It is surprising that temporal behavior, which is the predominant dimension of this type of software to the point that it is called "real time software", is finally addressed as a side effect, very late in the cycle of development.
Most engineers who develop embedded systems have a very "electronic-oriented" culture. They design real-time software as a slave component of the electronic system, reacting to events originated by the hardware platform that remains the master. This approach requires a long and tedious phase of fine-tuning and leads to indeterminism and temporal behavior complexity (infinite number of possible scenarios). This complexity grows exponentially with the number of processes running on the platform, so that adjusting, testing, validating, certifying, etc. becomes a nightmare.
Another major drawback of the approach induced by conventional RTOS lies in the temporal interdependence of processes. The temporal behavior of each task depends on that of the other tasks. Indeed, it is difficult to reuse a process from another system without long and tedious fine-tuning. Most times, any changes in a task will affect the behavior of other tasks, making corrective or evolution maintenance operations very expensive.
Determinism allows to exit this situation. Even if hardware execution platforms have an inherently indeterminist behavior [5] (use of cache, shared bus, etc.), the benefits of the introduction of determinism in the description of the execution model (i.e. the software), are colossal. The grain of indeterminism introduced by the hardware is much lighter and often imperceptible and insignificant for software time granularity. And if the effects of hardware indeterminism became apparent, they would be detected and could be prevented, with the software being able to take them into account to ensure the preservation of security (setting of a fallback position, for example).
In a deterministic approach, the temporal behavior of the application is described at the same level (in the same design phase) as the functional behavior. The states of the system are not the result of chance linked to hardware-originated events, but they are defined and mastered in order to achieve the mission of the application in an optimal manner. With this deterministic approach, the system does not react "as quickly as possible" (we can say "to deal with the most pressing issues”) but it processes information from the environment "at the necessary speed" to properly fulfill its mission.
In a deterministic system, each subsystem (each task) has a temporal behavior which is independent of other subsystems. This property allows secure portability of each subsystem. In addition, temporal behavior is independent of the performance of the hardware platform. Resource requirements (CPU, memory) can be determined in the design phase (at compilation). As long as these requirements are met, behavior is guaranteed. Although the platform may be evolving towards a better performance, the behavior remains the same. And conversely, if the platform cannot fulfill all temporal requirements, the compiler (or the pre-compilation analysis) will report it. It is thus possible to use a deterministic approach to optimally size the target hardware platform and closely adjust financial costs.
Determinism allows validating all system execution scenarios very early in the design process. Its predictability (same causes, same effects) significantly reduces testing efforts.
Overall, the deterministic approach turns the design of real-time systems in the right direction. The temporal behavior, an important dimension of a so-called "real time" system, is explicitly described and not produced as a consequence of setting an execution parameter (priority, delay, interrupt level, etc.).
Time and Space Partitioning
Spatial and temporal partitioning ensures that any malfunction appearing during processing, regardless of its origin (software bug or temporary or systematic failure of execution hardware) will have no impact on other processes. This means that the following become possible:
- Coexistence of certified and non-certified tasks on the same execution platforms
- Coexistence of “home-made” software components and COTS
- Acceptance of tasks vulnerable from outside (through the Internet) without impact on secure processing.
In other words, time and space partitioning is a way to reconcile the two facets of "Important Things of the Internet", the security product facet and the IoT facet.
Mixed Criticality Platform
The "terminal" products developed by Schneider Electric, as contactors, variable speed drives, protection relays, etc. support 3 types of functions:
A) Certified Real Time Critical functions, such as protection functions in a protection relay. These functions are real-time because they must run according to specific time constraints to be correct. They are certified according to standard requirements.
B) Non-certified Real Time Critical functions, such as consumption measurement, self-diagnostics, highly-valued application increasingly expected by our customers.
C) “Best-effort” functions such as Human Machine Interface, whose execution frequency may vary without an unacceptable alteration of their results.
A & B type functions are “Real Time Critical” (or Hard Real Time) functions. The functions must respect specific time constraints, as otherwise they will provide an altered result.
"Best-effort" (C type) functions have no formal time constraint. Their performance may drift over time without major consequences for the system, with service quality being degraded in an acceptable manner.
Functions of A type need to be isolated. Their execution has to be totally protected from other functions’ failures. Otherwise, other functions have to be developed with the same expensive certification process and the development costs will be much higher.
Functions of A and B types require determinism to efficiently master their time constraints.
Functions of C type require open architecture and COTS.
Ideally, it would be most advantageous to develop functions of A and B types on a deterministic OS like ASTERIOS. Functions of C type would be developed in a simpler and cheaper manner on a conventional OS to take advantage of many components available on the market.
That is why we worked on the safe coexistence of ASTERIOS and Linux. The principle of this coexistence is simple. The determinism of ASTERIOS provides a rigorous management of material resources that ASTERIOS controls and distributes according to the deadline of tasks. It therefore allocates the hardware resources to meet the real-time deadlines of hard critical functions and a temporal margin is assigned to the Linux kernel for its processes.
From the perspective of peripherals, both OS are installed directly on the hardware. ASTERIOS controls the allocation of CPU and other devices using the TrustZone mechanism, available on ARM platforms (Cortex A and now Cortex M).
A prototype has been developed using the Yocto Linux distribution. The results are positive. The latency between an ASTERIOS task and a LINUX task is similar to a system call on the platform (approx. 1μs on the prototype platform)
Conclusion
IoT-enable digital services need smart safe and secure device supporting both critical and non-critical functions.
In this context, the value of determinism improve our capabilities to master the growing complexity of embedded software. Embedded software is no longer a slave of electronics but it is the brain that uses electronics to acquire and takes action to deliver the expected service.
Many projects drift because of the growing complexity of software. Capitalization of developments, reuse of software components is working fine in Slideware. In fact, reusing code in an asynchronous system represents a considerable effort in development time, and often results in an unstable system whose overall temporal behavior is little known (consequently difficult to certify) and often leading to a rewriting of the code.
The ever increasing functional richness of Schneider Electric products and the need to make them more scalable and flexible for new service offerings prompt us to change our methods of design and implementation of our embedded software.
References
[1] Prof. Dr. Wolfgang Pree, “Trends in Embedded Software Engineering”, LASER 2005
[2] Krasner, J. “RTOS Selection and Its Impact on Enhancing Time-To-Market and On-Time Design Outcomes”, published in Embedded Market Forecasters, March 2007.
[3] Edward A. Lee, “Computing Needs Time”, Technical Report, 2009.
[4] M. Jan, L. Zaourar, Maurice Pitel, Maximizing the execution rate of low-criticality tasks in mixed criticality systems for optimizing the resource usage. In Proc. of the 1st Intl. Workshop on Mixed Criticality Systems, (WMC), pages 43–48, Vancouver, Canada, December 2013.
[5] Mathieu Jan and Lilia Zaourar, Maurice Pitel, Cache-aware static scheduling for hard real-time multicore systems based on communication affinities
Nice to hear that we at Schneider are looking into this type of topics. would be interesting if more information is shared on this topic.