Container-based Virtualization for Real-Time Industrial Systems – A Systematic Review

Industrial Automation And Control systems have matured into a stable infrastructure model which has been kept fundamentally unchanged, using discrete embedded systems (such as Programmable Logic Controllers) to implement the irst line of sensorization, actuation and process control and stations and servers providing monitoring, supervision, logging/database and data sharing capabilities, among others. More recently, with the emergence of the Industry 4.0 paradigm and the need for more lexibility, there has been a steady trend towards virtualizing some of the automation station/server components, irst by using virtual machines and, more recently, by using container technology. This trend is pushing for better support for real-time requirements on enabling virtualization technologies such as virtual machines and containers. This paper provides a systematic review on the use of container virtualization in real-time environments such as cyber-physical systems, assessing how existing and emerging technologies can fulil the associated requirements. Starting by reviewing fundamental concepts related to container technology and real-time requirements it goes on to present the methodology and results of a systematic study of 37 selected papers covering aspects related to the enforcement of real-time constrains within container hosts and the expected task latency on such environments, as well as an overview of container platforms and orchestration mechanisms for RT systems.


INTRODUCTION
Industrial Control and Automation Systems (IACS) have been facing increasing demands, namely since the emergence of the Industry 4.0 and Smart Factory concepts.The objectives pursued with this new paradigm shift imply a natural evolution of its cyber-physical systems, especially in terms of lexibility and scalability.These improvements add to the increasing relevance of cyber-physical systems, plus the increasing need for improved resilience and security.Coincidentally, in the IT domain there was a an evolution with similar drivers, that lead to the progressive softwarization, virtualization, and consolidation of multiple systems.
After being adopted in the IT world, hypervisor-based virtualization technologies became a subject of interest for IACS applications, being adopted to consolidate less demanding components with relaxed or no real-time requirements, such as historians and other supervisory control and data acquisition (SCADA) stations.
More recently, container-based virtualization matured enough to become a staple for modern IT infrastructures, due to its reduced resource overhead, often replacing or being used within hypervisor-based VM instances.Its unique characteristics may also be of value for IACS, due to the following reasons: • It ofers near-native performance, which is of utmost importance for real-time systems.
• Its isolation and resource control capabilities allow for the simultaneous deployment of multiple containers in the same host.This may be relevant when addressing mixed-criticality systems or when having into consideration cost optimization.• Its architectural concept leverages the development of modular software which simpliies the deployment, management, or reusage of software components, as encouraged by the IEC 61499 [43].Also, multiple replica instances can be created as backup and dynamically deployed in case of failure.• Being lightweight, containers are potentially more adequate for quicker instantiation and/or migration operations, when compared with VMs.This is especially relevant for IACS applications, as the streamlined container overhead may help optimize latency as the result of reduced resource contention and/or elimination of intermediate abstraction layers (the latter is especially true for bare-metal deployments).
This paper provides a systematic review of the use of container virtualization in real-time environments, namely in cyber-physical systems such as IACS.Its goal is to assess how the technology is being used and if (and how) the current state of container-based virtualization technology can meet the requirements of real-time environments.This will provide the reader with a comprehensive understanding of the current state of research in this ield, including the latest trends, approaches, achievements, and research gaps.
The remainder of the paper is organized as follows.Section II overviews the topic of container virtualization, while Section III introduces the concept of real-time systems in the scope of industrial and automation control systems.Section IV describes the methodology adopted in the systematic review process.The following ive sections individually address each of the research questions that steered this review: Section V discusses how to ensure real-time constraints with the container host, Section VI discusses the latency that can be expected from container-based RT systems, Section VII identiies which container platforms are mostly used for RT systems and why, Section VIII discusses container orchestration in RT systems, and Section IX identiies the key open challenges in this domain.Finally, Section X concludes the paper.

CONTAINER VIRTUALIZATION
Container virtualization has its roots in the creation of the chroot system call in the late 1970s.This may have been the irst step towards process isolation by means of changing the root directory of a process and its children to a new location in the ilesystem.In early 2000's, a new feature was introduced which allowed to subdivide a FreeBSD system into independent micro-systems, and to assign diferent IP addresses to each micro-system.This partitioning mechanism became known as FreeBSD Jails, being later transposed to the Linux OS through a kernel patch.
In 2006, Google introduced "process containers", a feature designed to provide resource isolation, limitation of accounting capabilities.Later it would be renamed "control groups" (cgroups) and merged into the main Linux kernel tree, being combined with another kernel feature named "namespaces" to provide the building blocks for most Linux container implementations.In 2008, the irst container system, named Linux Containers (LXC) [57] is announced, with the Docker framework [45] being introduced in 2013.
Since its inception up to this day, container technology has become a staple of modern computing environments, evolving from an abstraction of multiple Linux kernel features to sophisticated frameworks powered by tools Fig. 1.Types of Virtualization Type 1 and type 2 hypervisor virtualization focus on the virtualization of the entire operating system, which each virtualized environment instance beinf designated as virtual machine (VM).Type 1 hypervisors are commonly found in data centers where they are used to accommodate diferent users or services on the same machine or cluster.Type 2 are mostly used in personal workstations to sandbox distinct working environments or to solve compatibility issues that require the use of diferent operating systems.As seen in Figure 1, type 2 hypervisors have an extra layer below the hypervisor since these are installed on top of an OS, contrary to the type 1 hypervisors that are executed directly on top of the host hardware (bare-metal operation).
While hypervisors provide a complete hardware platform abstraction layer, container-based virtualization is supported by a thin layer provided by kernel-level mechanisms to host a wrapped package agglomerating code and all the dependencies for its execution.This means that a single operating system environment can host multiple containers, making it less resource-consuming than the previous alternatives [34] [20], albeit with less isolation than the one provided by VMs.Usually, but not necessarily, this approach follows the principle of one process per container, which leverages the isolation between processes, improves the scalability, enables independent upgrades of diferent software/service components, facilitates its reusability, and simpliies the management of complex software/services.This lightweight virtualization is usually applied in micro-service environments, which are often based on modular distributed architectures.Two examples of such type of virtualization platforms are Docker [45] and LXC [57].

Anatomy of a Container Framework
A generic container virtualization framework, such as the one depicted in Figure 2 encompasses several components, which will be next presented and described into more detail:  • Container ś a wrapper that includes all the necessary elements (environment variables, libraries, iles, and other dependencies) for an application to be executed, allowing it to run in an isolated environment.It can assume two distinct states: running or not running.When running, it exists as a process being executed by the kernel.When not running, it exists as a container image.There are two types of containers: application containers, that package a single process or application; and system containers, that simulate a full operating system and tend to execute multiple processes simultaneously.Although containers were irst conceived to be stateless, they can be stateful as long as there is some sort of persistent storage connected to the container.• Container image ś a ile or bundle of iles comprising everything that an application needs to be executed.
It can simply consist of a binary ile or it can have a more complex structure, including its own simpliied operating system.It can be created from scratch, or incrementally, on top of a pre-existing image (e.g. Alpine image) by adding the necessary elements or by re-using a pre-existing instance (which can be pulled from a public or private registry server).An image can have just a single layer, which means it is a base image with no parent layer, or it can have multiple layers, each one representing one or multiple changes that were made to its parent layer.This means that the parent image/layer always remains unchangeable.
Considering the OCI speciications [46], the image exists as a tarball archive that contains a tarball ile for each layer, plus JSON iles with metadata necessary to run the container.• Container host ś the system where the container is running, i.e., where the container image is instantiated.
• Registry server ś a ile server that stores container images.Container images are commonly organized by namespaces (not related to the kernel namespace) and, inside those, by repositories, each keeping a collection of diferent versions of the same image which may be distinguished using unique tags.By means of an access API, access control rules, and indexes, these ile servers allow pushing and pulling container images.• Container engine ś the software that is installed on the container host and is responsible for controlling the life cycle of the containers.This makes it the central element in this ecosystem.It is usually composed of diferent tools that take on diferent tasks.This modularity is essential to leverage innovation through the possibility of changing each of these tools independently.Taking as an example one of the most used engines (Docker), we have (Figure 3): the dockerd daemon, that is responsible for handling the user input (commands send via CLI or REST API); the containerd daemon, which can be considered a high-level container runtime, being responsible for pulling and pushing container images from and to the container registry, managing the storage and network and supervising the running containers; and the runc, which is a low-level container runtime that uses the libcontainer library, being responsible for interacting with low-level Kernel features enabling the actual creation and running of containers.• Container orchestrator ś the software tool that automates much of the operational efort required to launch, schedule, and manage containerized systems in centralized or distributed environments.Especially useful when used in large-scale systems, this type of tool greatly empowers the use of containers by automatically dealing with everyday actions like container scheduling, deployment, and networking, enabling horizontal scaling, load balancing and self-healing, among others.Kubernetes is the most popular orchestrator and powers the containers management system of players such as Red Hat OpenShift [38], Google Kubernetes Engine [35], Amazon Elastic Kubernetes Service [3]) and Microsoft Azure Container Instances [60].
Summing up, a container is a running instance of a container image that allows for code to be executed in an isolated way.It can be seen as a software package that bundles an application code and all its dependencies.When not being executed, it exists as a ile or bundle of iles that constitute the container image.When ordered to be executed, the container engine pulls the necessary metadata and iles from the registry server, unpacks the container image, and executes other required procedures (such as API calls to the Linux kernel) to run the container with isolation guarantees.At this point, the container is running on top of the Linux kernel and comes to exist as a Linux process.
The isolation is guaranteed by using Linux features such as: kernel namespaces, that act as an abstraction layer for global resources and enables the creation of an isolated workspace for each container (namely by creating individual mounting points, network interfaces, user and process identiiers, etc.); cgroups, that allow the control and isolation of system resources by limiting the access to those (e.g.CPU, RAM, IOPS, network bandwidth, etc.); and SELinux, that ensures the isolation of resources through access control security policies that supervise the interaction between processes and system resources.
The structure needed for its existence in both states has been standardized by the Open Container Initiative (OCI) [29].Created in 2015, the OCI is a combined efort of the main players in the container industry, with the aim of deining vendor-neutral, portable, and open speciications to leverage interoperability and innovation around this technology.One of the biggest contributors was Docker, which justiies the resemblance between Docker and OCI speciications and structures.OCI has already released three speciications [29]: • Image Format Speciication ś deines the requirements for an OCI container image, including the image ile on-disk format (at the top level it is a simple TAR archive), the internal layout, and meta-data such as entry-point, hardware architecture, operating system, among others.• Runtime Speciication ś deines how to run a container image compliant with the OCI image format speciication.It includes speciications for the coniguration and execution environment and others regarding the lifecycle of the container.• Distribution Speciication ś deines an API protocol for the distribution of content.It can be used for pushing and pulling container images to or from the registry servers across platforms, or any other type of content since it was designed to be agnostic of content types.
In addition to OCI work, there are also other relevant speciications within the container universe, such as: • Container Runtime Interface (CRI) [51] ś a plugin interface developed by the Kubernetes team [28] that allows kubernetes nodes to use multiple types of container runtimes.It deines the main communication protocol between the Kubelet ("primary node agent" that runs on each Kubernetes node) and the container runtime.Thus, leveraging the ecosystem compatibility and allowing for the execution of the preferred runtime on each container node.• Container Network Interface (CNI) [27] ś a Cloud Native Computing Foundation [26] project that speciies and includes coding libraries for writing plugins to conigure Linux container network interfaces.Moreover, it also focuses on the garbage collection of resources once containers are deleted.• Container Network Model (CNM) [18] ś another speciication for coniguring network interfaces for Linux containers.It was proposed by Docker and, like CNI, it empowers container virtualization by enabling multiple core networking functionalities and conigurations.
The growing adoption of container virtualization (especially in the IT world), the emergence of standards that allow the normalization of the entire ecosystem, and its lightweight virtualization capabilities with isolation guarantees, make it a technology with the potential to be widely adopted as well by the OT domain, namely for real-time systems.Thus, and complementary to the topic of container technology, the next section will introduce real-time systems, also presenting the cornerstone concepts supporting them.

REAL-TIME SYSTEMS
By deinition, a real-time system is a system that can respond to an event within a pre-speciied and guaranteed timing constraints.This time limit is called a deadline, and often comprises a few milliseconds or even microseconds.It is expected for such a system to be able to receive data from the surrounding environment, process it and, if needed, trigger some mechanism that inluences the environment at that point in time -this capability to respond correctly and in a timely manner is closely related to the system's determinism and predictability.By deinition, one knows that a deterministic system involves no randomness, which in this case means it should always produce results within the same time frame.Often, these two concepts are merged and simply referred to as determinism.
Real-time systems can be divided into three distinct categories according to the impact of failing to respond within the pre-deined deadline (Figure 4): • Hard real-time (HRT) ś The inability to meet a deadline results in a system failure.Responses following the deadline are automatically devoid of value.• Firm real-time (FRT) ś Deadlines can be infrequently missed without causing a system failure, however, there may be a degradation in the quality of service.Responses following the deadline are automatically devoid of value.• Soft real-time (SRT) ś Deadlines can be infrequently missed without causing a system failure.Responses following the deadline are still considered, even so there may be a degradation in the quality of service.The time set for the deadlines depends on the system and context in question, and it is the designer or programmer's responsibility to set it accordingly.There is no łone size its allž value.However, the literature can give us some guidance.For example, Khan et al. [48], while examining the application of cloud solutions in the Industry 4.0, present some data regarding the QoS in such industrial environments.The latency for motion control is placed between 250s and 1ms, for augmented reality in 10ms, and for conditioning monitoring in 100ms.Greifeneder et al. [37] studied the overall response time in industrial networked automation systems having in consideration distinct architectures with diferent control systems, namely event-driven systems using interrupts and time-driven using clock cycles.Multiple case studies were considered and the average response time between an input change, the execution of the internal processing algorithms, and the subsequent output activation were comprised between 18ms and 37ms.There are also indicative values for the end-to-end latency [23] ś as shown in Table 1 ś from which the deadlines can be gauged.
Next, we discuss the two main properties relevant for deterministic systems: latency and jitter.

Latency
Conventional computing is oriented towards throughput optimization, which means that coalescing or deferred processing mechanisms are often implemented to optimize resource usage, at the cost of reduced predictability or increased execution delays for certain tasks.In fact, when it comes to latency, a real-time system must be concerned with the delays for end-to-end communications and host-level processing levels, whose sum deines the maximum tolerance allowed by the system.Moreover, each of these components is also afected by other factors.
For communications, latency is the sum of multiple delays that may arise from diferent sources -for instance, a packet switched network has a series o inherent delay sources that must be accounted for, namely: • Propagation delay ś the time required for a packet to travel from the sender to the receiver.It can be computed considering a function of distance over the speed the signal propagates along the communication channel.In reasonably controlled environments, this value is usually uniform for each communication channel type.• Transmission delay ś the time required to push an entire packet into the communication channel.Its value can be computed considering a function of the packet's length over the transmission rate.Normally, the speciications of the hardware being used deine the transmission rate.• Processing delay (not to be confused with the processing latency łsubgroupž) ś the time required for a node to process a packet and be able to check for errors and determine its next destination.• Queuing delay ś the time that a packet spends in a queue waiting to be processed or transmitted.When the packet delivery takes place, the network device rises an interrupt so that the system becomes aware of this event.The real-time process (usually referred to as RT task) responsible for dealing with this event has irst to be scheduled, then processed, and inally a response to the event is achieved.Thus, processing delay occurs after the packet is delivered to a real-time system control device.It is possible to identify diferent sources of delay in the processing latency (Figure 5), among which two stand out: • Interrupt latency ś When the interrupt is raised, the system may not be available to handle this interrupt at this exact time due to circumstances that may be locking out interrupts.Also, actions like having the processor save the state of execution, and the interrupt processing itself, add extra delay to the process.• Dispatch latency ś After the interrupt is handled, the RT task becomes ready to run and is scheduled for processing according to the scheduling policies.This dispatch process generates delays caused by context switching, scheduling, and dispatching, among other conlicts that may arise in the process.Both the interrupt and the dispatch latency depend on the operating system's main purposes which, in turn, deines how its kernel is programmed.For example, if the OS goal is to prioritize throughput, the kernel scheduler will probably apply a non-preemptive policy which will increase the dispatch latency; if the OS wants to deal with mutual exclusion problems and to assure that only one process is executed in a critical region at a time, it may increase the maximum time that interrupts can be disabled, nevertheless, the interrupt latency will increase.
The aforementioned aspects, which are relevant for bare metal OS deployments, also provide an intuition about the potential penalty induced by the introduction of intermediate abstraction layers, as it is the case for hypervisors, whose inherent overhead might vary signiicantly due to factors such as the hypervisor type (type 1 or bare-metal vs. type 2 or OS-hosted), or the the speciic virtualization techniques being used, such as full virtualization, paravirtualization or hw-assisted virtualization [67].Accordingly with the speciic techniques being used, hypervisors may need to handle aspects such as virtualized device drivers, interrupt steering mechanisms or nested memory management (just to mention some examples) which have an implicit penalty on host processing times when services are hosted within virtual machines.From this perspective, containers ofer the advantage of providing minimal overhead for the execution environment (close the bare metal OS deployments), ofering near-native performance due to reduced resource contention and/or elimination of intermediate abstraction layers.

Jiter
Since predictability implies producing results always within the same time frame, it is essential to ensure that the latency variation ś also referred to as jitter ś is as small as possible and within the RT task restrictions.The lower the jitter, the higher the predictability of the system.
In real-time systems having a normalized latency is at least as important as having low latency.As presented in Table 1, the jitter value is always more restricted than the latency value.A response being achieved earlier than expected can result in synchronization problems, but a response being achieved later than expected can result in its invalidation.

SYSTEMATIC REVIEW PROCESS
As already mentioned, the systematic review presented in this paper focuses on the use of Containers (a broadscope technology) to the speciic domain of real-time systems.
Based on the guidelines for performing systematic literature reviews as presented by Kitchenham et al. [49] and other authors [24,58], the review process deined for this research work entailed three main phases: (A) planning, (B) implementing, and (C) reporting the review.Figure 6 presents a roadmap that outlines the practical application of these principles to the study hereby presented.This irst phase corresponds to the deinition of the rationale and goals for conducting the review, as well as to identify the relevant research questions and the basic review procedures to be adopted.More speciically, this survey has the goal of assessing the applicability of container virtualization technology to real-time environments.More speciically, this review addresses the research questions presented in Table 2.
To identify the best-matching keywords to use in the search for relevant or related works, we used the speciic methodology.First, we performed a lexical analysis of a set of papers identiied in previous research that somehow addressed this topic, in order to select the most frequently used words -these are shown in Table 3, which lists the twenty-two topmost used words for title, abstract and text.These three categories are also used to classify each word accordingly to the ones in which it appears in, as well as the respective number of occurrences (e.g. the word "time" appears in all categories, thus being classiied as III, with a per-category frequency of 16 times for titles, 79 for abstracts, and 2852 times for full text).
Next, the following well-known online digital libraries were used: ACM [25], IEEE Xplore[44], Science Direct [21], SpringerLink [63] Scopus [22], Wiley[74], and Web of Science [14] -the latter aggregates information from multiple sources -thus, enabling access to papers related to the research.By resorting to these libraries, composed of peer-reviewed studies published in well-reputed venues, it was possible to indirectly (partially) ensure the scientiic quality of the papers referenced in this research.
The inal set of papers considered in this review was selected having in consideration the criteria presented in Table 4.Only papers published between 2016 and 2022 were considered, due to the strong evolution of container technology in recent years, and since the focus was on the current state of the art.
Regarding query formulation, some tests were carried out to understand the best way to build the search query.For this matter, the IEEE Xplore digital library was used as reference.After a few attempts, the keywords that # Research Question Goal RQ1 How can real-time be ensured within the container host?
To identify which techniques can be used to ensure that real-time task requirements are met.RQ2 What is the expected order of magnitude for the real-time task latency when using containers?
To estimate the guaranteed latency that can be achieved but container-based RT systems, in order to assess their compliance with the requirements of IACS.RQ3 Which container platforms are better suited to real-time systems?
To identify which container platforms are more used in real-time systems and why.RQ4 Is container orchestration being used in the real-time systems scope?
To understand if and how the orchestration of containers is used to leverage this technology in the scope of RT systems.RQ5 What are the key open challenges of applying container technology to real-time systems?
To identify the key remaining challenges towards the adoption of container-based RT systems for IACS.seemed to produce the best results were selected, as well as the combination of the diferent elements where to search for, leading to the query shown in Listing 1. Table 5 shows the results obtained for the conjugation of the diferent elements considered, when using that search query.It was concluded that the search by title, as well as  by abstract, returns a considerable smaller number of results when compared with searching in the paper's full text.Combining any of the irst elements with the full text produced a slightly more reined result.As such, those three elements were combined as a way to achieve better results.
Listing 1. Adopted uery ( ( " Document T i t l e " : r e a l OR " Document T i t l e " : t i m e OR " Document T i t l e " : i n d u s t r i a l OR " Document T i t l e " : a u t o m a t i o n OR " Document T i t l e " : embedded ) AND ( " Document T i t l e " : c o n t a i n e r * OR " Document T i t l e " : v i r t u a l i z a t i o n ) ) AND ( ( " A b s t r a c t " : r e a l OR " A b s t r a c t " : t i m e ) AND ( " A b s t r a c t " : v i r t u a l i z a t i o n OR " A b s t r a c t " : c o n t a i n e r * ) AND ( " A b s t r a c t " : i n d u s t r i a l OR " A b s t r a c t " : a u t o m a t i o n OR " A b s t r a c t " : c o n t r o l ) ) AND ( ( " F u l l T e x t Only " : r e a l t i m e AND " F u l l T e x t Only " : c o n t a i n e r * AND " F u l l T e x t Only " : v i r t u a l i z a t i o n ) AND ( " F u l l T e x t Only " : i n d u s t r i a l OR " F u l l T e x t Only " : c o n t r o l ) OR ( " F u l l T e x t Only " : l a t e n c y OR " F u l l T e x t Only " : p e r f o r m a n c e OR " F u l l T e x t Only " : s c h e d u l i n g ) )

Implementation of the Systematic Review
In this phase we adopted an unbiased search strategy, to identify as many studies related to the research questions as possible, to select those that were within the inclusion criteria, and to extract and synthesize relevant data.
To identify relevant research work, the search query was applied to the aforementioned digital libraries: ACM, IEEE Xplore, Science Direct, SpringerLink, Scopus, Wiley, and Web of Science.Some ine-tuning had to be made during the search process since the search commands and/or options slightly change across the diferent libraries.
Despite the eforts for producing the most eicient search query possible, it was necessary to carry out a pre-selection of the papers to download after the search in each of the libraries.This iltering was done based on the title and abstract of each paper.After eliminating some papers that were obviously out of scope, the selected papers were retrieved.
Finally, retrieved papers were subject to a iltering process based on a full text analysis, with several being discarded due to their shallow technical depth and/or lack of relevant information.The inal count amounted to a total of 37 selected papers.Table 6 identiies where the selected works were published, the type of publication, the publisher and/or entity responsible for the event and the reference for the papers.
The papers constituting the inal selection were analysed in detail, having in mind the aforementioned research questions.The results and conclusions taken from this analysis efort are reported in the next ive sections, organised according to the aforementioned research questions.

HOW TO ENSURE REAL-TIME WITHIN THE CONTAINER HOST?
Standard container virtualization was not designed with real-time environments in mind and does not natively support real-time requirements.When installed on a general-purpose operating system, the tasks being executed inside containers will eventually be processed under a best-efort policy (as they are subject to kernel scheduler policies), thus not complying with real-time requirements.As such, containerization needs to be complemented with other techniques, so it can deliver real-time performance.Our systematic review identiied several strategies to achieve this, including methods based on standalone/single kernel, methods based on RT co-kernels, schedulingbased methods and other approaches, as discussed next.

Methods Based On Standalone/Single Kernel
This category entails all the methods that tweak the kernel so it becomes preemptable according to the priority of each thread, such as the PREEMPT_RT Linux kernel patch [30] or Real-Time Linux [31] that incorporate the PREEMPT_RT patch functionalities into the Linux kernel mainline ś not to be confused with RTLinux.Such preemption capabilities allow software developers to easily create (user-space) real-time applications [68], enabling tasks with higher priority to be given precedence over other tasks that may be executing at the time.In some cases (depending on the preemption level), it may even be possible to preempt critical sections like interrupt handlers.An interrupt handler can be treated as a "normal" thread and its priority modiied.Such an approach allows a real-time task running at the user space to have priority even over interrupts, thus guaranteeing a more deterministic behaviour.Next, we review the more relevant works based on this method.
Masek et al. [59] evaluated sandboxed software deployments for real-time systems, namely self-driving heavy vehicles.The authors investigated to which extent the execution environment inluenced scheduling precision and input/output performance of a given application, using a real-world application from a self-driving truck for evaluation purposes.Four distinct execution environments were used: (i) a conventional Linux kernel, (ii) a native real-time Linux (preempt_rt) kernel, (iii) a Docker container with a conventional Linux kernel, and a Docker container with a real-time Linux (preempt_rt) kernel.It was concluded that using Docker containers had a negligible impact on the performance of the system and that, on average, the speciied real-time deadlines were not violated.Also, it was concluded that choosing the correct kernel was of the utmost importance since swapping kernels translated into a signiicant variance in terms of scheduling precision and input/output performance.The obtained results were considered in line with referenced research that concluded that the processing latency of a task running inside a Docker container could be improved by 13.9 times (from 446s to 32s) when using a  real-time Linux kernel, compared to using general-purpose kernels.The system load was another factor pointed out by the authors, as the controlled experiment with a more demanding load placed the highest standard deviation of around 7000s, versus the uncontrolled one that was less exhausting and had a higher standard deviation (around 60s).Moga et al. [61] investigated the use of operating system virtualization to support industrial automation systems, such as motor drive control applications with cycle times between 1ms and 250s.With this aim in mind, authors analysed how industrial automation platforms could be consolidated by using Docker containers and also performed an experimental study about container-induced overhead.As a way of guaranteeing real-time performance, the evaluation was undertaken using a container host running a Linux Ubuntu distribution with the real-time kernel patch 3.14.12-rt9,with all the containers being run with the śprivileged=true lag so they could have privileged access to the OS functions and resources.The evaluation took into consideration two distinct aspects: (i) timing accuracy, which addresses the processing latency during the cyclic behaviour of a control application; and (ii) virtual networking performance, that relates with the application overhead imposed by inter-container virtual networking.It was concluded that container technology could meet real-time system requirements, ofering near-native performance with the added beneits of increased modularity, lexibility, and portability to real-time systems.However, some concerns were raised related to the optimal allocation of containers to resources, especially when multiple containers running real-time applications need to access shared resources.
Goldschmidt et al. [33,34] presented a container-based architecture for a multi-purpose controller, as a replacement for the typical programmable logic controllers and other automation controllers with working cycles between 100ms and 1s.This approach intended to create more lexible functions deployment to support innovation, and to address problems related to the support of legacy systems in which the control code is often tightly bonded with speciic hardware.Using a containerized system, it becomes possible to support multiple execution engines running over the same hardware at the same time, as well as to emulate legacy systems.In addition to using the real-time kernel patch (PREEMPT-RT) as a way to guarantee real-time performance, the authors also mention the need to use the cap-add=sys_nice lag (assuming the Docker platform) as a way to allow tasks being executed inside the containers to access real-time priority within the host system.Multiple tests under diferent scenarios were executed: irst, using a Docker container running a simple application, and later, running a QEMU PowerPC emulator in an LXC container.According to the provided results, the containerized execution of control applications sufers a negligible and constant overhead, thus meeting real-time requirements even when emulating legacy hardware -thus conirming the possibility of replacing legacy hardware components without having to change the control execution environment (e.g.migrating the unmodiied binary legacy software components to the new hardware).Those results also hint at the possibility of achieving highly automated lexible function deployment by using the container registry and components deployed through the exposed APIs.
Albanese et al. [2] also relied on the single kernel approach and used Docker with a Linux-based operating system with the preempt_rt patch applied.To increase the determinism of the system, multiple BIOS functionalities were deactivated (e.g., power saving modes, frequency scaling, CPU hyperthreading and Sub-NUMA clustering).Diferent cores were explicitly assigned for the kernel and OS jobs (1-2), and for the real-time application instances (2-7), and each core used private L1/L2 caches and shared the L3 cache.This way each instance was able to run exclusively in its allocated core(s) with extra isolation guarantees.Unlike most research works, the authors focus on the network performance and its ability to cope with real-time requirements in containerized virtual environments.Four diferent network technologies were evaluated: three Docker-supported software solutions (Host, Bridge and MACvlan), and the Root I/O Virtualization (SR-IOV) hardware-assisted solution.All of these were submitted to multiple tests, that consisted of a real-time task with a 2ms deadline for receiving packets transmitted from an external packet generator.Tests with and without a concurrent workload were executed.Network latency and missed packets were measured and registered.The median values ranged between 189s (SR-IOV) and 599s (MACvlan), being that the irst presented very stable results along all the distinct tests.No missed packets were registered except for the MACvlan -ś the authors eventually realised that this may be related to the way the MACvlan Linux driver deals with multicast transmissions, since after switching to unicast transmissions the results improved dramatically.It was concluded that, albeit software-based solutions like OS-level Bridges and MACvlan support are more sensitive to network and system load luctuation, this can be mitigated by using VLANs with the former and unicast traic with the latter, thus improving compliance with latency-sensitive applications.SR-IOV hardware-assisted virtualization was shown to be more robust and less prone to load variations while enabling NIC sharing in a straightforward way, which makes it a good choice for containerized virtual environments.
Sollfrank et al. [73] envisions a trend where the automation pyramid is shifting from a static form to a dynamic state, with the virtualization of applications becoming more present in Cyber-Physical Production Systems (CPPS).Acknowledging that such systems are often connected through a network and have a distributed control logic (thus being referred to as Distributed Networked Control Systems or DNCS), the authors note the importance of studying not only the nodes' processing latency but also the propagation delay.Being so, the usability of containers based on Docker for industrial time-sensitive applications was analysed having into consideration those two factors.Three nodes interconnected through an Ethernet switch were used, two of which exchanged UDP packets representing messages being used by real-time tasks.For guaranteeing a deterministic processing behaviour on the nodes, a real-time Linux Kernel was used (4.4.5-rt17-v7+SMP PREEMPT).Also, the ścap-add=sys_nice and the śulimit='rtprio=98' lags were used together to assign the real-time task with the second highest priority (being the highest priority assigned to the bash script responsible for starting the UDP-communication application inside the containers).It was observed that the processing delay on the nodes did not exceed 0.5ms, which is below the threshold deined for the rt-task, thus conirming the system as real-time capable.The mean on-node processing delay when using containers was 117s, and 74s when not using.Regarding the network propagation delay, a small diference was identiied.With Docker the mean delay was 513s and without Docker it was 508s.However, this diference was in the range of the NTP clock synchronization where some outliers were up to 700s.The presented conclusions state that the containerized environment is suitable for real-time applications if the priorities are well assigned.Also, network delay has to be considered since the standard Ethernet is not deterministic and has some time delay outliers that may go over the applications' processing time.Later, in [72], the same team addressed a relevant case study by considering a multi-thread application.By observing the latency values of a dual-thread application when executed in an isolated Docker container, versus running in a non-isolated non-containerized environment, they also concluded that multitask applications beneited from the isolation ofered by the container environment, since it achieved a more deterministic behaviour.
Hinze et al. [40] also experimented with a single kernel method, using a Linux Ubuntu distribution with the PREEMPT_RT kernel patch.In this case, the goal was to create an environment capable of being executed on local or cloud environments that could simulate the handling of deformable materials by industrial robots, such as cables or ropes, with real-time constrains.This is more of a challenge if considered that the handling of this type of material is characterised by nonlinear, location and time-dependent equations for the objects' behaviour -thus, a highly accurate and detailed simulation environment, capable to run in a deterministic time frame, is needed to obtain reliable results.Also, the authors looked to fulil other non-functional requirements such as maintainability, reusability, lexibility and portability.For this matter, the Docker container platform (runtime and orchestration) was used together with a RT-capable interface, implemented using ZeroMQ.Some tests were performed by means of a face-to-face comparsion of a non-virtualized system versus a containerized version, under the same environmental conditions (with and without concurrent load).Although the non-virtualized presented slightly better results in terms of mean latency (3.99s vs 4.07s) and standard deviation (0.0748s vs 1.4956s), when no concurrent workload was present, the containerized environment performed considerably better when in presence of parallel workload (mean latency 4.82s vs 4.12s; standard deviation 58.0151s vs 0.4102s).Such results hint at determinism improvements when using containers in shared environments.Nevertheless, the authors mention that further studies must be made to further demonstrate the real-time capabilities of the system.It is also concluded that the non-functional requirements mentioned above were achieved, in part due to the isolation and encapsulation ofered by the usage of containers.
In [41], Hofer et al. proposed a lexible architecture for real-time control systems capable of exploiting container virtualization and leveraging of-the-shelf technologies in cloud environments.The architecture is divided into three distinct layers.The irst is the monitoring and management of the cloud infrastructure and services.Decisions are taken based on the analysis of the aggregation of data collected globally.The second layer, named "Control Cluster", is composed of several nodes, and is where all the process control and related services are taken care of.All the services (real-time or best efort) that interact with on-premises devices are located in this layer.This layer also hosts the orchestration software that is responsible for increasing resource utilization without signiicantly impacting the system determinism.As such, the advantages and disadvantages of using static and dynamic resource scheduling strategies are studied, using a dynamic orchestrator that uses probabilities based on runtimes, contemporaneity factors, and the probability of exceeding a given worst-case execution time, to allocate resources.Based on several tests performed in cloud environments, the authors conclude that the conjunction of the Ubuntu 16.04 LTS operating system with the PREEMPT_RT real-time patch, and the Docker container platform, provide the best latency among the low-maintenance solutions analysed.Also, tasks with longer periods sufer less from external noise and tasks with shorter periods (e.g.1ms control loop for motion control systems) may need a speciic scheduler, since the commercial schedulers' refresh rate limit may not be able to accommodate such requirements.Those tests also hint at the possibility of exceeding over 90% of CPU resources without afecting the determinism of the system (depending on the tasks' properties).Finally, it also concluded that generic shared cloud resources may viably host less critical operations with periods around 100ms (being that the worst observed runtime variation around 126s).
The centralized protection and control concept for electric substations, presented in [8] (which documents the core contribution of a MSc.thesis [7] later published, in 2023), constitutes one of the most realistic eforts towards controller equipment virtualization, covering the issues surrounding the consolidation of mixed-criticality workloads (hard-RT and general-purpose) to implement virtual Intelligent Electronic Devices (vIEDS) in commodity x86 hardware, with support for network determinism.For this purpose, the authors adopted a VM-hosted containerized architecture that resorts to the PREEMPT_RT patch together with HugePage support (1 GB) in order to increase Translation Lookaside Bufer (TLB) hits and lock pages in memory, disabling hyperthreading, CPU C-states, power management features and dynamic frequency scaling, with the kernel option for omitting scheduling-clock ticks on CPUs that are either idle or that have only one runnable task (CONFIG_NO_HZ_FULL=y), and using both core ainity and partitioning.Obtained results achieved an average latency of 14s in single instance tests (with a maximum of 20s) and a maximum latency below 450s under stress (with 20 coexisting instances) using the cyclictest tool.In a scenario with 20 coexisting instances, authors veriied that vIEDs running on real-time VMs were able to process IEC 61850 Sampled Value messages without violating the time constraints imposed by the standard, with a maximum latency value below 450s and average values under 100s.
From the reviewed studies, it can be concluded that standalone/single kernel methods seem to support real-time behaviour, complying with latency requirements in the microsecond range.However, considering the diversity of the studies, targeted architectures, and the diferent approaches adopted in each, it is diicult to provide a more precise indication of the guaranteed latency ranges.The use of the ścap-add and śulimit lags while using the Docker platform should be highlighted for their relevance in guaranteeing containerized task compliance with real-time requirements.

Methods Based On Real-Time Co-Kernels
A general-purpose Linux kernel (GPLK) allows non-preemptible routines to be executed.This can be problematic when a real-time task is ready to be executed but is kept waiting until the non-preemptible routine processing inishes.In methods based on RT co-kernels, a second kernel that runs aside from the main kernel and has higher priority, executes tasks that are time critical, like real-time tasks and interrupt management [68].Is such circumstances, the GPLK is treated as a thread with idle priority, which means it has the lowest priority and is only given the chance to run and execute its own scheduler when the co-kernel becomes idle after executing rt-tasks to completion.If in the meanwhile an interrupt is triggered, it will be held until the rt-task being executed inishes, using a deferred processing strategy.If there is a real-time handler associated with that interrupt, it will be executed by the co-kernel -otherwise, it is handed to the GPLK.One of the major diferences of this method, in terms of software development, is the need to use speciic API/system calls according to the co-kernel approach.However, in some cases, the existence of such speciic API provides the application programmer with more reined control over the execution of its real-time application, possibly making applications more deterministic and predictable.The most frequent open-source co-kernel distributions are Xenomai Cobalt [78] and RTAI [69].Next, we review the more relevant works based on this method.
Cinque et al.
[11ś13] applied the concept of real-time containers to mixed-criticality systems to take advantage of many-core architectures, while trying to guarantee naming, temporal, and fault isolation of tasks.The proposed architecture relied on Docker containers on top of a Linux operating system patched with RTAI.The containers were marked with diferent criticality levels to deine their relative importance ś a kind of global ixed-priority assignment.Also, a speciic library (rt-lib) was implemented to provide a transparent mapping of rt-tasks on the underlying real-time core, according to the criticality level of the containers, and to expose standard primitives to rt-tasks inside the containers.Some modiications to other libraries were also necessary, to achieve naming and temporal isolation.Re-mapping the priority of rt-tasks was also done to assure that rt-tasks running into low-criticality containers could not preempt rt-tasks running in higher-criticality containers.However, the authors conclude that fault isolation was mainly guaranteed by the underlying technology and that it would need refactoring to increase its robustness.
Barletta et al. [5] rely on Xenomai and Docker to create a platform for mixed-criticality systems where real-time containers host real-time tasks and run on top of an operating system with real-time scheduling capabilities, side-by-side with non-real containers running non-real-time tasks.Among others, the proposed solution presents a hierarchical scheduling (SCHED_DS) based on the grouping of real-time tasks, "resource's budgets" for each group and two levels of run queues ś that suppress the need for active monitoring the tasks already running within containers but proactively isolating the CPU.The newly introduced Xenomai SCHED_QUOTA policy is used to impose CPU quota intervals to threads.This policy was customized to achieve a truly hierarchical scheduler able to overcome some limitations identiied by the authors.Authors also present a feasibility checker that veriies in advance (in terms of network and CPU) if a new RT container can be created without negatively afecting already running containers.To achieve network bandwidth guarantees, the proposed solution was integrated with the RTnet [70] real-time networking stack.RTnet wrappers are used within containers to isolate from disturbances coming from other sources, namely non-real-time traic.Based on presented tests, it is possible to identify that the proposed solution shows solid CPU activation latency results ( 10s), even under diferent types of concurrent load (CPU, I/O, HDD, network).Also, considering all the test cases, the maximum latencies did not exceed 6s, nor did the standard deviation overcome 1.1s.A direct comparison between Xenomai and PREEMPT_RT approach is also presented by the authors, and it concluded that the irst presents an improvement of at least 30% over the latter.The outcome of this work, namely when looking at parameters such as latency, overhead and isolation, is highly encouraging and seems to show that the combination of Xenomai co-kernel approaches and containers can support a RT-compliant environments.

Standalone Kernels combined with RT Co-Kernels
Tasci et al. [76] tried to combine the advantages of standalone kernels and RT co-kernels.This mixed approach still relies on a co-kernel but also explores the potential of single kernels by applying a real-time patch to the "main" kernel and then using a second kernel for tasks with tighter requirements.The authors proposed an architecture built on this concept that uses containers to modularize RT control applications, with a view towards improving reusability, portability, and lexibility.The core concept relies on distinct modules implemented as Docker containers running a merge between PREEMPT_RT kernel patch and Xenomai Cobalt kernel, which communicate between themselves through a brokerless messaging system based on ZeroMQ framework.It is assumed that each module is implemented in a diferent container and that multiple containers are needed during the control runtime execution.By combining PREEMPT_RT with the Cobalt Kernel, the authors conclude that it is possible to run multiple applications inside containers in a predictable and deterministic way.While the PREEMPT_RT patched kernel is used for less strict tasks (e.g.tasks that manage and launch real-time tasks), the Cobalt kernel is used for more demanding RT tasks.The outcome is a double kernel system where both kernels are more deterministic than a general-purpose one and both can execute real-time tasks.However, the authors also mentioned that, at that time, the cobalt source needed some ixes to be able to support containers.Moreover, the merge of both patches (Cobalt and PREEMPT_RT) required ixing several ile conlicts.The authors' evaluation focused mainly on the round-trip time of messages exchanged between containers, which were observed to be between 50 and 150s.Based on this, authors conclude that applications with cyclic times of 500s can be successfully executed in this architecture.

Scheduling-based Methods
As mentioned before, running containers are just processes being executed by the Linux kernel and, as such, are given a processing time slot by the host OS scheduler at a certain time.Therefore, tweaking the way the scheduler works is one of the most relevant aspects for guaranteeing deterministic behaviour and achieving RT requirements.Although the previously mentioned works also (partially) rely on scheduler optimizations, in this sub-section we speciically discuss the works focused on the scheduling mechanism itself, instead of other kernel changes or patches.
Telschig et al. [77] presented a cross-domain and cross-platform RT container architecture for dependable and reliable distributed embedded applications, for leveraging safe dynamic updates of distributed embedded applications.The logical execution time paradigm is the basis for the proposed architecture, to uncouple functionality from timing, functionality from composition, and domain from platform concerns.Thus, it enables the easier deployment of interacting RT tasks to distributed nodes.The SCHED_DEADLINE scheduler mechanism ś which combines the Constant Bandwidth Server (CBS) algorithm to compute the scheduling deadlines and the EDF algorithm to schedule the tasks ś is used to achieve temporal isolation and to partially guarantee RT performance.According to the authors, this architecture enables a mixed-criticality approach where tasks ranging from hard real-time to best-efort criticality levels can co-exist.Also, for being cross-platform, dependency conlicts can be avoided, allowing the use of legacy components and reducing interoperability issues.A small embedded system combined with LXC containers is described as a working example.However, little or no empirical data retrieved from this example is presented.
Abeni et al. [1] presented a new hierarchical real-time scheduling system for Linux that provides temporal scheduling guarantees for multiple co-located containers, while being compatible with the most well-known container-based virtualization solutions.The proposed approach resorts to a 2-level scheduling hierarchy: at the irst level the SCHED_DEADLINE scheduling policy is used, implementing a CBS algorithm, while at the second level a standard ixed priority scheduler (SCHED_FIFO or SCHED_RR) is used.The former is used to schedule the real-time run queues of the control groups (cgroup), and the latter schedules the real-time tasks inside the cgroups.Experimental results combining LXC containers, the presented scheduling mechanism and an audio pipeline, show that the systems can achieve better response times and may decrease the needed real-time computational bandwidth, thus reducing or even eliminating the occurrence of xruns (bufer under or overrun events caused by deadline misses).
Likewise, Cucinotta et al. [15] [16] also implement a custom Hierarchical CBS scheduling policy to guarantee RT processing performance within containers located in private clouds, thus guaranteeing a stable performance in distributed cloud services.Similar to the approaches presented above, a SCHED_DEADLINE policy with CBS algorithm is used to select a control group to be scheduled on each CPU, and the SCHED_FIFO or SCHED_RR policy is used to select the tasks in the scheduled control group.This mechanism is presented as a solution to achieve ine-grained control of the temporal interference between co-located real-time services, while avoiding overheads incompatible with real-time requirements.LXC containers are used in the validation scenario, which includes multiple virtualized network functions (VNFs) deployed as containers, across multiple heterogeneous computing nodes and with distinct timing requirements.Each container within a node may have its custom scheduling parameter values (Q,P) for a proposed Hierarchical CBS (Constant Bandwidth Server) scheduler which extends the Linux kernel SCHED_ DEADLINE_CPU scheduler class -(Q,P) correspond to the amount of runtime (Q) every period (P), meaning that Q time units are granted to a task on the CPU(s) every P time units.
A probabilistic model is presented to optimize such values according to a pattern of requests modelled as a Poisson stochastic process.Various scheduling reservation parameter conigurations are tested and compared with theoretical expectations.The empirical results conirm that the presented model can deliver high predictability under the correct reservation parameters, even with interfering tasks.
Struhár et al. [75] also resort to the use of a Hierarchical Scheduling mechanism and, after an evaluation, conclude that real-time containers running on hosts with such scheduler mechanism are able to keep their allocated resources even in the presence of other RT or best-efort containers running heavy processing loads.Also, it is noticed that the runtime jitter stays very low, thus not inluencing the real-time containers but reducing the CPU utilization used by the best-efort containers.Nothing else is mentioned about guaranteeing in-node RT performance since the authors mainly focus on the orchestration mechanism.
Lee et al. [52] addressed the functionality and QoS issues that may arise when using traditional ieldbuses, such as CAN, in containerized virtual environments.For this matter, a lightweight CAN virtualization technology for containerized controllers is presented.Regarding functionality, a driver-level virtualization technology provides the needed abstractions for the virtual CAN interfaces and buses, while still maintaining transparency to the OS and other applications.This supports the sharing of the network interface among multiple virtual controllers in an isolated way, while maintaining CAN requirements and low overheads.Regarding the QoS, a hierarchical RT scheduler based on a periodic execution model is used to guarantee the accomplishment of hard RT tasks.Also, a simulator enabling the adjustment of phase ofsets of virtual controllers and tasks is provided.This enables sub-optimal phasing of controllers and tasks by having into consideration a global clock based on the IEEE 1588 standard and the best moment to execute a task, according to the end-to-end delay.Authors claim the presented solution reduced the worst-case end-to-end delay on a real system by up to 78.7%.Unfortunately, no detailed information is provided about the adopted containerized solution.
Similarly to the previous methods, scheduling-based approaches also seem to provide adequate support for real-time operation with worst case controlled latency requirements in the order of the hundreds of milliseconds.Again, the diversity of the studies, conigurations, architectures, and the diferent approaches adopted in each case makes it diicult to compare diferent works and infer more detailed conclusions.However, it has been shown that diferent strategies can be used while adopting a scheduling-based method, such as 1 or 2-level scheduling hierarchies or distinct scheduling algorithms, to achieve temporal isolation and achieve RT performance even in the presence of concurrent best-efort workload.

Other Notable Approaches
It was found that several papers did not it into any of the previous categories due to diferent factors, such as failing to detail what type of strategies were being used to guarantee real-time performance.This subsection is devoted for such cases which still hold relevance for our research question, being discussed next.
Garcia et.al. [32] presented a lexible and lightweight container architecture for industrial control to deploy lexible functions and virtualized control units with legacy support.The intended goal is to be able to adapt and deploy distinct distributed services within the system, according to the current needs and coniguration.Such a mechanism must have into consideration the strong isolation needs of each service and its associated QoS requirements, such as RT capabilities, security, and safety.The presented architecture relies on the Docker engine for the execution of containers that have a FORTE runtime application inside, developed accordingly to IEC 61499 function block principles.This application is responsible for the message processing and exchange with the cyber-physical systems (robotic arms) using TCP sockets.The Robot Operating System (ROS) is used in the nodes that host the containers.Also, EtherCAT drivers are embedded in the system.The conjunction of such technologies seems quite interesting in this context and, presumably, the coexistence of these elements ensures RT requirements when executing rt-tasks.However, nothing is explicitly mentioned about RT guarantees.
Hofer et al. [42] aimed for an Infrastructure-as-a-Service approach, studying the possibility of migrating real-time industrial control applications from dedicated hardware to virtualized servers with shared resources.Although most of the presented work is executed using a type-1 hypervisor (instead of a containerized environment), the presented study and its outcomes are still worth mentioning, as it also encompasses a short review of container frameworks, considering LXC/LXD, Docker and Balena.A review of operating systems is also presented, including resinOS, Ubuntu Core, Xenomai 3 and PREEMPT_RT, with the last two being selected for comparison.Exhaustive testing is performed under twelve diferent host conigurations and the outcome regarding latency performance is measured.Then, a hardware comparison was executed using the most favourable coniguration previously identiied, comparing CPU latency under multiple hardware options made available by Amazon Web Services (AWS).In this comparison, PREEMPT_RT always outperformed XENOMAI 3. The maximum latency achieved was 49ms and only 96 occurrences out of 10 million (0.00096%) were registered above the predeined upper limit of 100ms.The best performance by PREEMPT_RT resulted in a considerable low spread and in a peak value of 114s.Finally, a Balena container was used for latency tests also executed in the AWS infrastructure.Average values of 11.44 µs (σ0.71 µs) with maximum peaks of 11 644 µs were observed.The results lead the authors to conclude it is feasible to migrate RT industrial control applications to virtualized environments with shared resources.
Simone et al. [17] discuss a lightweight virtualization approach leveraged by hardware-assisted Trusted Execution Environments (TEE), like the ARM TrustZone technology or the Intel Software Guard eXtension (SGX), to guarantee temporal, spatial, and fault isolation ś thus increasing the determinism and safety of the system.This type of technology (internal to the CPU perimeter) provides capabilities of securing user space applications without requiring any call to privileged OS code.Even though it is an interesting approach and the authors briely analyse it from a containers perspective, they opt to complement the TEE technology with unikernels instead of containers ś therefore most of the presented discussion stays out of the scope of this paper.Nevertheless, such type of technology used along with containers could be quite relevant and worth being aware of.

Summary
After analysing the selected works, it can be concluded that, for some authors, guaranteeing real-time performance in containerized environments can be achieved by applying scheduling, priority and/or process preemption tweaks on Linux systems.However, other authors mention the need to complement those tweaks with custom conigurations and the use of speciic container platform lags to provide appropriate guarantees for containerized RT tasks.
Despite the diversity of the proposed approaches, it is often diicult to narrow down on the speciic aspects of each technique or its implementation details.In fact, authors often fail to mention which classes of real-time systems (soft, irm, or hard) were being considered within the scope of their work.Moreover, several works omitted details describing how RT-compliant behaviour was implemented/obtained for containerized RT tasks.An example of such omissions can be found in [10], where a Docker container platform was used to support an attack-resilient drone control mechanism running on a Raspberry Pi 3 Model B, which only mentions that a real-time patch was applied to a Linux 4.4, without further information or any evidence regarding the support for such hard real-time system (which is left for future work).In fact, even some studies that provide details about the implementation speciics for containerized RT-capabilities do not always describe in detail how the adopted approach helped ensuring compliance with RT requirements.Such omissions make it diicult to replicate the content presented in these works.
Nevertheless, and despite the aforementioned shortcomings, the works hereby discussed conirm that multiple methods can be used to achieve the determinism and predictability targets deemed necessary to comply with RT deadlines, and that these can be used independently or in conjunction with each other.As seen in Table 7, most of the works analysed in this survey opted for using the PREEMPT_RT approach.This may be due to the following reasons: the wide compatibility across Linux distributions, since it supports every long-term stable version of the mainline Linux kernel since kernel version v2.6.11; because its installation process is quite straightforward; and/or because it demands no extra tweaks or adaptations to the application to be executed (unlike co-kernel approaches that require the use of their own APIs).

EXPECTED ORDER OF MAGNITUDE FOR RT TASK LATENCY WHEN USING CONTAINERS
As mentioned before, low latency and real-time requirements are diferent concepts.By deinition, a real-time system can accommodate diversiied latency requirements, which can be by the order of magnitude of seconds, minutes or even higher.However, real-time cyber-physical systems often require stable low latency, or even ultra-low latency response times, so that the system can fulil its intended function (cf.Table 1).However, when it comes to containerized environments it was not possible to systematically assess which latency requirements can be guaranteed.Two main reasons concur for this: • The techniques presented and the focus of each work are variable, which makes each authors' focus on diferent aspects of real-time container virtualization and, therefore, present distinct depth according to their focus.• The results presented in the analysed papers do not follow any sort of measurement standard.The way of measuring latency difers, as well as the richness of the provided data (e.g.number of measurements performed, maximum value/outliers measured, absence of standard deviation, type of latency measured).This means that the results presented across diferent papers cannot be directly compared, nor do the aforementioned asymmetries allow to establish a correlation between latency values and methods used.For example, [73] [72] were among the very few that presented latency values considering processing latency and network latency, standard deviation and peak values.Nonetheless, these papers do allow the extraction of latency-indicative values that can be used to gauge the order of magnitude that can be expected when using containers.

Container-induced Overhead
When it comes to comparing the overhead induced by using containers with a non-virtualized environment, some works [34] [33] conclude that the containerized execution of control applications sufers a negligible and almost constant overhead (around 2s to 5s), with a noticeable increase in determinism, meaning that the latency distribution is narrow.In [59] it is also concluded that the usage of containers has a negligible impact on the system since no deadline has been missed, and the extra cost of using virtualization is considered to be around 20s. Liu et al. [56] also state that container-based virtualization does not add a perceptible performance loss in computing or in communication.However, the analysis performed is situated in the millisecond range, and no in-node real-time method is being used.In [72] it is stated that there is a global increase of latency in containerized environments, namely, more 150s in the Ethernet network request-reply with a wider standard deviation (more 44s), and more 144s in the on-node processing (with a wider standard deviation around more 46s).Sollfrank et al. [73] noticed an additional in-node processing latency of 43s when using containers.Li et al. [53] highlight the lightweightness of the container's virtualization technology, but also state that the container's performance variability overhead can go as high as 500% over the non-virtualized environment.Also, it is stressed that the performance overhead can vary depending not only on a feature-to-feature basis but also on a job-to-job basis.

Overall Latency of Containerized Environments
Regarding the overall latency values measured when using containerized environments, by looking at Table 7 it is possible to observe that some works [76]  These seem to be the best-case scenarios.However, others [77] [41] [33] already place the latency values over 100ms.Nonetheless, it is important to look at these results with close attention since those are mostly mean values.One should also consider the standard deviation to understand the determinism of the system, and also the peak values to know what is the worst-case scenario.If a soft or even a irm RT system can cohabit with some missed deadlines, the same does not happen with hard RT systems.Especially in the latter, peak values, rather than average values, should be taken into account when designing the system, so that RT requirements are met.For example, in [42] it is reported an average value of 11.44s with a standard deviation of 0.71s, but peaks of 11,644s were observed.Looking only at the irst two values, one could conclude that any system with deadlines around 1ms and allowing jitter up to 1s (motion control according to Table 1) could easily be supported.However, the observed peaks raise a red lag for the use of the presented solution for a hard RT system which has a deadline far below the observed peak.Nevertheless, it could be possible to use such a solution for more lexible soft or irm real-time systems able to tolerate a few missed deadlines.
Summing up, the observed latency results seem to be in line with the requirements of several types of cyberphysical systems.When looking to the processing latency, the cost of using a containerized solution seems to be somewhere between 2s and 144s.Some authors report more stable values with lower jitter, while others report higher variance in the results.Regardless, there seems to be a gap in the literature in what relates to this type of measurements.A formal procedure for rigorously deining how to measure the diferent types of latency would be a valuable asset, that could leverage not only the quality of each study but also the comparative analysis between diferent studies.Struhár et al. [75] tried to address this gap by studying operating system level metrics, as well as metrics to speciically evaluate the timeliness of tasks running in the system, and adapted those to assess the RT performance of containers.Nevertheless, more there is still a strong need for formal and validated metrics.

CONTAINER PLATFORMS SUITABLE FOR RT SYSTEMS
In general, selected studies use one of the following three distinct container platforms: LXC, Docker and Balena.

LXC
LXC (Linux Containers) [57] is a containerization engine that was irst presented in 2008, as a contender to virtual machines.Its goal was to enable a virtualization environment with a lower footprint and less overhead, when compared to virtual machines.It provides an API and some tools that allow for users to create and manage system or application containers (although being focused on the irst), also supporting multiple types of network conigurations.System containers tend to be stateful, albeit this is not mandatory, since it basically depends on its intended use.LXC functionalities can be extended with LXD [6], which builds on top of LXC while providing support for distinct network types and better storage coniguration.LXD supports the management of multiple instances (not only containers but also virtual machines), also providing functionalities such as running container snapshots, an API designed to leverage third-party tool integration, or simpliied over-the-network control, among others.

Docker
Docker [45] was presented in 2013.This framework, that initially enabled the creation and management of containers based on LXC, has evolved a lot since then.With the release of version 0.9, Docker stopped using LXC as the default engine and replaced it with its own libcontainer, a containerization library natively created in GOlang.This created an abstraction layer that enabled the support of a broader range of isolation techniques, and also allowed a considerable reduction of dependencies (since it allowed Docker to control several functionalities without relying on external packages, such as those related to control groups, namespaces, AppArmor proiles, irewall, and network interfaces).It also opened the doors for the use of containers on top of other operating systems and to the OCI standards.Since then, Docker has specialized in application containers.These were originally created to be stateless, ephemeral, portable, and as lightweight as possible.However, just like system containers, it really depends on the usage and on the trade-of between the aforementioned characteristics and having persistent storage.Like LXC, Docker also supports multiple types of network coniguration.It also ofers a REST API to simplify the management and integration with third-party tools, although Docker ofers a large set of tools such as Docker Swarm, that enables the creation, management and scheduling of Docker engines and containers.

Balena
Balena [4], originally designated as Resin.io, is a container-based platform for deploying and managing IoT leets that started in 2013.Balena's goal was to simplify the development, deployment and management of software for IoT devices, by leveraging the usage of Linux containers and other open technologies.Nowadays, Balena is a full infrastructure ecosystem (balenaCloud) composed of several modules that focus on the needs of IoT leet owners.One of the most important modules is the container engine (balenaEngine), that is built on top of the Moby Project [66], compatible with Docker containers.It is optimized to run in the edge of IoT environments and, for this reason, it was stripped from capabilities present in Docker that were considered unnecessary in such environments (e.g.Docker Swarm, plugin support, cloud logging drivers, overlay networking drivers, non-boltdb backed stores as consul, zookeeper, etcd).Also, it uses bandwidth more eiciently, has smaller binaries, and uses storage and RAM more conservatively, to be compatible with less powerful embedded devices.Another relevant module is the balenaOS, which is a lightweight operating system, based on Yocto Linux, that is optimized to run containers on embedded devices with a special focus on reliability over long periods of operation.

Platform Suitability
The analysed papers showcase Docker as the dominant platform, with a noticeably wider acceptance.However, this doesn't mean it might be the best option for all cases.For instance, LXC can be used to leverage system containers to emulate legacy operating systems (as seen in [33], where it was used to emulate an old PowerPC) or when there is a need to provide independent Linux servers while avoiding the cost of using virtual machines or dedicated hardware.
On the other hand, Docker is more suitable for micro-services, since it focuses on application containers that are more lightweight, simple to manage and scale.The use of micro-services seems to be the approach that most authors adopt when considering RT environments in IACS.Works like [32] and [34] leverage these capabilities, which are deemed suitable for micro-services, to implement the concepts presented in the IEC 61499 standard, namely the need to work with modular function blocks with granular functionalities and a high potential for reusability.
The aforementioned paper analysis also revealed Docker as an efective option for low-power devices, such as those commonly found at the network edge.An example of this trend is provided by Morabito [62], who studied the use of Docker containers in low-power nodes such as single-board computers.After analysing the CPU, memory, disk I/O, network performance and the energy eiciency, among other parameters, it was concluded that there was a negligible impact when using containerized environments in such hardware, even when running multiple concurrent instances.Adding to the size of the Docker community and its extensive documentation, this can explain Docker being the most used container platform in this area of research.
The standards discussed in Section 2) strongly contributed to the emergence of new solutions to implement container virtualization.This contributed not only to the evolution of existing platforms, but also to increasing the compatibility among them.Since both LXC and Docker platforms support images compatible with the OCI standards, it is possible to create an LXC container from a Docker container image.It is also viable to combine both types of containerization.For example, it is possible to provide a complete Linux system using an LXC container and run multiple Docker containers inside.Thus taking advantage of a hypothetical hardware consolidation but without having the overhead of a virtual machine and still taking the most of Docker capabilities (such as portability, scalability, and small ingerprint) to run multiple applications or micro-services.
Although Balena was only mentioned in one paper, its optimization for embedded devices, support for IoT leet management, and the strong reliability provided by Moby makes it a container platform worth considering for IoT environments.In [42] the authors adopted Balena due to its lexibility, ease of use, and the properties coming with the stateless containers.
Overall, the conclusions reached by the diferent authors lead us to believe that under the correct environment and conigurations most, if not all of these platforms, can deliver deterministic real-time performance.

CONTAINER ORCHESTRATION IN RT CONTEXTS
In the world of IACS, we can ind systems with varying degrees of complexity, ranging from simple deployments often involving few sensors/actuators and a single control device, to complex and large-scale systems made up of tens, hundreds or even thousands of devices.For the latter cases, the adoption of containerization can lead to the creation of a considerable number of container instances, thus becoming potential candidates for the adoption of micro-service based approaches (where a single monolithic service may be partitioned into several micro-services, each running in a distinct container).In such situations, it becomes impracticable to individually manage each container, thus requiring the adoption of container orchestration technologies to address the operational needs of a container leet often spread across heterogeneous computing nodes.This technology leverages the use of containers, by enabling a centralized view of all the container infrastructure resource pool and by allowing the automation of multiple distinct tasks (cf.chapter 2).Also, orchestration can be applied in more distinct environments and architectures, like local, cloud, and edge-cloud.As such, container orchestration encompasses several capabilities, and can be optimized and exploited in diferent ways ś as discussed in some of the reviewed works.

Energy Consumption
In the scope of energy consumption, Kaur et al. [47] tackled the need for scheduling jobs around diferent IIoT nodes in a containerized network while optimizing energy consumption vs. performance, with minimal interference from coexisting containers.To achieve such goals, a scalable and comprehensive new controller for Kubernetes is presented, whose aim is to map/schedule containers across the available nodes.In this case, orchestration is tackled as a multi-objective optimization problem, formulated with diferent constraints (i.e.task deadlines, available resources, energy consumption, source of energy, and statistical data).After evaluating the proposed scheduler against existing schedulers using RT Google traces, it is concluded that it improves energy utilization by 14.42%, while also improving the performance/interference ratio by 31.83% and reducing the carbon emissions by 47%, when compared to the FCFS scheduler.
Still in the same scope, Okwuibe et al. [64] researched end-device energy consumption and latency in dynamic environments, when combining containerized environments with other technologies such as software-deined networking, network-function virtualization, and multi-access edge computing 5G services.Authors used Kubernetes as the orchestration tool, and focused on optimizing the resource allocation for edge-co-located IIoT services.Those services were dynamically oloaded to other nodes closer to the end device, thus helping to reduce the power consumption and latency of end devices.Although the mean migration time was 4.450ms, the authors suggest it would be possible to reduce this time by applying machine learning techniques and preemptively moving containers ahead of time.It was also concluded that the use of Docker containers increased the energy consumption by up to 4.5% and added 5 seconds to the initialization time of the case study application.
Other proposals such as Lin et al. [55] propose strategies for VM placement and reallocation aiming at optimizing performance and energy conservation in server clusters.While not speciically oriented towards RT workloads, the Peak Eiciency Aware Scheduling (PEAS) could be adapted for container environments, considering RT-compliance in addition to CPU, memory, disk space and bandwidth constrains, eventually expanding the Computing Resource Unit metric to also cover determinism guarantees.

Orchestration
Orchestration can also help minimizing the cost of executing time-critical containerized tasks in a secure manner.An example for this use case is provided by Singh et al. [71], which used Docker Swarm to enforce security, deadline and cost constraints.By using game theory, the authors were able to take into account diferent parameters, such as the cost of the security mechanisms, the resource costs, and the execution deadline thresholds, to schedule the tasks (or a portion of them) along diferent machines (working nodes).Although some simpliications were made, like assuming that the working nodes cannot deny the execution of an assigned fraction, or the nonexistence of any sort of dependence between tasks, the validation outcomes seem encouraging.
Orchestration can also help allocating or deallocating resources at the right moment according to the intended goals, something that might be of utmost importance when dealing with time-critical requirements.For instance, Struhár et al. [75] extended the Kubernetes scheduler, optimizing it for RT container scheduling.The implementation consists of two main components: RT Scheduler Extender and the RT Manager.The former extends the Kubernetes control plane, providing admission control and scheduling of real-time containers across the available container nodes.The latter lives on the compute nodes, enabling deployment of real-time containers and monitoring of the in-node performance, which is periodically reported to the master node.Container-level metrics are deined to evaluate the RT performance of tasks being executed on the container nodes: the number of missed deadlines, maximum lateness, and maximum response time within a certain period.Both compute nodes and containers' characteristics and requirements are registered in advance and, if needed, updated.When there is a need to run a new container, the admission control analyses if the resources and timing requirements of the containers can be met and allocated in one of the available nodes without negatively afecting already running containers.The presented evaluation shows that a mixed-criticality reality is enabled by the implemented system.Multiple RT containers and best-efort containers can co-exist on a single compute node without negatively afecting the performance of real-time containers.
Yadav et al. [79] addressed the availability and performance of containerized services by proposing a resource provisioning mechanism for managing dynamic and luctuating workloads.This proactive workload manager uses a modiied PID algorithm (Proportional Integral Derivative ś a control theory-based algorithm mechanism), in conjunction with the HAProxy (High Availability Proxy) load balancer and Docker Swarm orchestration tool, to perform dynamic resource provisioning according to the response time of the system.A response threshold is predeined for the system.The PID algorithm takes average response times as inputs and calculates the optimal number of containers to achieve the desired threshold.In its dynamic variation, the HAProxy uses a diferent algorithm to optimize the handling of requests, depending on the number of existing containers.When fewer containers are deployed round-robin is used, and when the number of containers rises the least-connection algorithm is applied.In other words, the presented mechanism controls the horizontal scaling (in and out) according to the desired system response time.The evaluation results show clear improvements in the overall response time, with a quick convergence to the desired value.Although the presented mechanism only uses a single parameter (response time), the authors suggest that other parameters, such as CPU and memory utilization, might be used to achieve better performance.
Yin et al. [80] use fog nodes to provide extra computational resources near the factory loor, in order to meet hard RT requirements.The authors consider that the fog nodes are located near the terminal devices (other aforementioned studies call them edge nodes but assume the same architecture and proximity).Since these nodes have limited computation, storage and network resources, authors propose a container-based task-scheduling mechanism algorithm that takes into consideration the real-time requirements of tasks and the high concurrency of fog nodes to determine which tasks need to be scheduled to a fog node and which tasks can run in the cloud.A resource-reallocation mechanism is proposed, to maximize the resource utilization of the fog nodes while minimizing the task delays (task completion time).This mechanism calculates the resource quota needed for each task in the subsequent period during the execution phase and, if needed, reallocates the CPU resources in the fog node to maximize the data processing per cycle, always having into consideration the task deadlines.The output of the reallocation mechanism is used as input to the task-scheduling mechanism that, in this way, becomes aware of the available resources on the fog nodes.It is concluded that using the reallocation mechanism reduces the tasks' completion time by 10%, and that using the proposed task scheduling allows to increase the number of tasks processed in the fog nodes by 5%.
In a slightly diferent approach, Lin et al. [54] proposed a containerized solution for eicient digital twin simulation for smart industrial systems, aiming at creating a system that consumes fewer resources than those being commonly used, while still producing trustworthy outputs.To this end, the authors presented a simulation as a service (SimaaS) architecture that is able to deliver large-scale models on demand, enabling the creation and deployment of digital twins instances (as well as all the related services) across heterogeneous infrastructure nodes.As stated, since the simulation must be synchronized with the physical system, the cloud, edge nodes and end devices (that may also be containerized) must be selected in an optimal way.Although the authors do not explicitly mention orchestration (except in one of the schematics), the complete description that is provided implicitly points to the use of containers orchestration, covering the following aspects: large-scale on-demand service; eicient management and collaboration of cloud, edge and end resources to meet the strict latency requirements; scheduling and automatic deployment of containers; and dynamism requirements.Based on their testing simulation, authors empirically conirm a signiicant improvement regarding system eiciency, compared to typical heavyweight virtual machine alternatives.
Carvalho et al. [8] (and later in [7]) used OASIS TOSCA (Topology and Orchestration Speciication for Cloud Applications) templates [65] for orchestration purposes within the scope of a protection and control architecture for electric power substations using containerized real-time services, providing a portable way to deine topologies, connections, dependencies, capabilities, and requirements for the entire infrastructure.For the speciic implementation, the authors resorted to the xOpera orchestrator [19], together with Ansible playbooks to deine orchestration actuators corresponding to the TOSCA lifecycle standard interface operations for provisioning and coniguration automation.
In brief, container orchestration can be applied in very distinct environments, as well as local, cloud, or edge-cloud architectures.In such scenarios, orchestration capabilities can be instrumental for RT-optimized container scheduling, provisioning and instantiation.As previously mentioned, orchestration capabilities can be used to lower costs by optimizing resource allocation, accordingly to real-time task requirements and fog node availability, thus leveraging the combination of cloud and edge infrastructures to determine which tasks need to be scheduled to a fog node and which tasks can run in the cloud.Such capabilities can be applied in the context of use cases such as digital twin management and deployment or reinforcement of infrastructure security and resilience in scenarios with speciic RT requirements (as it is the case for electric power substations in smart grids, for example).As such, the orchestration of containers is one of the most relevant mechanisms to leverage the use of containers in distributed systems.

Live Migration
Another relevant factor while using orchestration has to do with the ability to perform live container migration.Live migration is the action of moving a container from one server to another without losing the state of the running applications/services (including RAM, CPU, and network state), independently of the server location.This is useful for many situations: to move workloads to other nodes or locations prior to performing server or infrastructural maintenance operations; when there is a need to load balance a system; or when there is a need to keep the services as close as possible to mobile end-devices which are on the move.
When performing a migration it is essential keep downtime to a minimum.This is a critical point, since during this stage the services running inside the containers become inaccessible.If the downtime is higher than the response time required by the industrial application one or more deadlines will be missed and a hazardous result can occur.Govindaraj et al. [36] proposed a new migration scheme named redundancy migration, and evaluated it against the LXC/LXD stock migration.The authors focused especially on edge computing and started by splitting the concept of downtime into migration time and downtime.The former corresponds to the full duration of the migration, starting from the migration trigger until the moment the services become available on the destination server.The latter represents the period during which the services are completely halted.According to the authors, downtime is the more relevant metric, since it used in service level agreements (SLA).The proposed redundancy migration relies on a migration controller and on a traic controller that runs on both source and destination servers and has four main phases.First, a bufer and a rerouting of traic between the client and the container are created.Next, a clone of the container is created in the destination server.Next, the new container starts to consume the packets kept in the bufer to catch up to the state of the initial container.Finally, the new container takes over and the initial container can be shut down.Experimental results show a downtime improvement by a factor of 1.8 (from 2.97s to 1.68s), and an extra overhead in the migration time by a factor of 1.7 (from 9.00 to 24,3), in comparison to the LXC/LXD stock live migration.Although the migration time is improved, the mechanism that supports it is responsible for increasing the migration time.
Krüger et al. [50] used the dynamic orchestration characteristics ofered by Docker Swarm to reallocate containerized services to mitigate the failure of power grid services caused by disruptive events in the grid services.A RT test platform was created to test the presented solution.The control function of devices such as Intelligent Electronic Devices (IED) and Remote Terminal Units (RTUs) was virtualized using Docker containers, as well as multiple grid functions and services such as State Estimation, Data Acquisition, and Coordinated Voltage Control.Constant monitoring is performed and the outcome data is sent to an anomaly detector.When abnormal behaviour is detected, an alarm is raised and sent to the Service Controller (Docker Swarm), which then decides if and which services need to be reallocated (it may be only one service or a full service chain of services).It is concluded that this mitigation strategy based on container orchestration reduces the disruption of services.

Summing Up
The simple use of containers, without extra technology or tools, can bring added value to an IACS by itself.Nevertheless, containerized solutions are greatly empowered when used in conjunction with orchestration tools.This approach assumes a relevant role in all types of scenarios that demand automated actions triggered by information collected from the system infrastructure continuous monitoring and/or other sources, especially in distributed architectures like edge-cloud or IIoT environments.Failing to launch a container at the right time or at the right location may lead to service unavailability, SLA violations, higher power consumption, higher latenct, and other constraints.As such, when planning and designing a containerized environment, one should always have into account the advantages of using orchestration tools to optimize the system and achieve the best performance, according to the intended goals.

OPEN CHALLENGES
The interest in the use of container virtualization has been growing over the past years, especially in the IT world, prompting its evolution towards increased maturity levels.Part of this popularity is also due to the community that emerged around these technologies and which has been steadily growing up to a sizeable number of contributors willing to address open questions and contribute to their evolution.The usage of container technology in operational environments with real-time requirements is a diferent matter, due to the contextspeciic challenges that are characteristic of such a niche domain.While the previous chapters identiied and addressed numerous challenges related to the usage of containers in RT environments (in many cases with quite interesting solutions), there are several relevant gaps to be tackled.Next, we compile some of the more relevant challenges identiied by the authors of the surveyed papers.

Container Placement
When deploying a container image or performing a live migration of real-time containers, it is necessary to pre-allocate resources on the destination host according to the requirements of the tasks running inside the containers.Moreover, it is necessary to be aware that other containers may be running on that same host, with their own resource needs and potential concurrency impacts.This process can take some valuable time and may lead to some overhead or even service downtime.Also, it is necessary to be aware of the network status and availability of the destination host.
Yen et al. [80] alerted for the need to address the process of inding an optimal node to place the container in order to further reduce the task-execution time and network traic.Cucinotta et al. [15] state there is still work to be done regarding resource allocation to dynamic workloads and on how to deal with overload conditions.In line with this observation, Moga et al. [61] alert to the case where a real-time task does not respect its designed parameters (due to being compromised or other anomalies) and say it is necessary to have a dynamic orchestration algorithm that can deal with such situations.Abeni et al. [1] think there is a need for a more in-depth analysis regarding the scheduler of parallel RT activities deployed in multi-CPU containers.Govindaraj et al. [36] mention that live migration downtime could be further reduced and Morabito [62] alerts for the lack of support from Docker to perform live container migrations between diferent entities, especially when there are strict requirements on latency.Okwuibe et al. [64] state that an extra layer to handle orchestration introduces extra latency, and Struhar et al. [75] go further ahead and say that no orchestration system is natively considering RT requirements for containerized applications.Also, Moga et al. [61] state the need for middleware able to host containerized micro-services with RT capabilities that comprise communication, runtime and isolation.

Communication
As already mentioned, a containerized service can consist of multiple real-time tasks being executed in several distinct containers, following a pattern similar to micro-service deployments.This necessarily implies that determinism may be important not only for in-node processing, but also for inter-container communications.
Some authors have questioned whether inter-container communications are able to comply with RT requirements [40] [59], with similar doubts being raised regarding data sharing across containers [34] [33] ś such concerns are further aggravated by the fact that containers may or may not be hosted in the same node.As such, Tasci et al. [76] stated that container design cannot assume the availability of shared memory/resources for inter-container communication, thus calling for the need of message system frameworks compliant with real-time requirements.The determinism of the standard Ethernet is also questioned by Sollfrank et al. [72] [73], which deems it as not suitable for RT environments.This doubt is also expressed by Albanese et al. [2], which suggests further research regarding this matter should be undertaken, for instance addressing the use of Time Sensitive Networking (that, among other standards, includes the IEEE 802.1AS standard for time synchronization).

Security and Safety
The speciic security and safety concerns for container virtualization have been widely addressed in the IT world in the last few years, with clear improvements.However, in the speciic domain of RT applications, only a few works have focused on security and safety aspects.
Cinque et al. [13] [12] raised some questions regarding the isolation of faults in the hosting nodes.There is a need to reinforce the isolation mechanisms that prevent the propagation of faults from the host system to the containers and between containers.If such a problem is not efectively addressed, a single external fault may compromise the entire system.
Chen et al. [10] focused on defending real-time systems against DoS, achieving interesting results.However, they were not able to validate their approaches on hard real-time systems.In our opinion, there is still work to be done in this area, since hard real-time systems cannot aford missed deadlines, making it more challenging to address security and safety.

Public Infrastructure
In the IT world, the use of public cloud or edge-cloud infrastructures for containerized solutions is very common, due to factors such as accessibility, convenience, and scalability.However, when deterministic real-time requirements are mandatory, this may not be the best choice.Liu et al. [56] state that current implementations of edge-cloud containerized infrastructures do not fully support the requirements of RT industrial applications, especially in the context of concurrency, and suggest that the performance should be improved up to 20 times.

Others
Storage transaction speed also seems to be an open issue when using containers in time-restricted systems.While it seems proven that the latency at the processing level can reach values and determinism compatible with RT systems, according to Li et al. [53] the same does not hold in terms of storage transactions' speed, which may represent a bottleneck for systems with I/O needs.
Thinking on the creation of containerized PLC logic, and with the intent of reducing human error, Cervini et al. [9] point to the lack of an automated tool that could ingest logic samples and output containerized PLC variants.Also, such tool should automatically validate the logical equivalence of both physical and containerized PLC and, eventually, propose some optimizations.

CONCLUSIONS
In this paper we presented a systematic literature review on container virtualization applied to real-time environments, with a special focus on industrial and automation control systems.
By looking for answers to the proposed research questions, it was possible to identify not only how containers are being used in this context, but also several relevant aspects, such as: the techniques being used to guarantee real-time compliance in the container hosting nodes; the order of magnitude for the in-node processing latency; the container platforms being used and when to choose each; how is container orchestration being applied; and key remaining challenges of container technology in this domain.It is concluded that containerized solutions are compatible with industrial and automation real-time cyber-physical systems.Several works have shown the capability of running containerized real-time tasks in multiple hardware architectures, with distinct processing capabilities and in the presence of concurrent workloads.It was also proven that low-powered single-board computers can also comply with RT requirements, opening the doors for the possibility of executing a real-time task in the network edge, where the hardware resources tend to be more limited than in cloud architectures, or in IIoT systems.
However, there are still some challenges that need to be addressed, especially regarding the deployment and orchestration of containers, and also at networking level.Some research works partially addressed those challenges with encouraging results.Some validate the usage of full orchestrated containerized solutions, but only in laboratory environments.More research is needed before securely transposing this technology to production environments, especially research that encompasses and validates as a whole all the diferent aspects of container virtualization (runtime, orchestration, networking).

Fig. 2 .
Fig. 2.An overview of the fundamental container platform building blocks

Table 1
. Industrial Systems End-to-End Latency

Table 4 .
Study selection criteria

Table 5 .
Search query statistics

Table 6 .
Publications ACM Comput.Surv.Container-based Virtualization for Real-Time Industrial Systems ś A Systematic Review • 15

Table 7 .
Research Summary