Learning Models of Cyber-Physical Systems with Discrete and Continuous Behaviour for Digital Twin Synthesis

Digital twins are used to simulate (cyber-physical) systems and offer great benefits for testing and verification. The importance of quickly and efficiently constructing digital twins increases with the appearance of devices of greater complexity. Furthermore, the more (varied) behaviour the digital twin captures of the simulated device the more use cases it can be used for. In the presented thesis we investigate methods from automata learning and machine learning to automatically synthesise digital twins from cyber-physical systems, capturing both discrete and continuous behaviour. Our aim hereby is to combine methods from both fields and utilize their respective strengths to build better digital twins from cyber-physical systems in practice. We already developed an algorithm that learns discrete behavioural models even in the presence of noisy data.


INTRODUCTION
From smart homes to automotive measurement devices, ever more software systems integrate with the physical environment.Such cyber-physical systems (CPSs) allow for many opportunities and may be used in a variety of use cases, however, they are often difficult to engineer and verify.One method that shows much promise in solving these problems are digital twins: There are a variety of definitions for digital twins [19], we use the term digital twin (DT) to mean a virtual model or simulation of a physical device that behaves like the original device.DTs connect virtual and physical spaces and integrate between them, allowing for real-time simulation, predictive diagnosis, testing and performance optimisation among other uses.They can be used in almost every phase of a product's lifecycle and, especially for large or complex devices, can allow for much cheaper and more efficient testing and development of a product [21].Our industry partner AVL List GmbH applies DTs in their DevOps for integration testing of embedded software, verification of measurement devices and test generation.In the future, they also plan to use this for online maintenance and anomaly detection.
As more and more devices of greater complexity appear the importance of constructing DTs efficiently and quickly rises.While it is possible to build DTs manually, this is accompanied by problems, mostly higher costs due to manual labour performed by domain experts and flaws in the models of DTs due to insufficient information about certain behaviours or undocumented edge cases.Therefore, the automatic learning of the models and behaviours of systems to either generate or enhance existing DTs is of considerable interest.
On the one hand, the field of automata learning (AL) deals with the learning of automata models representing formal languages, which is a very active field of research.It enables learning discrete behaviour and logic of black-box systems, which is based only on observations and can thus be performed on any system even if no source code, documentation or expert knowledge of said system is available [9,16].There are many different algorithms for learning discrete models [3,22] and mining specifications [10], such as the famous L * [5], and many others, some of which are shown in our earlier work where we benchmarked different combinations of learning and testing algorithms [4].
On the other hand, most systems today have a variety of sensors that report about a wide range of values such as temperature, fill level and many other physical values.These are continuous, mostly in the form of time series, and can be represented by arbitrarily precise numbers instead of discrete states.While AL is able to learn such behaviour if discretisation is applied [2], such as 'high' or 'low' temperature abstraction, DTs can benefit from including more precise readings.After all, the more accurately a DT reflects the behaviour of the real system the more use cases are possible and thus the utility of the DT increases.The field of machine learning (ML) has grown rapidly in the last years and has a great variety of methods for learning, generating and predicting continuous values.These methods show great promise for learning even complex continuous behaviour and represent an opportunity for better and improved DTs.While ML is able to also capture discrete behaviour it seemingly has a harder time doing so [13] and methods to stabilize this behaviour reduce overall network expressiveness [15].
We aim to combine methods from the fields of AL and ML in order to learn both discrete and continuous behaviour from CPSs.Hence, the research questions we want to answer are: RQ 1: Which AL methods are suitable for learning discrete behaviour of CPSs in practice and how can they be applied?RQ 2: Which ML methods are suitable for learning continuous behaviour of CPSs in practice and how can they be applied?RQ 3: How can said methods be combined to learn or improve DTs of CPSs with both discrete and continuous behaviour?RQ 4: Which practical issues are encountered when learning from real CPSs and how can they be overcome?
The expected contributions of this work should yield a toolkit or set of methods that can be applied to a wide range of CPS in practice to learn their discrete and continuous behaviour using AL and ML methods to build or improve DTs.These DTs should then be useable in a variety of ways for industrial use cases [21].
To the best of our knowledge, the research is novel as prior work has not tried to combine AL and ML to model CPSs.

RELATED WORK
There do exists methods to learn so called hybrid automata, timed automata being a special case, with both discrete and continuous behaviour [11,12,14,18,20].These are mostly limited in the types of signals they can learn or challenged by noise and other artefacts that appear in practice.One of the main improvements we want to make over these existing methods is to take advantage of versatile ML techniques to better learn a variety of continuous behaviour.
Other related approaches include model-based reinforcement learning [17], mining of signal temporal logic properties [8] or shape expressions [7], and extending discrete models with continuous aspects (time) using linear regression [1].Some of these methods may provide inspiration for our own research.

DISCRETE BEHAVIOUR
Current State: In the author's master's thesis, automatic learning methods were employed to detect undocumented behaviour in a measurement device that was not included in its manually constructed DT, which could then be improved using the learned behavioural model [23].This showed that the automatic learning of such models is viable for creating and improving DTs, which partially inspired the current doctoral research.
Furthermore, we developed an algorithm based on Partial Max-SAT able to learn behavioural models from execution traces while filtering out noise [24], which was accepted to ICSE2024.This allows us to learn from CPS in practice where message loss or other faults would prevent us from using other more standard AL methods.
Work in Progress: While the proposed method was able to learn industrial measurement devices, among others, the method itself does not scale to very large systems.In the future, it might be of interest to learn the discrete behaviour of very large systems, in which case the method would have to be re-examined or improved to allow for this.

CONTINUOUS BEHAVIOUR
Current State: We are currently researching and testing different methods of learning and generating time series data, especially focused on generative models such as variational auto-encoders (VAEs) [6] and conditional variational auto-encoders (CVAEs).

Work in Progress:
We plan on using the capabilities of these methods in a number of different ways: (1) Generating representative, i.e., from the appropriate distribution, continuous behaviour for each state in the discrete behavioural model.This would allow us to extent a DT with realistic behaviour that was not captured by discrete states alone by learning the underlying data distributions in each discrete state -or between states.(2) Anomaly detection can be performed by testing whether some observed behaviour from the CPS is consistent with the DT.For example, by running the DT online in parallel with the CPS, the behaviours of both can be directly compared to each other and detect mismatching behaviour.This could be done by testing whether the observed behaviour of the CPS is in the learned data distribution of the DT for a given state.(3) Find useful state labels with which to differentiate states for discrete learning.Sometimes discretization or abstraction must be applied in order to get useable state labels.If no other obvious separation exists then ML methods may be suited to extract useful and convenient state labels for learning the discrete behaviour.

COMBINING DISCRETE WITH CONTINUOUS
Current State: We are currently using the learned discrete behavioural model to explore the CPS and record the continuous behaviour which we want to learn using ML.Using the model allows for the mapping of continuous behaviour along specific execution paths and makes the acquisition of relevant data easier and faster.

Work in Progress:
We consider two ways to combine the discrete with the continuous models: (1) We might use the discrete states and supply them to a CVAE as a condition.This would allow us to learn a richer ML model and allow the CVAE to select the appropriate continuous behaviour itself.(2) We might also use the discrete behaviour, e.g., in the form of a finite state machine, and extend it and its states with the continuous behaviour learned by ML, for example, by learning a single VAE for each individual discrete state.

CONCLUSION
This report gives an overview of our current research and work in progress.The first year of our doctoral studies has now passed and we have already made strives in synthesising discrete behavioural models for DTs while dealing with practical issues such as noisy data.Our current research is focused on learning the continuous behaviour of CPSs and in combining discrete and said continuous behaviour into more advanced DTs.The research up until now led to the publication of a paper at ICSE2024.