Towards Automatic Inference of Behavioral Component Models for ROS-Based Robotics Systems

Model-based analysis is a common technique to identify incorrect behavioral composition of complex, safety-critical systems, such as robotics systems. However, creating structural and behavioral models for hundreds of software components manually is often a labor-intensive and error-prone process. I propose an approach to infer behavioral models for components of systems based on the Robot Operating System (ROS), the most popular framework for robotics systems, using a combination of static and dynamic analysis by exploiting assumptions about the usage of the ROS framework. This work is a contribution towards making well-proven and powerful but infrequently used methods of model-based analysis more accessible and economical in practice to make robotics systems more reliable and safer.


INTRODUCTION
Ensuring that robotics systems operate safely and correctly is an important challenge in software engineering.As robots are becoming increasingly integrated in work environments and the daily lives of many people [15,18,26,38] their faults can potentially cause dramatic harm to people [5,17,37].However, ensuring that robotics systems are safe and operate correctly is hindered by their large size and complexity [1,2,22,27].Many robotics systems are comprised of hundreds of thousands of lines or millions of lines of code [35].
Robotics systems, especially systems written for ROS [31], the most popular robotics framework, are often component-based, i.e., are implemented as independently deployable run-time units that communicate with each other primarily via messages [3,9,19,31,35].They can be comprised of hundreds of software components, each of which can have complex behavior [6,23,35].Many ROS systems are predominantly composed of reusable component implementations created by external developers [21].In this context, the main challenge is their correct composition [10,35].
The composition and evolution of software components is errorprone, since components regularly make undocumented assumptions about their environment, such as receiving a set of initialization messages before starting operation.When composed inconsistently, the behavior of these systems can be unexpected, such as a component indefinitely waiting, not changing to the desired state, ignoring inputs, message loss, or publishing messages at an unexpectedly high frequency [10,14,36].In this paper I call these bugs "behavioral architectural composition bugs", because they are caused by inconsistent compositions and impact the software architecture's behavior.Finding and debugging behavioral architectural composition bugs in robotics systems are usually challenging, because components frequently fail silently, and failures can propagate through the system [1,2,16,22].
Software architects commonly use model-based architecture analysis to ensure the safety and correct composition of components [4,8,13,25,28,29].Model-based analysis is a design-time technique to evaluate whether design options meet desired properties.Systems are modeled as a set of interconnected views, such as component-connector models (describing what components are in the system, what ports they have, and how ports are connected between components), behavioral views (state machines or activity diagrams describing the dynamic reaction a component can have to receiving messages at its ports), and deployment views (mapping component instances to processing units) [11].Using models of interconnected views of the current system, architects can find inconsistencies or predict the impact of changes on the system's behavior.
However, in practice, due to the complexity of robotics systems, creating models manually is time-consuming and difficult [12,13,39].This motivates work on automated model recovery to reduce the modeling effort and make formal analysis more accessible in practice.
Architectural recovery techniques, such as ROSDiscover [35], HAROS [32,34], and the tool by Witte et al. [40], can reconstruct component-connector models.However, they do not reconstruct behavioral models.Without behaviors model-based analysis cannot reason about dynamic aspects that describe how the component reacts to inputs and how it produces outputs, such as whether a component sends a message in response to receiving an input, whether it sends messages periodically, and what state conditions or inputs determine whether it sends a message.Therefore, these approaches cannot find behavioral architecture composition bugs.
Existing approaches for inferring behavioral models, such as Perfume [30], use dynamic analysis to infer state machines from execution traces.However, these approaches cannot guarantee that the relationships they find are causal since they observe only correlations within behavior.
To address the challenge of automatically inferring behavioral component models for ROS-based systems, I propose to use a combination of static analysis of the system's source code written in C++ and dynamic analysis of systematically generated execution scenarios based on results of static analysis.In general, inferring behavioral models is undecidable [24].Even a partial solution is practically challenging, because the analysis needs to infer what subset of arbitrary C++ code gets compiled to be executed as a single component, what subset of this component's code communicates with other components, and under what situations this code for inter-component-communication is reachable.Fortunately, the following observations about the ROS ecosystem make this problem tractable for most cases in practice: (1) Component architectures and behaviors are defined via Application Programming Interface (API) calls that have wellunderstood architectural semantics [33].This simplifies the inference of architectural models by reducing the search space of architecturally-relevant code.(2) The composition and configuration of components to build larger systems are done in separate architecture configuration files (i.e., launch files).Most of these result in "quasistatic" systems.That is, architectures rarely change following run-time initialization [33], which simplifies the structure of inferred architectural models.(3) Behavioral patterns, such as periodically sending messages, are usually implemented using features provided by the ROS framework.Hence, most instances of those patterns follow a similar implementation template, which simplifies behavioral model inference by reducing the search space.
Based on these observations I propose the following main contributions: (1) A static analysis approach that infers component behavior models from ROS code by looking for API calls of architecturallyrelevant behavior and common pattern to implement common behavioral patterns in the ROS ecosystem.This contribution has been completed already.(2) A dynamic analysis approach that completes statically inferred models by designing systematic experiments that observe the missing behavior and add information of run time quality attributes, such as execution time.
(3) A model-based analysis of the resulting models that translates them into executable PlusCal/TLA+ models with a set of commonly requested analyses from the robotics domain.(4) An end-to-end evaluation of the presented approach that measures the effectiveness of the approach at finding bugs and the practical usefulness.
The main claim that my proposed work is trying to prove is:

OVERVIEW OF THE APPROACH 2.1 Problem Description
To define the semantics of the behavioral models that the presented approach infers, this section introduces the formalism of behavioral component models that will be used throughout the paper.The models are a variant of input-output state machines that describe the externally visible behavior of a component (i.e., the use of its ports).
An input trigger is a message arriving at an input port  ∈   , a periodic trigger   ∈  with frequency  ∈ , or a component event  ∈ , such as "component started".To keep the model simple, the content of messages is not modeled.

Environment configuration 𝑬 𝒏𝒗
The environment configuration  ∈  includes the hardware platform on which the software is going to be executed (e.g., number of cores, frequencies) that impact performance, as well as physical properties, such as radiation, temperature, battery power left, or wind speed, that might impact quality attributes of the system.
Outputs  ⊆   ∪ { } Outputs are either messages sent at output port  ∈   or the empty output  for transitions that only change the state but do not produce an output.
The partial transition function  :=  ×  ×  ⇀  ×  ×  is represented in the pre-and post-condition form with preconditions being predicates on system environment configurations  ∈ , states  ∈  and inputs  ∈  that define for which inputs and states the transition is triggered and post-conditions defining an output  ∈  and the next state  ′ ∈  in terms of  and , and the duration of the transition  ∈  modeled as probabilistic timing distribution.

Unknown Value ⊤
Finally, the formalism needs a special element ⊤ (pronounced "top") that is used to represent an unknown value for cases in which the static analysis is unable to infer the value of an expression (e.g., the frequency of periodic publishing, values of initial states, or the right side of assignments of state variables).It is included in all data types: ∀ ∈  : ⊤ ∈  .

API-Call-Guided Static Recovery
The first step in inferring behavior models is using static analysis.While ROSDiscover can recover structural models, it does not reconstruct component behavior.Therefore, I developed an extension, called ROSInfer that statically infers reactive, periodic, and statebased behavior of ROS components to create a state machine of architecturally-relevant behavior.
Similar to recovering structural models, we can also made the observation that the ROS API is commonly used to implement architecturally-relevant behavior.By looking for the API calls that define callbacks for receiving a message, sending a message, or sleeping for the remaining time of a periodic interval, ROSInfer recover models of architecturally-relevant behavior that can then be used for model-based analysis of the system.ROSInfer reconstructs state machine models by identifying ROS API calls that implement these types of behavior, their argument values, and the control flow between them.
We recover reactive behavior by finding control flow from a subscriber callback to a publish call.This establishes causality between receiving a message and sending another message.
To recovery periodic behavior, ROSInfer looks for publish calls within loops that have infinite conditions that call sleep on a rate object.Recovering the frequency defined in the rate constructor tells lets us recover the target frequency of the periodic behavior.
To cover state-depended behavior, ROSInfer finds state variables, their initial values, and state transitions.Our heuristics to identify state variables are (1) the variable is used in control conditions of architecturally-relevant behavior (i.e., functions that send messages, functions that change state variables, and of their transitive callers) and ( 2) the variable is in global or component-wide scope, such as member variables of component classes or non-local variables.To infer the initial state (i.e., the initial values for each state variable) of the component, ROSInfer searches for the first definitions of the variables either in their declaration or the main method.After the state variables are identified, ROSInfer infers transition conditions by combining control conditions of architecturally relevant behavior using logical operators and and not depending on whether the path is taking a negation branch (e.g., the else branch of an if-statement).
I evaluated ROSInfer on five complex real-world ROS systems with a total of 534 components.For this purpose we manually created 155 models of components from the source code to be used as a ground truth and available data set for other researchers.ROSInfer can infer causal triggers for 86.8 % of component architectural behaviors in the 534 components.The proportion of inferable values is 91 % for periodic rates, 100 % for reactive triggers, 72 % for state variable initial values, and 84 % for state changes.In a ground-truth comparison ROSInfer has a recall of 93 % for periodic behavior, 82 % for reactive behavior, 71 % for state variables, and 69 % for state transitions.The precision is 100 % for periodic behavior, 91 % for reactive behavior, 76 % for state variables, and 88 % for state transitions.

Partial-Model-Informed Dynamic Recovery
As the results from our evaluation of our current work have shown even perfect static analysis still leaves incomplete models in some cases.Furthermore, static analysis cannot infer execution times of tasks, producing models that cannot be used for most kinds of performance analysis, bottle neck analysis, or analysis of race condition.Fortunately, since the models are directly derived from the source code, they could also be used to guide the creation of experiments for dynamic analysis to fill in the unknown values in incomplete models, or to identify representative paths through the system that be used for profiling.This motivates future work on combining static and automated dynamic analysis to infer behavioral component models that contain more information about the components.
I plan to extent ROSInfer with dynamic analysis that automatically deploys components, systematically sends messages to it based on the known state machines to collect timing data or to resolve known unknowns.Since ROSInfer keeps track of the code locations in which known unknows occur, it is possible to automatically add code instrumentation to collect their values at run time.Furthermore, since the state machine of a component is known, it is possible to control the state of the component by sending the messages that trigger state transitions, and observe the run time behavior in each state.Thereby, ROSInfer will be able to generate complete component behavioral models for ROS components.

Model-Checking of Common Properties in Robotics Systems
Combining behavioral component models with component-portconnector models, allows for analyses of intra-component-dataflow.Structural models alone do not contain information how the inputs of a component are used and what is needed for the component to produce an output.Having input-output state machine models like the ones ROSInfer infers, allows to trace which messages at one component cause messages to be sent in other parts of the system.To check whether the components of a system are composed correctly, properties such as "An input at input port  1 of component   can/must result in an output at output port  1 of   " can be checked via discrete event simulation [7] or logical reasoning [20].I started implementing PlusCal/TLA+ generation so that existing model checking techniques can be used to verify behavioral properties.With this approach, ROSInfer can already find three real-world bugs fully automatically.
Furthermore, after adding dynamic analysis to collect execution times for each state transition, I plan to add timing analysis to check real-time properties of systems when components get composed in a resource-constrained environment.
Additionally, knowing the frequencies at which periodic messages get published allows to propagate these frequencies to all transitive receivers of this data stream so allow to check the desired frequency of message publishing further down the data stream to avoid unexpectedly high publishing frequencies.

EVALUATION
The goal of the end-to-end evaluation is to answer the following two high-level research questions: Methods in Section 3.1 How effective is ROSInfer at finding realistic source-level behavioral architecture composition bugs?
The static model inference of ROSInfer has been evaluated in Section 2.2 and an evaluation proposal for dynamic inference will follow a similar methodology of comparing the inferred models to ground truth models.This research question closes the gap between individual pieces of ROSInfer to evaluate its end-to-end ability to find bugs in the source code.I propose to answer this research question by injecting realistic bugs into the source code and checking how many can be found in the inferred models.

RQ 2 (Practical Usefulness)
Methods in Section 3.2 How useful is ROSInfer in a practical setting?
I propose to answer this research question by comparing the performance of ROSInfer to typical ROS developers.

Measuring Effectiveness
To evaluate the effectiveness of finding bugs I propose to inject realistic bugs into the source code of real-world ROS systems, run ROSInfer, TLA+ generation, and model-based analyses to identity potential bugs, and compare the detected bugs with the ground truth.
Creating a Data Set of Real-World Bugs: Since there is no existing data set of behavioral architecture composition bugs in ROS, I will need to create one as part of the research contribution.
Bug Injection: In the case that the data set of real-world bugs is not large enough for a proper evaluation and/or does not cover a large enough variety of behavioral architecture composition bugs, I propose to manually inject realistic bugs into the source code of real-world ROS systems.
Data Analysis: To analyze the resulting data, I propose to quantitatively measure precision and recall as both false positives and false negatives impact how effective ROSInfer can be in a practical setting.Additionally, a qualitative root cause analysis of the kinds of false positives and false negatives will provide further insights into limitations and potential future work to improve the effectiveness of the approach.

Measuring Practical Usefulness
To measure how useful ROSInfer is in a practical setting, I propose to answer the following sub-research questions: How useful to typical developers perceive ROSInfer to be in their development activities?
Participants will be provided with a git repository containing a real-world ROS system with one or more bugs from the data set described in Section 3.1.6 Then, we will ask the participants to list all architecture misconfiguration bugs that they can find in the system while measuring the time it takes them to come to a conclusion.
Then we show them the output of the tool, ask them to locate the bug and ask for a description how they would fix it (without having to code it).A final survey of their experience with ROSInfer evaluates developer's satisfaction with the results.Furthermore, if possible, letting professional developers use ROSInfer for their daily work on their own projects followed up with semi-structured interviews about their experience might provide deeper insights.

Figure 1 :
Figure 1: Simplified example of ROS code implementing a component that waits for an input message and then periodically publishes a message with a frequency of 10 Hz.

RQ 2 . 1 (
Practical Usefulness) How does ROSInfer's effectiveness compare to typical ROS developers in finding bugs?RQ 2.2 (Practical Usefulness) How does ROSInfer's efficiency compare to typical ROS developers in finding bugs?RQ 2.3 (Practical Usefulness)

I
plan to finish dynamic inference by the end of Spring 2024, Plus-Cal/TLA+ generation by the end of Fall 2024, and the end-to-end evaluation by the end of Spring 2025.