ROSInfer: Statically Inferring Behavioral Component Models for ROS-based Robotics Systems

Robotics systems are complex, safety-critical systems that can consist of hundreds of software components that interact with each other dynamically during run time. Software components of robotics systems often exhibit reactive, periodic, and state-dependent behavior. Incorrect component composition can lead to unexpected behavior, such as components passively waiting for initiation messages that never arrive. Model-based software analysis is a common technique to identify incorrect behavioral composition by checking desired properties of given behavioral models that are based on component state machines. However, writing state machine models for hundreds of software components manually is a labor-intensive process. This motivates work on automated model inference. In this paper, we present an approach to infer behavioral models for systems based on the Robot Operating System (ROS) using static analysis by exploiting assumptions about the usage of the ROS API and ecosystem. Our approach is based on searching for common behavioral patterns that ROS developers use for implementing reactive, periodic, and state-dependent behavior using the ROS framework API. We evaluate our approach and our tool ROSInfer on five complex real-world ROS systems with a total of 534 components. For this purpose we manually created 155 models of components from the source code to be used as a ground truth and available data set for other researchers. ROSInfer can infer causal triggers for 87% of component architectural behaviors in the 534 components.


INTRODUCTION
Ensuring that robotics systems operate safely and correctly is an important challenge in software engineering.As robots are becoming increasingly integrated in work environments and the daily lives of many people [30,35,52,79] their faults can potentially cause physical damage, injuries, and even fatalities [5,49,78].However, ensuring that robotics software systems are safe and operate correctly is hindered by their large size and complexity [42,54].
Robotics systems, especially systems written for the Robot Operating System (ROS) [66], the most popular robotics framework, are often component-based, i.e., are implemented as independently deployable run-time units that communicate with each other primarily via messages, i.e., asynchronous data exchange [2,13,36,66,76].Robotics systems can be comprised of hundreds of software components, each of which can have complex behavior [10,45,76].Many ROS systems are predominantly composed of reusable components implemented by external developers [41].In this context, the main challenge is their correct composition [14,76].
The composition and evolution of software components is errorprone, since components regularly make undocumented assumptions about their environment, such as receiving a set of initialization messages before starting operation.When composed inconsistently, the behavior of these systems can be unexpected, such as a component indefinitely waiting, not changing to the desired state, ignoring inputs, message loss, or publishing messages at an unexpectedly high frequency [14,27,75].In this paper we call these bugs "behavioral architectural composition bugs", because they are caused by inconsistent compositions and manifest in inter-component communication, i.e., on the architectural level.Finding and debugging behavioral architectural composition bugs in robotics systems is challenging, because components frequently fail silently, failures can propagate through the system, and state-dependent behavior is hard to reason about [1,17,31,42].
Fortunately, a large amount of existing formal methods research has been done on using formal model-based architecture analysis to ensure the safety and correct composition of components [4,12,20,32,51,59,60,80].Based on structural and behavioral models, such as state machines, of the current system, architects can find inconsistencies or predict the impact of changes on the system's behavior using existing analyses [9,11,23,26,27,40,47,83].However, in practice, due to the complexity of robotics systems, creating models manually and keeping them consistent with the code is time-consuming and difficult [19,20,80].
To reduce the modeling effort and make formal analysis more accessible in practice, this paper presents a static analysis technique to infer architecturally-relevant behavior for ROS-based systems.Architecturally-relevant component behavior is the set of all behaviors required to describe what causes inter-component communication, i.e., what causes a component to send messages.
Architectural recovery techniques, such as ROSDiscover [76], HAROS [67,69], and the tool by Witte et al. [82], can reconstruct structural models, such as component-port-connector models that describe the organization of components and the relationships between them, including what types of inputs and outputs the component handles.However, they do not reconstruct component behavior, i.e., dynamic aspects that describe how the component reacts to inputs and how it produces outputs, such as whether a component sends a message in response to receiving an input, whether it sends messages periodically or sporadically, and what state conditions or inputs determine whether it sends a message.
Existing approaches for inferring behavioral models, such as Perfume [61], use dynamic analysis to infer state machines from execution traces.However, these approaches cannot guarantee that the relationships they find are causal, since they observe only correlations within behavior.Furthermore, for complex systems they come with the trade-off of long analysis times or low coverage.
To address the challenge of automatically inferring behavioral component models for ROS-based systems we propose to use static analysis of the system's source code written in C++.Static inference of behavioral models is challenging, because the analysis needs to infer what subset of a component's code communicates with other components, i.e., is architecturally-relevant, and what causes message sending behavior, i.e., in what situations is the code that implements inter-component communication executed.
Fortunately, the following observations about the ROS ecosystem make ROS components easier to analyze than general C++ code: • Inter-component communication happens almost exclusively via Application Programming Interface (API) calls that have well-understood architectural semantics [68].• The causes of message sending behavior (e.g., periodic loops, reacting to receiving a message) are usually implemented using features provided by the ROS framework and often follow common behavioral patterns.
Based on these observations, we make the following contributions: (1) An approach of API-call-guided static analysis to infer state machine models that capture architecturally-relevant component behavior for ROS systems (in Section 3).(2) A prototypical implementation of the approach that is an extension to ROSDiscover, an existing tool for structural architecture recovery [76] (in replication package).(3) An empirical evaluation of the presented approach's recovery rate, recall, and precision of five real-world open-source systems with a total of 534 components (in Section 4).(4) A data set of 155 handwritten behavioral component models of components for the evaluation systems that can be used by other researchers (in replication package).
While this work focuses on the ROS ecosystem, the approach of API-call guided static recovery of component behavioral models seems promising to generalize to other frameworks and ecosystems that follow the observations above.

ARCHITECTURALLY-RELEVANT COMPONENT BEHAVIOR IN ROS
Fortunately, only a small part of the overall behavior of a component is relevant to describe to component's behavior on an architectural level.This makes it practically possible for static analysis to infer behavioral component models for complex systems.This section defines architecturally-relevant behavior and describes corresponding architectural styles offered by ROS.

Architecturally-Relevant Component Behavior
Architecturally-relevant component behavior is the set of all behaviors required to describe what causes a component to send messages (e.g., triggers, state variables, state transitions).

The Robot Operating System (ROS)
ROS is the most popular framework for robotics systems, and supports a large ecosystem of more than 7 5001 software packages.In practice, ROS is used by many companies, such as Amazon, BMW, Boeing, Bosch, Boston Dynamics, NASA, PAL Robotis, and Siemens.From a software engineering research perspective, ROS is a typical example of a framework for component-based architectures.That means ROS systems are developed to consist of independently deployable run-time units i.e., components, in ROS known as nodes, that primarily communicate with each other via messages [2,13,36,66,76].Each node is implemented as an independent process and is typically responsible for providing a single function (e.g., transforming depth images into point clouds, planning the robot's trajectory, and translating movements into low-level motor commands).Nodes communicate with each other over named channels (i.e., topics, services, actions).In this paper, we focus on topic-based communications and service calls, which represent the vast majority of communications in ROS systems [76].
Services represent a synchronous two-way call-return-style.Topics use a publish-subscribe model to provide asynchronous message-based, multi-endpoint communication between nodes.Each topic can have multiple publishers and subscribers.Nodetopic connections are defined in the node's source code and are established at run time by providing the name of a topic in the form of a string to the ROS API.Topics are typically used for both reporting periodic information (e.g., camera data, sensor data, position) and sporadic requests (e.g., disabling a motor).
Figure 1 shows a typical example of how ROS developers use the ROS API to implement architectural component behavior.To define behavior that handles input messages, the ROS framework lets users register callback methods that are called by the framework when a component receives a message (see subscribe call in Figure 1).To define periodic behavior, ROS offers the Timer API and the Rate API (see periodic sleep in Figure 1).

Formalization of Architecturally-Relevant Component Behavior
To define the semantics of the behavioral models that we infer, this section introduces the formalism of architecturally-relevant component behavior used throughout the paper.

Unknown Value ⊤
Finally, the formalism needs a special element ⊤ (pronounced "top") that is used to represent an unknown value for cases in which the static analysis is unable to infer the value of an expression (e.g., the frequency of periodic publishing, values of initial states, or the right side of assignments of state variables).It is included in all data types: ∀ ∈  : ⊤ ∈  .
Figure 2 shows the model for the example code shown in Figure 1.

Behavioral Architecture Composition Bugs
Behavioral component models can be used to find behavioral architecture composition bugs, i.e., bugs that result from incorrect component composition and impact the software architecture's behavior.Figure 1 shows an example of a bug from the Autoware.AI [39] system in which the lattice_trajectory_gen component requires an input to perform its main functionality although no other component sends this message.Hence, lattice_trajectory_gen waits indefinitely.Approaches that recover only component-port-connector (CPC) models, such as ROSDiscover [76], cannot find this bug, because they cannot infer that the input is required, i.e., the component's main functionality depends on it.Our approach classifies the input as required by inferring component state machines; identifying that the component's expected behavior happens in the state g_pose_set == true, which is triggered only by the message input; and detecting that no other component sends this message.
Compared to ROSDiscover, ROSInfer adds models of internal component behaviors in the format of component state machines supporting the finding of behavioral architecture composition bugs.In particular, it can find three real-world bugs from the original ROSDiscover evaluation, that ROSDiscover was unable to find (autoware-02, autoware-03, and autoware-10) [76].
Besides these Autoware bugs, our approach increases the practicality and accessibility of a large amount of existing analyses on component state machine models to find behavioral architecture To check whether the components of a system are composed correctly, properties such as "An input at input port  1 of component   can/must result in an output at output port  1 of   " can be checked via discrete event simulation [11] or logical reasoning [40].
Furthermore, synchronizing the resulting component state machines on their input/output messages creates a system-wide state machine model over which system-level Linear Temporal Logic (LTL) properties can be checked via approaches such as the ROS theorem prover by Kortik and Shastha [43] or PlusCal/TLA+ [48].On these models, properties, such as a component changing to a desired state, no messages getting lost or ignored, or a component eventually publishing a certain message, can be checked [23,26].
Additionally, knowing the frequencies at which periodic messages get published allows reasoning about the frequencies of transitive receivers of a message to check for unexpectedly high publishing frequencies (e.g., if behavior is not supposed to be periodic but reacts to periodically sent input).

APPROACH
This section describes our approach of API-call-guided static inference of architecturally-relevant component behavior for ROS systems and the implementation of this approach in our tool called ROSInfer.The analysis process is shown in Figure 3.
In general, even inferring only architecturally-relevant behavior is challenging, because theoretically any piece of code could send a message.Fortunately, the following observations about how the ROS framework is used in practice allow us to narrow down the analysis: Component Framework API: Inter-component communication for sending and receiving messages is implemented almost exclusively via API calls that have well-understood architectural semantics [68,76].This simplifies the static inference of architecturally-relevant behavior by reducing the problem of tracing message-sending behavior to code locations to finding the corresponding API calls and inferring their arguments.Behavioral Patterns Usage: The triggers of message sending behavior are usually implemented using common behavioral patterns (e.g., implementing periodic behavior by sending messages in an unbounded loop that sleeps for the remainder of a periodic interval as shown in Figure 1).This simplifies the problem of identifying the triggers and effects of message sending / receiving behavior by looking for common patterns.
We consider behavior to be reactive if it is triggered by receiving a message or a component-event (e.g., component started/stopped or (un)subscribed from/to a topic).We consider behavior to be periodic if it is triggered with a constant target frequency.Periodic and reactive behavior can both be state-based, i.e., triggered only under conditions depending on the state of the component.
The key idea of our approach is to find ROS API calls that implement the triggers or outputs of architecturally-relevant behavior, infer the API call arguments, find control flow leading to message sending behavior, and reconstruct state variables and state transitions on which architecturally-relevant behavior depends.While our approach focuses on the ROS ecosystem, it can generalize to every framework or ecosystem for which the observations listed above hold true as well.
The remainder of the section describes each analysis step.

API Call Detection
The first step in API-call-guided inference of component behavioral models is to detect API calls that implement elements of architecturally-relevant behavior.ROSInfer accomplishes this by traversing the Abstract Syntax Tree (AST) and detecting syntactic patterns that identify architecturally-relevant API calls (see below).
ROSInfer detects API calls via AST matchers that look for method calls based on the fully qualified method name and argument list.
For most kinds of API calls, ROSInfer then attempts to recover the values of arguments and the object on which the function is called to infer additional details, such as what port owns this behavior, or the frequency / duration of sleep calls.

ROSInfer detects the following API calls and behaviors:
Inferring Message Outputs   : To infer message outputs   behavioral inference approaches need to identity points in the component's source code that send messages to other components.For publish-subscribe styles, this consists of API calls to publish a message and corresponding API wrappers (e.g., sendTransform, diagnostic_updater, and CameraPublisher).
To identify the corresponding output port, ROSInfer infers the publisher object on which each publish call is made and traces the object to its creation via the NodeHandle::advertise API call.
Inferring Reactive Triggers   : To infer reactive triggers, behavioral inference needs to looks for the control flow entry points (i.e., callbacks that handle a received message or a requested service or the component being started).In publish-subscribe styles, subscriber callbacks define the component's behavior in response to receiving a certain message.Analogously, in call-return styles service call callbacks need to be identified.
To identify control flow entry points of detected input ports ROSInfer looks for callbacks that are parameters to the ROS API calls subscribe, registerCallback,2 and advertiseService.
Inferring Periodic Triggers  : To infer periodic triggers, behavioral inference needs to identity sleep calls.There are two kinds of sleep calls: (1) constant-time sleep calls that sleep for the same amount of time every time they are called, (2) filling-time sleep calls that sleep for the remainder of a periodic interval every time they are called.Filling-time sleep calls allow the accurate static inference of the target frequency (unless the execution of each cycle takes longer than the cycle time, resulting in a lower actual frequency) while constant-time sleep calls can provide only an upper bound on the frequency, since execution times of other statements are not captured.
C++ offers three common constant-time sleep calls: usleep, sleep, and std::this_thread::sleep_for.The ROS framework offers Duration::sleep.ROSInfer detects these calls and infers the duration and their units from arguments using constant-folding.
ROS offers two filling-time API calls: Rate::sleep, which is called on a rate object (see periodic sleep in Figure 1), and NodeHandle ::createTimer, which has a rate object and callback as argument.Since the frequency is specified in the constructor of the Rate object ROSInfer uses constant-folding to infer the frequency's value and denotes it with ⊤ if it cannot constant-fold it.

Behavioral Pattern Detection
After API call detection, ROSInfer runs a control-flow analysis on the program to construct an abstract representation of the program that contains APIs calls, control flow statements, function calls, and assignments.On this abstract representation, ROSInfer detects behavioral patterns that describe the architecturally-relevant behavior.
Detecting Reactive Behavior: Reactive publishing behavior is message sending that is caused by receiving a message or a component event.To detect message outputs  ∈   reacting to message inputs  ∈   ROSInfer looks for the behavioral pattern of publish calls happening (transitively) within a subscriber callback method by checking for a path in the call graph from the callback method for  to any publish calls of .Since some systems pass publish objects as arguments to functions that then call publish on their arguments, ROSInfer tracks the object identity of arguments when traversing the call graph.This pattern is shown in Figure 4.
The other pattern for reactive behavior that ROSInfer identifies is if a publish call happens (transitively) within the main method and its control dependencies are satisfied in the initial state, then it responds to the component event "component-started" ∈ .Detecting Periodic Behavior: Periodic publishing behavior is repeated sending of a message of the same type with a constant upper target frequency (note that messages do not necessarily always have to be sent every interval).The pattern to detect periodic behavior is checking for publish calls that happen (transitively) within unbounded loops that (transitively) contain a sleep call, as these sleep calls happen periodically throughout the normal execution of the program.To identify unbounded loops ROSInfer considers loop conditions that are either true or ros::ok().This pattern is shown in Figure 5.

State Variable Detection
The key idea to infer state variables statically is to look for variables in the code that store state information, such as ready in Figure 1.We use the following heuristics to identify variables that represent component state.
Usage Heuristic: The variable is used in control conditions of architecturally-relevant behavior (i.e., functions that send messages, functions that change state variables, and of their transitive callers).Control conditions describe the conditions that determine whether a statement is executed.
Scope Heuristic: The variable is in global or component-wide scope, such as member variables of component classes or non-local variables.Since local variables are used close to their assignments, they are less likely to capture state information than variables that can be changed in callbacks or other functions.This heuristic limits the search space and complexity of the resulting models, because control conditions can contain complex logic that defined behavior that is not architecturally relevant.
To implement the usage heuristic ROSInfer first infers all control conditions for all publish calls and their transitive calls and removes conditions on variables that do not satisfy the scope heuristic and uses constant folding to replace variables and constants with the literals that they represent.

Transition Inference
After detecting state variables and inferring behavioral patterns for reactive and periodic behaviors, the only information that remains to be inferred to create complete transition functions  are conditions on state variables and state changes.ROSInfer identifies the intra-procedural control conditions for each publish call and its transitive function calls.In an inter-procedural analysis on the call graph starting from the behavior's trigger ROSInfer then combines the control conditions of function calls ending in the publish call.Conditions are combined using a logical AND and negated in the case of taking the else branch of an if-statement.
To infer state changes, ROSInfer detects assignments to state variables, constant-folds the right hand-side of the assignments, and infers the assignments' triggers in the same way as for other architecturally-relevant behavior as described above.ROSInfer then groups behaviors by triggers and state conditions and constructs the union of all outputs and state changes with the same triggers and conditions.

Initial Value Inference
To infer the initial state   ∈  (i.e., the initial values for each state variable) of a component, ROSInfer searches for the first definitions of the variables.These can be either in their declarations, in the program entry point of the component (e.g., main) and its transitive calls, or in statements or initializers of component class constructors.If an initial expression is found ROSInfer attempts to constant-fold the expression.Analogous to previous cases, values that cannot be constant-folded are denoted with ⊤.

Implementation
We implemented ROSInfer using Clang for C/C++ as an extension of ROSDiscover [76].(ROSDiscover and ROSInfer do not support Python, because C++ is the most used language in ROS [68].) The ROS system is analyzed within a Docker container that contains the source code, ROS packages, and other dependencies.ROSInfer infers the API calls and its parameters, as well as AST elements such as while statements, if statements, and assignments and creates an abstraction of the component.Then it looks for the patterns described in the previous section and creates a JSON output that contains reactive, periodic, and state-based behavior from which the state machines described in Section 2.2 can be reconstructed.Finally, ROSInfer generates a PlusCal/TLA+ [48] specification containing the behavior and connectors of all components in the system configuration.To detect bugs, users can either specify their own LTL properties or provide a list of expected outputs to happen, from which ROSInfer auto-generates the corresponding LTL properties.The implementation is openly accessible on GitHub: https://github.com/cmu-rss-lab/rosdiscover-evaluation

EVALUATION
In this section we describe how we evaluate the overall approach of API-call-guided static inference of architecturally-relevant component behavior in ROS systems as well as the results of our evaluation of ROSInfer's recovery rate, recall, precision, and execution time on five large, real-world open source ROS systems.As noted earlier, our approach is based on the assumption that developers of ROS systems commonly use the ROS API and behavioral patterns to implement architecturally-relevant component behavior.So even if the static analysis could recover all elements of detected behaviors it might miss behaviors that violate this assumption.To validate this assumption and to evaluate how many behaviors ROSInfer missed we measured the recall compared to a ground truth.This metric measures the degree of completeness of the set of inferred behaviors on real-world ROS systems.To measure this, we executed ROSInfer on real ROS components with corresponding ground truth models and compared the output for different behavior types.

RQ 3 (Precision)
Results in Section 4.4 How high is ROSInfer's precision for real-world ROS systems, i.e., what percentage of inferred architecturally-relevant component behaviors are true positives?
Since ROSInfer uses heuristics to infer architecturally-relevant behaviors, it can incorrectly classify behaviors as periodic or reactive to a component event or component input, and can include unnecessary or incorrect state variables or state transitions.To evaluate how many false positives are in the inferred models, i.e., how often ROSInfer infers behaviors that do not exist in the real program, we measured the precision of inferred models compared to a ground truth.This metric measures the degree of soundness of ROSInfer's inference heuristics on real-world ROS systems.
Overview of Evaluation Systems: For these research questions we evaluated ROSInfer on the data set presented in Timperley et al. [76], consisting of five large real-world open source systems: Autware.AI [39], AutoRally [25], Fetch [81], Husky, and Turtlebot [72].To demonstrate the complexity and size of the systems used for evaluation, statistics are shown in Table 1.Some components are part of multiple of these systems due to component reuse, leaving 550 unique components in total.
Ground Truth Models: Measuring recall and precision requires a ground truth to compare to.Unfortunately, there is no reliable ground truth available for the architectural behavior of ROS components.Therefore, we needed to create ground truth models by hand by via manual source code inspection of the five ROS systems mentioned in Section 4.1.Due to the large size and complexity of these systems, we could not construct models for all 550 components.Therefore, we randomly picked components for each system (excluding components that are test or demo components that do not contain architecturally-relevant behavior).The first two authors of the paper (one who is closely familiar with the implementation of ROSInfer and one who is not) evenly split the work.
To ensure consistency we first created a protocol for manual model inference, which is included in the replication package.The protocol includes steps to infer behaviors, special cases to look for, a consistent format notation, and descriptions of how to handle exceptional cases that do not fit into the given format.
To validate the accuracy of manually inferred models, the two authors who created the models measured their agreement of an overlap of 21 models (14.1 % of total) that were inferred by both authors, intentionally including some of the most complex models in this overlap.They agreed completely on 86 % of these components and partially on the remaining three components.After a discussion of the few differences in inferred models, they identified one case in which one author missed a type of publishing behavior, which resulted in revising existing models to fix their representation, and two cases of inaccurately modeled behavior that resulted in refined ground truth models, an updated protocol for inferring models, and updating existing models to ensure their correctness.
All 155 hand-written models are also included in the replication package and are available as a data set for other researchers studying behavioral component models of ROS-based systems.
Threats to Validity: With respect to internal validity, the ground-truth models were inferred by two of the paper's authors who have not been involved in the development of the case study system.Since the creation of formal models for complex component behavior is error-prone and requires deep understanding of the domain, we cannot guarantee the correctness or completeness of all models.We attempted to reduce this threat to validity by measuring agreement between authors on model on a certain portion of handwritten models.
With respect to external validity, the results of the evaluation might not necessarily generalize to other ROS systems if their usage of the ROS API or patterns of implementing architecturally relevant behavior is significantly different from the five case study systems.We reduced this threat by selecting diverse case studies with Autoware and AutoRally being mostly self-contained industriallydeveloped systems and Husky, Fetch, and Turtlebot being representative for typical open-source systems that are developed without central organization.

Measuring Recovery Rate (RQ1)
Methodology: As discussed in Section 3, ROSInfer denotes values that cannot be statically recovered with ⊤ to indicate unknown values.So the main metric for the recovery rate is how often ⊤ is included in parts of the resulting model.
We ran ROSInfer on all 550 components of the five systems presented in Section 4.1.Components that are included in multiple systems only count once.For 16 components the static analysis crashed due to errors in Clang, so these components are excluded from the evaluation, leaving 534.For each type of architectural behavior we then calculated the percentage of unknowns included in inferred values (i.e., target frequencies for periodic behavior, triggering events or callbacks for reactive behavior, initial values for state variables, and new values for state transitions).These numbers represent how well ROSInfer can infer all parameters of a detected behavior.
Further, the trigger types recovery rate metric measures how often ROSInfer can recover the trigger for detected publishing behavior.

Trigger Types Recovery Rate (Evaluation Metric)
The trigger types recovery rate approximates the inferred proportion of the total architecturally-relevant component behavior by measuring the percentage of message publishing calls for which ROSInfer can infer the cause of the behavior (i.e., for which a behavioral pattern with corresponding trigger was detected).
Note that this metric overapproximates recall in cases in which publish calls are hidden in inaccessible source code (e.g., in DLLs) but underapproximates recall in cases in which publish calls happen within uncalled callbacks (e.g., XbeeCoordinator and obstacle_sim).
After the quantitative analysis, we manually inspected each case of unknown values to conduct an in-depth qualitative analysis of the limitations of ROSInfer using open coding and linked examples.

Results for RQ 1 (Recovery Rate)
See Table 2 In a exhaustive analysis of five large real-world ROS systems with 534 components the overall trigger types recovery rate is 87 %.The proportion of inferable values is 91 % for periodic rates, 100 % for reactive triggers, 72 % for state variable initial values, and 84 % for state changes.Usage observation, because 87 % of all message sending behavior in the evaluated components could be automatically classified to conform to one behavioral pattern.AutoRally has the lowest trigger types recovery rate, because many components respond to inputs of serial devices with project-specific API (e.g., AutoRallyChassis, GPSHemisphere).Cases in which ROSInfer cannot recover periodic rates include rates that are loaded from component parameters (e.g., runStop, fake_camera, adis16470_node, robot_pose_ekf, yocs_virtual_sensor, lidar_fake_perception, AutoRallyChassis, watchdog_node), return values of function calls (e.g., robot_pose_ekf), conditional behavior (e.g., yocs_virtual_sensor).
State transitions include unknowns if and only if the right handside of assignments ot state variables cannot be constant-folded.
Reactive triggers can be recovered completely, since ROSInfer's current implementation does not include component events or message inputs that can include unknown values.

Measuring Recall (RQ2)
Methodology: After creating the handwritten models as ground truth (see Section 4.1) we executed ROSInfer on the source code and compared the results by treating the handwritten models as ground truth.The existence of model elements is compared automatically, while expressions in conditions are compared by humans to judge whether they are logically equivalent.After the quantitative analysis, we then manually inspected each false negative to conduct a qualitative root cause analysis of missed behaviors.

Results for RQ 2 (Recall)
See Table 3 In a ground-truth comparison with 149 components ROSInfer has a recall of 93 % for periodic behavior, 82 % for reactive behavior, 71 % for state variables, and 69 % for state transitions.
Results: Detailed quantitative results are shown in Table 3.
Cases in which ROSInfer cannot detect reactive behavior include the use of virtual methods (e.g., joystick_teleop), behavior that is triggered by other events than receiving a message in subscriber callback, such as reacting to messages from external devices received via serial ports (e.g., vg440_node), Mqtt messages (e.g., mqtt_receiver), or CAN-Bus (e.g., vehicle_receiver), our approach cannot infer the trigger for this behavior.
Cases in which ROSInfer cannot recover state variables include complicated object logic, such as whether a list or map is empty (e.g., vscan2image).Figure 6 shows an example of this.This requires a deeper understanding of the objects owned by the component that are used to represent its state and is therefore a limitation of the approach.Conditions on object fields can contain implicit dependencies that cannot easily be inferred statically.For example when a subscriber callback initializes the image stored in a state variable

Message Output
Figure 6: Simplified code snippet showing an example from waypoint_clicker in Autoware.AI for which our approach cannot recover the state machine.The analysis would need to model the state of a vector map containing multiple arrays and identify that the assignment in the cache_point subscriber callback affects the return value of the empty call.
whose width and height are checked to be positive numbers in a control condition (image.width> 0 && image.height> 0) a human developer can infer that this condition refers to checking whether the initialization in the subscriber callback has been called implying that the component has received the message.This dependency that is implicit due to complex logic within the image object cannot be inferred statically.

Measuring Precision (RQ3)
Methodology: For each behavior category we calculated the number of inferred behaviors that are part of the output of ROSInfer but not part of the ground truth models.We then manually inspected each false positive to conduct a qualitative root cause analysis of incorrectly classified behaviors.

Results for RQ 3 (Precision)
See Table 3 In a ground-truth comparison with 149 components ROSInfer has an precision of 100 % for periodic behavior, 91 % for reactive behavior, 76 % for state variables, and 88 % for state transitions.
Results: Detailed quantitative results are shown in Table 3.
False positives for reactive behavior are caused by a limitation of our current implementation that ignores state conditions of periodic behavior in main and therefore treats it as reacting to the event "component-started", which can be fixed in the future.
False positives of state variables are caused by mistaking a configuration parameter for a state variable (e.g., amcl), mistaking variable identity due to overloaded variable names (e.g., control dependencies on assignments to another state variable false positive (e.g., pos_downloader), and control dependencies on assignments to another state variable false positive (e.g., pos_downloader).
False positives of state transitions are caused by false positives of the corresponding state variables.

Measuring Execution Time
When running ROSInfer on Autoware.AI on a server with 4 Intel(R) Xeon(R) Gold 6240 CPUs (each has 18 cores at 2.6 GHz) with 256 GB RAM, the static analysis took on average 36.5 s per component.The fully automated analysis of the entire Autoware.AI system took 3.8 h and much shorter for the other systems (Autorally: 10.2 min, Fetch: 29.9 min, Husky: 29.2 min, Turtlebot: 37.7 min).This indicates that the static analysis scales to real-world systems and could integrate well into iterative software development practices.
In practice, static model inference approaches like the presented approach would integrate well in iterative development processes since model inference supports automatic regeneration of models when the sources change.Changes to the code base require regeneration of only the components that are affected by the change, since ROSInfer infers which source files are required to infer each component model.This would dramatically reduce the time to update the system's behavioral model after the initial execution.
The effort it took to create the 155 handwritten models of this evaluation was approximately 120 hours of manual labor. 3In practice, the developer time saved would be lower than 120 hours, because developers potentially need to replace the known unknowns (⊤) with correct values and cannot fully rely on the inferred models being complete.While in this paper we do not quantify the saved effort, we present these numbers to demonstrate that the approach can save a significant portion of time to infer models, making model-based analysis more accessible, and economical.

DISCUSSION
In this section we discuss how the advantages and limitations of the approach fit into a practical software engineering context.

Lessons Learned about ROS Components
When building the behavioral models for ROS components and inspecting the root cause for missed behaviors we noticed: (1) Many components are designed to process input streams and publish processed outputs like a pipes and filters architecture.These components are stateless and usually produce a single output for each input that they receive.(2) Components that maintain states are often components that start to publish periodically after receiving a set of input messages that are used to initialize the component, such as the example shown in Figure 1.
(3) Only a few components implement a complex state machine.
Most explicit or implicit state variables are booleans and only few components have more than three state variables.(4) While the state machines that model the behavior of the component might be less complex, developers sometimes use more complex language features to express them than would be necessary (see Figure 6).This makes the code more extensible and easier to read by human developers, but harder to analyze using static analysis. 3Models were inferred by authors of this paper who have advanced knowledge of C++, a background in software architecture and formal modeling, are knowledgeable in robotics and Autoware.AI but have not been involved in its development.Model inference times will vary based on the expertise in the domain and experience with the system.This number is only intended to provide an informal estimate of the effort.

Incomplete Models
As discussed in the approach, inferred models can be incomplete, due to limitations discussed in the evaluation.There are two types of incompleteness: known unknowns (i.e., the analysis can infer the type of behavior but cannot reconstruct all its required elements so that the resulting model contains the keyword ⊤ representing an unknown value) and unknown unknowns (i.e., the analysis does not detect an instance of architecturally-relevant behavior so that this behavior is entirely missing from the resulting model).Examples for known unknowns are frequencies of periodic publishing, topic names, initial values or other assignments of state variables, or values that state variables are compared to in conditions.They occur when other variables are referenced that cannot be constant-folded, when C++ language features are used that the static analysis implementation does not support yet, when values are read from external sources, such as run-time inputs or files, or when developers follow the behavioral pattern but use too dynamic language features for static analysis to be able to identify the values.
In practice, users of ROSInfer can more easily deal with known unknowns, because ROSInfer directly points them to the place in the code for which it was unable to reconstruct the value.Users can then figure out the values and replace the known unknowns in the model with accurate values.Since they only need to fill in the blanks for some values, this task is much easier and less timeconsuming than building the entire model from scratch.In some cases, known unknonws can be reduced with more engineering effort to improve the static analysis, but cannot be fully eliminated.Having incomplete models would still be preferable to having no models, because even incomplete models allow finding behavioral architecture composition bugs that would not have been found otherwise.
Unknown unknowns are more limiting in practice, since it is much harder for users to identify that information is missing from the generated models.Unknown unknowns can be reduced by extending the list of behavioral patterns to look for or by adding the APIs of commonly used libraries, but cannot be fully eliminated.

Real-World Bugs Found
To demonstrate the real-world effectiveness of ROSInfer, we identified documented behavioral architecture composition bugs within the data set presented in Timperley et al. [76], the only open set of architecture composition bugs in ROS currently known to us.Three of these bugs (autoware-02, autoware-03, autoware-10) can be classified as behavioral architecture composition bugs.Autoware-02 is shown in Figure 1.The other two also result from required inputs for important component behavior not being connected to publishers.We ran ROSInfer on these systems, generated Plus-Cal/TLA+ specifications and checked that expected outputs happen eventually.ROSInfer found all of these bugs based on a given list of components in the system configuration, a desired output to check for, and configuration parameter assignments.The resulting models are available in the supplemental material.

Coding Style Guidelines
Unlike many open-source ROS systems, most industrially developed projects follow coding style guidelines that narrow down the expected kinds of behaviors by telling developers to implement certain types of code in a certain way.We expect the recall of our approach to benefit from this, because fewer cases of unnecessarily complex versions of simpler code would exist.This effect can become even stronger if coding styles related to specifying architecturally relevant behavior are established, as almost all cases in which our approach cannot correctly infer architecturally-relevant behavior, the corresponding code could be refactored towards more analyzable code.For example, coding style guidelines, such as "component states should be explicitly modeled as variables in the code" to avoid the limitation described in Figure 6 by replacing empty() calls with a state variable, "state variables should be initialized explicitly" to avoid unknown or ambiguous initial states, or "ROS connectors should be used where possible" to avoid over-use of project-specific APIs.Similarly to how testability became a goal of software design to reduce the effort of ensuring correctness via testing, analyzability of code could become a future design goal of ROS code to supporting the automatic inference of rich behavioral models for automated formal analysis.

RELATED WORK
To demonstrate the novelty of this work, we discuss other analyses that have been performed on robotics systems, approaches to recover static architectures, and dynamic analyses of behavioral models and explain how our work differs from them.

Analysis of Robot Systems
Static analysis and formal model-based analysis have been used to automatically find bugs in robot systems before [4,32,51,63].For example, the systems Phriky [62], Phys [38], and Physframe [37] use type checking to find inconsistencies in assignments based on physical units or 3D transformations in ROS code.
Furthermore, Swarmbug [34] finds configuration bugs in robot systems that result from misconfigured algorithmic parameters, causing the system to behave unexpectedly.
These approaches focus on the analysis of bugs that result from coding errors that are localized in a few places of the system.In contrast, our work aims to reconstruct models that can be used to identify incorrect composition or connection of components and therefore focus on architectural bugs.

Inference of Structural Architectures
Most approaches for static recovery of software architectures reconstruct structural views of software modules from the perspective of a developer [3,15,18,21,22,24,28,53,55,56,64,71,73].Since they show the code before compiling it, the module view does not show the relationships of components during run time [16] and therefore cannot find behavioral bugs.
DeSARM [65], SAMEtech [33] are dynamic approaches to reconstruct component-port-connector (CPC) models.ROSDiscover [76], HAROS [67,69], and the tool by Witte et al. [82] can statically reconstruct CPC models for robotics systems.CPC models describe the types of inputs that a component receives, the types of outputs it produces, and to what other components their input and output ports are connected to.However, CPC models do not contain information about how a component reacts to inputs (e.g., what kind of output it produces in response to an input), whether an output port is triggered sporadically or periodically, and whether the component's behavior is dependent on states.Therefore, CPC models cannot be used to analyze the data flow within a system.
While CPC models identify the communication channels that components use to interact with the rest of the system they do not describe the conditions under which communications actually occur.They do not contain information about how a component reacts to inputs (e.g., what kind of output it produces in response to an input), whether an output port is triggered sporadically or periodically, and whether the component's behavior is dependent on states.In contrast, ROSInfer builds upon ROSDiscover's implementation and output to produce behavioral component models that describe how a node reacts to incoming messages (e.g., publishing a message to a different topic; switching into a different state) and timing-based triggers (e.g., publishing a status message every 50 ms).Therefore, ROSInfer allows to find behavioral architectural composition bugs, such as the ones described in Section 5.3 that ROSInfer found.

Dynamic Inference of Behaviors
Behavioral models of components can be inferred using dynamic analysis by observing the component behavior of representative execution traces.For example, Kieker [29,77], DiscoTect [70] and Perfume [61] construct state machines from event traces obtained by run time monitoring.Similar approaches also use method invariants [46], LTL property templates [50] to increase the effectiveness.Domain-specific approaches have been proposed for for CORBA systems [58] or telecommunication systems [57].The main limitation of dynamic approaches that are based on mere observation is that they can only measure correlations between inputs and states and outputs and cannot make claims about causal relationships.Additionally, approaches relying on dynamic execution might miss cases in rarely executed software.In contrast to this, our approach analyzes the control and data flow of the source code and therefore has the capabilities to differentiate concurrent behavior that just coincidentally happens after an input or state change from behavior that is caused by an event.Furthermore, dynamic approaches require either an accurate simulator or real robot hardware to produce reliable results and need to execute a large number of representative traces through the system in real time, which can increase the time and cost of the model creation for computation-intense systems compared to static analysis approaches, such as ROSInfer.

CONCLUSIONS
In this paper we have shown that looking for specific API calls of the ROS framework and commonly used behavioral implementation patterns enable the effective static inference of models that capture architecturally-relevant component behavior with high precision and recall.This work is a contribution towards making well-proven and powerful but infrequently used methods of model-based analysis more accessible and economical in practice, potentially leading to robotics systems becoming safer and more robust.Due to its potential to integrate well into practical software development environments with continuous integration and potentially higher accuracy with established coding style guidelines, we believe that API-call-guided static inference can have a significant impact on practice.While this paper focused only on ROS-based systems, we believe that API-call-based inference of component behavior is a promising approach that could generalize to other frameworks within the domain of cyber-physical systems and inspire future work that applies this approach to other ecosystems.

FUTURE WORK
We envision this contribution to enable the following future work: Automatic Generation of Documentation: Automated inference of behavior models can support the generation of documentation for components, especially for reusable components.In cases in which components need to receive a set of initialization inputs to function properly (such as the example from Figure 1), inference of architecturally-relevant component models can be used to demonstrate which inputs are required for which output.
Automated Program Repair: Since models that were inferred from source code have the advantage of retaining a mapping between source code locations and model elements a repair patch for the model could be translated back to code.This opens motivates future work on model-repair translations back to code.
Repository Mining: Automatic inference of component behavioral models enables large-scale empirical research on the development and evolution of component behavior and inter-component communication patterns in complex robotics systems.
Test Generation: Information about what input messages change the component to a state in which it executes different behavior can be helpful to systematically generate test cases to cover a larger portion of the component's behavior.
Combination of Static and Dynamic Analysis: The limitations shown in the results of RQ1 can be overcome with the use of dynamic analysis.Furthermore, static analysis cannot infer execution times of tasks, producing models that cannot be used for most kinds of performance analysis, bottleneck analysis, or analysis of race condition.Results from from static analysis can inform systematic test generation and instrumentation of code to specifically obtain information that is missing in statically inferred models to increase recovery rate and recall.Furthermore, testing for the existence of inferred behavior can reduce false positives.
Generalization to Other Frameworks: The approach of APIcall-based static inference of component behavior is not inherently specific to the ROS framework.Other component frameworks, such as NASA's FPrime framework [8], that provide APIs for component interaction mechanisms could implement this approach as well.

Figure 2 :
Figure 2: Example model for code shown in Figure 1.The first transition handles _ inputs and changes the state variable g_pose_set to true without an output.The second transition triggers periodically with a frequency of 10 Hz if the state variable g_pose_set is true.Then it sends a message.

Figure 5 :
Figure 5: Simplified behavioral pattern to look for periodic behavior.

Input Triggers 𝑰 ⊆ 𝑴 𝒊𝒏 ∪ 𝑷 ∪ 𝑬 An input trigger is a message handled by an input port 𝒎 ∈ 𝑴 𝒊𝒏 , a periodic trigger 𝒑 𝒇 ∈ 𝑷 with frequency 𝑓 ∈ 𝐹𝑙𝑜𝑎𝑡, or
a component event  ∈ , such as "component started".To keep the model simple, we do not model the content of messages.

ROSInfer Initial State State Variables 1. API Call Detection 3. State Variable Detection 2. Behavioral Pattern Detection 5. Initial State Inference
To evaluate ROSInfer we asked the following research questions: How high is ROSInfer's recovery rate for real-world ROS systems, i.e., what percentage of inferred architecturally-relevant behaviors can be recovered completely?When static analysis detects message sending behavior within a component's source code (e.g., a message-sending API call) it attempts to infer a complete behavioral model of what causes the component to send this message (e.g., to what input it reacts, at what periodic frequency it is sent, in what state it is sent).Since static analysis cannot always recover all parts of this behavior, resulting models can be partial (i.e., include known unknowns ⊤).To measure how often static analysis fails to infer parts of the resulting model as a measure of how complete and precise inferred models are in the practice we calculate the recovery rate for different behavior for real-world ROS components.How high is ROSInfer's recall for real-world ROS systems, i.e., what percentage of architecturally-relevant component behavior can ROSInfer infer correctly?

Table 1 :
Systems used for evaluation with their stars on GitHub (as of 17th July 2023), lines of XML configuration files, lines of code including their dependent ROS packages, and number of components.

Table 2 :
Results for RQ1: The trigger types recovery rate is the percentage of inferred publish calls for which ROSInfer can infer what kind of trigger causes that behavior (periodic or reactive).For each sub-type of behavior, percentages show how many of that type do not contain unknowns (⊤) in the inferred modes of a total of 534 components of the five large real-world systems presented in Section 4.1. is the total number of behaviors of the respective type inferred by ROSInfer (all publish calls in the case of trigger types).The All row counts components only once, even if they are included in multiple systems.

Table 3 :
Recall and precision of ROSInfer based a comparison with 149 manually inferred component models. ,  , and   are the number of true positives, false positives, and false negatives compared to the ground truth models.