Refinery: Graph Solver as a Service Refinement-based Generation and Analysis of Consistent Models

Various software and systems engineering scenarios rely on the systematic construction of consistent graph models. However, automatically generating a diverse set of consistent graph models for complex domain specifications is challenging. First, the graph generation problem must be specified with mathematical precision. Moreover, graph generation is a computationally complex task, which necessitates specialized logic solvers. Refinery is a novel open-source software framework to automatically synthesize a diverse set of consistent domain-specific graph models. The framework offers an expressive high-level specification language using partial models to succinctly formulate a wide range of graph generation challenges. Moreover, it provides a modern cloud-based architecture for a scalable graph solver as a service, which uses logic reasoning rules to efficiently synthesize a diverse set of solutions to graph generation problems by partial model refinement. Applications include system-level architecture synthesis, test generation for modeling tools or traffic scenario synthesis for autonomous vehicles. Video demonstration: https: $I$ Iyoutu. be/Qy_3udNsWsM Continuously deployed at: https: $I$ Irefinery. servicesl


INTRODUCTION
Motivation and challenge.Model-based systems engineering is a popular approach in the industry for the design of critical softwareintensive cyber-physical systems [31] supported by a variety of modeling tools (like Artop, Capella, Yakindu, OpenModelica and many closed-source alternatives).These tools help reveal design flaws early, thus reducing costs and improving product quality.In this context, there is an increasing need for a diverse set of synthetic graph models to represent test cases and benchmarks for modeling tools or candidate designs in systems engineering.However, such synthetic graph models need to be consistent to comply with the underlying domain-specific standards (e.g.AUTOSAR, SysML) captured in the form of metamodels and well-formedness constraints.
However, the automated synthesis of a diverse set of consistent domain-specific graph models is very challenging.(A) Random model generators can derive large and diverse graphs, but the derived graphs are not consistent.(B) Search-based model generators [17,24] can derive large and consistent graphs using evolutionary algorithms but without any guarantees for completeness or diversity.(C) Solver-based model generators [6][7][8]32] map the graph generation problem to a SAT-or SMT-problem in the background to detect inconsistencies of specifications, but they fail to derive a diverse set of graphs with more than a few hundred of graph nodes.Conceptually, a graph solver can also be regarded as an SMT-solver for the domain of complex graph (or relational) data, but the main focus is to derive a diverse set of consistent models (if they exist).
Objective and scope.The Refinery framework supports the efficient generation of consistent and diverse domain-specific graph models.It offers (1) a high-level specification language to capture the domain and control the range of graphs requested by end users, (2) a semantically well-founded graph generation approach based on refinement of partial models using 4-valued logic, and (3) a modern cloud-based architecture that provides a partial modeling editor, a partial model reasoner and a graph solver for engineers made available in a web browser or programmatically as a Java library.
First, the end user needs to provide a domain specification, which consists of a metamodel, an (optional) initial partial model, a set of predicates and constraints and a scope definition to restrict the size of the generated models.Then server-side automated graph generation back-end can be initiated by a push of a button (or programmatically).The diverse set of auto-generated consistent graph models can be visualized or serialized in a textual format.
A main use case of Refinery is to synthesize graphs as test cases in applications such as the testing of modeling tools or systemlevel testing of autonomous vehicles [2].Refinery also helps experts to semi-automatically provide a variety of consistent design candidates with complex graph structures as part of design space exploration [14], partial modeling [5], or feature modeling [12].
Envisioned users.The Refinery framework primarily aims to target systems and software engineers to derive complex test suites for industrial modeling tools (like Artop, Capella, Yakindu, OpenModelica and many closed source alternatives).An ongoing initiative is to provide test models for the new SysML standard [13].However, the modern web interface enables the use of the framework by other domain experts, e.g.safety experts developing test scenarios in autonomous vehicles, or Blockchain experts (see section 4).During the development of the framework, we have received regular feedback from researchers at Budapest University Technology and Economics (Hungary), McGill University (Canada) and Linköping University (Sweden) and engineers at IncQuery Labs.

CONCEPTUAL OVERVIEW AND USAGE
Our framework offers a high-level specification language for model generation using refinement of 4-valued partial models.

Specification Language
The framework provides a concise yet precise specification language for generating graphs based on the syntax for partial models proposed in [10].The language allows to control the range of generated models using four kinds of language elements: ⟨problem⟩ := (⟨domain⟩|⟨assertion⟩|⟨predicate⟩|⟨scope⟩) * A domain specification in Refinery captures the key concepts and relations of the domain using an essential subset of XCore [27], a popular textual metamodeling language integrated with Eclipse Modeling Framework [26].In a domain specification, the user can declare classes and associations as relational symbols (denoted by ⟨s⟩ in the grammar below), while a large set of logic constraints imposed by the structure of the metamodel is automatically translated to assertions and predicates (including the type hierarchy, multiplicities, and containment hierarchy, as illustrated in [10]).
.⟨max⟩]?⟨s⟩(opposite⟨s⟩)?) * } For example, graphs representing file structures may provide classes such as FileSystem, File, Directory, and Symbolic Links (SymLink).A FileSystem contains a File as a root; each Directory contains multiple Files, and a SymLink can refer to other Files.A non-annotated ⟨truth-value⟩ denotes true value assignment, and the !symbol denotes false value assignment.For example, one can prescribe that a directory called resources exists in a graph, which contains an img file and a symbolic link pointing to the image, while one can state that the image file cannot be a Dir: Dir ( resources ).element ( resources , img ).element ( resources , link ).target ( link , img ).! Dir ( img ).
Logic predicates provide custom model views while constraints (error patterns) allow to further restrict the range of valid graphs.A logic predicate in Refinery declares a new n-ary relational symbol (with header variables ⟨v⟩), and defines a constraint formulated as a disjunction of multiple bodies (separated by the ";" character), which are composed as a conjunction of literals (separated by the "," character).A literal refers to the truth-value of a symbol with variables: by default, the literal refers to "*" denotes transitive closure.The keyword error denotes error patterns: in a valid model, such predicates must be false for each node [4].⟨predicate⟩ := (error)?pred⟨s⟩(⟨v⟩(,⟨v⟩) * )<->⟨body⟩(;⟨body⟩). ⟨body⟩ := ⟨literal⟩(,⟨literal⟩) * ⟨literal⟩ := (!)?⟨s⟩(*)?(⟨v⟩(,⟨v⟩) * ) For example, we can identify self-loops with a predicate that matches symbolic links targeting themself and forbid their occurrence with the error keyword.Similarly, we can ban empty directories with a predicate that matches nodes that are directories and have no elements.Finally, we can refer to some files as important if more different links point to them.
The scope controls the size of the generated models by defining the minimum and maximum number of instances (or predicate occurrences) of the scoped symbol with a true truth-value.
One may generate models with 25 to 30 nodes, two file systems, and at least one match for the predicate detecting important files: scope node =0..30 , FileSystem =2 , important =1..*.

Generation with 4-Valued Partial Models
Model generation can be initiated by the user (Generate button) to derive a consistent model of the specification, if such a model exists.Simple inconsistencies can be highlighted in Refinery by error markers on the derived graph.To obtain a diverse set of graphs, each newly generated graph is structurally different from previous ones ensured by shape-based graph diversity metrics [22].The Refinery framework uses 4-valued logic [3,9] to explicitly represent incomplete, partial (paracomplete) models, and to tolerate errors and inconsistencies (paraconsistency) arising during the evaluation of computations over such models.4-valued logic contains the usual false and true truth values, the unknown value introduced for uncertain (unspecified) properties, and the error value that signals inconsistencies.The subset {false, true, unknown} of logic values can express partial, but potentially consistent information (such as incomplete models).Conversely, the subset The Refinery framework collects all assertions and predicates of the problem specification, and semantically merges the information content into a 4-valued partial model.The framework continuously (and incrementally) checks and visualizes the problem specification, and immediately pinpoints if there is a set of inconsistencies (e.g.related to type errors, multiplicities).
During model generation, instance graphs are derived along refinements of partial models [2,20]: at each generation step, (1) an uncertain element is selected and resolved by decision rules, and (2) the consequences of this decision are investigated by unit propagation rules (which also handles continuous type checking and multiplicity validation).As Refinery supports the incremental and partial evaluation of constraints [21], a (certain) match of an error predicate immediately triggers backtracking, while potential matches of error predicates provide search heuristics [18].
Figure 1 shows a partial model with inconsistent and incomplete values.Previously, node img was defined not to be a Dir, while SymLink was not explicitly excluded from its type, hence the node is marked as potential SymLink (white type label).If we now add the assertion target(link,link), it causes an inconsistency flagged by an occurrence of the error predicate selfLoop (see the respective error marker in the partial model).Truth-values of other predicates are also automatically calculated and updated in the model: the node link is denoted as potentially important, while no other node in the model has the potential to be important.

ARCHITECTURE
The architecture of the Refinery graph solver as a service is illustrated in Figure 2. Refinery follows a modern multi-tier software architecture: graph generation problems are input via the frontend web application (or a Java library), while the backend comprises auto-scalable service deployed as containers behind a load balancer.

Frontend
Web application.A Single-Page Application (SPA) was created for editing and visualizing partial models and initiating model generation.To provide editor support, we opted to perform the bulk of the parsing and semantic analysis of the partial models in the backend in order to reuse the analyses already required by the model generation.In particular, we display the logical consequences of the statements in the partial model by executing propagation operations on the backend immediately after the user edits the partial model.
The SPA establishes a WebSocket connection and sends editing operations (text deltas) from the textual partial model editor (based on the CodeMirror1 framework) to the backend.Features like syntax error checking, content assist, find occurrences, semantic highlighting, and automatic formatting are initiated via the WebSocket when the editor is idle or upon user request.
Additionally, when the partial model description is free of syntactic errors, the corresponding partial model semantics, including any discovered (semantic) inconsistencies are obtained back from the server.The user may apply further filtering (e.g., hide some nodes or relations) before visualizing the model as a graph using Graphviz and D32 .When the user initiates graph generation, the generated solution is also visualized.Subsequent generation requests return different solutions by choosing a different random seed.
Client library.Refinery is also available as a library in the Java programming language.Model generation tasks may be programmatically submitted using our textual partial modeling language.Alternatively, more direct interaction with the partial model management library and the model generator is available for more specialized use-cases, such as iterative model generation [23].

Backend
The backend consists of three main components: the (1) partial model editor, the (2) partial model reasoner, and the (3) graph solver for generating consistent graph models.
Container-based packaging provides easy deployment to cloud providers, such as AWS.Components (1)+(2) may be deployed as a single Docker container to serve as a backend of the web-based editor, while components (1)+(2)+(3) deployed together in a container enable model generation.In both cases, Refinery does not rely on any other server-side state.Thus, it can be automatically scaled behind an load balancer that can handle WebSocket connections.
A Docker image, containing (1)+( 2)+(3) as a monolith is also available for local or on-premises deployments (e.g., for use-cases where the resource limits provided by the cloud service are insufficient).
Partial model editor.The editor contains a web server to handle incoming WebSocket connections and uses Eclipse Xtext [28] to parse partial models and provide syntactic analysis features.Then partial models are transformed into an internal semantic representation according to the Refinery language semantics.
In order to enable multiple concurrent users per server instance, we implemented an optimized, WebSocket-based protocol over the Xtext Web feature of the Xtext framework to maintain a server-side copy of the edited partial models.This allows to substantially lower resource utilization per user and latency on initial connection by serving multiple users with a single backend instance.
Partial model reasoner.The core of partial model reasoning in Refinery is an efficient model management library that enables the compact representation of multiple versions of partial models [25] as relational logic structures.This component can also be used as a standalone Java library to store and query partial models.Consistency checking and refinement of partial models requires reasoning about the type system describing the specified and unspecified parts of graphs and the graph constraints that capture consistency rules.To this end, we integrated an incremental graph query engine based on Viatra Query [4] in combination with advanced approximation techniques for partial query evaluation [21].
Propagation steps are partial model refinements that encode logical consequences of the type system and constraints.Refinery derives a refined partial model and analyzes its consistency every time the user edits the partial model.Numerical constraints are handled by external solvers, including GLOP3 for linear constraints.
Graph solver.The solver takes the initial partial model provided by the user and generates a set of consistent graphs as output [21].
At each decision step, a new partial model is derived (as a new exploration state) by decreasing the number of uncertain nodes and edges in the partial model while simultaneously increasing its size.
State space exploration needs to repeatedly detect if a partial graph has already been reached (special isomorphism detection), and if a graph constraint is surely violated, when no consecutive refinements will ever lead to a consistent model.We rely on graph shapes [16] to detect isomorphic graphs and enforce diversity [22].

EVALUATION
Scalability evaluation.An initial evaluation of the Refinery framework is provided in Table 1 across five different domains used as case studies in previous research [2,11,20].The number of classes, associations, and constraints in each domain is listed in Table 1 together with the largest model successfully generated as follows.In this initial experiment, each generation run had a 60-second timeout.We incremented the model size by 250 until the generation for a given size timed out five times in a row.In the case of the Social Network domain, because of the limited size of the generated models, we increased the number of objects by 10.Finally, it is worth pointing out that model generation runs for each of these domains are available as part of the integration test suite of Refinery.
Theoretical properties.The Graph solver algorithm provides multiple formal guarantees [30], including correctness (a model is generated, then it satisfies the constraints) and completeness (if a model satisfies the constraints, it will be generated eventually).
Uses cases in research and education.The development of the Refinery framework has been supported by an Amazon Research Award, and it has been successfully used in several practical applications, which include (1) the automated synthesis of test scenes for autonomous vehicles, (2) generation of dependable blockchain architectures, and (3) automated synthesis of system architectures for early mission planning.The framework has also been used by MSc students as part of an advanced modeling lab offered at Budapest University of Technology and Economics.Moreover, a public tutorial of Refinery has been delivered at ASE 2023 by the authors.
Related work.There are other software tools that offer the automated synthesis of consistent graph models.Most notable examples include Alloy [8], Sterling [29], USE [6], Pledge [1,24], UMLtoCSP [7], TAF [17] or VIATRA Solver [19].However, we wish to point out that the size of models generated by Refinery compares very favorably to alternative model generation approaches (see e.g.[2,11,20] for detailed measurements of other tools).Compared to our previous work [19], Refinery provides a fundamentally novel, modern cloud-based architecture, a 4-valued partial modeling framework, and new decision procedures for type inference and propagation.Running example.The example is available at [15].

Figure 1 :
Figure 1: Example of a partial model in Refinery

Figure 2 :
Figure 2: Architectural overview of the Refinery graph solver as a service

Table 1 :
Largest models generated in 60 seconds