From Decision Models To User-Guiding Configurators Using SMT

As decision models for the correct derivation of customizable products grow ever more complex, user-facing configurators are developed to manage this complexity by emulating the models as closely as possible. By design, decision models closely resemble the structures of configurators. Therefore, a classical configurator that implements a decision model often copies the model’s (linear) ordering of decisions. To enable users to jump into arbitrary points of a decision model, non-linear analysis is necessary. We present a novel way of automatically encoding a complex decision model into Satisfiability Modulo Theories (SMT) formulas, which are then solved by an SMT solver. Users interact with a configurator front-end, which internally calls an incremental SMT solver that returns open choices. In this paper, we present the architecture of our framework, introduce a general direct SMT encoding of decision models, and illustrate with a small case study the potential of our approach for product configuration and reconfiguration. Our approach successfully converts a legacy ordered decision model from an industrial application into a fully interactive configurator.


INTRODUCTION
Configure-to-order (CTO) is an efficient production method that provides customers with high variability in possible products.When customization options offered by a product are sufficient for a customer, no additional engineering is required.This stands in contrast to engineer-to-order (ETO) products, which first have to be (more or less extensively) adapted in order to meet customer requirements.CTO requires automated decision procedures to determine which additional options are valid for any given product, forming complex relations between them.Different industrial approaches exist to handle the resulting complexity, many of them based on proprietary decision models (DMs) [14].
In order to build highly customized machines, our industry partner established the following workflow: (1) Select the category of machines that is capable of the task required by a customer.(2) Apply pre-configured CTO options.
(3) Derive the fully configured artifact by applying other options and required parts from their proprietary DM. (4) If required, engineer custom ETO extensions.
These steps are carried out by a sales person together with a configuration expert to map a customer's requirements to a valid machine.Our partner is iterating on their CTO method with the goals to decrease the number of ETO orders and increase the visibility of possible options.The requirement to capture the exact semantics of the proprietary DM and to integrate more complex rules lead to the following two challenges.
• Non-Linear Configurators with User Guidance.To improve existing configuration solutions, an already existing (ordered) DM should automatically be usable for building advanced user-guiding configurators.These do not force the internal ordering of the original DM (e.g. from its tabular or flat-list notation or its code representation) on users.The user should be presented with multiple available decisions at once and have freedom of navigation [15], with each choice also possibly affecting other choices anywhere in the list of decisions.Derivation of a configured artifact remains purely ordered, but arriving at the right decisions to achieve the desired outcome is optimized, while the original workflow with the already existing legacy DM is maintained.• Upgrade Paths.After shipping a product, it leaves the scope of the DM, effectively becoming a unique branch.It is desirable to re-integrate a product into the DM to check which new features or upgrades may be added at a later stage.When a new feature conflicts with some other property already configured, it cannot be applied, thus making the upgrade unfeasible without an ETO component.If it can be applied directly, the already engineered upgrade package could be applied without individualized engineering effort.
In order to address these challenges, we developed DMalizer, a framework for analyzing DMs as implemented by our partner's proprietary DM tool.We contribute a (to the best of our knowledge) novel encoding for DMs into Satisfiability Modulo Theories (SMT) [4] which makes it possible to build non-linear product configurators, compute upgrades, and analyze conflicts between configuration decisions.We describe our encoding in detail and demonstrate our framework in a case-study.

RELATED WORK
According to [7,14], variability modelling as used for product configuration is most commonly categorized into decision modelling and feature modelling.Following this categorization, the legacy configuration tool PropDM of our industrial partner is based on decision modelling, because it implements its variability as decisions to be made during derivation, it presents its configuration user interface as an "(ordered) list of questions", and always maps results to artifacts.While the categorization as decision model applies, PropDM differs to the decision modelling approaches described in [17] in several aspects.Decisions in PropDM are independent from variables, which are introduced separately (different from decision variables, as in e.g.Schmid and John [16] and DOPLER [9]).A decision's actions may assign values to arbitrary variables at arbitrary time during the derivation, similar to the free-form actions of Synthesis [1].Actions may also jump to other decision groups (called a GOTO action), similar to calling a function in imperative programming languages.Variables set in the callee are set globally.These jumps proved to be a major challenge, which we could not find related work for in the variability modelling literature, making us rely on inlining the called decision groups at their call-sites, similar to the function inlining technique employed by compilers [12].
Many approaches have been presented to automatically analyze various kinds of variability models [5,20].There are several approaches that employ SMT for analysis and for testing feature models: FeatureIDE detects dead features and redundant constraints with SMT [19].An approach to automatically generate test cases for variability models is described in [2] and in [3].Arcaini et al. repair broken models with the help of SMT solvers.Nieke et al. present an SMT encoding for anomaly analyses in the context of feature model evolution [13].None of these approaches employ SMT as backend to a user-guiding DM configurator as done in DMalizer.
While tools exist to convert DMs to feature models [10], the complex legacy of PropDM pushed us to create a new tool.Our industrial instance relies on comparing and matching rational numbers, complex string expressions (not always abstractable by an enumeration of values), and connecting decision groups (DGs) with jumps that change the order of DG evaluation, the combination of which we did not find in related approaches [10,18].Additionally, its size of over 10 000 interconnected DGs with an average of 22 decision each and a high reliance on linearity, resulting in tens of millions of possible configurations made it impossible to do a single-shot conversion of the whole model to a different modelling paradigm.Instead, we rely on analyzing sub-models directly and putting them into context with the rest of PropDM.Our direct approach also enables future optimizations, such as expressing GOTOs as macro-invocations on the SMT-solver's side, increasing throughput between DMalizer and its SMT backend.

PRELIMINARIES
We formalize the semantics of DMs in the context of PropDM with a subset of first-order logic (see [11] for an in-depth introduction).
In particular, we rely on SMT, established as a powerful, highly customizable subset of first-order logic for which efficient reasoning tools are available.For our application, we need comparison operations for string and numeric (integer and real) variables.
We follow the structure of the decision models as used in PropDM, the proprietary configuration tool of our industrial partner.We receive their DM in proprietary XML structure (DMX).The primitive elements of a decision model are variables  and their sorts.Each sort describes a (potentially infinite) set of possible values that a variable of that sort can hold.The state  of the DM is described by an interpretation   that maps each variable to a value within the domain of its sort.We consider formulas that are built from predicates of different arities.A predicate of arity  has a signature over  sorts.A predicate  is defined over the cartesian product of the domains of the sorts in a predicate's signature.
A decision  is a pair (, ) consisting of a predicate  and an action .An action is a mapping from state to state.If a decision's predicate  evaluates to true in a state , we say the decision is applied in  and  is mapped to  ′ as defined by action .If  evaluates to false,  is mapped to itself.A decision group (DG) is a mapping from state to state and defines an ordered list of (, )pairs.When a DG is executed, the decisions are evaluated in-order.After the first decision is applied, the remaining decisions of the DG remain unchecked, and therefore, they are not applied.A decision model is an ordered list of DGs.
PropDM's DMs are designed to be called multiple times, once for each part of the product under configuration.The results of the applied decisions are shown in the final state.The order of these invocations matters, as DMs tend to have internal dependencies on prior runs, with each state update  ′ being carried over to the next invocation.For reasoning over this behavior, we linearize multiple invocations into one big invocation with an all encompassing DG.For an example, see section 6.Other development aids offered by PropDM, such as GOTO actions that continue the DG evaluation in another additional DG and DGs which apply multiple decisions (multi-DGs) are also reduced to the simplified formalization described above.

ARCHITECTURE OF DMALIZER
The software architecture of our DMalizer framework is shown in Figure 1.First, a DMalizer executable is compiled from a source DM in DMX format, see subsection 4.1.Then, three modes of interaction are offered: Transpiling the source DM into Common Lisp or Python (see subsection 4.2 and Figure 2a), emulating a PropDM execution on the source DM (using the embedded and compiled Common Lisp transpilation, see Figure 2b), and interactively configuring an artifact from the source DM (see Figure 2c).The configurator mode internally relies on the SMT-solver Z3 [8], which keeps track of possible decisions and reduces the set of choices offered to the user.We describe how DMalizer internally calls Z3 in section 5.

Extracting a Model from DMX into our IR
The production-size DMX files of an industrial problem can be large (> 400 MB), making it impractical to rely on a DOM-style XML parser.Instead, we built a SAX parser based on a state machine that produces our intermediate representation (IR).It is stored inmemory as a set of Common Lisp objects, but can be persisted to disk using SBCL's (a state-of-the-art Common Lisp implementation) core-dump feature.All proprietary languages embedded in DMX are parsed using recursive descent parsers.While parsing, all literals are reduced to their most space-efficient representation, e.g.strings only containing digits are reduced to a number literal.This improves our IR's space efficiency and makes deducing variable sorts easier.
Optimizing the IR and Deducting Sorts.While the IR directly extracted from a DMX is already valid, some idiosyncrasies stemming from PropDM remain.We remove assignments of variables to themselves (used to express database queries in PropDM) and convert lists with only one element to be scalar.Most importantly, we deduce the sort of all variables in the given DM by iterating over all occurrences of each variable and choosing the sort that can hold all assignments and is defined for all given comparisons.Strings can only be compared to other strings, while only numbers can be compared by their cardinalities.The result is the most fitting representation for every variable in the DM.

Transpilation of Decision Models
In order to make iterating on legacy DMs and building new ones easier, we developed multiple tools for working with DMs outside of PropDM.These tools are not related to the SMT representation of DMs, instead they provide alternative formats, namely Common Lisp and Python, to work with the linear representation of models as a modelling engineer.These alternative formats are a functional copy of a given DM and emulate the execution behavior of PropDM, but themselves do not implement an interactive user-guiding configurator.The latter is built by converting the linear DM from our IR into an SMT formula, as we introduce below in section 5.
Export to Common Lisp.As PropDM does not provide the semantics of how it implements DMs, we reverse engineered it by writing a transpiler from our IR to executable code.We tested our interpretation of the semantics by matching (input, output) pairs of PropDM models with DMalizer and incrementally adapting our code generator to reduce the differences between input and output.In order to produce an executable model from DMX, DMalizer relies on Common Lisp's (compile) functionality, allowing it to directly optimize the generated code to be executable as native processor instructions.This sets DMalizer apart from the purely interpreter-based approach of PropDM, drastically speeding up the DM's execution.The compilation process itself is parallelized over all available cores and takes approximately one minute over the largest DM used in production by our partner.If the DM contains an unsupported operation not yet observed during reverse engineering, the unrecognized pattern produces an error.
Export to Python.To make reverse engineering PropDM's semantics easier and to better understand existing DMs, we developed a Python code generator next to the Common Lisp code generator.It converts our IR into a list of functions containing if statements, one function for each DG, one if statement for each decision.Each function receives a dict of all variables in the DM, which may be modified in-place.This code generator proved to be highly useful in exploring large DMs given only in DMX format, as the wellknown Python syntax is fully independent of PropDM and offers the same reference-jumping capabilities as the native PropDM editor, while needing neither the full PropDM software nor a network connection.The derived code itself is not meant to be executed in a configurator but can be used as a modelling language.See our case-study in section 6 below for an example of how we represent a DM in Python.
Import from Python.We created another package in order to streamline writing test inputs for DMalizer called PyToDMX.It reads a Python script as exported by DMalizer and produces valid DMX to be again read as a new source DM by DMalizer.This enables efficient co-evolution, making it possible to modify the Python script and import changes into a DMX file readable by both PropDM and DMalizer.PyToDMX is architecturally separate from a compiled DMalizer executable and is written in Python.It uses the ast standard library package to process a Python-encoded DM into an abstract syntax tree, which it then processes into DMX.

DECISION MODELS AS SMT FORMULAS
In this section, we first explain how we convert our IR into an SMT-LIB [4] formula.Afterwards, we describe how the SMT-solver is called by DMalizer during a configurator session and how it encodes user choices into SMT.This conversion to SMT is required, as incomplete configurations containing undefined variables cannot be evaluated correctly with PropDM's semantics otherwise.
The )) It states that if a decision   is active and its predicate evaluates to true in the current state, then this decision is applied, i.e., the next decision of the DG is not active and the state is updated according to the action of   .Otherwise, the next decision has to be active.If   was the last action of the DG, the variable   |  |+1 has to be true in order to indicate that no decision of this DG was applicable.
Variable Links.A DM maps initial state  1 to final state  | |+1 , each state mapping each variable from  to a value of its domain.Every   is executed in a state   , and the execution leads to state  +1 , possibily updating the variables occurring in the DG, i.e.,   is mapped to  +1 .To link the DGs, we use the linking chain To reduce the number of required equalities by an order of magnitude, we only link the variables occurring inside of a DG's predicate or action, and jump over a DG otherwise.All variables in   are linked to their first occurrence in the chain, with the last occurrence linking to  +1 .Decision Groups.A decision group is a conjunction over all decisions  with two implication chains for their visibility conditions    .One ensures that any given decision activates if the prior decision's action was not applied, or if it is the first decision of a DG and the DG is active (i.e.  0 = ⊤).The other ensures that no decision may be active if the prior visibility condition was already inactive, i.e.    −1 = ⊥.If the DG is inactive or if no action could be applied, the state has to be carried over by setting   =  +1 .The SMT-LIB code below encodes the th DG: (and (ite The full DM is encoded as a conjunction over all DGs and variable links.By default, all   0 variables are asserted to ⊤ to activate all DGs. Calling the SMT-Solver.After transforming the DM into SMT-LIB representation like described above, a (push) is sent to the SMT-solver.During interaction with its configurator, DMalizer repeatedly adds (simulated) choices as (temporary) assertions on the final state  | |+1 and calls (check-sat) to check if the new constraints change the satisfiability of the formula.Simulated choices are followed up with a (pop) call to remove the temporary assertions and prepare the solver for the next check.Actual user choices are not followed by a (pop) call in order for the selected options to stay active during the following checks, exploiting the incremental solving capabilitues of state-of-the-art SMT-solvers.During configuration, the SMT-solver tries to find assignments such that all variables to the given constraints are satisfied, respecting all decisions encoded in the DM.The DM effectively constrains how variables may be assigned at its end in relation to each other, as some assignments would lead to inconsistencies from decisions (actions setting variables to other values than their asserted ones) or to errors (actions setting some error variable).
Interacting with the Configurator.DMalizer's configurator keeps a list of possible options to be presented to the user.Each such option is a possible user choice, i.e. an assignment of some variable to some value from its (finite) domain.After each such choice, all remaining options of the same variable are removed and all possible assignments to other variables with finite domains are pushed as simulated user choices to the solver.It incrementally checks which of them would result in a satisfiable formula, including prior user choices.If the solver reports the formula to remain satisfiable under a given simulated choice, the respective option is kept and will be presented as possibility for the next user choice.Otherwise, the option is removed from the list of possible options, reducing the number of available choices in the next configuration step.Variables with infinite domains may be assigned freely or be inferred by the SMT solver, remaining choices are checked afterwards.to other DGs using GOTOs, as DMalizer then has to insert the entire nested hierarchy of DGs to replace the initial action that contained the GOTO.We expect our direct approach to have the potential to further optimize this scenario, as hinted in section 2. We did not yet implement parsers for other configuration languages (e.g. the Kconfig language [6]) into DMalizer due to time constraints.We now introduce a case study of DMalizer that shows how we address the challenges from the introduction, based on a fictitious shop for bicycles 1 .The DM is defined in Python in Listing 1, which we then transform into DMX using PyToDMX.Its logical view is pictured in Figure 3.

EVALUATION AND BIKESHOP CASE-STUDY
Non-Linear Bike Configuration.The legacy configurator of the shop is tightly coupled to its decision model.It asks customers step-by-step about the type of the bike and the desired fork type.A customer cannot directly jump to the decisions they care most about, instead they have to follow the predefined script of the configurator.Our approach can improve the user experience by using our SMT-LIB representation of the shop's DM in order to build a new configurator that does not depend on the order of decisions as given by the DM.We present the user with a list of options, each option reducing the remaining choices to the ones that keep the resulting configuration valid.
Upgrade Paths for Bikes.If a customer wants to upgrade a prior purchase, they want to know what the shop can offer that fits with the configuration they have.In the legacy system, the shop has to manually check what extensions may fit onto the old bike, keeping the knowledge encoded into the configurator hidden.In our approach, we can assert the end result of the old configuration to be the result of the evolved DM.The red rectangle in Figure 3 shows some added options for baskets (at potentially arbitrary positions).The SMT-solver now reduces the remaining options for the old DM to the ones that fit to the new DM. 1 See https://maximaximal.pages.sai.jku.at/vamos24/ for an interactive demonstration.Conflict Analysis.While customers are using the configurator, they may wonder why they cannot choose a specific value for a yet undecided variable.In this case, some other decision is blocking the current one.In legacy configurators, customers then have to backtrack their decisions in order, remember what they applied before, and then redo their decisions in the order forced by the configurator, hoping to find the path that allows the desired option to be selected.In our SMT-based approach, all blocked values of remaining decision variables can be extracted and assigned to the solver regardlessly.Then, we query the solver for the minimal UNSAT core (the smallest set of assertions that contain a conflict), and give the user the next best choice to undo.If multiple choices block the desired value, this process may has to be repeated.

CONCLUSION AND OUTLOOK
We presented a novel way of directly encoding decision models into SMT and presented our DMalizer framework with its embedded interactive configuration tool based on our encoding.Using our encoding, we can remove the previously fixed ordering from a given DM and automatically build a non-linear product configurator that provides maximum flexibility, can compute possible upgrades to previous configuration results, and can analyze conflicting configuration options.Our DMalizer framework can process industrial-scale DMs with > 10 000 DGs.We developed a non-linear user-guiding configurator for a sub-set of PropDM's features which can be used to interactively configure artifacts defined in PropDM models.We demonstrated it using a minimal example DM.We also demonstrated how our configurator can be used to find upgrades to existing solutions.We showed how DMalizer addresses the two challenges introduced in this paper and argued our direct approach is more scalable compared to a single-shot conversion into standard feature models of large legacy PropDM DMs.
The next step is enabling DMalizer to combine multiple SMT solver calls to support configuring a full industrial-scale DM containing multiple products end-to-end.Encoding such a model into a single formula remains a challenge.We will approach this by both extracting dependencies between DGs in order to partition a DM into appropriate sub-configurators and by further optimizing our SMT encoding to also directly support more complex PropDM features such as jumps (GOTOs) between DGs.

Figure 1 :
Figure 1: Architecture of the DMalizer Framework

Figure 3 :
Figure 3: Bikeshop Decision Model number of DGs of decision model  is | | and the number of decisions in decision group  is | |.The th decision group is denoted by   and its th decision    is given by (   ,    ).Every  has |  | + 2 boolean activation variables where    with 1 ≤  ≤ |  | (de-)activates the th decision.Further, variable   0 (de-)activates the entire DG and variable   |  |+1 is true if   is activated but no decision could be applied.By   we denote the DM's state before the execution of   and by  +1 we denote the DM's state after the execution of   .The activation predicate  (   ) is true if action    is executed in state   .Decisions.The encoding is encapsulated into different parts.We start with the description of the SMT formula  [, ], mapping the th decision   of the th DG into first-order constraints that are given to the SMT-solver.A decision's predicate should only be evaluated if it is active, which we model using the activation variable We successfully tested DMalizer with supported sub-sets of our partner's large industrial instance and used it to convert it (and Listing 1: The Bikeshop's Python code to be read by PyToDMX other smaller industrial DMX models) to Python.While configuring such supported sub-models with hundreds of decisions, we did not encounter SMT-related performance bottlenecks with our approach, but could not yet further increase the test size because of currently unsupported PropDM functionality.Bottlenecks were encountered when the DM contains several layers of DGs that jump