How Easy is SAT-Based Analysis of a Feature Model?

With feature-model analyses, stakeholders can improve their understanding of complex configuration spaces. Computationally, these analyses are typically reduced to solving satisfiability problems. While this has been found to perform reasonably well on many models, estimating the efficiency of a given analysis on a given model is still difficult. We argue that such estimates are necessary due to the heterogeneity of feature models. We discuss inherently influential factors and suggest potential algorithmic solutions.


INTRODUCTION
Feature models [7,10,21] describe the user-visible characteristics, known as features, of software product lines (SPLs) [6,45].A configuration of features is valid when it fulfills all feature dependencies, known as constraints [6].The valid configurations of a feature model usually form large configuration spaces [21,50], which quickly become difficult to comprehend.Thus, automated feature-model analyses have been proposed [2,8,9,36,46,60], with which stakeholders can improve their understanding about a feature model (e.g., to spot modeling errors or guide business decisions).Furthermore, feature-model analyses enable more advanced SPL analyses, which support several activities in the software development life cycle (e.g., design [8,52], implementation [32,56], testing [29,35], and economical estimates [15]).
To implement such analyses, feature models are often represented as propositional formulas [6,7,39,47], which are then passed to off-the-shelf analysis tools, such as satisfiability (SAT) solvers [36,38].This approach is usually tractable in practice, although SAT is NP-complete.In two well-known publications, this phenomenon has been empirically investigated on large collections of feature models; concluding that "SAT-based analysis of (large real-world) feature models is easy" [31,36].
While many studies and experiences confirm this overall sentiment, feature models are also known to be heterogeneous (e.g., in terms of origin, domain, and size) [5,50].Indeed, there are several large feature models (e.g., the Linux kernel, Freetz-NG, or Automotive02) that still challenge state-of-the-art analysis techniques [28,42,44,50,51,55].This is because not all analyses are equally tractable: For example, while a single call to a SAT solver is usually cheap to compute, some analyses are difficult or impossible to phrase in terms of a single SAT call [51].Instead, they either require several SAT calls (e.g., reasoning about edits [57], core/dead features [12,20], type-checking [22,23]), specialized solvers (e.g., #SAT [50] or AllSAT [18]), or algebraic reasoning (e.g., slicing [1,26] or differencing [2]).Thus, a more conservative interpretation of previous results might be: "many SAT-based analyses on most (large real-world) feature models are comparably easy".
However, this naturally begs several questions: Which analyses are easy on which feature models?What does easy mean for feature-model analysis?What factors influence the answers to these questions?We argue that it is time to pivot from a class-based point of view on feature-model complexity to an instance-based perspective.That is, instead of making sweeping statements about the entire class of feature models, it may be more illuminating to try to estimate the difficulty of computing a given analysis for a given feature model (potentially taking into account other influential factors).With this shift in perspective, we aim to foster a more nuanced discussion of feature-model complexity in the SPL community, which takes the heterogeneity of feature models into account.Both perspectives are cases of feature-model meta-analysis, which provides a general framework for talking about properties of feature-model analyses.In the following, we outline our idea of meta-analysis, the shift in perspective we propose, and initial suggestions for working towards instance-based meta-analysis.

FEATURE-MODEL META-ANALYSIS
We define feature-model meta-analysis as the practice of asking (and answering) questions about feature-model analyses as follows: • First, one must ask a question about a (non-)functional property of feature-model analysis.The actual analysis results are not of interest here, instead one asks for correctness or efficiency (e.g., regarding runtime, memory usage, or energy consumption).The question fixes some factors (e.g., feature model and analysis), while leaving others blank (e.g., algorithm and solver).• Second, one must define criteria to answer the question (either exactly or an estimate) and propose an algorithm to do so.
Here, we discuss two opposed kinds of meta-analysis: classand instance-based meta-analyses.While this distinction is not clear-cut, it serves well for demonstrating the shift in perspective we propose.

Class-Based Meta-Analysis
Class-based meta-analyses ask questions about a whole class of feature models and/or analyses.Thus, they can illuminate the feasibility of computing certain analyses on certain models.In previous work, several questions of this kind have been asked (and answered).
"Is SAT-based analysis of feature models easy?"This is an open meta-analysis question that binds few factors and can only be answered with "yes" or "no".Mendonça et al. [36] actually pose and answer a more specific variant of this question: They focus on artificial feature models and, although acknowledging repeated SAT calls, they perform only one SAT call.They conclude that analysis is indeed "easy" because they find no phase transition."Is SAT-based analysis of large real-world feature models easy?" Analogously, Liang et al. study the feasibility of singular SAT calls on feature models of several open-source SPLs, which are specified using the KConfig language [31].They give new insights as to why determining feature-model satisfiability is comparably easy-still, they do not consider complex feature-model analyses or distinguish difficulty on a per-instance basis.

Instance-Based Meta-Analysis
The above meta-analyses have likely been helpful in establishing the widespread use of SAT solvers for feature-model analyses.However, they neither acknowledge the vast gap between feature models that are computationally "simple" (e.g., the graph product line [33]) or "complex" (e.g., the Linux kernel [42,55]), nor do they distinguish how this computational complexity may depend on the computed analysis (or other, more subtle factors).We discuss briefly how to both ask and answer instance-based meta-analysis questions.
Asking Meta-Analysis Questions To acknowledge the differences in complexity between feature models, we can ask more precise questions about analysis tasks, such as: "How much time does analysis X need on feature model Y when using solver Z?" or "Which algorithm is most memory-efficient for computing X on Y?" These questions still leave room for filling in details (e.g., system specifications), so they can only be estimated-but they will yield more useful answers for a given use case than the more sweeping statements obtained with class-based meta-analyses.The appropriate level of parametrization depends on the use case and represents a trade-off: Binding more factors allows for more accurate estimates (improving internal validity); binding less factors allows for more general settings (improving external validity) [49].As a starting point for posing interesting questions, we list several factors that we know or suspect to influence the correctness or efficiency of feature-model analyses: • Feature Model [50]: origin [5], domain, size, and expressiveness of constraints [24,54] • Propositional Encoding [6,7,47]: extractor (e.g., for KConfig specifications [16,17,42]), non-Boolean variability [11,40,43], CNF transformation [28,34], and preprocessing • Analysis: class (consistency, cardinality, enumeration, or algebraic) [51], the question it answers [8,52], the chosen algorithm [12,20], and its implementation • Solver (if needed): class (e.g., SAT [36], #SAT [50], AllSAT [18], or VSAT [59]), solver parametrization (e.g., exact or approximate, optional preprocessing steps), name/version • Knowledge Compilation (if needed): class (e.g., BDD [19,37,55] or d-DNNF [53]), name/version • Prior Information (if given): existing analysis results, revisions (incremental analysis [25]), and interfaces [48] • Execution Environment: CPU, RAM, and deep variability [30] It is one purpose of feature-model meta-analysis to study the influence of these (and other) factors and how they interact.To do so, we must find techniques to answer meta-analysis questions.
Answering Meta-Analysis Questions Ideally, we want to answer instance-based meta-analysis questions without actually computing the analysis in question, which can be costly or even infeasible.Instead, one usually tries to investigate surrogate metrics (e.g., on an ordinal or interval scale) to estimate analysis complexity.For example, we can characterize feature models using metrics: • Syntactic Metrics [50]: number of features, variables, constraints, clauses, literals; constraint size, density [31] • Semantic Metrics [3,4]: phase transition [36], community structure [41], self-similarity While syntactic metrics are easy to compute, they seem to allow for rough estimates at most [19,50].Semantic metrics are probably better indicators for inherent complexity of a feature model, but are themselves usually NP-hard and could therefore be approximated.
Once we have determined suitable metrics for studying a metaanalysis question, we must also choose an algorithm to answer it.To this end, previous work uses simple criteria (e.g., "yes/no" for a phase transition [36]) or otherwise handcrafted models and hypotheses (e.g., the number of features correlates with analysis time [50]).Alternatively, machine learning techniques might be applicable, but we are not aware of any studies in this direction.

CONCLUSION
By pivoting from class-to instance-based meta-analysis, many directions for discussions and future work open up: What metaanalysis questions are worth asking, what factors are relevant, and how do they interact?Is feature-model complexity intrinsic, regardless of the chosen analysis or solving technique?When do knowledge compilation and incremental analysis pay off?By improving our ability to answer instance-based meta-analysis questions, we lay a foundation for implementing meta-analyzers that (semi-)automatically choose the best (e.g., fastest) analysis plan (i.e., algorithm, solver, . . . ) for a given analysis task, analogous to what portfolio solvers do for SAT [58].Thus, analysis plans render analyses into first-class objects, which we can precisely describe, manipulate, and optimize; as has been done for databases [14] and, to some degree, also been proposed for SPL analyses [13,27].