Exploring the Computational Complexity of SAT Counting and Uniform Sampling with Phase Transitions

Uniform Random Sampling (URS) is the problem of selecting solutions (models) from a Boolean formula such that each solution gets the same probability of being selected. URS has many applications. In large configurable software systems, one wants an unbiased sample of configurations to look for bugs at an affordable cost [12], [13]. Other applications of URS include deep learning verification (to sample inputs from unknown distributions) [2] and evolutionary algorithms (to initialize the input population) [4].


PROBLEM
Uniform Random Sampling (URS) is the problem of selecting solutions (models) from a Boolean formula such that each solution gets the same probability of being selected.URS has many applications.In large configurable software systems, one wants an unbiased sample of configurations to look for bugs at an affordable cost [12,13].Other applications of URS include deep learning verification (to sample inputs from unknown distributions) [2] and evolutionary algorithms (to initialize the input population) [4].
Model counting (#SAT) -the problem of counting the number of solutions of a Boolean formula -is closely related to URS.These two problems generally rely on the same principles and heuristics to be solved; URS can also be reduced to #SAT [11].Beyond URS, #SAT has many applications in (configurable) software engineering, such as variability reduction [15], variability evolution and analysis [7,16], feature prioritization [15] and bug fix prioritization [8].URS and #SAT are both challenging to solve efficiently: existing solutions hardly scale to real-world formulas [13].Unlike the traditional problem of satisfiability solving (SAT), the reasons behind the complexity of URS and #SAT have been underexplored.
Problem: This poster addresses the understanding of the computational complexity of URS and #SAT to determine whether or not a formula is computationally prohibitive.Similarly to research efforts made for the conventional SAT problem in the past 30+ years [9,10], our objective is to unveil these factors by conducting systematic analyses.

METHODS
Phase transitions are sudden changes in the properties of a subject observed after a small variation of a parameter.For SAT problems, Mitchell et al. [9] have shown that phase transitions occur when the clause-to-variable ratio of the formula reaches around 4.25.In other words, they observed that the time required for a SAT solver to process a formula suddenly changes when approaching this ratio.This discovery has later guided research in the development of new algorithms [3,5,14] specifically focused on instances lying at the phase transition.The analysis of phase transitions for SAT thus has had considerable theoretical and practical importance [6].
We contribute to a principled understanding of URS and #SAT complexity by studying whether phase transitions also occur in these problems.Our investigations require both experimental analysis (based on controlled experiments and artifacts) and empirical analysis (uncontrolled observations made on existing practices).Indeed, while empirical observations on real-world formulas are needed to validate conclusions in practice, the low availability and high heterogeneity of these formulas entail an insufficient coverage of all possible structural variations to draw general conclusions.Therefore, we also conduct experiments on synthetic formulas created through systematic and controlled procedures to identify general trends and limit cases.Doing so allows us to explore the role of specific characteristics in the complexity of URS and #SAT.
Thus, we first study experimentally the complexity of model counters and uniform random samplers by generating k-CNF formulas, i.e., random formulas in CNF form where every clause has exactly k literals.These formulas are generated by randomly choosing k variables, which are then negated with probability 1 2 .The randomly chosen variables form a clause.We repeat the process to generate the desired number of clauses.We control the generation to cover different values of clause-to-variable ratio -the structural characteristics previously used to reveal phase transitions in SAT problems [9].
Second, we explore the reasons behind the phase transitions.The phase transition for SAT solving has been explained by the fact that there is a sudden change in the probability of a generated formula being satisfiable.Because #SAT and URS techniques do not rely on the same principles as classical SAT (e.g.SAT aims to find one solution to the formula, whereas #SAT/URS have to explore all solutions), we look for another explanation for phase transitions in #SAT and URS.Instead of exploring the clause-tovariable ratio, we explore the ratio  =  ICSE-Companion '24, April 14-20, 2024, Lisbon, Portugal Olivier Zeyen, Maxime Cordy, Gilles Perrouin, and Mathieu Acher is the logarithm base 2 of the number of solutions divided by the number of variables in the studied formula  .Thus,  represents the proportion of variables that are theoretically necessary to encode the solutions of the formula.
Finally, we empirically analyze real-world formulas to check whether the general trends observed on synthetic data are confirmed.A key difference is that real-world formulas have a heterogeneous structure, e.g., they mix clauses of different sizes.This heterogeneity impedes drawing general conclusions from the sample we use.To limit the influence of this heterogeneity, we first preprocess the formulas by doing a boolean constraint propagation.We continue by eliminating subsumed clauses (i.e.clauses that are supersets of other clauses in the CNF formula).Finally, we remove variables that are either unconstrained (i.e.do not appear in any clause) or constants (i.e., variables appearing in clauses containing only one literal).We then compute the clause-to-variable ratio.
Methods.We use phase transitions [9,10] as tool to understand URS and #SAT complexity.Because of URS, #SAT and SAT theoretical differences, we offer the  ratio as a clause-to-variable replacement.Our methods mix experimental analysis (based on controlled experiments and artifacts) on curated synthetic data and empirical analysis (uncontrolled observations made on existing practices) on real-world formulas.

RESULTS
By setting  = 3, we observe that phase transitions indeed occur for both URS and #SAT, though at a different ratio than for SAT (2.00 versus 4.25).This observation persists regardless of the modularity of the formulas, where modularity is a measure of the extent to which formula clauses are independent [1].This is interesting as Ansótegui et al. have shown that real-world formulas have a high modularity.However, we find that modularity does affect the amplitude of the phase transitions -a lower modularity means a greater increase in computation time.
We generate random -CNF formulas with  = 3 and  = 4.We find that the phase transition for  = 4 moves regarding the clauseto-variable ratio but is more stable for  .We thus uncover that the complexity is related to the number of models of the formula relative to its number of variables.
Finally, our observations on 503 formulas indicate that phase transitions also occur in real-world formulas as the hard instances all have a clause-to-variable ratio higher than 2.
Results.We observe phase transitions on 3-CNF synthetic formulas at a ratio of 2.00 (instead of 4.25 for SAT problems) independently of modularity.Modularity affects phase transition amplitude.Hard real-world formulas have a higher clause-to-variable ratio.
2 (|  |)/|  ( )| which This work licensed under Creative Commons Attribution International 4.0 License.322 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.