Speeding up SMT Solving via Compiler Optimization

SMT solvers are fundamental tools for reasoning about constraints in practical problems like symbolic execution and program synthesis. Faster SMT solving can improve the performance and precision of those analysis tools. Existing approaches typically speed up SMT solving by developing new heuristics inside particular solvers, which requires nontrivial engineering efforts. This paper presents a new perspective on speeding up SMT solving. We propose SMT-LLVM Optimizing Translation (SLOT), a solver-agnostic pre-processing approach that utilizes existing compiler optimizations to simplify SMT problem instances. We implement SLOT for the two most application-critical SMT theories, bitvectors, and floating-point numbers. Our extensive evaluation based on the standard SMT-LIB benchmarks shows that SLOT can substantially increase the number of solvable SMT formulas given fixed timeouts and achieve mean speedups of nearly 3× for large benchmarks.


INTRODUCTION
Satisfiability Modulo Theories (SMT) constraints are first-order logical formulas with functions and variables from various theories, such as real numbers, integers, and, as relevant to software engineering, bitvectors, and floating-point numbers.State-of-the-art solvers like CVC5 [2] and Z3 [15] use a complex mix of heuristics, theoryspecific engines, and SAT solver calls to efficiently reason about SMT constraints.Yet many constraints still take a prohibitively long time to solve.Improving solver performance can improve results for real-world applications.For example, in symbolic execution, lower solving time equates to greater code coverage [12].
The most popular approach to speeding up SMT solving is developing more powerful solving strategies.These have sometimes taken the form of new solvers like Boolector [36], or of new algorithms in existing solvers.For example, Berzish et al. [6] introduce new heuristics for string constraints involving regular expressions while Bjørner et al. [7] improve Z3's performance for custom theories.FastSMT [1] speeds up solving by using machine learning to choose the best solver heuristics.
This paper proposes a new perspective on improving SMT solving: instead of developing more advanced solving tactics, our key insight is to repurpose existing compiler optimization techniques to the SMT problem.In particular, we propose a translation-based pre-processing step, SMT-LLVM Optimizing Translation (SLOT), which can directly optimize input SMT-LIB formulas.Conceptually, our approach has three main advantages: • Simplicity: End-users of SMT solvers can benefit from compiler optimizations as a black box, without detailed knowledge of SMT-specific optimizations.• Solver-independence: Because it is a pre-processing step on SMT constraints, semantics-preserving optimization can be used in applications that use any solver(s).• Extensibility: New compiler optimizations can be directly applied to further improve SMT solving without the need to make complex changes to solvers.
SLOT bypasses the need to re-implement compiler optimizations in SMT solvers by translating the constraints, rather than the optimizations.While not all compiler optimizations are useful for the SMT context, the combination of semantics-preserving optimization with existing solvers creates a sieve: some constraints are caught quickly by existing solver heuristics, while others are handled better by SLOT.
We have implemented SLOT for the SMT theories of bitvectors and floating-point numbers.Constraints in these theories are the most relevant to software engineering because they model machine arithmetic; for example, they are used in practical tools for symbolic execution [12], translation validation [26], and program synthesis [8].In Section 4, we show that the semantics of these two theories can be exactly represented in LLVM IR.The key challenge for SLOT is bridging the semantic gap between SMT constraints and LLVM IR which exists because the languages, one declarative and the other imperative, were designed for entirely different purposes.
Figure 1 illustrates the three components of SLOT.The frontend translates SMT constraints to LLVM IR.This step ensures that every SMT function is converted to an equivalent sequence of LLVM instructions.Optimization uses the LLVM optimizer to simplify the translated constraint almost for free.Finally, SLOT'S backend translates the optimized LLVM IR back into an SMT constraint.The complex structures created by the optimizer must be translated back without semantic gaps.We have applied SLOT to the quantifier-free benchmarks for bitvectors, floating-point numbers, and their combination included in the SMT-LIB specification [3].Our extensive evaluation demonstrates that SLOT can substantially speed up SMT solving, especially for complex constraints which would otherwise take a long time to solve.Our approach increases the number of solvable constraints by up to 20% for bitvector, 15% for floating-point, and 80% for mixed benchmarks.Moreover, SLOT is more effective than existing solvers combined: it can solve constraints for which all tested solvers time out.We also observe mean speedups above 2× for bitvector and floating-point, and as high as 3× for mixed constraints.By measuring which optimization passes contribute most to the speedup, we find that simple peephole optimizations and global value numbering are sufficient to improve solver performance.
In summary, we make the following primary contributions: • We present an easy-to-use, solver-agnostic framework for speeding up SMT solving by translating constraints to a compiler IR and back.• We define, prove, and implement SLOT, and show that it improves the performance of solvers on standard benchmarks.• We measure which LLVM optimization passes contribute most to speeding up SMT formulas, giving users access to well-tested simplifications and solver developers insight into possible solver improvements.
The rest of the paper is structured as follows.Section 2 motivates SLOT with an example SMT constraint.Section 3 presents background on constraints in SMT-LIB, while Section 4 describes SLOT's translation and proves its fidelity.Section 5 describes the evaluation results, and Section 6 puts the results in context.Finally, Section 7 surveys related work, and Section 8 concludes.

MOTIVATING EXAMPLE
This section presents a concrete example (Figure 2) to motivate SLOT.Specifically, it takes Z3 390 seconds to solve the original formula (Figure 2a).After applying SLOT, the optimized formula (Figure 2d) can be solved almost instantly.
Input SMT constraint.Figure 2a gives an SMT formula from the SMT-LIB QF_BV benchmark set. 1 It checks whether multiplication can overflow (lines 3-7) when the inputs  and  are subject to a division constraint (line 8).The formula is unsat because any value of  which satisfies the second assertion causes the multiplication  × to overflow.Even though this constraint is concise and simple, Z3 takes 390 seconds to return the unsat result.Frontend.Figure 2b gives the result of SLOT's frontend: an LLVM function that is semantically equivalent to the SMT constraint in Figure 2a.This function returns true on an input (, ) if and only if (, ) satisfies the original constraint.Its instructions mirror the function applications in Figure 2a.For example, zext is equivalent to the SMT zero_extend operation and mul is equivalent to bvmul.
Optimization. Figure 2c gives the result of LLVM optimization on the function in Figure 2b using all available optimization passes.In this example, only three passes affect the function's instructions: instcombine, reassociate, and dce.These simplify away all substantive code, producing the function that always returns false.
Backend.Finally, Figure 2d shows the result of translating Figure 2c back to an SMT constraint.Since the LLVM function always returns false, the corresponding SMT constraint simply asserts falsity.Z3 can now trivially produce the unsat result in 0.02 seconds.

Challenges.
From Figure 2, we can see that SLOT allows an SMT solver to leverage the power of existing LLVM optimizations.The key technical challenge of SLOT is bridging the semantic gap between SMT constraints and LLVM IR, i.e., ensuring the input and output SMT constraints are equivalent.While multiplication and bit extension are equivalent in the two languages, SMT functions cannot always be directly mapped to LLVM IR.For example, line 10 of Figure 2b adds a check to ensure the division on line 8 of Figure 2a does not introduce division by zero, because this has undefined behavior in LLVM.

PRELIMINARIES
This section gives background on the SMT problem [4] and describes the bitvector and floating-point theories of the SMT-LIB standard [3].Definition 1 (SMT formula).Given a theory  with signature Σ and interpretations  , an SMT formula  is an expression made up of symbols (function applications or variables) from Σ.  is the set of maps from variables in Σ to sort-appropriate values. is satisfiable if there exists an interpretation in  that satisfies .
Intuitively, a theory  provides definitions of sorts (i.e., types) and functions, and an SMT formula is a set of variables and a constraint on those variables using functions from  .If there exists an assignment of the variables which fulfills the constraint, we call the formula sat; otherwise, it is unsat.The SMT-LIB standard defines eight theories and from these 29 logics, combinations of functions from one or more theories, possibly with extensions.All logics rely on the Core theory, which defines basic boolean operations like logic and, logic or, and equality.We restrict our discussion to the Core theory and the quantifier-free logics of bitvectors and floating-point numbers.
SMT-LIB has a sort for each width of bitvector ((_ BitVec )), several unary and binary operations on bitvectors like bvneg and bvadd, and bitvector comparisons like bvuge.The theory of floatingpoint numbers defines the sorts (_ FloatingPoint   ) for integers ,  > 1. Operations on floating-point values follow standard IEEE-754 semantics [35], though the sizes of the exponent and significand are not limited to those defined in IEEE-754.As with the bitvector logic, there are unary and binary operations on floating-point values (many of these require the specification of one of the five rounding modes) and comparisons that yield booleans.There are also conversions from floating-point to bitvectors, from bitvectors to floating-point values, and between different-size floating-point values.Table 1 gives a full list of QF_BV and QF_FP functions.
Following the work of Kroening and Strichman [21], we summarize Table 1 using the grammar shown in Figure 3.The grammar consists of formulas ( ), bitvector comparisons (), floating-point comparisons (), bitvector values ( ), and floating-point values ( ).Intuitively, formulas are expressions with boolean sort; bitvector and floating-point comparisons are expressions of boolean sort which take bitvector or floating-point expressions, respectively; and values are expressions of bitvector or floating-point sort.

SLOT: SMT-LLVM OPTIMIZING TRANSLATION
This section formalizes the translation described in Section 2 and presents proofs of semantics preservation for bitvectors and floatingpoint values.

Overview
Given an SMT constraint , SLOT translates each operation to an LLVM equivalent, creating an LLVM function .It then invokes the LLVM optimizer, producing an optimized function  ′ .Finally, it translates back into an SMT constraint  ′ .Intuitively, equivalence between  and  ′ means that their sets of satisfying assignments are equal.Equivalence between a constraint and an LLVM function   means that, given a variable assignment, evaluating the constraint produces the same result as executing the LLVM function.Algorithm 1 performs the translation from an SMT-LIB constraint  into an LLVM function .Each function recursively builds the LLVM statements corresponding to an SMT expression.The GetLLOp function represents fetching an LLVM instruction or instructions which have the same effect as the input SMT operation  op .The variables of  are converted to arguments of .
Optimization from  to  ′ is performed by the LLVM optimizer.For black-box style processing of SMT constraints, SLOT uses all the passes included in LLVM's O3 optimization level.However, not Algorithm 1: SLOT frontend translation. )); all optimization passes are relevant to SLOT's translation (for instance, SLOT does not introduce any memory operations).Section 5 discusses in detail which LLVM passes are most important for SLOT.Finally, we translate  ′ back into an SMT constraint  ′ with Algorithm 2. This translation is straightforward; we proceed along the  ′ syntax tree and convert each instruction to its equivalent SMT-LIB function as in Algorithm 2. Because frontend translation, optimization, and backend translation all preserve the semantics of the constraint, we can then use the satisfiability of  ′ as a proxy for the satisfiability of -this property is formalized in Theorem 1.
The key challenge of translation is defining GetLLOp without introducing undefined behavior.Some functions can be translated one-to-one, but bitvector division, shifts, and floating-point comparisons have subtly different semantics in LLVM and SMT-LIB.In addition, some SMT operations have no direct LLVM equivalent and vice versa, requiring their semantics to be built from existing operations in each language.
SMT-LIB supports floating-point values of arbitrary width (and even of arbitrary exponent and significand widths), but LLVM supports only a few fixed floating-point widths.We, therefore, limit our translation to the standard 16-, 32-, 64-, and 128-bit floating-point types with exponent and significand sizes listed above.
is translated to an LLVM function with the following signature: Simple operations.Many operations have the same semantics in LLVM and SMT-LIB; we list these here without detailed proof.SLOT translates these operations by simply applying the equivalent LLVM operation to the same arguments.
• The boolean functions and, or, and xor have the same names and semantics in LLVM and SMT-LIB.(⇒  ) is reduced to (or (not ) ).• Bitvector and floating-point comparisons (including = for bitvectors and fp.eq for floating-point) are equivalent to the LLVM icmp and fcmp instructions with the appropriate condition codes (ordered for floating-point). 1 % zero = icmp eq i %b , 0 2 % div = udiv i %a , % b 3 % out = select i1 % zero , i -1 , i % div (a) LLVM equivalent of (bvudiv   ).
1 % zero = icmp eq i %b , 0 2 % rem = {u, s} rem i %a , % b 3 % out = select i1 % zero , i %a , i % rem (c) LLVM equivalent of (bv{u,s}rem   ).• to_fp on a single bitvector argument is equivalent to the LLVM bitcast instruction.to_fp on a floating-point argument is equivalent to either fpext or fptrunc, depending on the relative widths.to_fp on a rounding mode and a bitvector (i.e., signed numeric conversion to floating-point) has the same semantics as sitofp in LLVM.Similarly, to_fp_unsigned is equivalent to uitofp.• The SMT-LIB conversions fp.to_ubv and fp.to_sbv are equivlent to the LLVM instructions fptoui and fptosi, respectively.
In addition to functions with equivalent LLVM instructions, we express several SMT-LIB functions using LLVM intrinsics.These are common LLVM functions invoked using the call ⟨intrinsic⟩ syntax.In the context of SLOT, there is no cost to using intrinsics instead of instructions, as we do not use the LLVM IR to generate a binary.
• Floating-point fused multiply-add (FMA), square root, and absolute value have the same names and semantics in SMT-LIB and LLVM (as intrinsics).• fp.min and fp.max are equivalent to the llvm.minnumand llvm.maxnumintrinsics, respectively.These match the SMT-LIB semantics in that if one argument is NaN, the other argument is returned.• The SMT-LIB floating-point class predicates like fp.isNaN, fp.isInfinite, etc. are equivalent to the llvm.is.fpclass intrinsic.This intrinsic takes a bitmask representing which classes to check for-each of the SMT-LIB predicates can be represented with the flags.
Division and bit shifting.There are several functions whose SMT-LIB and LLVM versions differ in subtle ways because of undefined behavior.In SMT-LIB, bitvector division by 0 is defined as a fixed value depending on the dividend.In LLVM, it produces a poison value, which is propagated by the optimizer through all subsequent operations.To build an equivalent series of LLVM instructions, we must add a check for this case.
In addition to different handling of division by 0, LLVM and SMT-LIB have different semantics for bitvector shift operations.In LLVM, shifts by the bit width or more have undefined behavior.In SMT-LIB, they always result in a bitvector of all 0s or all 1s (in the case of arithmetic shift right of a negative value).The translations of these three shift operations are shown in Figure 5.
Floating-point equality.LLVM floating-point math does not have undefined behavior as in the integer case, but we must handle two distinct notions of equality: fp.eq and "=".The function fp.eq checks for floating-point equality in the IEEE sense; it has the same semantics as LLVM's fcmp oeq.SMT-LIB "=", on the other hand, is a core theory operation that checks for the equality of two SMT expressions.Unlike IEEE-754, every value must be uniquely represented in SMT-LIB, so there is one NaN "object" which is equal to any other NaN.In all other cases, "=" means bitwise equality-this translation is shown in Figure 6.
Changing bit widths.There are three bitvector operations that change the bit widths of their arguments: concatenation (combining multiple bitvectors), repeat (repeating a single bitvector a constant number of times), and bit extraction.Each of these operations is translated to a sequence of LLVM instructions that simulate their semantics.Suppose we have a bitvector  of length  and want 1 % zsg = zext i1 % sign to i 2 % zex = zext i % exp to i 3 % zsi = zext i % sig to i 4 % ssg = shl i % zsg ,  − 1 5 % sex = shl i % zex ,  6 % right = or i % sex , % zsi 7 % all = or i % ssg , % right 8 % out = bitcast i % all to fptype to extract the bits from  down to  (both inclusive in SMT-LIB).Intuitively, to extract this portion of , we move the bits of interest to the right end of the bitvector (a shift by ), and then truncate to the appropriate size ( −  + 1).
Concatenation involves extending both arguments to the new width, shifting one into the newly-added all-zero bits of the other, and then combining with bitwise or.The SMT-LIB ((_ repeat ) a) operation is translated by chaining multiple concatenations.The number of repetitions is a constant parameter, and is therefore known statically.The translations of concatenation and repetition may add redundant overhead for some inputs, but any such overhead is eliminated during the optimization phase.
Floating-point construction.The SMT-LIB floating-point constructor fp takes a bitvector each for the sign, exponent, and significand and returns a floating-point value.In LLVM, we need to concatenate (shift and bitmask) the three parts and interpret (i.e., bitcast) the result as a floating-point value.This translation is shown in Figure 7.
Rounding modes.SMT-LIB defines five separate floating-point rounding modes: roundNearestTiesToEven, roundNearestTiesToAway, roundTowardPositive, roundTowardNegative, and roundTowardZero.These modes specify the semantics of floating-point operations like addition and subtraction when rounding is required.By default, LLVM floating-point instructions follow round to nearest with ties to even, so SLOT translates SMT function applications with this rounding mode directly to LLVM instructions.For other rounding modes, the tool must generate an LLVM call to a constrained floating-point intrinsic in LLVM, which in most cases allows the specification of rounding mode. is translated to the following: call float @llvm.experimental.constrained.fadd.f32( float %a , float %b , metadata !" round.upward" , metadata !" fpexcept.ignore" ) However, constrained floating-point intrinsics do not exist for all SMT-LIB operations; in these cases, SLOT chooses the correct intrinsic based on rounding mode.For example, SMT-LIB conversion from floating-point to signed bitvector (fp.to_sbv) becomes one of constrained.roundeven,constrained.lround,llvm.ceil,llvm.floor, or fptosi, depending on the rounding mode.Analogous measures are required for fp.to_ubv and fp.roundToIntegral.

Backend translation
Simple operations, variables, and types.During backend translation, the straightforward operations listed in Section 4.2 can be translated just as during frontend translation.Function arguments of  ′ are converted to variables in  ′ with the same names and types.LLVM's optimizer may add or remove intermediate SSA variables in , but only the function arguments are converted to variables in .The optimizer may render one of the function arguments dead; in this case, it is not translated back into a variable.Therefore, the set of variables in  ′ is a subset of the variables in .Bitvector and floating-point types are also translated as during frontend translation.However, LLVM does not distinguish between booleans and 1-wide bitvectors, while SMT-LIB does.In most cases, this distinction is immaterial, and we treat i1 as a boolean, but where the optimizer introduces bitvector operations on an i1 (for instance, sign extension), we convert the argument to a bitvector, rather than a boolean.
Undefined behavior.During frontend translation, great care must be taken not to introduce undefined behavior into .This is because the SMT versions of operations are more strictly defined than those in LLVM; in other words, the SMT bitvector division, for instance, matches the outputs of LLVM division on all inputs, but not the reverse.LLVM optimization does not introduce any undefined behavior, so, during backend translation, we need not insert any checks around operations like division and shifting.There is one LLVM operation that is undefined in SMT-LIB: bitcast conversions from floating-point values to integers.SLOT handles these conversions by introducing an extra integer variable and constraining the result of converting it to a floating-point.

Bit operations intrinsics
Frontend translation produces only those intrinsics listed in Section 4.2.However, the optimizer may introduce other intrinsics which must be handled by backend translation.The llvm.bswap intrinsic swaps the lowest and highest bytes of its input, and is translated to a sequence of extract and concatenate operations representing these semantics.Similarly, the llvm.bitreverseintrinsic reverses all of the bits; this is also achieved by composing extraction and concatenation.Finally, the llvm.ctpopintrinsic counts how many bits are set (have value 1) in a bitvector; this is also achieved through several extractions followed by addition.
Math intrinsics.LLVM includes the intrinsics umin, umax, smin, and smax to take the signed minimum, unsigned maximum, signed minimum, and signed maximum, respectively, of two bitvector arguments.These intrinsics are translated into an SMT-LIB ite operation with the appropriate comparison.For example, a umin call on bitvector arguments  and  is translated to the SMT-LIB expression (ite (bvult a b) a b).
In addition, LLVM includes "saturated" math operations like llvm.usub.sat.These instructions prevent over-and underflow by clamping the return value to 0 if underflow would have occurred.Like the minimum and maximum operations, these intrinsics are translated to a ite expression.Rounding mode intrinsics produced by frontend translation must also be converted back to the corresponding SMT function with the correct rounding mode argument.
This property of the optimizer may not always hold because the optimizer may contain bugs.But in our work, we take it as ground truth that the input and output of the optimizer are equivalent.Because we focus on relatively simple optimizations (e.g., we do not deal with memory operations), compiler bugs changing our results are likely to be rare.In our testing of more than 100,000 SMT benchmarks, we have encountered no compiler bugs.We now prove that SLOT is semantics-preserving.
Lemma 1 (Frontend translation).Let  be an SMT constraint with variables  1 ,  2 , . . .,   , and  be the function produced by the frontend translation of .Then  is satisfiable if and only if there exists an input to  for which  returns true.
Proof.(⇒) Assume  is satisfiable.Then there exists an assignment of the variables of ,  = { 1 ,  2 , . . .,   } for which  evaluates to true.Let  denote the LLVM function resulting from frontend translation.From Section 4.2, at each instruction  in , the value produced by  is the same as the value produced by the corresponding function application in .In particular, with the assignment  , 's outermost function application should produce true.This means that the last instruction in  must return true.
(⇐) Assume that there is an input  = { 1 ,  2 , . . .,   } such that ( ) is true, and consider whether the assignment of the values  to the variables of  satisfies .By the translations in Section 4.2, for each instruction , the equivalent function application in  yields the same value.In particular, we assume that the last instruction in  returns true; this corresponds to the final result of evaluating , so the assignment  must satisfy the constraint .□ Lemma 2 (Backend translation).Let  be an LLVM function over integer (bitvector) types with  arguments, and let  be the SMT constraint resulting from performing backend translation on .Then,  is satisfiable if and only if there exists a set of inputs  1 ,  2 , . . .,   such that ( 1 ,  2 , . . .,   ) returns true.
Proof.(⇒) Assume that there exists an input  1 ,  2 , . . .,   on which  returns true.Let the set of values of the internal SSA variables of  under the given input be  1 ,  2 , . . .,   , and call  = { 1 ,  2 , . . .,   ,  1 ,  2 , . . .,   } the set of all variables from .At each instruction  in , the corresponding function application in  gives the same value as .In particular, because  returns true, the last instruction, and so the result of evaluating , must be true, which means that the assignment  satisfies the constraint .
(⇐) Assume that  is satisfiable.This means that there exists a set of variables  = { 1 ,  2 , . . .,   } for which  produces true.Now take the subset of  , which corresponds to the input variables of .At each instruction in , we know the value must be the same as the corresponding SMT function application.In particular, the outermost function of  corresponds to 's return instruction, so since the assignment  caused  to evaluate to true,  must also return true.
□ Theorem 1 (Preservation of satisfiability).Given an SMT constraint  on floating-points and bitvectors, the new constraint  ′ produced by SLOT is satisfiable if and only if  is satisfiable.
Proof.(⇒) Let  be the LLVM function produced by frontend translation of ,  ′ be the result of optimizing , and  ′ be the result of conducting a backend translation of  ′ .Assume  is satisfiable.Then by Lemma 1, there exists an input on which  returns true.Moreover, from Section 4.2, we know that  contains no undefined behavior.Therefore, by Property 1, there is an input to  ′ such that  ′ also returns true.But then, by Lemma 2,  ′ is satisfiable.
(⇐) Assume that  ′ is satisfiable.Then by Lemma 2, there exists an input on which  ′ returns true.Again, we know that  has no undefined behavior by the lemmas in Section 4.2, so by Property 1, there is an input to  for which  returns true.But then, by Lemma 1,  is satisfiable.□ Theorem 1 means that the sequence of translation, optimization, and translation described in this section produces a new constraint that has the same satisfiability as the original.Moreover, because of the construction of the translation, SLOT also preserves models between  and  ′ .That is, if  is satisfiable, an assignment that satisfies  ′ directly gives an assignment that satisfies the original constraint-we just ignore the extra variables introduced by the translation and optimization process.The theoretical guarantee of Theorem 1 gives us a practical, solver-agnostic tool for preprocessing and optimizing SMT constraints.

EVALUATION
We evaluate SLOT by applying it to the SMT-LIB benchmark suites for the subject theories [3].We highlight our most important results as follows.
• SLOT increases the number of solvable formulas at specified timeouts by up to 24% for bitvector-only benchmarks, 14% for floating-point-only benchmarks, and 80% for mixed benchmarks, allowing the solving of all but one QF_BVFP benchmark within 600 seconds.• On average, SLOT slows down the smallest benchmarks but speeds up the largest benchmarks.Geometric mean speedups are up to 2.8× for bitvector-only benchmarks and over 3× for floating-point benchmarks.• Most of SLOT's speedup is the result of just a few simple LLVM optimization passes like instcombine.Our approach shows which optimizations are "missing" from SMT solvers and allows the effort involved in developing these passes to be instantly available in the SMT context.

Experimental setup
Given a solver, a benchmark, and a timeout  * , we follow a threestep process to test the effectiveness of SLOT.First, we measure how long it takes for the solver to conclude either sat or unsat for the benchmark; call this  pre .Then, we apply SLOT to the benchmark, Table 2: Timeout improvement results produced by SLOT.Each column denotes a different time limit with the total number of original unknown formulas ("Total"), the number improved ("Imp."), and the percentage ("%").The "All" rows denote the number of formulas for which all solvers timed out, but at least one of the solvers produced a solution after SLOT was applied.producing a new SMT constraint; we call the time it takes to do this  SLOT .Finally, we measure how long the solver takes to solve the optimized benchmark,  post .For a fair comparison, we must offset the overhead of running SLOT against the speedup achieved.Thus, a formula has been improved if  SLOT +  post <  pre .This is the SLOT-only result.In addition, we adopt the portfolio methodology [44]; by running SLOT optimization in parallel with a solver, a user can simply take whichever result is produced first.When discussing speedups, we report this (i.e., min{ pre , SLOT +  post }) as the portfolio result.We report all proportional speedups as geometric means, reducing the impact of large outliers.We answer the following research questions: • RQ1: How many more formulas can be solved?Given a time limit, how many formulas from benchmark sets can SLOT convert from unknown to either sat or unsat?• RQ2: How much faster can formulas be solved?What is the proportional speedup produced by SLOT for constraints with low and high original solving times?• RQ3: Which LLVM optimization passes contribute?
Which optimization strategies in LLVM are most effective at simplifying SMT formulas beyond the capabilities of existing solvers?
Implementation.We have implemented SLOT in about 2,500 lines of C++ and made it publicly available on Github. 2 We use Z3's built-in parser for the SMT-LIB language and the standard LLVM C++ API, but provide input and output in the standard SMT-LIB format for use with solvers other than Z3.Frontend and backend translation is carried out as described in Section 4. SLOT has been tested with LLVM version 16.0.0.
As discussed in Section 4, we restrict floating-point variables to the standard 16-, 32-, 64-, and 128-bit widths and exclude any constraints which contain variable rounding modes.These limits are 2 https://github.com/mikekben/SLOT.minimal: only 26 benchmarks from QF_FP (all for variable rounding mode), and 128 QF_BVFP benchmarks (89 for unsupported widths and 39 for variable rounding modes) are excluded, amounting to just 0.2% of all mixed and floating-point benchmarks.
Solvers.We test with the state-of-the-art general SMT solvers Z3 and CVC5 used in prior literature [43,46].For the bitvector-only benchmarks, we also test with Boolector [36], a solver specifically optimized for bitvectors [45].SLOT has been tested with Z3 version 4.12.1,CVC5 version 1.0.5, and Boolector version 3.2.2.
Testing environment.All experiments are performed on a server with two AMD EPYC 7402 CPUs and 512GB RAM, running Ubuntu 20.04.We test with timeouts between 30 and 600 seconds, in line with those used in applications for translation validation [26] (zero to five minutes) and symbolic execution (between five and 129 solver calls within one hour) [12].Finally, when measuring speedups, we count solver and SLOT timeouts as 600-second contributions.
5.2 RQ1: How many more formulas can be solved?
Table 2 shows the number of constraints that are changed from unknown to solved by SLOT for each of the three benchmark sets.The total column denotes the number of unknown benchmarks at each timeout, and the improved column ("Imp.") gives the number of constraints, from those, which can be solved after SLOT is applied.
The results include SLOT's running time, i.e., we report a benchmark as improved only if  post +  SLOT <  * .Since solvers are typically run with a fixed timeout, e.g., during symbolic execution [12], the proportion of constraints that move from timeout to solved at fixed values of  * represents an improvement for users.SLOT is most effective at speeding up constraints with a mix of floating-point and bitvector variables.It allows all but one mixed benchmark to be solved within 600 seconds and reduces the number of unsolvable constraints by about one-third at all time limits.SLOT renders solvable roughly 10% of timeout floating-point constraints, and 15%-20% of bitvector benchmarks.The results are comparable for each of the tested solvers, showing that SLOT's speedup is not solver-specific.
Most importantly, SLOT not only improves each solver's performance but also does better than all solvers combined.The "All" rows in Table 2 show the number of benchmarks for which all of the solvers timed out and the number which became possible to solve with at least one of the solvers.The improvements in these rows show that SLOT outperforms even a portfolio of existing solvers, decreasing the number of unknown constraints by as much as 24% for small bitvector benchmarks.

RQ2: How much faster can formulas be solved?
Figure 8 shows the mean speedups observed for each benchmark set.
Values below one indicate a slowdown.For the smallest benchmarks, SLOT slows down solving, often substantially.However, while the proportional slowdown is large, the absolute slowdown is typically small, and occurs because the overhead of translating outweighs the cost of simply solving the benchmark (i.e.,  SLOT >  pre ).For example, one benchmark 3 with Z3 is sped up from 0.06 seconds to 0.02 seconds (a 3× speedup), but SLOT takes 0.24 seconds to translate and optimize it, creating an overall proportional slowdown.The effect reverses for more complex constraints: for constraints that take longer than 300 seconds, we improve mean solving time by more than 1.25× for floating-point, about 1.6× for mixed, and between 1.6× and 2× for bitvector benchmarks.The portfolio methodology yields even greater running time improvements, in the range of 3× for QF_BVFP and 2× for QF_BV.Even small constraints below 60 seconds of initial running time see appreciable speedups under all solvers with the portfolio method.The difference between the SLOT-only and portfolio results exists because the dramatic speedup of some constraints is offset by a slowing down of others.

RQ3: Which LLVM optimization passes contribute?
To understand why SLOT produces performance improvements, we investigate which LLVM passes contribute most to the underlying results.The structure of SMT constraints means that most LLVM optimization passes are irrelevant to SLOT.Translated SMT constraints differ from most programs in that: • They perform no memory operations-SMT variables are directly translated into LLVM function arguments.• They have only one function; this maintains the equivalence definitions described in Section 4. • The single function has only one basic block (i.e., there is no branching).This is a consequence of the nature of SMT constraints; the only branch-like operation is ite, which is translated to an LLVM select instruction.
The majority of LLVM's 58 optimization passes affect only memory operations (7), are interprocedural (10), or optimize branching (17).An additional 16 passes do not apply to translated SMT constraints for a variety of other reasons (they are architecture-specific, they optimize debug information, etc.).We also exclude bb-vectorize because vectorization introduces substantial translation overhead while providing no benefit in the SMT context.This leaves eight    passes that are relevant to SLOT: instcombine (regular and aggressive), instsimplify, dead code elimination (regular and aggressive), global value numbering, reassociate, and sparse conditional constant propagation (SCCP).SLOT runs these passes in the same order in which they are performed during LLVM O3 optimization.Table 3 shows how many benchmarks each optimization pass affects; a benchmark is counted for a pass if the pass application caused any change to the LLVM IR function.Agressive dead code elimination did not change any constraints, and aggressive instcombine only changed a few; this may be because, for the SMT context, their features are usually handled by the non-aggressive versions.The most effective passes are instcombine, which changes almost every constraint, reassociate, and global value numbering (for QF_BV).Notably, many more passes change bitvector benchmarks than floating-point; this is a combined result of lower structural complexity in the QF_FP and QF_BVFP benchmark sets and greater difficulty in optimizing floating-point operations.
Table 4 shows the mean speedup observed for benchmarks which were and were not affected by each optimization pass for QF_BV with initial timeouts above 30 seconds; the results for floating-point and mixed benchmarks are comparable, but have small sample size.Even passes with negative spread, like dce, are beneficial under portfolio methodology.The results confirm that reassociate and instcombine speed up benchmarks the most, followed by global value numbering and instsimplify.
The instcombine pass consists mostly of simple peephole optimizations, which shows that solver performance could benefit from simple theory-specific reasoning, which is already implemented in compilers.Global value numbering and reassociation also contribute substantially; while some solver heuristics already eliminate common subexpressions, more advanced implementations of these algorithms in LLVM provide further benefits.Our results show that the extensive effort expended to perfect compiler optimizations indeed provides benefits beyond those available in existing SMT solvers.Using SLOT, solver users can benefit from that effort without deep knowledge of solver implementation.

DISCUSSION
Compiler optimization vs. SMT simplification.Because the purpose of compiler optimizations is to reduce the number of processor instructions, some operations which are "simpler" in LLVM may not provide any advantage in SMT.For example, bitvector multiplication by 2  is equivalent to shifting left by ; however, In our implementation of SLOT, we provide a flag to force backend translation to generate multiplication, rather than shift, where possible.However, there may be more complex analogous examples arising from the fundamentally different purpose of LLVM.So while compiler optimization unlocks logic not present in existing SMT solvers, it acts more as a sieve than as a magic bullet.Running SLOT and a solver as a portfolio allows solver heuristics and SLOT each to shine where they perform best, each doing well on some benchmarks but slowing down on others.Only those benchmarks that neither can handle slip through the sieve.
SLOT overhead.The main driver of the proportional slowdown for small constraints is the cost of running SLOT.In contrast to a solver, which just needs to parse the constraint, SLOT must parse, translate, optimize, translate again, and then write.SLOT's runtime is roughly linear in the size of the AST of the original constraint, while SMT solving has unpredictable, possibly exponential performance.This means that the proportion of  SLOT + post contributed by SLOT generally decreases as  pre increases, as shown in Table 5.
The moderate increase for QF_BV benchmarks above 300 seconds is a result of SLOT timeouts; these are counted as 100% contribution in Table 5. Frontend translation makes up about 60% of the running time of SLOT for bitvectors, and about one-third for mixed and floating-point constraints while optimization contributes between 7% and 10%.Backend translation takes longer for constraints including floating-point numbers (about 60%) because floating-point intrinsics require additional steps to be translated back to SMT-LIB.
Other SMT Theories.Program analysis tools make use of just a few SMT theories: bitvectors [26], floating-point [25], and more recently, strings [6].SLOT improves the performance of solvers on the theories relevant to these applications, and we leave to future work the extension of SLOT's general method to other theories like real numbers of use outside software engineering [22].While the optimization process used in SLOT may provide benefits outside bitvectors and floating-point numbers, applying translation and

RELATED WORK
SMT constraint transformation.Existing work presents a number of strategies for simplifying SMT constraints.Dillig et al. [17] introduce a solver-internal constraint simplification algorithm that preserves satisfiability.Reynolds et al. [38,39] introduce simplifying transformations for unbounded string constraints.Transformations of SMT formulas are also employed to test solvers.StringFuzz [9] focuses on generating well-formed formulas, but also provides some transformations on string constraints.STORM [28] transforms boolean formulas to perform black-box testing.Bugariu et al. [11] introduce constant assignment and term synthesis as transformations for string constraints.Sparrow [46] and YinYang [43] expand transformations to real numbers and integers.The transformations performed by these tools are designed to test solvers; our approach uses a related method to speed up solving instead.
Speeding up SMT solving.Most work on improving SMT solver performance focuses on algorithms to be implemented within a solver.In addition to Z3, CVC5, and Boolector, such work has taken the form of new solvers like MathSAT [14], Bitwuzla [32], and Yices [19].Early work on improving solver performance used symmetry to reduce the constraint-solving search space [16].More recently, Niemetz et al. introduced syntax-guided quantifier instantiation to speed up solving for quantified constraints [33].For particular theories, Z3str3 speeds up solving of string constraints [5], and Berzish et al. introduce new methods for solving constraints involving regular expressions [6].Sadhak [30] combines CVC4 with fuzzing techniques for uninterpreted functions to improve performance.FastSMT [1] uses a neural network to find better ways to combine existing solver heuristics, thereby speeding up solving.MBA-Solver [45] departs from solver-specific approaches by preprocessing bitvector constraints involving alternating bitwise and arithmetic operations.Our approach is most similar to MBA-Solver, as SLOT uses pre-processing rather than solver-internal improvements.However, it differs in that we apply the broad range of optimizations performed by LLVM, including floating-point transformations.We harness an existing source of optimizations as a black box rather than hand-crafting one for the SMT problem.
The constraint-code nexus.Work on symbolic execution and translation validation has used SMT constraints to represent the semantics of LLVM programs.KLEE [12] converts LLVM IR programs into SMT formulas that encode symbolic execution constraints; many symbolic execution tools are built on the LLVM-SMT core provided by KLEE [37].Alive and its progeny [26,27] generate SMT constraints from LLVM instructions to verify the optimizations performed by the LLVM optimizer, which Lee et al. [23] expands to LLVM's memory model.LifeJacket [34] and Alive-FP [29] use SMT formulas to verify floating-point computation.VeRA [10] also translates C++ code to SMT constraints for program verification, and faces some engineering challenges analogous to SLOT.Constraints in the formal refinement-based B-method have also been translated to SMT-LIB [20,40].These approaches use SMT solvers to reason about programs and compilers.SLOT does the opposite, using a compiler for reasoning about SMT problems and exactly preserving constraint semantics instead of solving analysis constraints.
Optimizations outside compilers.Dong et al. [18] apply compiler optimizations directly to programs which serve as inputs for KLEE.They find that those optimizations can slow down symbolic execution because the optimizer complicates branching structure.LEO [13] attempts to remedy this limitation by using machine learning to choose which optimization passes to apply.SLOT's results differ because it operates at the level of constraints, not programs.This allows SLOT to work in contexts outside symbolic execution, and also to avoid analysis-frustrating branching optimizations.
Declarative and imperative code.Existing work has explored conversion from declarative to imperative languages to allow platform flexibility [41] or give access to greater optimization opportunities [42].More recently, Li and Slind [24] convert functions in higher-order logic to a simplified intermediate representation.
Steno [31] translates declarative queries into imperative code to speed up operations over collections.SLOT, on the other hand, goes beyond translation to an imperative language by adding backend translation and achieves a simplification of the declarative constraints, rather than transforming them into executable code.

CONCLUSION
This paper has presented a general pre-processing tool, SLOT, which allows solver users to apply compiler optimizations to SMT constraints as a black box.SLOT practically improves solvers' performance on standard benchmarks and increases the number of solvable constraints at fixed time limits.Furthermore, the speedup is achieved using only the simplest compiler optimization passes, giving solver developers insight into possible improvements to solver tactics.

Figure 1 :
Figure 1: Overview of SLOT's translation and optimization process.The output constraint (SMT') is satisfiable if and only if the original constraint (SMT) is satisfiable.

Figure 2 :
Figure 2: SLOT translation and optimization process.

Figure 4 :
Figure 4: LLVM equivalents of SMT-LIB division and remainder.

Figure 6 :
Figure 6: LLVM equivalent of (=  ) for floating-point values.fp indicates half, float, double, or fp128 with width .The constant i32 3 indicates a check for NaN.

Figure 7 :
Figure7: LLVM equivalent of (fp sign exp sig). is the width of the floating-point values,  is the width of the exponent, and  is the width of the significand.fptype may be any of half, float, or double.

Example 2 .
The SMT operation with 32-bit floating-point variables  and  ( fp .add roundTowardPositive a b )

Figure 8 :
Figure 8: Geometric mean speedup from original constraint to optimized constraint produced by SLOT for each benchmark set under Z3, CVC5, and Boolector (for QF_BV).Constraints are grouped into ranges of original solving time along the x-axis.All measurements include  SLOT .

Table 1 :
List of functions in the bitvector and floating-point theories by type.We abbreviate bitvectors BV, floating-point values FP, and rounding modes RM.  represents any type."*" indicates a function parameterized by integer constants, " †" indicates a function that changes bit widths.
Figure3: The grammar of constraints in the QF_BV and QF_BVFP logics.bvc is any of the bitvector comparisons from Table1.fpc means any of the floating-point comparisons, and class means any of the floating-point class operations.bvop1 and bvop2 mean any of the unary and binary bitvector operations, respectively, and the same for fpop1 and fpop2.

Table 3 :
Percentage of benchmarks affected by each optimization pass.

Table 4 :
Mean speedups for benchmarks which are and are not affected by each pass from the QF_BV benchmark set with initial solving time above 30 seconds under Z3.The spread is the difference in mean speedup between benchmarks which are affected by the pass and those which are not.Z3 takes much longer to solve constraints involving shifts.On one benchmark4for example, Z3 takes less than a second if doubling is expressed as  +  or 2 × , but does not finish within 24 hours if it is expressed as shift left by one.

Table 5 :
SLOT/ SLOT + post (as a percentage) for bitvector and floating-point benchmarks.