On Optimal QUBO Encoding of Boolean Logic, (Max-)3-SAT and (Max-)k-SAT with Integer Programming

We present an asymptotic improvement in the number of variables (n + m⌊log 2(k − 1)⌋) required for state-of-the-art formulation of (max-)k-SAT problems when encoded as a quadratic unconstrained binary optimization (QUBO) problem. We further show a variable reduction technique (max-)3-SAT formula which achieves the optimal number of substitution variables. We show optimality empirically by presenting an integer linear programming (ILP) construction for arbitrary Boolean formulas. We show how various goals can be encoded, and this model can be generalized to searching for arbitrary substitution variable functions. Lastly, we show the optimal high-order substitution reduction in cubic QUBO equations which has smaller coefficients than the typical construction used.


INTRODUCTION
The problem of polynomial-time reductions between domains of NP-complete problems is well studied.Less well-known are the maximally efficient reductions which minimize the polynomial complexity as well as the constant factor in such reductions.Additionally, parsimonious reductions are also less commonly known where the solution space is bijective between an NP-complete problem and its reduced counterpart.We investigate the use of integer linear programming (ILP) to make the possibility of both parsimonious and maximally efficient encoding possible specifically when converting from a Boolean logic formula to a QUBO.1: Nomenclature used to describe Boolean logic satisfiability problem and the Ising model to which QUBO directly translates are both NP-complete problems.
For -SAT problems, conjunctive normal form (CNF) is typically used so we will analyze this case specifically.However, we do not require CNF but algebraic normal form (ANF), or any form as conjunctions of Boolean functions of  variables  which represent literals  ∈ {, ¬ }.Then a (max-)-SAT problem with  variables and  clauses is conjunctive function The nomenclature used in this paper in terms of logic names, symbols and their CNF form equivalents are listed in Table 1.This assigns a symbol to all ten non-trivial Boolean logic operators as well as explains its relation to CNF form.
A QUBO formulation of a problem seeks to optimize the energy of a binary vector  = {0, 1}  subject to a symmetric matrix  ∈ R  × , a coupling matrix such that we evaluate the function minimize  () = , ∈ [1.. ]    ,   .
During constructions of such QUBO problems, a carefully chosen encoding can allow the minimum to be known in advance such that termination can be completed immediately when the solution is found.We define variables as of two types in the context of encoding to QUBO: original variables and substitution variables.Substitution variables represent an arbitrary Boolean function of some of the original variables.We further subdivide substitution variables into two classes: weak substitution variables and strong substitution variables.A weak substitution variable is used exclusively in the context of substitutions, and can allow a relaxed constraint where substitution failure is not penalized.A strong substitution variable on the other hand, is penalized in all cases where it is incorrect.This is similar conceptually to assignment versus equivalence.By making each term of a conjunctive expression have zero energy, the global minimum if satisfiable will occur at zero.Non-zero energy can loosely correlate to the number of unsatisfied clauses, though it will also include strong substitution failures that may have occurred as well as any possible amount of weak substitutions.
An Integer Linear Programming (ILP) solver takes a problem

Background
The study of 3-SAT problems specifically was reduced to the maximum independent set problem by Choi [2] and also appearing in the comprehensive NP-problem to QUBO transformations of Lucas [5] requiring 3 QUBO variables.Chancellor improved upon this to  +  QUBO variables in [1].We extend upon the work of Nüßlein [9] who made observations in their construction that a common substitution term was present and noted the need for mathematical solutions for optimal encoding.We further extend upon Nüßlein's work on (max-)-SAT in [8] which required  () substitution variables where  () = ⌊/3⌋  ≤ 3 ⌈log 2 ( + 1)⌉ +  (⌈log 2 ( + 1)⌉) otherwise .

ILP PROCEDURE AND RESULTING FORMULAE AND ALGORITHMS
We define our goal as follows: first, we want to minimize the total number of substitution variables   in the QUBO.Secondly, we want to minimize the number of non-zero coefficients.Thirdly, we want to minimize the values of the non-zero coefficients.
We can set a constant maximum and minimum boundary  = {}  ,  = {−}  for all coefficients except the constant which should be in interval [0..2].A reasonable choice for  is ()( + 1) since this represents all pairs of original variables linear and quadratic terms.A more relaxed choice of  would be 2 +  .So  = 12 will be sufficient for three variable Boolean functions regardless of substitutions.
For a Boolean function  and an arbitrary set of substitution variables  = { 1 ..   } based on function   =   ( ) we consider a generic constraint function: ,  , ∪ {1}.Then we construct our equalities as .This is as strict as we require since when   = 0 the goal is to achieve zero energy.Then we can continue by adding inequality constraints to make unsatisfied sub-formula be any value greater than or equal to 1.So relaxing this case when   ≠ 0, we have This is not yet sufficient, as bad substitutions must also not allow for the energy in the QUBO to go below .So we add the following additional inequality constraints −  .This provides sufficient constraints to get valid solutions.However, if the intention is a parsimonious reduction, then we can merely strengthen the inequality by changing −  to −  − 1 only if   = 0.This preserves the solutions by making substitution failures not coincide with them.
The first goal can be realized by solving the ILP without any substitution variables introduced making  = 0 and not adding bad substitution constraints.After that confirms no solution, then all ten various Boolean operators can be tried in succession to see which of them best meets the other goals.
To realize the other two goals, we require some mathematical transformation as the third goal requires taking an absolute value.For this, we can introduce an auxiliary variable  into the ILP. to the minimization goal function.The worst case contribution of the tertiary goal   is   hence this is a sufficient way to prioritize the secondary goal.By combining all of this together for a total of 3  + 1 variables to solve (padding  appropriately with zeros elsewhere), we have an ILP system that satisfies all goals, and a procedure that allows for maximally efficient QUBO encoding of an arbitrary three variable Boolean formula.We used the ILP solver from scipy library for Python via its scipy.optimize.linprogfunction to correctly solve this system based on arbitrary functions and zero or more substitution functions.From a practical standpoint, this construction will scale well to functions of 16 variables giving results within seconds.
The freedom in the choice of substitution functions will become a problem when trying to scale the model.The obvious way to generalize this is by turning the substitution functions into variables.This will lead to 2    Boolean decision variables however and make solving the problem much harder.Some immediate problems further are introduced as quadratic terms appear where substitution variables are multiplied by coefficients.Furthermore there are cubic terms, where substitution variables are multiplied by each other and a coefficient.The convenience of Boolean variables allows for reduction back to a linear system.Notice that a substitution variable  , =     can be constrained via the following linear constraints:  , ≤   ,  , ≤   ,  , ≥   +   − 1.These are the well-known Fortet inequalities [3] representing a Boolean AND substitution.This leaves only the quadratic cases but as these are now also exclusively multiplied by one Boolean variable, we can use the following linear constraints for  , =     when   ∈ [−..]: − , −  ≤ 0, − , +  +  ≤ ,  , −  ≤ 0,  , +  −  ≤ .These are also well-known as the McCormick inequalities [6] which are exact in the case of multiplication by a Boolean variable.These two techniques are sufficient to generalize the ILP model 2: Optimally sparse (Max-)3-SAT QUBO encoding formulae to a search for substitution functions.We note that this technique scales well to handle only some specific extra cases that would be slow to exhaustively search with the simpler model: 4 variable Boolean functions with one substitution variable, 5 variable Boolean functions one or two substitution variables, 6 variable Boolean functions with one substitution variable.By 6 variable Boolean functions requiring two substitutions, the search space becomes very large.Simple techniques to reduce it like enforcing an order on substitution functions do not provide an ample reduction.Certain cases beyond the practical range may still be possible due to having an abundance of solutions, symmetry or other properties which the heuristics in modern solvers can take advantage of.We give a complete example, abbreviating the  and  substitutions and constraints to only the first one (corresponding to function index 150 in Table 4), and with an extension to minimize unnecessary substitutions variables via   checking if any the  { } containing any  whose label contains  in Figure 1.
We first note that most Boolean operator combinations of substitutions are universal.The ILP solver will find a formula for every 3-variable Boolean function regardless of choice, though the sparseness will vary.The exceptions are XOR (inequality) and XNOR (equality) which are never necessary over all 256 possible formulae.Of the possibilities, 226 of them can be done without a substitution.This is a surprising fact as it indicates that only 30 of the 3 variable formulae actually require substitutions.This also gives indication that formulae with more variables may more often than not have representations with fewer than expected or no substitutions.All of the substitution functions in these cases merely require a function of two variables, though three is also possible.
We analyze the 3-SAT optimal encoding from two perspectives.In the first, we look at all maximally sparse encoding values with any substitution function.Next we look at the most sparse encoding across an a fixed substitution function.In both cases, we consider all of the universal binary substitutions.
We specifically compare to the prior paper where for 0, 1, 2 and 3 negated literals, they require 6, 6, 6 and 10 non-zero values respectively as seen in Table 2. Our encoding values reduce this to 5, 6, 5 and 5 respectively with smaller coefficients.So interestingly, with 1 negated literal, there is no further reduction in couplings available.
More importantly, to reduce the number of overall variables in a QUBO, which is in general more beneficial than reducing the couplings as it ties directly to the time complexity, we analyze the 3: Average (for random formula) optimal sparse (Max-)3-SAT QUBO encoding formulae for substitution using NIM-PLY best choice for an unknown clause grouping in terms of maximizing sparsity.In practice, this could be optimally computed on-the-fly as terms can cancel out in the sums, but what is clear is that a single substitution Boolean operator should be chosen.We note that if we consider all 8 possibilities of a random Boolean formula, that AND, NAND, OR, NOR, NIMPLY, IMPLY, CNIMPLY, CIMPLY have total zero values of 30, 14, 30, 26, 32, 23, 32 and 23 respectively.Therefore a reasonable suggestion is to use the negative implication operator or it's converse and this result is found in Table 3.The primary importance is that any of these are sufficient for variable reduction and having all the best ILP formulations for each Boolean operator pre-computed can allow for correct sums to be checked for zero values which will not incur significant computation expense, merely vector additions, which is the method we have followed.
The theoretical maximum number of variables remains  +  with this new encoding.In random 3-SAT formulae, depending on the clause to variable ratio, it might not make a large difference.However, in real world 3-SAT problems, clauses sharing 2 literals are in abundance.For simplicity, we consider where the number of clauses is around 4 making 5 QUBO variables.With our simplifications, this can likely reduce to around 2 assuming there are 4 shared literal pairs between clauses on average.To find the optimal clause groups, a greedy algorithm can make a dictionary of all literal pairs and remove the literal pair with the maximum count and all clauses associated with it are added to a group.This simple procedure is repeated until no clauses remain.This ensures a minimum number of variables (although not guaranteed to be the most sparse which would be harder to account for).The ILP construction trivially as a calibration test can confirm the optimal 2-SAT formula as from [4].
Another satisfiability problem is the circuit satisfiability problem (Circuit-SAT).We outline an efficient algorithm to convert any Circuit-SAT problem to a QUBO.If the Circuit-SAT is expressed in terms of any Boolean operators other than XOR and XNOR, then no QUBO substitution variables will be needed.Otherwise one 0 0 1 0 0 0 0 0 0 0 1 0 ... 0 ... 0 0 1 0 0 0 0 0 1 0 1 1 0 ... 0 ... 0 1 0 0 0 0 0 1 0 0 1 1 0 ... 0 ...  substitution QUBO variable corresponding to every unique XOR and XNOR clause in the formula is needed.To do this, we say a Circuit-SAT equation is well-formed if it does not contain any double negations, which are trivial to remove by a simple tree traversal or by construction.A well-formed Circuit-SAT problem can be converted to 3-SAT CNF form via the method of Tseitin transformation [10].In Tseitin form, all clauses will be of trivial form (single literals, or two-literal clauses) or a three literal clause.Regardless of operator choice of AND, OR, NAND, NOR, IMPLY, NIMPLY, CIMPLY or CNIMPLY, the equivalent QUBO encoding will not require any substitutions.Only those containing XOR and XNOR clauses will require substitutions.The possibility of common literal pair simplifications also remains.Although Tseitin transformation does not guarantee a simplified formula, various SAT solving methods can be used to reduce the formula further.The Conflict-Driven Clause Learning (CDCL) is of the most powerful techniques to achieve simplification.As the goal is simplification, not solving, a depth limit in CDCL can be applied which avoids exponential complexity.We hereby show that the optimal solution for (max-)-SAT which requires considering clauses separately is  + ⌊log 2 ( − 1)⌋.We will consider a reduction for the decision problem of an entire Boolean formula below.This is an asymptotic improvement over the previous known results.The intriguing aspect is a new idea in how to use substitution variables to do counting.Rather than the typical idea of representing a count as a sum of powers of two, we consider the substitution variable to directly encode various count groupings.
We define a Boolean counting function  ( 0 , ...  ) =  0 + ...  ∈  where  ⊆ {0.. + 1} is the set of counted values.In the previous approach which required an iterated logarithm amount of variables, the  sets chosen would correspond to the power of two ranges 2  ..2 +1 .However, as it turns out, this is not the optimal method of counting when converting to quadratic form.Rather the counting can be done in a manner that partitions the function output, noting that a clause in CNF form is only false for a single input value.This was deduced by analyzing the results of the ILP solver on cases of 5 to 9 variables disjunctive clauses which showed the optimal formulations.We note that the single false output of the CNF formula is not of importance and will not affect the substitutions at all whether it is counted or not.In addition, the construction relies on the fact that the any mixing of variables or substitution variables can always solve one or two bit counts due to the quadratic nature of the QUBO encoding.This means that the two highest counted values { − 1, } Index Min.Form.4: Optimal sparse QUBO encoding of all Boolean functions 3 variable requiring a substitution will not be required at all and solved through the existing variables.The number of substitution variables required is   = ⌈log 2 ⌉ − 1.With each substitution variable accounting for 2 more counts, as well as their quadratic mixing, and the fact that overlap across more than 2 substitution variables can also cover further variables, it is clear that up to 2   −1 count values are covered.Including the other values mentioned, we have 2   − 1 + 3 = 2   +1 which is precisely the definition above.
Each substitution variable   ∈ {0, 1}∀ ∈ [1..  ] is represented by the  function with its counting sets as   ∀ ∈ [1..  ].Decomposing all   by repeated intersection and differences, must yield all decomposed subsets with length one or two.We now explain a general method to actually construct these sets.
We wish to construct a QUBO formula (− + .Note the careful choice of  = [(− + 1), 1, ] as coefficients of the isolated and quadratic variables as well as for the constant term.This corresponds to the highest two counts { −1, } having a zero result and the constant takes care of the falsity case for the clause.Overlapping between counting sets is accounted for by making coefficients of quadratic substitution terms as the product of their variable substitution terms.So for each substitution variable, we must only determine its quadratic and isolated coefficients.We note another property that each covering set can have at least one unique value.This makes determination of their isolated coefficients merely evaluating the equation with that unique value, making all other substitution terms zero.
To determine if a covering set is valid, the equation when evaluated with all 2   possible combinations of substitution variables for each count must contain at least one zero value.If there are multiple zero values, any of them can be chosen.So many possible valid covering sets are possible, but our algorithm is designed to generate one in every case efficiently.We present a general method in Algorithm 1 running in a very reasonable O (2   ) to determine both the coefficients and their respective covering sets, and by using the ILP solver with the covering sets, the coefficients in some cases were further reduced.This is a self-validating algorithm which was empirically verified for  ∈ [3..1024  .]into the generated formula, and using the same for the iscount function, it is easy to adjust the constant term, the signs of coefficients, and the substitution variable coefficients.This strategy being independent of the literals sign, if potentially used with an arbitrary Boolean formula thereby can compute solutions for entire classes of formula, as is the case here.We present the negation formula based on this approach for 4-SAT in Table 6.
Another common QUBO encoding which is often in a sub-optimal form is a simple substitution that can convert cubic terms to quadratic (or quartic to cubic, etc).By using our ILP solver and changing to equalities and reversing the inequality or effectively by multiplying −1, we obtain a substitution formula based on if the current coefficient of the term is positive of negative as in Table 7.
We can generally conjecture that for any computable function of  Boolean variables, there exists an optimal QUBO encoding with  + log 2  variables.This would mean the theoretical optimum implies that any -SAT problem is not dependent on  or the number of clauses but only the number of variables.However, determining the correct substitutions and coefficients is not a simple problem in the general case, especially where integer coefficients are desired.The other caveat, is that the coefficients will scale with order of 2  making it so subdividing a formula to limit the coefficients may be necessary in practical applications, a strategy commonly used.

EXPERIMENTAL RESULTS
To validate the result experimentally, we considered random 3 and 8-SATs with a 4.26 clause to variable ratio per We compare the of our 3-SAT encoding for random max-3/8-SAT problems versus the older Nüßlein methods [9] [8] using TABU search-based QBSolv and the D-Wave Quantum Annealer (QA) on an Advantage System 6.2 with 5614 working qubits through the D-Wave Leap interface using the Pegasus embedding.All results represent the average for a given number of SAT variables over 20 iterations with a chain strength as the maximum absolute value of the QUBO coefficients.We use only the best result of the QA over 100 reads.QBSolv uses 1000 iterations as it is not a limited resource like the QA.For context, the QA at full capacity can solve up to 136 variable random max-3-SAT before no embedding can be found.Finding an optimal embedding is a difficult problem and we used the minorminer library provided by D-Wave but better approaches are possible such as in [11].
In Figure 2, we compare the number of QUBO variables and non-zero couplings requirements.Figure 3 shows physical qubit counts and the results of the actual experiments.As seen, the number of variables does correlate to the optimality of the solution as the underlying problem hardness remains the same, but it very significantly reduces the computational demand of a hardware solver and increases the problem size which can fit on the QA.
We provide the reference code for all the constructions, experiments and generated tables in a publicly accessible repository at https://github.com/GregoryMorse/ILPSatQubo.

CONCLUSION
Using ILPs to find optimal formulations of QUBO problems represents a powerful technique in at least solving common small cases of encoding including an arbitrary 3, 4 or 5 variable Boolean function, as well as looking for general patterns in specific function families such as we have done with the disjunctive clauses of a (max-)-SAT CNF formula.
However, further research is needed to find ways to more easily determine the substitution functions without running into an exponential number of Boolean variables (2 2    ), the burden of which is placed upon the ILP solver.Given the covering sets and partitioning ideas from the success in (max-)-SAT, alternative ways of achieving this are of great interest for future research.

Figure 2 :
Figure 2: Comparison of number of variables, non-zero couplings in QUBO Encoding

Figure 3 :
Figure 3: Number of physical qubits required on D-Wave QA and comparison of results from QBSolv and QA for selected sizes The Boolean ( ∧ ) ∨ (¬ ∧ ) Table To add the mathematical relationship   = |  |, the simple inequality constraints −  ≤   ≤   .This has indirectly helped solve the sec-Therefore we set   <=   .These inequalities are sufficient if we adjust   = {1} and   = {  },   =