Optimal Polynomial-Time Compression for Boolean Max CSP

In the Boolean maximum constraint satisfaction problem—Max CSPΓ—one is given a collection of weighted applications of constraints from a finite constraint language Γ, over a common set of variables, and the goal is to assign Boolean values to the variables so that the total weight of satisfied constraints is maximized. There exists a concise dichotomy theorem providing a criterion on Γ for the problem to be polynomial-time solvable and stating that otherwise, it becomes NP-hard. We study the NP-hard cases through the lens of kernelization and provide a complete characterization of Max CSPΓ with respect to the optimal compression size. Namely, we prove that Max CSPΓ parameterized by the number of variables n is either polynomial-time solvable, or there exists an integer d ≥ 2 depending on Γ, such that: (1) An instance of Max CSPΓ can be compressed into an equivalent instance with 𝒪(nd log n) bits in polynomial time, (2) Max CSPΓ does not admit such a compression to 𝒪(nd-ε) bits unless NP ⊆ co-NP / poly. Our reductions are based on interpreting constraints as multilinear polynomials combined with the framework of “constraint implementations”, formerly used in the context of APX-hardness. As another application of our reductions, we reveal tight connections between optimal running times for solving Max CSPΓ. More precisely, we show that obtaining a running time of the form 𝒪(2(1-ε)n) for particular classes of Max CSPs is as hard as breaching this barrier for Max d-SAT for some d.


Introduction
Background and motivation The framework of constraint satisfaction problems (CSPs) allows the computational complexity of a large class of problems to be studied through a common lens [10].A typical instance of such a problem asks whether it is possible to assign each of the variables x 1 , . . ., x n a value from a finite domain D, such that a given list of constraint applications is satisfied.A constraint is applied to a fixed number of variables, and indicates which combinations of values are legal.In the Max CSP problem, the goal is to maximize the number of satisfied constraints.See Section 2 for formal definitions.
The investigation of CSPs has led to deep theorems characterizing the complexity of a CSP based on the type of constraints allowed in the instance [6,22].For example, the long-awaited CSP dichotomy theorem [5,32] provides a criterion separating the NP-complete from the polynomialtime solvable CSPs; the work of Khanna, Sudan, Trevisan, and Williamson characterizes how well the maximization version of a Boolean CSP can be approximated [19] (see [12,18] for larger domains; see [11,25] for optimal approximation factors); and Cai and Chen [7] present a dichotomy that separates CSPs for which the number of complex-weighted solutions can be counted in polynomial time, from those where the problem is #P-hard.
In this work we analyze the complexity of constraint satisfaction in an algorithmic regime that is currently far from understood: polynomial-time compression and kernelization [14].Here, the goal is to analyze how much (in terms of the number of variables n) an instance can be compressed by a polynomial-time algorithm without changing the answer, and to understand how the compressibility depends on the type of available constraints.A compression is a polynomial-time algorithm that reduces instances of one problem to equivalent, small instances of a potentially different problem; a kernelization compresses to an instance of the same problem (see Section 2.4).A kernelization of small size allows an instance to be stored, manipulated, and solved more efficiently.It is therefore of interest to find the smallest possible kernelizations.Since every kernelization yields a compression, one can prove lower bounds on the size of kernelizations by establishing lower bounds on compressions.
In recent years, there have been a number of advances in the understanding of compressibility of CSPs [8,13,17,23].A foundational result by Dell and van Melkebeek [13] states that for d ≥ 3, CNF-SAT with clauses of size at most d (d-CNF-SAT) parameterized by the number of variables n admits no (polynomial-time) compression of size O(n d−ε ) for any ε > 0, unless NP ⊆ co-NP/poly.As an instance of d-CNF-SAT can trivially be compressed to O(n d ) bits via a bitstring that encodes for each of the O(n d ) possible clauses whether or not it is present in the instance, the d-CNF-SAT problem does not admit any non-trivial compression.The situation is different for the related problem d-Not-All-Equal SAT (d-NAE-SAT), which is the variant where a clause is satisfied when its literals do not all evaluate to the same value.Jansen and Pieterse showed [16,17] that for d ≥ 3, the d-NAE-SAT problem has a compression of size O(n d−1 log n), but not of size O(n d−1−ε ) unless NP ⊆ co-NP/poly.This example shows that the type of constraints affects the compressibility of a CSP.
The notion of a constraint language is used to rigorously analyze how the complexity of a CSP depends on the type of constraints.In this work, we will only consider CSPs over the Boolean domain: we work exclusively with Boolean constraints and constraint languages.a constraint is therefore a function of the form f : {0, 1} k → {0, 1}, where k ≥ 1 is the arity of the constraint, also denoted as ar(f ).A constraint language Γ is a finite set of constraints.The input of the corresponding decision problem, denoted CSP(Γ), consists of a set of constraint applications of the form f (x j 1 , . . ., x j ar(f ) ) = 1 over n common variables, where f is some constraint from Γ.The question is whether there is an assignment {x 1 , . . ., x n } → {0, 1} satisfying all the constraint applications.
In this terminology, Chen, Jansen, and Pieterse [8] characterized for all (Boolean) constraint languages Γ consisting of constraints of arity at most three, what the optimal compression size is for CSP(Γ).Lagerkvist and Wahlström [23] gave universal-algebraic conditions on Γ which ensure that CSP(Γ) has a compression of size O(n log n), and a characterization is known of the constraint languages Γ sym consisting entirely of symmetric functions for which CSP(Γ sym ) has a compression of near-linear size [8, §5].Hence there is some understanding of the optimal compressibility of CSP(Γ).
However, when we move from the question of whether all constraints can be satisfied to the task of maximizing the number of satisfied constraints (Max CSP), the situation is much less understood.To the best of our knowledge, no non-trivial compressions are known for any Max CSP(Γ), and no compression lower bounds are known for Max CSP(Γ) other than those already implied from CSP(Γ).In this paper, we therefore analyze the compressibility of Max CSP(Γ).
Before presenting our results, we briefly summarize the main algorithmic approach for compressing CSP(Γ) and illustrate why it fails completely for Max CSP.Consider for example 3-NAE-SAT.The number of constraint applications in an n-variable instance of this problem can be reduced to O(n 2 ) without changing the solution space, which allows it to be encoded in O(n 2 log n) bits.The sparsification to O(n 2 ) constraint applications is achieved by a linear-algebraic approach.Note that a not-all-equal constraint on variables (x, y, z) ∈ {0, 1} 3 is satisfied if and only if x+y +z −xy − xz −yz −1 = 0. Observe that if p 1 (x 1 , . . ., x n ) = 0, . . ., p m (x 1 , . . ., x n ) = 0 are polynomial equalities which are satisfied by an assignment to x 1 , . . ., x n , then also m i=1 α i • p i (x 1 , . . ., x n ) = 0 holds for any linear combination as determined by α 1 , . . ., α m .To sparsify a 3-NAE-SAT instance with this insight, proceed as follows.Transform each constraint c i into an equality p i (x 1 , . . ., x n ) = 0 for a degree-2 polynomial p i , substituting 1 − v for negated variables ¬v in the constraint.This yields a system of equations of degree-2 polynomials in n variables, which have O(n 2 ) distinct monomials.The rank of a corresponding vector space is therefore O(n 2 ), which yields a basis of O(n 2 ) equalities such that all others can be expressed as their linear combinations.All constraints not corresponding to an element of this basis can be safely omitted from an instance of 3-NAE-SAT, since they will be automatically satisfied by any assignment that satisfies all basis constraints.This yields the claimed sparsification of O(n 2 ) constraints.Note, however, that this approach fails completely for the variant Max 3-NAE-SAT: if an assignment does not satisfy all constraints of the basis, this does not give any satisfaction guarantees on the linearly-dependent constraints.Hence the sparsification approach for CSP(Γ) is not applicable for Max CSP(Γ), and a priori it is not clear whether any Max CSP(Γ) problem admits a non-trivial polynomial-time compression.

Our results
We provide a new route to compression for Max CSP(Γ), and prove that the resulting compressions are essentially optimal for all constraint languages Γ, assuming NP ⊆ co-NP/poly.Our results characterize the optimal compressibility of all Boolean Max CSPs in terms of degrees of characteristic polynomials, and uncover a wide range of Max CSP(Γ) problems that admit a non-trivial compression.For a Boolean function f : {0, 1} k → {0, 1}, its characteristic polynomial is the unique k-variate multilinear polynomial P f (x) over R that agrees with f on all x ∈ {0, 1} k .The fact that this representation is unique is well-known (cf.[27]).For a constraint language Γ, define deg(Γ) = max f ∈Γ deg(P f ).We prove that deg(Γ) characterizes the compressibility of Max CSP(Γ).
To state our results precisely, we have to address a feature of the problem that is particular to the maximization variant: repetitions of constraint applications.While such repetitions are irrelevant in the CSP setting when all constraint applications have to be satisfied, they become relevant when maximizing the number of satisfied constraint applications.The standard approach in the Max CSP literature is therefore to give each constraint application a positive integer weight value [10,19].The decision problem Max CSP(Γ) then takes as input a system of Γ-constraint applications with weights from N, and a threshold value t, and asks whether there is an assignment such that the weight of the satisfied constraint applications is at least t.
Let Γ be a (finite, Boolean) constraint language.Our main positive result is the following.
Theorem 1.1.Max CSP(Γ) parameterized by the number of variables n, with positive integer weights bounded by n O (1) , admits a compression of size O n deg(Γ) log n .
In fact, we are even able to reduce any instance of Max CSP(Γ) to an equivalent instance of the same problem, having O n deg(Γ) weighted constraint applications.We prove matching lower bounds whenever Max CSP(Γ) is NP-complete.It is known [9,10,19] that for inputs with positive integer weights, Max CSP(Γ) is polynomial-time solvable if Γ is 0-valid, 1-valid, or 2-monotone (see Section 2.1), and NP-complete otherwise.
Our results uncover an interesting contrast in compressibility between decision CSPs and maximization CSPs.While both involve the analysis of the degrees of polynomials, the type of polynomials which is used differs, leading to differences in compressibility.The linear-algebraic approach to sparsify CSP(Γ) yields a compression of size O(n d log n) when for each constraint f ∈ Γ, for each assignment u ∈ {0, 1} ar(f ) for which f (u) = 0, there exists a ring R and a degree-d polynomial over R for which P (u) = 0 and P (x) = 0 for all x ∈ {0, 1} ar(f ) with f (x) = 1.Different rings can be used for different constraints, and all that matters is that the polynomial for u is nonzero on u but zero on all satisfying assignments.In contrast, for Max CSP we can only use polynomials over R (or, equivalently, Q), and must ensure that the value of the polynomial coincides with the value of the constraint on all Boolean assignments.This means that a higher-degree polynomial may be needed, which also translates into worse compressibility.For example, while d-NAE-SAT has a compression of size O(n d−1 log n) for all d ≥ 3, the corresponding Max d-NAE-SAT problem with weights of absolute value n O (1) has a compression of size O(n d−1 log n) for odd d ≥ 3, but no compression of size O(n d−ε ) for even d.Another example is d-Exact SAT, where we require exactly one literal in each clause to be true.Whereas d-Exact SAT admits a compression of size O(n log n) for every fixed d [8], we show that Max d-Exact SAT cannot be compressed to O(n d−ε ) bits.
Techniques On a high level, our results are obtained by combining two ingredients: (1) a characterization of the complexity of a constraint language as deg(Γ), via the degree of the characteristic polynomials, and (2) reductions between different problems Max CSP(Γ) and Max CSP(Γ ) by implementing constraints of one language by combinations of constraints from the other.While both ingredients have been used in isolation [9,10,19,24,31], their combination is novel and is the key to understanding compressibility.To comprehend how characteristic polynomials help to compress an instance of Max CSP(Γ), observe that since the characteristic polynomial gives 1 when a constraint is satisfied and 0 otherwise, the total value of satisfied constraint applications can be written as a weighted sum of applications of characteristic polynomials.If deg(Γ) = k, then this weighted sum contains O(n k ) distinct monomials.An instance can therefore be compressed by expanding this weighted sum, and storing the coefficient of each monomial.If all weights in the input instance are bounded by n O (1) , each coefficient will have value n O (1) and can therefore be encoded in O(log n) bits.
Our lower bounds are obtained by parameterized reductions between Max CSPs in which the number of variables does not grow significantly.By a careful analysis of the terms of the characteristic polynomial, we effectively show that if deg(Γ) = deg(Γ ), then constraint applications from Γ can effectively be simulated by combinations of constraints from Γ .Here, we use the framework of implementations from an earlier work [19].Since the characteristic polynomial of d-CNF clauses has degree d, this yields a reduction from d-CNF-SAT to Max CSP(Γ) for deg(Γ) = d that preserves the asymptotic size of the variable set, therefore transferring the cited lower bound for d-CNF-SAT [13] to Max CSP(Γ).The same reduction is also used to turn the compression sketched above into a kernelization, which outputs an instance of the original problem.

Consequences for exponential-time algorithms
The framework we develop for parameterized reductions among Max CSPs also has consequences for exponential-time algorithms, which we believe to be of independent interest.The Max 3-SAT Hypothesis [24] states that Max 3-CNF-SAT with n variables cannot be solved in time O(2 (1−ε)n ) for any ε > 0 (cf.[1,4]).Our reductions imply that this hypothesis is equivalent to the version where Max 3-CNF-SAT is replaced by Max CSP(Γ) for any constraint language Γ with deg(Γ) = 3 in which negated literals can be expressed (formal details follow later).In particular, the Max 3-SAT hypothesis is equivalent to the statement that Max E3-Lin cannot be solved in time O(2 (1−ε)n ).What is more, for any k ≥ 2, our reductions uncover an equivalence class of NP-hard problems whose optimal exponential-time running times coincide with the one for Max k-SAT.

Related work
Representations of Boolean functions as polynomials have been studied frequently in the literature [2,3,24,26,27,29,30] revealing, e.g., a relation between the degree of the representation and the decision tree complexity [27].Algorithms for CSPs via their characteristic polynomials were first given by Williams [31].He used the split-and-list technique to give accelerated exponential-time algorithms for Max 2-SAT and Max CSP(Γ) for deg(Γ) = 2.In recent work, Lincoln, Williams, and Vassilevska Williams [24] give an exponential-time split-and-list reduction from Max CSP(Γ) for deg(Γ) = k to the problem of detecting an -hyperclique in a k-uniform hypergraph, for > k, in support of the (k, )-hypothesis which states that detecting such a hyperclique in an n-vertex input requires time n −o (1) on a Word-RAM with O(log n)-bit words.If this hypothesis fails for some k and , their reduction implies that each Max CSP(Γ) problem with deg(Γ) = k can be solved in time O(2 (1−ε)n ) for some ε > 0. As their reductions run in exponential time, they are very different from ours.
Organization We begin with Section 2 containing all the necessary definitions about CSPs and a summary of the important properties that are already known, including the implementation framework.In Section 3 we explain the idea of representing constraints by polynomials and provide an algebraic background for our reductions.It is followed by Section 4, where the notion of reduction between constraint systems is formalized, and the main reductions are presented.It serves as a toolbox for proving the main results for compression (Section 5) and exponential-time algorithms (Section 6).

Preliminaries
For a set S and integer d ≥ 0, let S d be the collection of all unordered size-d subsets of S. We use [n] as a shorthand for {1, . . ., n}.A k-ary constraint is a function f : {0, 1} k → {0, 1}.We refer to k as the arity of f , denoted ar(f ).We always assume that the domain is Boolean.A constraint f is satisfied by an input s ∈ {0, 1} k if f (s) = 1.A constraint language (sometimes called constraint family) Γ is a finite collection of constraints {f 1 , f 2 , . . ., f }, potentially with different arities.a constraint application, of a k-ary constraint f to a set of n Boolean variables, is a triple f, (i 1 , i 2 , . . .i k ), w , where the indices i j ∈ [n] select k of the n Boolean variables to whom the constraint is applied, and w is a weight, described formally below.The variables can repeat in a single application.Definition 2.1.A constraint system is a pair CS(Γ, W), where Γ is constraint language and the weight range W is either Z or N.An instance (or formula) of CS(Γ, W) is a set of constraint applications from Γ over a common set of variables, each application having a weight from W.
We denote the number of constraint applications in formula Φ by |Φ| and the sum of absolute values of all weights in Φ by ||Φ||.For an assignment vector x, the integer Φ(x) is the sum of weights of the constraints applications satisfied by x.
In the decision problem Max CSP(Γ, W, c) we are given a formula Φ from CS(Γ, W) over n variables such that ||Φ|| ≤ n c , together with integer t, and we ask if there is an assignment x such that Φ(x) ≥ t.We indicate the parameter c in order to be accurate about the specific decision problems, for which we can show hardness results.When it does not lead to a confusion, e.g., when some property holds for all c, we refer to this family of problems shortly as Max CSP(Γ, W).Whenever we use the O-notation, we do it with respect to a fixed problem, that is, we treat Γ and c as constants.
The most commonly studied case is expressed by W = N [12,19,25], where the weights can be interpreted as repetitions of constant applications.It is important to make this distinction because it can be the case that Max CSP(Γ, N) is polynomially solvable whereas Max CSP(Γ, Z) is NP-hard [18].Although our main reduction framework works for W = Z, we are able to transfer the compression lower bounds to the case W = N as long as Max CSP(Γ, N) is NP-hard.
Another decision problem that is related to constraint systems is Exact CSP(Γ, W), where we ask whether there is an assignment for which the satisfied weights sum up exactly to a given integer [24,31].Even though we focus on the maximization variant, we formulate our reductions so that they could be employed for other problems over constraint systems or larger weight domains.

Types of constraints
We start by formally defining the most important constraints and constraints properties.They allow us to formulate the dichotomy theorem for Max CSP.We use the Boolean notation for negation, i.e., ¬x = 1 − x for x ∈ {0, 1}.
• A constraint is trivial if it is either always 1 or always 0 regardless of the arguments.
• T and F are unary constraints given by T (x) = x and F (x) = ¬x.
• OR k and AND k are k-ary constraints, such that OR k (x 1 , . . ., x k ) = k i=1 x i and AND k (x 1 , . . ., , for some p, q ≥ 0, (p, q) = (0, 0), i.e., f is equivalent to a DNF-formula with at most two terms: one containing only positive literals and the other containing only negative literals.
, where x stands for the bit-wise complement of x.
• A constraint f is symmetric if for any two assignments x 1 , x 2 ∈ {0, 1} ar(f ) having the same number of ones, it holds that f (x 1 ) = f (x 2 ).
A constraint language Γ is called 0-valid, 1-valid, 2-monotone, C-closed, or symmetric, if all nontrivial constraints in Γ satisfy the respective property.We call Γ non-trivial if it contains at least one non-trivial constraint.This regime is convenient for formulating the fundamental dichotomy theorem for Boolean Max CSP.For our purposes it is only important that APX-hardness entails NP-hardness.Theorem 2.2 ([19, Theorem 2.11], cf.[9]).Max CSP(Γ, N) is solvable in polynomial time if Γ is either 0-valid, 1-valid, or 2-monotone.Otherwise, the problem is APX-hard.

Closures of constraint languages
Definition 2.3.Let constraints f, g be respectively k-ary and d-ary.We say that g is expressible by f with constants if the following identity holds for a vector (ξ 1 , ξ 2 , . . ., ξ k ), where each ξ j is either a variable x i for some i ∈ [d] or one of the constants 0, 1.
We say that g is expressible by f with literals if such an identity holds for a vector (ξ 1 , ξ 2 , . . ., ξ k ), where each ξ j is a literal: either a variable x i or its negation ¬x i for some For a constraint language Γ we introduce its closures: • the language Γ T,F contains all functions expressible by f ∈ Γ with constants, • the language Γ LIT contains all functions expressible by f ∈ Γ with literals, • the language Γ N EG is the negation-wise closure of Γ, i.e., Γ N EG = f ∈Γ {f, ¬f }.
It is easy to see that the closures satisfy (Γ T,F ) T,F = Γ T,F , (Γ LIT ) LIT = Γ LIT , (Γ N EG ) N EG = Γ N EG .We will be particularly interested in those constraint languages in which negated literals can be expressed, as in, e.g., d-CNF-SAT or d-NAE-SAT.These are the languages that satisfy Γ = Γ LIT .Below we present examples on how to capture important CSPs using our definitions.

Constraint implementations
We describe a technique that has been introduced in order to prove Theorem 2.2 [19].The idea is to implement a constraint f by a collection of other constraints, so that satisfying f is equivalent to maximizing the number of satisfied constraints in that collection.It allows to express formulas from Max CSP(Γ 1 , N) by those from Max CSP(Γ 2 , N), as long constraints in Γ 1 can be implemented by those in Γ 2 .
The caveat is that each implementation may introduce new auxiliary variables whereas for our purposes we need reductions that increase the number of variables only by a multiplicative constant.Therefore the reductions by Khanna et al. [19] do not transfer to the compression paradigm and we will use the implementations in a different way.On the other hand, our reductions do not preserve approximation factors.

Definition 2.4 ([19, Definition 3.1]).
A collection of unit-weighted constraint applications C 1 , C 2 , . . .C m over a set of variables x = {x 1 , x 2 , . . ., x p } called primary variables and y = {y 1 , y 2 , . . ., y q } called auxiliary variables, is an α-implementation of a constraint f (x) for a positive integer α if the following conditions hold.
1.For any assignment to x and y, at most α constraint applications from C 1 , C 2 , . . .C m are satisfied.
2. For any x such that f (x) = 1, there exists y such that exactly α constraint applications are satisfied.
3. For any x, y such that f (x) = 0, at most α − 1 constraint applications are satisfied.
An implementation is a strict α-implementation if for every x such that f (x) = 0, there exists y such that exactly α − 1 constraint applications are satisfied.
We say that a constraint language Γ (strictly) implements a constraint f if there exists a (strict) α-implementation of f using constraints of Γ for some constant α.We use Γ =⇒ f to denote that Γ implements f , and Γ s =⇒ f to denote that Γ strictly implements f .The above notation is also extended to allow the target to be a family of functions.
We omit the machinery that was employed to construct implementations and we will just exploit the following results in a black-box manner.We are not going to rely on the strictness property in further sections as we need it only to ensure transitivity of implementations for the sake of proving Lemma 2.9, which was not stated explicitly in [19].Lemma 2.9.Let Γ be a constraint language that is neither 0-valid, 1-valid, nor 2-monotone.Then Γ s =⇒ XOR.
Proof.If Γ is C-closed, then the claim follows from Lemma 2.6.Otherwise, we can strictly implement T, F (Lemma 2.7) and then rely on transitivity (Lemma 2.5) to implement XOR with Lemma 2.8.

Parameterized complexity
A parameterized problem is a decision problem in which every input has an associated positive integer parameter that captures its complexity in some well-defined way.In our study of CSPs we use the number of variables as the parameter, but other choices have been considered [15,20,21].For a parameterized problem a ⊆ Σ * × N, a decision problem B ⊆ Σ * , and a function f : N → N, a compression of A into B of size f is an algorithm that, on input (x, k) ∈ Σ * × N, takes time polynomial in |x| + k and outputs an instance y ∈ Σ * such that (x, k) ∈ A if and only if y ∈ B, and such that |y| ≤ f (k).A kernelization algorithm of size f for problem A reduces any instance (x, k) to an f (k)-sized equivalent instance of the same problem in polynomial time.

Characteristic polynomials
In this section we provide the technique necessary for expressing one constraint system by another without introducing too many auxiliary variables.This insight is based on interpreting constraints as multilinear polynomials.Definition 3.1.For a k-ary constraint f : {0, 1} k → {0, 1} its characteristic polynomial P f is the unique k-ary multilinear polynomial over R satisfying f (x) = P f (x) for any x ∈ {0, 1} k .
It is easy to construct such a polynomial.First define Formally, P s is the sequence of coefficients one gets by expanding all parentheses.It is easy to see that they are all integers.It holds that P s (s) = 1, while P s (x) = 0 for any x = s.For a constraint f its characteristic polynomial is given as It is known that no other multilinear polynomial can take identical values on {0, 1} k [27,31].This also means we can interchangeably analyze polynomials as formal objects and as functions on {0, 1} k .Observation 3.2.For any Boolean function f , the coefficients of the characteristic polynomial P f are integers.
Let deg(f ) = deg(P f ) and deg(Γ) = max f ∈Γ deg(f ).For a k-ary constraint f we refer to the coefficient at the unique k-ary monomial in P f as the leading coefficient.The leading coefficient is non-zero iff.deg If g is expressible by f with literals, then we can obtain P g from P f by replacing each literal with negation ¬x i by 1 − x i and expanding the parenthesis within monomials.If g is expressible by f with constants, then we just substitute 0 or 1 for particular variables and remove monomials containing 0. These transformations imply deg(Γ T,F ) = deg(Γ LIT ) = deg(Γ N EG ) = deg(Γ).
As an example, consider Max 3-NAE-SAT.The function NAE 3 (x 1 , x 2 , x 3 ) has the degree-2 characteristic polynomial x 1 +x 2 +x 3 −x 1 x 2 −x 1 x 3 −x 2 x 3 , which allows us to construct a compression for Max 3-NAE-SAT of size O(n 2 log n) by summing coefficients at all O(n 2 ) monomials.On the other hand, OR 2 (x 1 , x 2 ) = x 1 + x 2 − x 1 x 2 = NAE 3 (x 1 , x 2 , 0), which indicates that solving Max 3-NAE-SAT should not be easier than Max 2-CNF-SAT.We will formalize these arguments in the next section.
We are now going to show that the set of characteristic polynomials of all constraints expressible by f with constants spans the linear space of multilinear polynomials over Q with degrees at most deg(f ).We first prove that this set contains polynomials of all degrees up to deg(f ) and then use them to express a basis of the linear space.Lemma 3.3.Let f be a k-ary constraint.For any 1 ≤ d ≤ deg(f ) there exists a d-ary constraint g expressible by f with constants, such that its characteristic polynomial P g has degree exactly d, i.e., its leading coefficient is non-zero.
Proof.The degree of d-ary P g can be at most d so we just need to show that the leading coefficient would be non-zero.We proceed by induction over i = deg(f ) − d.Suppose that d = deg(f ) and consider indices j 1 , . . ., j d for which a monomial d i=1 x j i has a non-zero coefficient α.We define g 0 (x 1 , x 2 , . . ., x d ) = f (ξ 1 , ξ 2 , . . ., ξ k ), where ξ j i = x i and other ξ j are set to 0. It is easy to verify that the leading coefficient of P g 0 is α, which proves the induction basis for i = 0.
Suppose now the induction hypothesis holds for i−1, that is, g d+1 (x 1 , x 2 , . . ., x d+1 ) is expressible by f with constants and deg(P g d+1 ) = d + 1.First consider the case where P g d+1 has a non-zero coefficient at some monomial of degree d.Let j be the unique index for which x j does not occur in this monomial.Then g d (x 1 , x 2 , . . ., x d ) = g d+1 (x 1 , . . ., x j−1 , 0, x j , . . ., x d ) is expressible by f with constants and has degree d.
Finally, suppose that P g d+1 has zero coefficients at all monomials of degree d.We define g d (x 1 , x 2 , . . ., x d ) = g d+1 (x 1 , x 2 , . . ., x d , 1).Since the monomial d i=1 x i does not appear in P g d+1 , the leading coefficient in P g d is the same as in P g d+1 .Lemma 3.4.Let f be a non-trivial constraint and P be a multilinear polynomial over Q on variables of degree 0 ≤ d ≤ deg(f ).There exists a sequence of constraint applications f i , (j 1 i , . . ., j on variables, where each constraint f i is expressible by f with constants, and α i ∈ Q, such that the following polynomial identity holds. Before proving this claim, let us demonstrate it on a less obvious example than the one with NAE 3 and OR 2 .Let P (x 1 , x 2 , x 3 ) be the characteristic polynomial of the constraint OR 3 (x 1 , x 2 , ¬x 3 ): We will represent it with characteristic polynomials from {EX 3 } T,F , where EX 3 (x 1 , x 2 , x 3 ) = 1 iff.exactly one variable is 1.Its characteristic polynomial Q is given as: We can express P as the following linear combination where, e.g., Q(x 1 , x 2 , 0) is the characteristic polynomial for EX 3 (x 1 , x 2 , 0), which is a binary constraint expressible by EX 3 with constants: Proof of Lemma 3.4.We proceed by induction over the degree of P .Since f is non-trivial, it admits a satisfying assignment s T .If P is constant, then P (x) = α • f (s T ) for some α ∈ Q. Suppose now d = deg(P ) ≥ 1.For each S ∈ [ ] d let α S denote the (potentially zero) coefficient in P at the monomial i∈S x i .By Lemma 3.3 there is a d-ary constraint f d , which is expressible by f with constants and deg(P has no monomials of degree d, since each term in the sum subtracts exactly one of them.The polynomial P has degree at most d − 1, so we can apply the induction hypothesis to it and represent P as a linear combination of characteristic polynomials of constraints from {f } T,F .We obtain the claim by adding these polynomials to the sum above. Since f i and P f i coincide as functions on {0, 1} ar(f i ) , Lemma 3.4 allows us to represent any constraint of degree at most deg(f ) as a linear combination of constraints from {f } T,F .Proposition 3.5.Let g, f be constraints, such that f is non-trivial and deg(g) ≤ deg(f ).There exists a sequence of constraint applications f i , (j 1 i , . . ., j ar(f i ) i ), α i M i=1 on ar(g) variables, where each constraint f i is expressible by f with constants, and α i ∈ Q, such that the following identity holds for all binary vectors.
This resembles the idea of implementation, where we are additionally equip constraints with (potentially negative) rational weights, but in return it does not require introducing any auxiliary variables.

Reductions between constraint systems
We first formalize our notion of reduction.The objects we work with are the constraint systems and the reductions between them imply analogous relations between the associated decision problems.The reduction is crafted in such a way that it preserves the numbers of variables and constraints up to a constant factor, and the total weight up to a polynomial factor.Definition 4.1.A linear transformation from a constraint system CS(Γ 1 , W 1 ) to another constraint system CS(Γ 2 , W 2 ) is a polynomial-time procedure that given a formula Φ 1 of CS(Γ 1 , W 1 ) over n 1 variables and integer t 1 , returns a formula Φ 2 ∈ CS(Γ 2 , W 2 ) over n 2 variables and integer t 2 , so that the following conditions hold: ) and we want to show that CS(Γ 1 , W 1 ) ≤ LIN CS(Γ 3 , W 3 ).Since n 2 = O(n 1 ) and n 3 = O(n 2 ), then n 3 = O(n 1 ).To ensure properties (2, 3) are preserved, note that . The properties (4,5) are equivalences that are transitive.Moreover, if we replace relation ≤ LIN with ≤ ADD , then we have that n 3 = n 1 + O(1) and the other properties follow since an additive transformation is always a linear one.
We continue with two simple additive transformations, which will allow us to use negations of constraints as an alternative to setting negative weights.Lemma 4.5.For every constraint language Γ we have Proof.Suppose we are given Φ 1 ∈ CS(Γ N EG , Z) and integer t.Observe that (¬f )(x) = 1−f (x).For each negated constraint ¬f we can thus remove each constraint application ¬f, (i 1 , i 2 , . . .i k ), w , replace it with f, (i 1 , i 2 , . . .i k ), −w , and subtract w from the target weight t.We iterate this operation until all applied constraints belong to Γ.We obtain a new formula Φ 2 of CS(Γ, Z) over the same set of variables and a new threshold t , so that Φ 2 (x) = t ⇐⇒ Φ 1 (x) = t (and the same holds with '=' replaced by '≥').This proves (1).
For claim (2), consider a formula Φ 1 of CS(Γ N EG , Z).As before, we have f (x) = 1 − (¬f )(x) and we can replace each constraint application of f having a negative weight by (¬f )(x) with the opposite weight, and subtract the weight from t.By erasing all negative weights, we obtain a formula Φ 2 of CS(Γ N EG , N) over the same set of variables and new threshold t , so that the new instance is equivalent.
The rest of this section contains four lemmas that form a chain of reductions from CS(Γ 1 , Z) to CS(Γ 2 , N), which is valid as long as deg(Γ 1 ) ≤ deg(Γ 2 ) and Max CSP(Γ 2 , N) is NP-hard.The proofs are based on the frameworks of characteristic polynomials and constraint implementations.
Proof.Let f ∈ Γ 2 be a constraint of degree deg(Γ 2 ).By Proposition 3.5 we can represent each g ∈ Γ 1 as a linear combination of constraint applications f i , (j 1 i , . . ., j ar(f i ) , and functions f i are expressible by f with constants.Since Γ 1 is finite, there is a common denominator of all rational numbers occurring in these linear combinations.Therefore, we can find a single positive integer β, so that for each g ∈ Γ 1 , the function β • g can be represented as a combination of constraints from {f } T,F with integer coefficients.
Given a formula Φ 1 ∈ CS(Γ 1 , Z) we replace each constraint application g, (j 1 , . . ., j k ), w with a set of constraints applications from {f } T,F with these integer weights multiplied by w.We obtain a formula Φ 2 ∈ CS(Γ T,F 2 , Z) over the same set of variables, such that Φ 2 (x) = β • Φ 1 (x).This implies the properties (4, 5) from Definition 4.1.We have not increased the number of variables, so the transformation is additive, and |Φ 1 |, ||Φ 1 || have been increased by a constant factor depending only on Γ 1 and Γ 2 .
Consider two formulas Φ 1 , Φ 2 over sets of variables V 1 , V 2 , respectively, which might have a nonempty intersection.We define the sum of these formulas, Φ 1 +Φ 2 , over the set of variables V 1 ∪V 2 by taking a union of their sets of constraint applications and merging pairs of applications that share the same constraint and the same tuple of variables, i.e., replacing the pair with a single application with a weight being the sum of the respective weights.For an integer α, the formula α • Φ has the same constraint applications as Φ, but with weights multiplied by α.Lemma 4.7.Let Γ be a non-trivial constraint language, which is neither 0-valid nor 1-valid.Then CS(Γ T,F , Z) ≤ ADD CS(Γ, Z).
Proof.Given a formula Φ ∈ CS(Γ T,F , Z), we add two auxiliary variables x T , x F and we translate each constraint application f , (j 1 , . . ., j k ), w , where f is expressible by f ∈ Γ with constants, into an application of f by replacing constants 0, 1 with variables x T , x F .Let us refer to this formula as Φ 1 ∈ CS(Γ, Z) and note that In the next step, we will use implementations to impose particular conditions on x T , x F .We refer to x T , x F and all the new variables introduced within the implementations as auxiliary.Let a = O(1) denote their number.In the new formula we assume that the first n variables x 1 , . . ., x n are the primary variables and x n+1 , . . ., x n+A are auxiliary.For x ∈ {0, 1} n+A let x| n stand for the projection on the first n coordinates.
Assume first that Γ is not C-closed.Then it contains non-trivial functions f 0 , f 1 , and g, possibly identical, which are not 0-valid, not 1-valid, and not C-closed, respectively.By Lemma 2.7 we know that Γ α 1 -implements function T and α 2 -implements function F for some integers We implement constraint applications T (x T ) and F (x F ), that is, we construct formulas Φ T , Φ F ∈ CS(Γ, N) over the set of auxiliary variables, such that satisfying α constraint applications in Φ T +Φ F is only possible when x T = 1 and that is, we copy all the constraint applications from Φ 1 and add the applications from Φ T +Φ F with weights multiplied by W . Recall that we have (−||Φ||) ≤ Φ(x) ≤ ||Φ|| for all x.By the definition of implementation, any assignment to Φ 2 which does not satisfy x T = 1 or x F = 0 has value at most (α − 1) Now, if Γ is C-closed, it contains a non-trivial constraint which is C-closed and not 0-valid.Due to Lemma 2.6, Γ α-implements function XOR for some constant α.We implement XOR(x T , x F ), that is, we construct formula Φ XOR ∈ CS(Γ, N) over the set of auxiliary variables, such that satisfying α constraint applications in Φ XOR is only possible when XOR(x T , x F ) = 1.As before, we define where x is the bit-wise complement of x.
We summarize the transformation properties for both considered cases: the new number of vari- then we know that all assignments x satisfy Φ(x) > t and in such case we could return an empty formula over a singleton variable set (so the only possible value is 0) and set threshold t = −1: this is an equivalent instance.To see properties (4,5) observe that, assuming t ≥ −||Φ||, Φ(x) = t (resp.Φ(x) ≥ t) holds for some assignment x iff.Φ 2 (y) = t (resp.Φ 2 (y) ≥ t ) holds for some assignment y, where t = α W + t.
So far we have established a relation between Γ 1 and Γ 2 , which allows us to transform one constraint system to another by adding only a constant number of new variables.However, it works only when we allow negative weights.The next two lemmas explain how to get rid of negative weights, so that the hardness results can be applied to the natural setting with only non-negative weights.
Proof.For a k-ary constraint f ∈ Γ and S ⊆ [k], let f S be the function expressible by f with literals given by negating variables with indices in S. For example OR {2} 3 (x 1 , x 2 , x 3 ) = x 1 ∨ ¬x 2 ∨ x 3 .Suppose we are given Φ ∈ CS(Γ, Z) on n variables.Let J f denote the family of all tuples (j 1 , . . ., j k ) ∈ [n] k that appear in some constraint application of k-ary constraint f in Φ.For each f ∈ Γ, we construct a formula Φ f ∈ CS(Γ LIT , N) over the same set of variables as Φ.The formula Φ f consists of all constraint applications of the form f S , (j 1 , . . ., j k ), 1 for all S ⊆ [k] and (j 1 , . . ., j k ) ∈ J f .
We claim that Φ f is a constant formula, i.e., Φ f (x) does not depend on x.For any assignment x and k-tuple (j 1 , . . ., j k ), the number of sets S for which f S (x j 1 , . . ., Let W be the absolute value of the minimum negative weight in Φ (if Φ has no negative weights, we set W = 0).We define Φ 1 = Φ + W • f ∈Γ Φ f .Every constraint application from Φ appears in some Φ f , so after summing the weights we end up with only non-negative numbers.Hence The second summand does not depend on x and we can set Proof.We proceed similarly to Lemma 4.7.Given a formula Φ ∈ CS(Γ LIT , N) on n variables, for each variable x i we introduce its negation-copy, that is, a new variable xi , and translate each constraint application f , (j 1 , . . ., j k ), w , where f is expressible by f ∈ Γ with literals, into an application of f by replacing literals x i , ¬x i with variables x i , xi .Let us refer to this formula on 2n variables as Φ 1 ∈ CS(Γ, N) and note that By Lemma 2.9 we know that Γ α-implements XOR for same constant α.We implement XOR(x i , xi ), that is, for each i ∈ [n] we construct a formula Φ i ∈ CS(Γ, N) over x i , xi and O(1) auxiliary variables (the sets of auxiliary variables are disjoint for different i), so that Φ i can have value α only when XOR(x i , xi ) = 1.We define Φ 2 = Φ 1 + W • n i=1 Φ i , where W = ||Φ|| + 1.The total number of variables is O(n) and we have added O(n) new constraint applications.As before, let x| n denote the projection of the enlarged vector of variables to the original one.
Since the weights are non-negative, we have 0 ≤ Φ 1 (x) = Φ(x| n ) ≤ ||Φ|| for all x.By the definition of implementation, any assignment to Φ 2 which does not satisfy XOR(x i , xi ) for some i ∈ To see properties (4,5) observe that we can assume t ≥ 0 (by the same argument as in Lemma 4.7) and then Φ(x) = t (resp.Φ(x) ≥ t) holds for some assignment x iff.Φ 2 (y) = t (resp.Φ 2 (y) ≥ t ) holds for some assignment y, where t = nα W + t.
In particular, all the transformation above are linear, therefore by transitivity we can summarize them as the following corollary.Then there is a constant c such that Max CSP(Γ, N, c) does not admit a compression of size O n d−ε for any ε > 0, unless NP ⊆ co-NP/poly.Proof.First observe that a characteristic polynomial of a constraint with d = 1 is linear.Since this polynomial is 0/1-valued, it can depend only on one variable, which means that the constraint is 2-monotone.Hence we can assume that d ≥ 2.
We know that there is a constant c d so that Applications to specific CSPs Having established both the lower and upper bound, we can refer to deg(Γ) as the optimal compression exponent for Max CSP(Γ, N).We are now equipped with a handy but powerful tool for determining the optimal compressibility of Max CSP(Γ, N) as this task reduces to computing the degrees of characteristic polynomials in Γ.
As an example, we apply this technique to compute the optimal compression exponent for Max k-NAE-SAT, Max Ek-Lin, and Max k-Exact-SAT, which are all NP-hard for k ≥ 2. We have , where EX k (x 1 , . . ., x k ) = 1 iff.there is exactly one 1 in (x 1 , . . ., x k ).
Since deg(Γ LIT ) = deg(Γ N EG ) = deg(Γ) it suffices to analyze the characteristic polynomials for functions NAE k , XOR k , and EX k .Let e i (x 1 , . . ., x k ) denote i-th elementary symmetric polynomial, i.e., the sum of all degree-i monomials on k variables.
It is easy to check these formulas using the binomial theorem.We present the argument for XOR k as an example.Suppose the number of 1s in the vector (x 1 , . . ., x k ) is .Then e i (x 1 , . . ., x k ) equals i for i ≤ and 0 for i > .The formula for XOR which is 1 for odd and 0 for even , as expected.By these identities we deduce that the optimal compression exponent for Max k-NAE-SAT is k in the even case, k − 1 in the odd case, and the optimal compression exponent for both Max Ek-Lin and Max k-Exact-SAT is k.
An example of a non-symmetric constraint with a non-trivial upper bound on its degree is f k , with ar(f k ) = 3 k , defined recursively: It is tempting to seek a concise characterization of Γ for which one can obtain non-trivial bound on deg(Γ) and therefore a non-trivial compression for Max CSP(Γ, N), where by non-trivial we mean a bound of the form deg(f ) ≤ ar(f ) − 1 for functions depending on all the coordinates.Unfortunately, as far as we are aware no such conditions are known, not even for symmetric polynomials induced by symmetric constraints.There exist some interesting partial results though, e.g., that if the number of variables k is a prime minus one then the degree is always k, and in general deg(f ) ≥ k − O(k 0.548 ) [30].On the other hand, there are infinitely many symmetric functions for which deg(f ) = ar(f ) − 3 [30].When it comes to non-symmetric functions, there exist infinitely many examples with deg(f ) ≤ log(ar(f )) [28], which is asymptotically the lowest upper bound possible [27].
Negative weights For the sake of completeness, we show that an analogous classification holds for Max CSP(Γ, Z), that is, whenever Max CSP(Γ, Z) is NP-hard, then the upper bound from Theorem 5.1 is essentially tight.The dichotomy theorem for W = Z can be stated in a simpler manner, as the problem becomes NP-hard whenever deg(Γ) ≥ 2 [18] and the case deg(Γ) = 1 reduces to linear function maximization.This dichotomy follows also from the reduction below.Proof.Since f and ¬f cannot be 0-valid (or 1-valid) at the same time, the language Γ N EG is neither 0-valid nor 1-valid.We can thus apply Corollary 4.8 to get CS(Γ d-SAT , Z) ≤ ADD CS(Γ N EG , Z).We compose it with CS(Γ N EG , Z) ≤ ADD CS(Γ, Z) (Lemma 4.5, point (1)) to obtain that an O n d−ε -size compression for Max CSP(Γ, Z, c) with sufficiently large c entails the same for Max CSP(Γ d-SAT , Z, c − O(1)), which implies NP ⊆ co-NP/poly [13].

Consequences for exponential-time algorithms
As mentioned before, our framework of reductions can be used to preserve the exponential running time as well.Namely, if CS(Γ 1 , W 1 ) ≤ ADD CS(Γ 2 , W 2 ), then an algorithm for Max CSP(Γ 2 , W 2 , c) with running time T (n) entails an algorithm for Max CSP(Γ 2 , W 2 , c − O(1)) with running time T (n+O(1)).All the constructed transformation, except from CS(Γ LIT , N) ≤ LIN CS(Γ, N) (Lemma 4.10), are additive and in particular they work as long as negative weights are allowed.Alternatively, we can take advantage of other properties of particular constraint languages to remove the negative weights.Proof.Recall that Γ N EG can be neither 0-valid nor 1-valid so it satisfies the conditions for Γ 2 in Corollary 4.8.The same holds for Γ d-SAT .We also take advantage of the fact that this language can express negated literals, i.e., (Γ d-SAT ) LIT = Γ d-SAT .We have the following cycle of reductions.Alman and Williams [1] have noted that it is not known how to improve the running time for Max 3-CNF-SAT, even for instances with a linear number of clauses.They have therefore formulated a stronger hypothesis.The Sparse Max 3-SAT Hypothesis states that there exists c > 0 such that Max 3-CNF-SAT with cn clauses does not admit an O(2 (1−ε)n )-time algorithm for any ε > 0. Similarly, their Sparse Max 2-SAT Hypothesis states that one cannot beat running time O 2 ω n 3

CS(Γ
for Max 2-CNF-SAT with cn clauses.Observe that our reductions preserve the property of having O(n) different constraint applications (condition (2) in Definition 4.1).Therefore as long as we allow negative weights at constraint applications, we can replace Max 2-CNF-SAT (resp.Max 3-CNF-SAT) in this hypothesis with Max CSP(Γ, Z) for any constraint language Γ of degree 2 (resp.3) to obtain an equivalent statement.

Conclusions and open problems
We have provided a complete characterization of the optimal compression for Max CSP(Γ) in the case of a Boolean domain.a natural question arises about larger domains.Our approach does not transfer even to the case with a domain of size 3, since there is no unique way to represent functions {0, 1, 2} k → {0, 1} as polynomials.One may consider, e.g., embedding to a Boolean domain or using non-multilinear polynomials, but it is not clear which approach leads to the optimal degree and how to find accompanying lower bounds.
On the exponential-time front, we have showed that Max d-CNF-SAT is as hard as any Max CSP of degree d as long as negative weights are allowed.Although we were able to get rid of the latter assumption in several cases, there is still a gap in this classification: does improving the running time for any degree-d Max CSP(Γ, N) imply an improvement for Max d-CNF-SAT?
Max CSP(Γ d-SAT , N, c d ) does not admit compression of size O n d−ε unless NP ⊆ co-NP/poly: for d = 2 it is due to Lemma 5.2 and for d ≥ 3 it follows from the compressibility hardness for the non-maximization variant of d-CNF-SAT [13].As noted in Proposition 4.2, if we had such a compression for Max CSP(Γ, N, c) with sufficiently large c, then by Corollary 4.11 it would transfer to Max CSP(Γ d-SAT , N, c d ).Theorem 5.3 is a formalization of Theorem 1.2 in the introduction.Below we present some important corollaries from it.

Theorem 5 . 4 .
Let non-trivial Γ be such that d = deg(Γ) ≥ 2. Then there is a constant c such that Max CSP(Γ, Z, c) does not admit a compression of size O n d−ε for any ε > 0, unless NP ⊆ co-NP/poly.

8 ≤ 8 Theorem 6 . 2 . 2 ω n 3 for
d-SAT , Z) ≤ ADD CS(Γ d-SAT , N) Lemma 4.9 for (Γ d-SAT ) LIT = Γ d-SAT ≤ ADD CS(Γ N EG , Z) Corollary 4.ADD CS(Γ, Z) Lemma 4.5, point (1)≤ ADD CS(Γ d-SAT , Z)Corollary 4.For each d ≥ 2 and any constant α > 1 either all the following problems admit an α n n O(1) algorithm for all c, or none of them do: First corollary of this theorem is that problems of form Max CSP(Γ, Z) are divided into equivalence classes with respect to the optimal running time.In particular, solving any Max CSP(Γ, Z) with deg(Γ) ≥ 3 in time O(2 (1−ε)n ) for any ε > 0 contradicts the Max 3-SAT Hypothesis.Also, the hypothesis remains equivalent if we replace Max 3-CNF-SAT with Max 3-Lin SAT or Max 3-Exact SAT with only positive weights, because their constraint languages satisfy Γ N EG = Γ and Γ LIT = Γ, respectively.Another corollary is that improving the running time O Max Cut or Max DiCut with integer weights or Max 3-NAE-SAT with positive weights would imply an analogous breakthrough for Max 2-CNF-SAT.