Local geometry of NAE-SAT solutions in the condensation regime

The local behavior of typical solutions of random constraint satisfaction problems (CSP) describes many important phenomena including clustering thresholds, decay of correlations, and the behavior of message passing algorithms. When the constraint density is low, studying the planted model is a powerful technique for determining this local behavior which in many examples has a simple Markovian structure. Work of Coja-Oghlan, Kapetanopoulos, M\"{u}ller (2020) showed that for a wide class of models, this description applies up to the so-called condensation threshold. Understanding the local behavior after the condensation threshold is more complex due to long-range correlations. In this work, we revisit the random regular NAE-SAT model in the condensation regime and determine the local weak limit which describes a random solution around a typical variable. This limit exhibits a complicated non-Markovian structure arising from the space of solutions being dominated by a small number of large clusters. This is the first description of the local weak limit in the condensation regime for any sparse random CSPs in the one-step replica symmetry breaking (1RSB) class. Our result is non-asymptotic, and characterizes the tight fluctuation $O(n^{-1/2})$ around the limit. Our proof is based on coupling the local neighborhoods of an infinite spin system, which encodes the structure of the clusters, to a broadcast model on trees whose channel is given by the 1RSB belief-propagation fixed point. We believe that our proof technique has broad applicability to random CSPs in the 1RSB class.


Introduction
A random constraint satisfaction problem (rcsp) involves n variables z = {z i } i≤n ∈ X n drawn from a finite alphabet set X, satisfying m ≡ αn random constraints.The aim is to analyze the solution space of rcsps as n and m increase, with α constant.Major advances have been made by statistical physicists using deep but non-rigorous theory, which describes a series of phase transitions as the constraint density α grows.Their insights apply to a wide class of rcsps belonging to the so-called one-step replica symmetry breaking (1rsb) class, including k-sat, nae-sat, and coloring ( [20,18], see also [3], [19]).We'll begin by describing some of the main predictions made by the physicists [18].See Figure 1 for the pictorial description of the conjectured phase diagram.
When the density of constraints α is below the uniqueness threshold α uniq , all of the solutions lie in a single cluster.Here, a cluster is defined to be the connected component of the solution space, where two solutions are connected if they differ by a small, say log n, number of variables.As α increases, the space of solutions undergoes a shattering threshold α clust after which the space of solutions shatters into exponentially many clusters of solutions, each well separated from each other [1].While the space of solutions becomes more complex at this point, the behavior of a typical solution retains a simple description.In particular, the uniform measure over the solutions is contiguous with respect to the so-called planted model.This was recently established rigorously by Coja-Oghlan, Kapetanopoulos, and Müller [10] up to the condensation threshold α cond for several models including nae-sat and coloring.[18,14].A pictorial description of the conjectured phase diagram of random constraint satisfaction problems in the one-step replica symmetry breaking class.In the condensation regime (α cond , α sat ), a bounded number of clusters contain most of the solutions and the uniform measure over the solutions fails to be contiguous with the planted model.
A second threshold, which is of primary interest in this paper, is the condensation threshold α cond .For α ∈ (α clust , α cond ) each cluster of solutions has only an exponentially small fraction of the total number of solutions while for α ∈ (α cond , α sat ) most of the solutions are contained in O(1) number of clusters.Indeed, a more refined prediction is that the cluster sizes follow a Poisson-Dirichlet distribution [18].This is the regime in which the model is said to be 1rsb, or in the condensation regime.Formally, this means that if we look at the normalized Hamming distance of two randomly chosen solutions, it is concentrated on two points.This corresponds to a positive probability of having two solutions in the same cluster in which case they are close as well as a positive probability of two solutions in different clusters in which case they are much further.While this is predicted in many models it has so far only been established in the regular nae-sat for large k [28].
It is further conjectured that not only does the structure of the space of solutions exhibit a phase transition at α cond , so does the local distribution of the individual solutions themselves.In particular, given a solution drawn uniformly at random, consider the empirical distribution of the solution in a ball of radius 2t around variables 1 ≤ i ≤ n.Here, a ball of radius 2t is with respect to the factor graph induced by the constraints and the variables.For example, if 2 variables are involved in the same constraint, they have distance 2.
For α < α cond , because of contiguity, it suffices to study the planted model to determine the limit of such local empirical distribution.Here, the planted model means taking a fixed "planted" assignment of the variables and then choosing the constraints conditioned to satisfy the planted assignment.The local empirical distribution of the planted model admits a simple description as it can be studied with the configuration model.In the case of nae-sat or colorings on random regular graphs, it is simply the uniform distribution on solutions on a regular tree, which is Markovian in the sense that the spins along any path follow a Markov chain whose transition probabilities can be readily calculated.This then describes the behavior of a random solution up to α cond .
In this paper, we investigate the regime α > α cond in the random d-regular k-nae-sat model for large k ≥ k 0 , where k 0 is an absolute constant.This regime presents a complex local empirical distribution, deviating from the planted model.The nae-sat problem offers additional symmetries compared to k-sat that make it more tractable from a mathematical viewpoint.Nevertheless, it is predicted to belong to the same 1rsb universality class of rcsps as random k-sat and random graph coloring, thus sharing similar qualitative behaviors.Let us give an informal statement of our main theorem.
We refer to Theorem 1.3 below for the formal statement.Explicit definition of P t ⋆ is given in Section 1.4.Unlike in the planted case, P t ⋆ is non-Markovian as shown in Section 1.5.We emphasize that for t ≥ 2, characterization of the local weak limit P t ⋆ with O(n −1/2 ) fluctuation poses significant difficulties, demanding novel methods not covered in earlier works [28,29].Indeed, [28] has only studied statistics of depth 2 neighborhoods, and they established concentration in ℓ ∞ -distance of the free component profile with larger distance O(n −1/2 log n).In order to establish Theorem 1.1, we establish ℓ 1 -type concentration with optimal fluctuation O(n −1/2 ), which is much stronger since there are typically n Ω k (1) types of free trees.Further, we construct a delicate coupling in the infinite spin system encoding the clusters, which we call component coloring, and improve the concentration of depth 2 neighborhoods to greater distances.Section 2 provides a high level overview of our proof methodology, which we believe has wide applicability to 1rsb class random csps.
We further remark that the characterization of the local weak limit in the condensation regime (α cond , α sat ) is delicate due to the randomness coming from the weights for the clusters.Indeed, we expect that for a different notion of the local distribution studied in [23], where a uniformly random vertex is chosen first and then the marginal of a neighborhood around the vertex is considered, the local weak limit will then be random, which is a mixture of extremal Gibbs measures with weights drawn from a Poisson-Dirichlet distribution.Since showing that the relative sizes of the clusters follow a Poisson-Dirichlet process in the regime (α cond , α sat ) is open for any rcsp's in 1rsb universality class, we leave this different notion of local weak limit as a conjecture.See Section 1.5 for a further discussion.

Definitions and Main result
We first define the random regular nae-sat model.An instance of a d-regular k-nae-sat problem can be represented by a labelled (d, k)-regular bipartite graph as follows.Let V = {v 1 , . . ., v n } and F = {a 1 , . . ., a m } be the sets of variables and clauses, respectively.Connect v i and a j by an edge if the variable v i participates in the clause a j .Denote this bipartite graph by G = (V, F, E), and for e ∈ E, let L e ∈ {0, 1} denote the literal assigned to the edge e.Then, nae-sat instance is defined by G = (V, F, E, L) ≡ (V, F, E, {L e } e∈E ).
For each e ∈ E, we denote the variable (resp.clause) adjacent to it by v(e) (resp.a(e)).Moreover, δv (resp.δa) are the collection of adjacent edges to v ∈ V (resp.a ∈ F ).Then, a nae-sat solution is formally defined as follows.
The random regular nae-sat instance G = (V, F, E, L) is then generated by a perfect matching between the set of half-edges adjacent to variables and half-edges adjacent to clauses which are labelled from 1 to nd = mk.Thus, E is a uniform permutation in S nd .Conditioned on E, the literals L = (L e ) e∈E is drawn i.i.d.from Unif({0, 1}).We use the notation z ∼ Unif(SOL(G)) for a nae-sat solution drawn uniformly at random given a random regular nae-sat instance G.
We next define the distribution in a local neighborhood of v ∈ V and a depth t ≥ 1. Hereafter we denote d(•, •) ≡ d G (•, •) by the graph distance on the factor graph G.
We have used the distance 2t − 3 2 instead of 2t − 2 to include the boundary half-edges (half-edges that are not connected within N t (v, G)) hanging from the leaf variables {w ∈ V : d(v, w) = 2t − 2}, which will be convenient for the proof.Denote the set of boundary half-edges (resp.full-edges) of We take the convention that the full-edges of e ∈ E in (N t (v, G)) store literal information L e while the boundary half-edges e ∈ ∂N t (v, G) do not.To this end, for G = (V, F, E, L) and z ∈ {0, 1} V , denote Note that if N t (v, G) does not contain a cycle, it is isomorphic to (d, k)-regular tree.Denote the infinitary (d, k) regular factor tree rooted at a variable ρ by T d,k ≡ T d,k (ρ).Here, we consider ρ as a variable and its d descendants as clauses.Similarly, all the clauses have k descendant variables.Thus, the variables are located at even depths whereas the clauses are located at odd depths.Then, T d,k,t is defined by 2t − 3 2 neighborhood around the root ρ in T d,k (see Figure 2).We use the notation and ∂T d,k,t respectively for the set of variables, clauses, full-edges, and boundary half-edges of T d,k,t . For Then the depth t empirical distribution is given by Note that . The total mass is 1−O(n −1 ) with high probability because of rare neighborhoods which contain a cycle.Our main theorem below determines the limit of P t n [G, z] with O(n −1/2 ) fluctuation, thus such cyclic neighborhoods may be neglected.
We emphasize that the fluctuation C √ n is optimal as local neighborhood frequencies have central limit theorem fluctuations. 1In particular, it is arguably much stronger than the asymptotic statement, where one only specifies the limit P ⋆ t , and not the rate O(n −1/2 ) in (3).For example, Theorem 1.3 imply that , for any a n which diverges to ∞.

Related works
Local weak convergence of graphs, also known as Benjamini-Schramm convergence, gives a way of describing the local distributional limit of a sequence of (possibly random) graphs.It was developed independently by Aldous [4] to study the Assignment Problem and by Benjamini and Schramm [5] to study recurrence of random walks on planar graphs.The notion of local weak convergence generalizes naturally to the graphs that are labeled by a spin system, which we study in this work, and it is conjectured that many global properties of the spin systems are determined by the local weak limit.Indeed, in the context of rcsps, statistical physics predictions [34,25,18] that describe a series of phase transitions rcsp in the 1rsb universality class are based on the local weak limit.The shattering and freezing thresholds are described explicitly in terms of transitions in the behaviour of the broadcast model which corresponds to the local weak limit in this regime.The condensation corresponds to the point at which the local weak limit stops being given by the simple broadcasting model description.
Earlier mathematical literature on local weak limits for rcsps was centered around understanding the shattering and freezing thresholds.The shattering threshold is conjectured to coincide with the reconstruction threshold of the local weak limit, which asks whether the spin at the root is dependent on asymptotically far away vertices.Several results relate tree thresholds to the analogous on graphs.Gerschenfeld and Montanari [16] showed that on locally treelike graphs, reconstruction on the graph is equivalent to that of the tree if two independent solutions have approximately independent empirical distributions.Montanari, Restrepo and Tetali [24] extended this to a wider class of rcsps.Later, Coja-Oghlan, Efthymiou, and Jaafari established the local weak convergence in the k-colouring model up to the condensation threshold for large enough k in [9].On the other hand, the freezing threshold for colouring models was established by Molloy [21] and for a wider class of models in [22].
As noted above, studying the planted model has been a powerful tool to study the local weak limit of rcsps below condensation.A spectacular application of this method was by Achlioptas and Coja-Oghlan who established regimes of clustering for solutions for the colouring, k-sat and nae-sat [1].This was established via the second moment method which shows that most graphs have similar numbers of solutions.A more refined picture can be established by Robinson and Wormald's small graph conditioning method [32] to show that the planted model is contiguous with respect to the original distribution [17].
Another setting where local weak convergence has played a key role is in the study of the stochastic block model (SBM).This is a model of inhomogeneous graphs that contains communities and is an important testbed for the statistical theory of community detection in networks.Understanding the local weak limit of the SBM has led to sharp information theoretic bounds on when detection of communities is possible [26,27].

Clusters
A central role in understanding the nae-sat model and sparse rcsps in general is to study how the space of solutions splits into small rigid clusters.In a typical solution, a small but constant fraction of variables can be flipped between 0 and 1 without violating any constraints giving rise to exponentially many nearby solutions.In order to give a combinatorial definition of a cluster, the so-called coarsening algorithm inductively maps variables taking values in {0, 1} to f free if they can be flipped without violating any constraints [31,13].A constraint is considered satisfied if one of its variables is free and the algorithm continues until no more variables can be set to f resulting in a {0, 1, f} valued configuration called a frozen configuration.Every valid frozen configuration satisfies the following properties.Definition 1.5 (Coarsening and Frozen configuration).Given a nae-sat instance G = (V, F, E, L) and a nae-sat solution z ∈ SOL(G), the coarsening x ∈ {0, 1, f} V of z is defined by the following algorithm.For each variable v ∈ V , whenever the value of z v ∈ {0, 1} can be flipped between 0/1 without violating any constraints, change z v to f. Iterate until no more variable can be set to f.Note that the resulting configuration must satisfy the following.We call x ∈ {0, 1, f} V a (valid) frozen configuration if it satisfies Frozen model solutions are themselves a random csp but without clusters, because they are in effect clusters of solutions projected to a single point.[13] showed that for α ∈ [α lbd , α ubd ], with high probability all solutions map via coarsening algorithms to frozen configurations with a low density (less than 7/2 k ) of free variables.Thus, free variables form subcrtical clusters that are almost all small subcritical branching process trees.The size of a cluster is the product over the trees of the number of solutions in each tree.
An alternative definition of the cluster model is in terms of fixed points of the Belief Propagation (BP) equations.On each directed edge e we define a pair of messages m e ≡ ( ṁe , me ) taking values in the set of probability distributions on {0, 1}. .
The interpretation of ṁe (z) is the probability that v is equal to z in a random solution in that cluster after the edge e is removed.Frozen variables correspond to those which have at least one incoming message m that is a point mass.A solution to the BP equations m ≡ (m e ) e∈E can be arrived from a nae-sat solution z by starting with ṁe (y) = I(z v(e) = y) for e ∈ E and y ∈ {0, 1} and then iteratively calculating m from ṁ and then ṁ from m. From almost all solutions z these converge in a finite number of iterations.
The number of solutions in a configuration corresponding to a cluster x, or equivalently m, is given by for explicit functions ( φ, φ, φlit ) defined in equation ( 28) of [33] or equation (59) below.The challenge in the condensation regime is that the typical number of solutions is much smaller than the expected number of solutions.This is because the largest contribution to the expected value comes from rare clusters, which are large.But it is exactly these clusters whose local distribution behaves like the planted model.Their absence in a typical realization results in a different empirical distribution.Instead, following the physics heuristics of [18] implemented rigorously in [33,28], we weight frozen model configurations according to size(m, G) λ and tune λ to give clusters that correspond to the largest size that appears.To this end, we define the following measure-valued functions.For µ ∈ P([0, 1]), where P(A) denotes the set of probability measures on a measurable space A, let where Ẑ (µ) and Ż (µ) are normalizing constants to make Rλ µ and Ṙλ µ a probability measure.Denote R λ ≡ Ṙλ • Rλ : P([0, 1]) → P([0, 1]).The fixed point of R λ was established in [33].

Local weak limit
We now specify the distribution over solution given the literals L t ∈ {0, 1} E in (T d,k,t ) , where we recall that T d,k is the infinite (d, k) regular factor tree rooted at a variable ρ, and T d,k,t is the sub-tree of T d,k up to depth 2t − 3 2 .First, we choose a random cluster in terms of its BP messages m = ( ṁ, m).Note that if we set the incoming messages me at the boundary edges e ∈ ∂T d,k,t , then there is a unique extension to the internal edges E in (T d,k,t ) solving the BP equations, which gives m t ≡ (m e ) e∈E(T d,k,t ) .With abuse of notation, identify ṁe , me ∈ P({0, 1}) with ṁe (1), me (1) ∈ [0, 1].For L t ∈ {0, 1} E in (T d,k,t ) , we assign the weight where the normalization constant Z λ is given by Given m t , let x(m t ) ≡ (x v ) v∈V (T d,k,t ) be the frozen configuration associated with m t .That is, if there exists e ∈ δv such that me = δ z for some z ∈ {0, 1}, then set x v = z, and otherwise, set x v = f.For z t ∈ {0, 1} V (T d,k,t ) , we write z t ∼ L t x(m t ) if z v = x v whenever x v ∈ {0, 1} and z t is a valid nae-sat configuration for literals L t .
In other words, z t is a valid assignment of the spins in the free variables of x(m t ).For λ ⋆ ≡ λ ⋆ [α, k] in (5), define the probability measure This construction picks a frozen configuration x according to the λ ⋆ -weighted measure and then picks a random solution properly weighted by the effect of the x outside of the neighborhood.

Further discussion
Having established the local weak limit P t ⋆ ≡ P t ⋆ [α, k] for α ∈ (α cond , α sat ), natural questions arise: is P t ⋆ [α, k] a Gibbs measure?Can P t ⋆ [α, k] be described in a Markovian fashion?As one might expect, the answer to the first question is yes in a rather simple manner.Note that given m t , the integrand in equation ( 6) defines a Gibbs measure over z t with BP messages (m e , ṁe ) e∈E(T d,k,t ) .Thus, P t ⋆ is a mixture of such measures, which is again a Gibbs measure.Furthermore, let F 0 ⊂ F (T d,k,t ) be a subset of clauses and V 0 := {v ∈ V (T d,k,t ) : d(F 0 , v) = 1} be the variables adjacent to them.Denote the boundary variables in V 0 by ∂V Then, conditional on z ∂V0⊔V c 0 and L t , it follows from the definition that ) is simply a uniform measure over the nae-sat solutions in V 0 ∪ F 0 . 2 Thus, in this sense, all the interesting aspects of P t ⋆ comes from the boundary conditions (m e ) e∈∂T d,k,t .
For the second question, the measure P t ⋆ is non-Markovian.Let us first show this in the limiting case of t = ∞.Note that the BP messages induced by z are measurable with respect to z.We condition on z ρ = 1 and on the literals L t and edges around the root e ∈ δρ satisfies me [1] ∈ {0, 1  2 }.Assume it was Markovian.Then conditional on the root the messages to the root from each subtree are conditionally independent.Let Y i be the indicator that the i-th edge of ρ is forcing.Then taking a ball of radius 1 around the root, if any of the clauses are forcing Z λ = 1 while if they are all separating then the root is a free singleton and Z λ = 2 λ .Hence we have that with p = μλ ⋆ (0) μλ ⋆ (0)+μ λ ⋆ (1/2) , and so it is not a product measure.The factor of 2 (λ ⋆ −1) comes from Z λ and the probability 2 −1 of z ρ = 1.
As the measure is non-Markovian for t = ∞, it must also be non-Markovian for some large fixed t.We remark that our description of local weak limit holds even below α cond .Note that in μλ ≡ μλ [α, k] in Proposition 1.6 is well-defined even for α ∈ [α lbd , α cond ].Thus, in such regime, P t ⋆ [α, k] is well-defined with equation (6), where λ ⋆ [α, k] ≡ 1 for α ≤ α cond , and (a modification) of our proof shows that Theorem 1.3 holds for α ∈ [α lbd , α cond ].However, when λ ⋆ = 1, it can be shown that P t ⋆ is just a uniform measure over (z t , L t ), which is a nae-sat solution and can be described by a simple broadcast model.This coincides with the description of the local weak limit obtained in [10] for entire range α ∈ (0, α cond ), thus below condensation, our method is just a more complicated way of determining the local weak limit.
We further remark that there are two different notions of local weak convergence of a solution [23].In the terminology of [23], our result is called convergence locally on average as we have taken the empirical distribution averaged over all the vertices of the graph.A stronger notion is convergence in probability locally which asks that at almost all fixed variables i, the distribution of solutions in a ball of radius t around i converges.This stronger notion was proved by [10] for α < α cond .In contrast, it is not true for α > α cond because the local distribution is itself random depending on the clusters and their relative weights.To be more precise, let , where v is drawn uniformly at random from V and G is a random regular nae-sat 2 Here, note that for two nae-sat instance.Then, we conjecture that in the condensation regime α ∈ (α cond , α sat ), the random element ) equipped weak star topology converges weakly to where the random elements P t ⋆,+ , P t ⋆,− are defined as follows.
Here, (ω i ) i≥1 follows Poisson-Dirichlet distribution with parameter λ ⋆ ≡ λ ⋆ [α, k] (see Chapter 2 of [30] for the definition of Poisson Dirichlet process) and m ∼ ν λ ⋆ (• ; L t ).Also, ¬ z t is obtained from z t by flipping 0 and 1. Establishing this conjecture is equivalent to showing that the cluster sizes follow Poisson-Dirichlet distribution, which is a major open problem for rcsp's in 1rsb class.

Proof overview
In this section, we give an overview of the proof of Theorem 1.3 which is equivalent to lim sup Based on 1rsb heuristics, we analyze the law of z ∼ Unif(SOL(G)) by first conditioning on it's coarsening x ∈ {0, 1, f} V : we will see in Section 2.1 that conditioned on (G, x), the law of z can be described in a relatively simple manner based on belief propagation.We then divide the cases into whether the frozen configuration (G, x) is favorable or not.By Chebyshev's inequality, it suffices to show that for any ε > 0, there exists a set X fav ≡ X fav (ε) of favorable frozen configurations such that P ((G, x) ∈ X fav ) ≤ ε and sup sup where C ≡ C(ε, k, t) > 0 is a constant that depends only on ε, k, t.In the subsequent subsections, we explain the main ideas on how to establish ( 8) and ( 9) on some typical event X fav .In Section 2.1, we define the notion of free components, which intuitively are the connected components of the subgraph formed by the free variables.Free components play a crucial role in understanding the law of z ∼ Unif(SOL(G)) conditioned on its coarsening x.Indeed, we show in Section 2.2 that the variance control (8) follows from the exponential decay of the frequencies of the free components, which was established in [28].Obtaining the bias control ( 9) is where most of the challenges lie in.In Sections 2.3 and 2.4, we explain the main ideas to establish (9).Notations: Throughout, we let x ∈ {0, 1, f} V be a valid frozen configuration on a nae-sat instance G = (V, F, E, L).We often identify V ≡ {1, 2, . . ., n} for convenience.For non-negative quantities f = f d,k,n,t and g = g d,k,n,t , we use any of the equivalent notations f = O k,t (g), g = Ω k,t (f ), f ≲ k,t g and g ≳ k,t f to indicate that there exists a constant C k,t , which depends only on k and t.We drop the subscript t the constant C only depends on k.

Free components
there are 2 adjacent frozen variables that evaluate 0 and 1, i.e. there exist e, e ′ ∈ δa such that L e ⊕x v(e) = 0, L e ′ ⊕x v(e ′ ) = 1.A clause a ∈ F is non-separating if it is not separating.Observe that a separating clause can never be violated no matter how the free variables in x are filled with 0 or 1.
An edge e ∈ E is called forcing if flipping the value of x v(e) invalidates a(e), i.e.L e ⊕ x v(e) ⊕ 1 = L e ′ ⊕ x v(e ′ ) ∈ {0, 1} for all e ′ ∈ δa \ e.A clause a ∈ F is forcing, if there exists e ∈ δa which is a forcing edge.In particular, a forcing clause is also separating.Definition 2.1.Given (G, x), a free piece, denoted by f in , is a connected component of the subgraph induced by the free variables and non-separating clauses in x.A free component is a union of f in and the half-edges adjacent to f in .Thus, f is composed of the free piece f in and the boundary half-edges hanging from f in .Moreover, the (half-)edges of f are labelled as follows.
• Denote by V (f), F (f), E(f) the set of variables, clauses, and full-edges of f (i.e.edges of f in ), respectively.
Then, each e ∈ E(f) is labelled by its literal L e .
Then, e ∈ ∂f is labelled with the information (x v(e) , L e ) ∈ {0, 1} 2 .Here x v(e) ∈ {0, 1} is guaranteed, since it must be frozen.The label x v(e) is called spin-label whereas L e is called literal-label.The other boundary half-edges e ∈ ∂f, which must be adjacent to separating clauses in G, are unlabelled.
A free tree is a free component f which does not contain a cycle.We often use the notation t to denote a free tree and the notation f for a generic free component, which may contain a cycle.We denote the collection of free components inside (x, G) by F (x, G) and the collection of free trees by , and e f = |E(f)| for the number of variables, clauses, and edges of a free component f.
We remark that an equivalent labelling scheme was also used in [28,Definition 2.18].However, the notion of 'free tree' in [28] (see Definition 2.16) is slightly different than the one in Definition 2.1.Namely, [28] further introduced an equivalence relation and defined a 'free tree' as an equivalence class.It is crucial that we do not make this reduction for the purpose of coupling described in Section 2.4.
Note that a free component f ∈ F (x, G) is embedded in G by definition.However, it can also be treated as a separate labelled graph.To this end, we denote the set of possible free components (up to graph and label isomorphisms) by F and the set of possible free trees by F tr ⊊ F .For f ∈ F , define the weight of f as where z V (f) is a nae-sat solution if it satisfies every clauses in f (recall that the every (half-)edges adjacent to clauses store literal information) and the coarsening is taken with respect to the clauses of f as the same manner as described in Definition 1.5.A crucial observation then follows.
Observation 2.2.Since a separating clause can never be violated, we have size(x, G) = f∈F (x,G) w f .Thus, sampling a random regular nae-sat instance G and a uniformly random nae-sat solution z ∈ Unif(SOL(G)) is equivalent to the following sampling procedure.
(a) Sample a random regular nae-sat instance G.Then, sample a frozen configuration, or equivalently a cluster, x ∈ {0, 1, f} V with probability proportional to its weight size(x, G), namely (b) Given (G, x), sample a nae-sat solution z uniformly at random among those which are coarsened to x.
Equivalently, for each free component f ∈ F (x, G), independently sample z V (f) ∈ {0, 1} V (f) uniformly at random among those which are coarsened to f.
We remark that for a free tree t ∈ F tr , every nae-sat solution z V (t) of t is coarsened to t, so size(t) is the number of nae-sat solutions of t.Moreover, sampling a nae-sat solution z V (t) ∈ {0, 1} V (t) of t uniformly at random can be analyzed in a simple manner.That is, any marginals of the law of z V (t) can be described by a belief propagation (see Section A).
For the rest of this subsection, we review the notion of component coloring in [28].Define the set C as Here, we take the convention that (f, e) = (f ′ , e ′ ) if there exists a graph isomorphism from f to f ′ that keeps the labels unchanged and maps e to e ′ .The symbols R 0 , R 1 (resp.B 0 , B 1 ) represent 'red' (resp.'blue') spins, and S represent 'separating' spin.The terminologies were introduced in [33], building on the works [11,8].
Definition 2.3.Let x ∈ {0, 1, f} V be a (valid) frozen configuration on G.The component coloring σ ≡ (σ e ) ∈ C E corresponding to x is defined by the following procedure.
2. For each separating clause a and an adjacent edge e ∈ δa, assign σ e = S if x v(e) = f.
3. For the other edges e ∈ E, which must be contained in a free component of x, let f e ∈ F be the free component that contains e.We then set σ e = (f e , e) ∈ C .
Given G, we call a component coloring σ valid on G if there exists a valid frozen configuration x on G such that it maps to σ with the above procedure.Then, it is straightforward to see that the procedure in Definition 2.3 gives a one-to-one correspondence between the valid frozen configurations and the valid component colorings.The following remark plays an important role later in Section 2.4.
Remark 2.4.Given (G, x), suppose e ∈ E is contained in a free component.Note that by Definition 2.1, (f e , e) contains the information on literals of f e and spin labels on boundary half-edges of f.Therefore, σ e = (f e , e) completely determines the colors of adjacent edges, (σ e ′ ) e ′ ∈δv(e)\e ⊔ δa(e)\e .

Exponential decay of the frequencies of free components
By Observation 2.2, conditional on (G, x) = (G, x), we have that X t i and X t j are independent if N t (i, G) ∩ N t (j, G) = ∅ and there is no free component intersecting both N t (i, G) and N t (j, G).Thus, the critical component in establishing the variance control ( 8) is to show that the large free components are rare in a typical frozen configuration, which enables us to argue that for most i, j ∈ V , X t i and X t j are (conditionally) independent.We formalize this idea in this subsection.We start with the definition of boundary profile and free component profile introduced in [28, Definition 3.2].Definition 2.5.Given (G, x), the unnormalized free component profile of x is the sequence (n f [G, x]) f∈F , where n f [G, x] is the number of free component f inside (G, x).The free component profile is then Finally, the (1-neighborhood) coloring profile of (G, x) is the collection of the boundary profile and the free component profile, which we denote by ξ For r > 0, let E r be the collection of free component profiles satisfying the exponential decay of frequencies in its number of variables with rate 2 −rk .That is, With slight abuse of notation, we also denote ξ = (B, In Section 3, we show that (G, x) is contained in the set E 1 4 and there are no multi-cyclic free components, i.e. a free component with more than one cycle, with high probability.
Lemma 2.6.For k ≥ k 0 and α ∈ (α cond (k), α sat (k)), we have that Moreover, we show that if p f [G, x ] f∈F ∈ E 1 4 holds, then our desired variance control (8) holds.Lemma 2.7.Consider a frozen configuration (G, x) which satisfy p f [G, x ] f∈F ∈ E 1  4 .Then, we have x] by the collection of (i, j) ∈ V 2 such that N t (i, G) ∩ N t (j, G) = ∅ and there exists no free component f ∈ F (x, G) which intersect both N t (i, G) and N t (j, G).Then, by Observation 2.2, we have that Note that for a fixed i ∈ V , there are at most (kd ) are formed by variables, not clauses), so there are at most v f (kd) t−1 number of such variables i ∈ V .Hence, it follows that where the last inequality holds since . Therefore, our claim (13) follows.

Tight concentration of free component profile
We now consider the bias control in (9).First, let us examine the case where t = 1.Then, N 1 (i, G) is always isomorphic to T d,k,1 , which consists of a single variable ρ and boundary half-edges adjacent to it (with no literals), thus is sampled u.a.r.among those which are coarsened to f.When f is a free tree, this probability can be evaluated using a belief propagation.
Therefore, if we establish the tight O(n −1/2 ) concentration of the number of frozen variables of (G, x) and the analogue ℓ 1 -type concentration on the free component profile {p f [G, x]} f∈F , then we can establish tight bias control (9) for the simplest case t = 1.In fact, the former concentration of the number of the frozen variables can be established using the concentration of the boundary profile B[G, x] around the optimal boundary profile B ⋆ using the results of [28].We review the definition of B ⋆ [28] in Section A below.
However, establishing O(n −1/2 ) concentration of the free component profile {p f [G, x]} f∈F in ℓ 1 -type distance poses significant challenges.Indeed, the results of [28] only imply a much weaker concentration in ℓ ∞ -distance of the free component profile with larger distance O( log n √ n ), and it is apriori not clear if the stronger ℓ 1 -type concentration can be established, let alone removing the log n factor to obtain the optimal fluctuation as in Theorem 1.3.Note that there are unbounded number of types of free components in a typical frozen configuration (G, x) 3Nevertheless, we show in Section 3 that by a delicate use of a local central limit theorem for triangular arrays [7] and the exponential decay of the free component profile (cf.Lemma 2.6), the free component profile concentrates in ℓ 1 -type distance on the optimal scale O(n −1/2 ).To be precise, consider the following distance of 2 coloring profiles We show in Section 3 that the following Theorem holds.

From 1-neighborhoods to t-neighborhoods
In this subsection, we discuss the ideas for establishing the bias control (9) for general t ≥ 2. Given (G, x), let σ ∈ C E be the corresponding component coloring (cf.Definition 2.3).Note that by Observation 2.2, the quantity ] is completely determined by the configuration of free components intersecting with N t (i, G).Further, note that σ e , e ∈ E(N t (i, G)) encodes the free component that e is contained in.Here, we emphasize that E(N t (i, G)) contains the boundary half-edges 12 below) with one exception: namely, when there is a free component f ∈ F (x, G) that intersects N t (i, G) twice, i.e. f ∩ N t (i, G) is disconnected.Fortunately, as shown below, this exceptional case is very rare and, thus, can be neglected.
Note that in the exceptional case described above, there must be a self-avoiding cycle, i.e. a cycle that does not self-intersect, within f ∪ N t (i, G).That is, consider 2 variables v 1 , v 2 that are in 2 disjoint connected components in f ∩ N t (i, G).Then, there must exist 2 distinct self-avoiding paths that connect v 1 , v 2 , where the first one lies within f and the second one lies within N t (i, G), and concatenating them produces a selfavoiding cycle.Observe that the length of this cycle might be as large as Θ k (log n) given the exponential decay in Lemma 2.6 due to the case where f is large.However, if we only count the edges that are not contained in free components, which we call boundary-transversing length, the number of such edges in the cycle is at most 2t, since v 1 and v 2 are connected with path of length no larger than 2t.The lemma below shows that such cycles are rare.Lemma 2.9.Given (G, x) = (V, F, E, L, x) and a cycle C inside the bipartite graph (V, F, E), define the boundary-transversing length of C by the number of edges in C that are not contained in free components of x.Denote N b cyc (2t; G, x) by the number of self-avoiding cycles whose boundary-transversing length is at most 2t.Uniformly over ξ = (B, Remark 2.10.Hereafter, we denote P ξ (resp.E ξ ) to be the probability (resp.expectation) with respect to the conditional measure Observation 2.2), P ξ is a uniform measure over (G, x) with the constraint ξ[G, x] = ξ.Thus, P ξ can be described in terms of a configuration model with colored edges.This fact is further explained in Observation 4.5 below.We remark that a similar idea was also used in [6].
Having Lemma 2.9 in hand, we focus our attention on the empirical measure of σ t (i, G), L t (i, G) i∈V .
Definition 2.11.We say ) is a valid (literal-reinforced-) t-coloring if it can be realized as σ t (i, G), L t (i, G) = (σ t , L t ) for some valid component coloring (G, σ) and i ∈ V .We take the convention that the 2 valid t-colorings (σ t ) ≡ (σ e ) e∈E in (T d,k,t ) , ℓ = 1, 2, are the same if there is an automorphism ϕ of T d,k,t such that σ ϕ(e) , e ∈ E in (T d,k,t ).Denote Ω t by the set of all valid t-colorings considered up to automorphisms.Then, the t-coloring profile of a valid frozen configuration (G, x) is the probability measure ν t ≡ ν t [G, x] on the set Ω t ⊔ {cyc} defined by Here, p z t (σ t , L t ) is defined as follows for Note that (σ t , L t ) not only determines the {0, 1, f} configuration on V (T d,k,t ), but also the hanging free trees on ∂T d,k,t .That is, for e ∈ ∂T d,k,t , σ e encodes the free component f e , if any, that e resides in.Assuming that all the free components f e are disjoint for e ∈ ∂T d,k,t , p z t (σ t , L t ) is the probability of obtaining z t when we independently sample for each f ∈ {f e } e∈E(T d,k,t ) , a nae-sat solution z V (f) ∈ {0, 1} V (f) uniformly at random among those which are coarsened to f.When {f e } e∈E(T d,k,t ) are all free trees, the value p z t (σ t ) can be expressed in terms of belief-propagation.
The main component in obtaining the bias control ( 9) is to enhance the concentration of coloring profile as stated in Theorem 2.8 to that of t-coloring profile.Theorem 2.14.For k ≥ k 0 and α ∈ (α cond (k), α sat (k)) there exists an explicit ν ⋆ t ≡ ν ⋆ t [α, k] ∈ P(Ω t ) in Definition 4.4 below, such that the following holds.For any ε > 0, C > 0, and t ≥ 1, there exists a constant K ≡ K(C, ε, k, t) > 0 such that uniformly over any vector w ∈ [−1, 1] Ωt∪{cyc} and ξ ∈ Ξ C , we have The proof of Theorem 2.14 is provided in Section 4. For the remainder of this subsection, we discuss the high-level ideas in proving Theorem 2.14.
The measure ν ⋆ t in Theorem 2.14 can be described by a so-called broadcast model based on ξ ⋆ (see Section 4.1).Indeed, this description is what we will use to prove Theorem 2.14.Let E ξ ν t ∈ P(Ω t ∪ {cyc}) be the conditional expectation given ξ of the random element ν t ≡ ν t [G, x ] in P(Ω t ∪{cyc}).Then, by Chebyshev's inequality, Theorem 2.14 is implied by uniformly over ξ ∈ Ξ C , where ν t ⊗ ν t ∈ P (Ω t ∪ {cyc}) 2 is the product measure of ν t 's.
We establish the first claim in ( 15) by coupling the measures E ξ [ν t ] and ν ⋆ t .Since E ξ [ν t ] can be described by a certain sampling with replacement procedure based on ξ and ν t ⋆ is the analog sampling without replacement procedure based on ξ ⋆ , we construct a coupling of the two procedures based on a coupling of ξ and ξ ⋆ .The O k,t (n −1/2 ) probability of error comes from the coupling of ξ and ξ ⋆ , since the difference between sampling with or without replcement only induces error of probability O k,t (n −1 ).Similarly, we establish the second claim in (15), where the dominant probability of error now comes from the latter difference.
In the coupling argument above, rare spins might cause problems.However, recalling Remark 2.4, if e ∈ E is contained in a free component, σ e = (f e , e) completely determines the colors of adjacent edges, (σ e ′ ) e ′ ∈δv(e)\e ⊔ δa(e)\e .Moreover, the condition ∥ξ − ξ ⋆ ∥ □ = o n (1) guarantees that the number of boundary spins {R, B, S} are at least Ω k (n).Therefore, Remark 2.4 rules out the difficulties coming from the rare spins, and this is the sole reason we work with the more complicated component colorings instead of the simpler 'coloring configuration' defined in [33].
Proof of Theorem 1.3.It suffices to prove (7) for any fixed (z t , L t ) ∈ {0, 1} V (T d,k,t ) × {0, 1} E in (T d,k,t ) .To this end, fix any ε > 0.Then, recalling the set Ξ C from Definition 2.13, there exists C = C(ε, k, t) such that Indeed, for large enough C > 0, Theorem 2.8 guarantees that ∥ξ[G, x] − ξ ⋆ ∥ □ ≤ Cn −1/2 holds with probability at least 1−ε/6, and since holds by tower property, Markov's inequality shows that E ξ N cyc (2t; G) ≤ C is also satisfied with probability at least 1 − ε/6.Now, with C = C(ε, k, t) that satisfies ( 16), Theorem 2.14 then shows that there exists where 12 for the definition of p z t (σ t , L t )), and p z t ,L t (cyc) ≡ 0.Moreover, the result of Appendix B in [33] imply that P t ⋆ defined in Section 1.4 satisfies P t ⋆ (z t , L t ) = σ t ∈Ct ν ⋆ t (σ t , L t )p z t (σ t , L t ).To this end, we define X fav ≡ X fav (ε, k, t) to be the favorable set of (G, x) which satisfies the following 4 conditions.
Then, Lemma 2.6, Lemma 2.9, and equations ( 16), (17) show that Moreover, because of the first condition ξ[G, x] ∈ E 1 4 , the variance control for X fav in (8) holds by Lemma 2.7.With regards to the bias control (9) x], then either v is included in a cycle of lengh at most 2t or N t (v, G) contains an {R, B, S}-colorededge (i.e. an edge not contained in a free component) in a self-avoiding cycle whose boundary-transversing length is at most 2t (see the paragraph above Lemma 2.9).Thus, we have for (G, x) ∈ X fav that Therefore, both conditions ( 8) and (9) holds on the set X fav with P((G, x) / ∈ X fav ) ≤ ε.By Chebyshev's inequality, this concludes the proof of (7), thus the proof of Theorem 1.3.

Concentration of coloring profile in ℓ 1 -type distance
In this section, we prove Lemma 2.6 and Theorem 2.8.While Lemma 2.6 follows from the results of [28,29,28], Theorem 2.8 requires a careful control on the frequencies of large free trees.Throughout, we consider k large enough so that the results of [28,29,28] hold and α ∈ (α cond (k), α sat (k)).
Proof.Fix ε > 0. By Theorem 1.1 of [28], there exists where In the proof of Theorem 1.1-(a) in [28] (see equation (3.79) therein), it was shown that We remark that in [28], they introduced a truncation of free and red variables, but this truncation only induces a difference that is exponentially small in n (see Lemma 2.12 of [28], or Lemma 3.3 of [33]).Thus, taking expectation in (18) shows that where we took C 1 ≡ C 1 (ε, α, k) small enough so that e (1−λ ⋆ )C1+C2 ≤ ε/3.We now establish the second condition of Γ(C).Let and similarly, let Z λ ⋆ ≡ Z λ ⋆ (G) defined by the same equation, but without the indicator term above.Then, it was shown in Proposition 3.5 of [28] that where the final estimate is from Corollary 3.6 and Theorem 3.21 of [28].Thus, proceeding in a similar manner as in (18), we have Therefore, this concludes the proof.
Proof of Lemma 2.6.For ε > 0, consider Γ(C) for C ≡ C(ε, α, k) from the conclusion of Proposition 3.1.Since Γ(C) contains the coloring profile with exponential decay and the absence of multi-cyclic free components, we have lim sup Since ε > 0 is arbitrary, the left hand side must equal 0.
Finally, the concentration of the boundary profile B[G, x] can also be established using the first moment estimates from Section 3 of [28].We first recall the following notation from [28]: for λ ∈ [0, 1] and s ∈ [0, log 2), where R(x) denotes the number of forcing edge in (G, x) and f(x) denotes the number of free variables in x.Similarly, we define the quantities Z tr λ and Z tr λ,s by the contribution of the sum in (19) from frozen configurations x that does not contain cyclic free components, i.

Concentration of free tree profile
Having established the concentration of boundary profile in Proposition 3.2, we now establish the concentration of free component profile as in Theorem 2.8.By Proposition 3.1, we have with high probability that the number of cyclic free components is at most log n and the largest free component is of size O k (log n), thus all the interest is in the concentration of the free tree profile.Proposition 3.8 in [28] shows that different free components are independent in the appropriate measure, which enables us to use local central limit theorem for triangular array [7].We first introduce the necessary notations: for a coloring profile ξ = (B, {p f } f∈F ), let Z λ [ξ] ≡ Z λ [B, {p f } f∈F ] denote the contribution to Z λ from frozen configuration x with coloring profile ξ[G, x] = ξ.Note that in order for Z λ [ξ] ̸ = 0, B and {p f } f∈F must be compatible in the following sense (see Definition 3.2 in [28]).
Then, the free component profile {p f } f∈F is compatible with the boundary profile B, which we denote by {p f } f∈F ∼ B, if for τ ∈ {B, S}, and we have f∈F where 1 denotes the all-1 vector.With slight abuse of notation, we call h ≡ h[B] ∈ R 3 determined from the equation ( 20) the induced boundary profile.We also remark that the last equality in ( 21) is actually redundant, since it is implied by the other equalities in (20) and (21).
Remark 3.4.If {p f } f∈F ∼ B and p f = 0 holds if f is multi-cyclic, then (21) implies that Thus, h • ≡ h • [B] determines the number of free trees t∈Ftr n t when there is no multi-cyclic free component.
We then define the set of labeled component L (f) of a free component f ∈ F (see Definition 2.20 in [28]).A labeled component f lab is obtained from f by adding additional labels on ithe half-edges and f as follows.For a ∈ F (f) (resp.v ∈ V (f) ), arbitrarily label half-edges adjacent to a (resp.v) by 1, ..., k (resp.1, ..., d).If f is cyclic then add an additional label by first choosing a spanning tree of f and labeling their edges by "tree".We consider f lab up to graph and label isomorphism, so |L (f)| counts the number of labeled components up to such isomorphism.If we let T f be the number of spanning trees of f (T t = 1), then, the embedding number of f is defined by , where v(σ) for σ ∈ {R, B, S} k is defined in (65) below.
Remark 3.6.There is a slight difference between Proposition 3.5 and Proposition 3.7 of [28] because of the difference of the notion of 'free tree'.Namely, the 'free tree' in [28] corresponds to an equivalence class of our notion of free trees.However, the proof of Proposition 3.7 of [28] proceeds exactly by first establishing Proposition 3.5, and collapsing the equivalence class of free trees.
We now define the optimal free tree profile and optimal coloring profile ξ ⋆ used in Theoerem 2.8.
Definition 3.7.For a free tree t ∈ F tr , the optimal free tree profile is {p ⋆ t } t∈Ftr , which is defined by where the quantities Z⋆ , Ż ⋆ , Ẑ ⋆ , q⋆ (B 0 ), q⋆ (S), which only depend on (α, k), are defined in Definition A.12.
For a cyclic free component f ∈ F \ F tr , we define p ⋆ f = 0.Then, with the optimal boundary profile B ⋆ in Proposition 3.2, we define the optimal coloring profile ξ ⋆ to be ξ ⋆ ≡ (B ⋆ , {p ⋆ f } f∈F .An important observation is that p ⋆ t can be expressed by where the vector [28] that the optimal free tree profile {p ⋆ t } and the optimal boundary profile B ⋆ are compatible.That is, h ⋆ ≡ (h ⋆ B0 , h ⋆ B1 , h ⋆ S ) and h ⋆ • induced from B ⋆ by the equations ( 20) and (22) The following proposition shows the ℓ 1 -type concentration of the free tree profile conditional on the (B, s, {p f } f∈F \Ftr ) ∈ Ψ typ , where Ψ typ ≡ Ψ typ (C) is defined by the set of (B, s, {p f } f∈F \Ftr ) which satisfy Proposition 3.9.For any ε > 0 and a constant C > 0, there exists another constant C 0 , which only depends on ε, C, α, k, such that uniformly over (B, s, {p f } f∈F \Ftr ) ∈ Ψ typ (C), Remark 3.10.Note that by the definition of Ψ typ in (24), every cyclic free component has one cycle and has variables v f < log n.Since every clause in a free component must have at least 2 full-edges adjacent to them (otherwise, the clause is forcing), it follows that 2f f ≤ e f .For f ∈ F having at most one cycle, this implies that f f ≤ e f .In particular, we may assume that f f < log n holds if p f ̸ = 0 and (B, s, {p f } f∈F \Ftr ) ∈ Ψ typ .
We now discuss the main ideas behind the proof of Proposition 3.9.By Proposition 3.5, we can express4 where the sum in the denominator in the rhs is restricted to sum over {p t } t∈Ftr which satisfy {p f } f∈F ≡ {{p t } t∈Ftr , {p f } f∈F \Ftr } ∼ (B, s), where The rightmost condition comes from the fact that we defined Z λ,s in (19) as the contribution to Z λ from x such that size(x, G) = f∈F w f ∈ [e ns , e ns+1 ).Thus, apriori, the rhs of ( 26) is a probability of a configuration of free trees conditional on a large deviation event.A classical tool in large deviation theory [12] is to introduce an exponential scaling factor to move from the large deviation regime to a moderate deviation regime.Since we are considering ∥B − B ⋆ ∥ 1 = O(n −1/2 ), we can use the optimal scaling factor θ ⋆ from (23) (see Remark 3.8).To this end, we introduce a probability distribution over the free trees and let X 1 , ..., X nh• be a i.i.d.sample from P θ ⋆ (•), where h • ≡ h • [B] is defined in (22).Recall from Remark 3.4 that when p f = 0 for multi-cyclic f, {p f } f∈F ∼ B implies that the total number of free trees is given by nh • .Thus, from the identity ( 26), the ratio of interest in ( 25) can be expressed by where the event A ≡ A (B, s, {p f } f∈F \Ftr ) regarding X 1 , ..., X nh• is given by Observe that the event A is a moderate deviation event for (B, s, where the first inequality holds since for f that contains at most one cycle, b f 1 ≲ k v f holds (cf.Remark 3.10), and the second inequality holds since B → h(B) is O k (1)-Lipschitz (see (20) and ( 22)).Similarly, n holds.Thus, local central limit theorem with triangular arrays [7] show that uniformly over (B, s, {p f } f∈F \Ftr ) ∈ Ψ typ , Therefore, if in the rhs of ( 27), the sum were replaced by a supremum over free trees and C 0 could be allowed to grow with n, say C 0 = Ω(log n), then we could resort to concentration inequalities (e.g.Chernoff bounds) to show the conditional probability of interest is less than ε.Indeed, this was the strategy taken in [28] (see e.g.Section 6 therein).However, to obtain the ℓ 1 -type control as in Proposition 3.9, we need to take acount for the sum over free trees and conditioning on A in (27) more carefully.Our strategy is to first divide the F into typical and atypical set of free trees: for n ≥ 2, let For each free tree t ∈ F typ tr ⊔ F atyp tr , we assign a cost ε t that is summable, and bound the probability that the empirical count of t among X i 's deviate from p ⋆ t by distance εt √ n conditional on A .For atypical trees t ∈ F atyp tr , we design such cost ε t crudely based on a Chernoff bound (cf.Lemma 3.11).On the contrary, for typical trees t ∈ F typ tr , we assign ε t carefully by showing that the conditioning on A has a negligible effect on the probability of interest for typical trees, which we argue by a local central limit theorem for triangular arrays [7] (cf.Lemma 3.12).
We first start with the easier case of atypical trees.A crucial fact that we use throughout the proof is that p ⋆ f ∈ E 1 2 holds from Lemma 3.13 in [28].Lemma 3.11.For A ≡ A (B, s, {p f } f∈F \Ftr ) in (28) and C > 0, we have uniformly over (B, s, {p f } f∈F \Ftr ) ∈ Ψ typ (C) that Proof.Since P θ ⋆ (A ) ≳ k n −2 uniformly over (B, s, {p f } f∈F \Ftr ) ∈ Ψ typ by (29), it suffices to show that the unconditional probability in ( 30) is ≪ n −2 .Throughout, we abbreviate where where the second inequality is due to {p ⋆ f } ∈ E 1 2 .Thus, we can use Chernoff's bound in the rhs of (31) to have Again, using the fact p ⋆ t ≤ 2 − kv 2 , the rhs above is at most exp − Ω 2 kv 3 /(log n) 2 .Therefore, takes care of the case v(t) ≥ log n k log 2 .Let us now consider the case where v(t) ≤ log n k log 2 and p ⋆ t ≤ n −1/2 .Note that since L v ≤ (Ck) v , the number of such trees is at most n O( log k k ) .Thus, we have for some universal constant C ′ > 0, There are at most 4 v+f number of isomorphism classes of trees with v + f nodes (see Section 9.5 in [15]).The factor k v comes from assigning the number of spin-labels {0, 1} to the clauses in f which have boundary-half edges.

Note that if p
for large enough k.Thus, using Chernoff's bound in the rhs above, it follows that The estimates (32) and ( 33) conclude the proof.
Next, we consider the more delicate case of typical trees.Lemma 3.12.For C > 0, there exist constants C ≡ C(C, α, k) and C k , such that the following holds.For any t ∈ F typ tr , and ε t > 0, we have Proof.We first show that the fraction (nh ) is bounded away from 1 w.h.p.. To this end, for t ∈ F typ tr , denote the random set of indices such that X i = t by Then, by a Chernoff bound, Thus, Note that we can express Having (35) in mind, we bound the rhs above by sup where . We now argue that R ≲ k 1 by a local central limit theorem: note that the distribution of X 1 , ..., X nh• given I t = I are i.i.d from the distribution

By local central limit theorem for triangular arrays, we have for A
Recalling that P θ ⋆ (A ) ≳ k n −2 holds (cf. ( 29)), we therefore have that R ≤ C for some C ≡ C(C, α, k).
Plugging this estimate into (36) concludes the proof.
Having Lemmas 3.11 and 3.12 in hand, we now prove Proposition 3.9.
Proof of Proposition 3.9.Fix C > 0 and ε > 0. By Lemma 3.11, it suffices to restrict our attention to the typical trees.That is, recalling the identity (27), it suffices to show that there exists To prove (37), we assign a cost ε t to each typical tree t ∈ F typ tr .We choose Thus, Lemma 3.12 shows that the lhs of (37) is bounded by where C ≡ C(C, α, k) is from Lemma 3.12.An important observation is that holds by Lemma 3.13 in [28].Thus, |F typ tr | ≤ (Ck) 1) .Therefore, altogether, (38) The final step is to bound the rhs above by a Chernoff bound.Choose where the first inequality holds for some by definition of Ψ typ (C) in (24).Thus, we have by a Chernoff bound that where C > 0 denotes a universal constant.If we denote the rhs above by f (C ′ 0 ), then f does not depend on any other parameters and clearly satisfy lim C ′ 0 →∞ f (C ′ 0 ) = 0. Therefore, taking C ′ 0 to be large enough so that f (C ′ 0 ) ≤ (2 C) −1 ε concludes the proof by ( 38) and (39).
By a corollary of Proposition 3.2 and Proposition 3.9, we have the following.
Corollary 3.13.Recall the set of coloring profile Ξ 0 from Proposition 3.1.Then, for any ε > 0, there exists Proof.For ξ ∈ Ξ 0 , the quantity ∥ξ − ξ ⋆ ∥ □ defined in ( 14) can be bounded by since by Remark 3.10, f f ≤ v f holds for non multi-cyclic free component f, and 4 .Thus, by taking C 0 ≡ C 0 (ε, α, k) large enough, Proposition 3.2 shows that for some Here, we used the fact that p ⋆ f = 0 for cyclic free components f (cf.Definition 3.7).Note that . Finally, it is straightforward to establish Theorem 2.8 from Proposition 3.1 and Corollary 3.13.
Proof of Theorem 2.8.By Proposition 3.1, there exist where the extra o n (1) comes from the truncation of the number of free variables and forcing edges in the definition of Z λ ⋆ ,s in (19) (cf.Lemma 2.17 in [33]).By Corollary 3.13, the sum in the rhs above can be made small enough compared to s∈[s•(C1),s•(C2)] EZ λ ⋆ ,s .Moreover, Theorem 3.22 in [28] shows that uniformly over 4 Coupling t-neighborhoods to a broadcast process on a tree In this section, we prove Lemma 2.9 and Theorem 2.14.We mostly focus on the proof of Theorem 2.14, and the proof of Lemma 2.9, which is based on a first moment estimate, is provided at the end of this section.Throughout, we let (G, x) be a random frozen configuration drawn with probability proportional to its size (cf.(a) of Observation 2.2).As mentioned in Section 2.4, the key idea in proving Theorem 2.14 is to construct a coupling between the measures E ξ ν t [G, x] and ν ⋆ t , and another coupling between Proposition 4.1.There exists an explicit ν ⋆ t ≡ ν ⋆ t [α, k] ∈ P(Ω t ) such that the following holds: for any t ≥ 1 and C > 0, there exists a constant K ≡ K(C, k, t) such that Proposition 4.1 clearly implies Theorem 2.14, and we include the proof for completeness.
Thus, the rest of this section is devoted to the proof of Proposition 4.1 except for a brief moment where we prove Lemma 2.9.

A broadcast process with edge configurations
In this subsection, we define the optimal t-coloring profile ν ⋆ t ≡ ν ⋆ t [α, k] based on the optimal coloring profile ξ ⋆ ≡ ξ ⋆ [α, k] in Definition 3.7.The following notations will be useful throughout.
We say τ ∈ C d (resp.(τ , L ′ ) ∈ (C × {0, 1}) k ) is valid if it can be realized as σ δv = τ (resp.(σ δa , L δa ) = (τ , L ′ )) for a valid component coloring (V, F, E, L, σ) and v ∈ V (resp.a ∈ F ).We say τ ∈ C k is valid if there exists L ′ ∈ {0, 1} k such that (τ , L ′ ) is valid.We denote the collection of valid τ ∈ C d such that τ ̸ ⊂ {R, B, S} d by Ċf , and denote the collection of valid τ ∈ C k such that τ ̸ ⊂ {R, B, S} k by Ĉf Note that for τ ∈ Ċf and σ δv = τ , then the free component f(τ ) ∈ F that contains v is determined solely with τ .Similarly, for τ ∈ Ĉf , the free component f(τ ) induces is well-defined.Conversely, given a free component f ∈ F , v ∈ V (f), and a ∈ F (f), the component colorings σ δv ∈ C d and σ δa ∈ C k are determined up to a permutation of the coordinates.For τ ∈ Ċf , define the multiplicity of τ by ṁ(τ ) := {v ∈ V (f(τ )) : where per(τ ) denotes the set of permutations τ .Similarly, the multiplicity of τ ∈ Ĉf is defined by Remark 4.2.Let σ ∈ C E be a valid component coloring corresponding to (G, x).Note that for τ ∈ Ċf , the quantity ṁ(τ ) • p f(τ ) [G, x] equals the fraction of variables such that σ δv ∈ per(τ ).A similar remark can be made for τ ∈ Ĉf .On the other hand, the empirical profile of spins adjacent to frozen variables and separating clauses is encoded by B[G, x].Thus, the coloring profile ξ = ξ[G, x] alone determines the spin profiles {σ δv } v∈V and {σ δa } a∈F up to a permutation of the coordinates, which serves as a useful fact throughout this section.
We now introduce the broadcast model on an infinite (d, k)-regular tree with edge configurations.
In the last step above, we remark that when σ δa ∈ Ĉf , then there exists a unique L δa ∈ {0, 1} k such that (σ δa , L δa ) is valid since the non-boundary component coloring carries the literal information.
The compatibility {p ⋆ f } f∈F ∼ B ⋆ (cf.Remark 3.8) guarantees that Ḣ⋆ and Ĥ⋆ have total mass 1 and have the same marginals.Consider the sample (σ ⋆ t , L ⋆ t ) drawn from the broadcast process with channel ( Ḣ⋆ , Ĥ⋆ ).Then, the optimal t-neighborhood coloring profile ν ⋆ t ≡ ν ⋆ t [α, k] is the distribution of (σ ⋆ t , L ⋆ t ) ∈ Ω t , considered up to automorphisms as in Definition 2.11.

Coupling based on a configuration model
We now prove Proposition 4.1.We focus on the estimate Then, note that E ξ ν t [G, x] ∈ P(Ω t ⊔ {cyc}) is the law of σ t (v, G), L t (v, G) considered up to automorphisms where (G, σ) ≡ (V, F, E, L, σ) ∼ P ξ and v ∼ Unif(V ).Here, we take the convention that if N t (v, G) contains a cycle, then σ t (v, G), L t (v, G) ≡ cyc.Moreover, the following observation, which follows from Remark 4.2, shows that (G, σ) ∈ P ξ is a sample from a configuration model.It serves as our main intuition in the construction of the coupling.
, where (G, σ) ∼ P ξ and v ∼ Unif(V ), is an exploration process, which can be described in a breadth-first manner as follows.(c) By the previous step, we obtain a 2−neighborhood around v (not necessarily a tree) with the boundary half-edges hanging on clauses.For each boundary half-edges, we repeat the same procedure.We repeat this process until depth 2t to obtain the 2t − 3 2 neighborhood of v denoted by N t (v), and a coloring configuration on N t (v) denoted by σ The description of the exploration process above is a sampling without replacement version (due to the pairs of half-edges already matched) of the broadcast process with the channel ( Ḣsy , Ĥsy ) given as follows: for ξ = (B, {p f } f∈F ), the symmetrized coloring profile of ξ is defined by ( Ḣsy , Ĥsy ) ≡ ( Ḣsy The only difference between the exploration process E ξ ν t [G, x] and the broadcast model with channel ( Ḣsy , Ĥsy ) is that during the process when the boundary half-edge e has spin σ e ∈ {R, B, S}, then the distribution of the spins of the children edges has a slight tilt compared to Ḣsy or Ĥsy due to the pairs of half-edges already matched. 7The following lemma shows that this tilt is O k,t (n For any constant C > 0, there is a constant K(C 0 , t, k) > 0 such that for ( Ḣsy , Ĥsy ) Proof.By definition of ( Ḣsy , Ĥsy ) ≡ ( Ḣsy [ξ], Ĥsy [ξ]) in (48) and Definition 4.4 of Ḣ⋆ , Ĥ⋆ , we have by triangular inequality that where the equality holds since 44), and the last inequality is by Definition 2.13 of ξ ∈ Ξ C .Analogously, we have that Note that the equations above also imply that d TV ( Hsy , H⋆ ) ≤ C √ n holds, where Hsy is the marginal of ( Ḣsy , Hsy ) ≡ ( Ḣsy [ξ], Hsy [ξ]) and H⋆ is the marginal of ( Ḣ⋆ , Ĥ⋆ ).Moreover, the compatibility {p shows that the H⋆ at the boundary spins τ ∈ {R, B, S} is given by H⋆ (τ ) = B⋆ (τ ).From the definition of B⋆ in Definition A. ).Next, we sequentially reveal the spins and the literals associated with the 'children half-edges' of a boundary half-edge.That is, for each boundary half-edge e adjacent to a variable, we reveal the half-edge e ′ that is matched with e, and reveal the spins and literals associated with children half-edges δa(e ′ ) \ e ′ .If boundary half-edge e is adjacent to a clause, we reveal the half-edge e ′ matched with e and only reveal the spins associated with children half-edges δv(e ′ ) \ e ′ .This procedure is carried out by utilizing a breadth-first search for both the neighbors of v and ρ.
At time ℓ ≥ 1, denote the revealed neighborhood of v by N ℓ (v) and the revealed neighborhood of ρ by N ℓ (ρ).We let ∂N ℓ (v) and ∂N ℓ (ρ) be the set of boundary half-edges of N ℓ (v) and N ℓ (ρ).Also, denote the revealed spins (resp.literals) in N ℓ (v) by σ N ℓ (v) (resp.L N ℓ (v) ) and the revealed spins (resp.literals) in N ℓ (ρ) by σ ⋆ N ℓ (ρ) (resp.L ⋆ N ℓ (ρ) ).Then, define the event E ℓ of success by Now, suppose at time ℓ+1, we take a boundary half-edge e ∈ ∂N ℓ (v) adjacent to a variable v(e) ∈ N ℓ (v), and reveal the connection of e, and the spins and literals of children half-edges of e.Note that the probability of creating a cycle by revealing the connection of e is O k,t (n −1 ) since a priori, the probability of having a cycle in N t (v, G) is O k,t (n −1 ) by definition of Ξ C .Moreover, if the spin at e, σ e , is free, i.e. σ e / ∈ {R, B, S}, then the spins and literals of children half-edges δa(e) \ e is completely determined by σ e (cf.Remark 2.4).
On the other hand, if σ e ∈ {R, B, S}, then conditioned on N ℓ (v) and σ N ℓ (v) , σ δa(e) is drawn from Ĥsy ℓ1, l2 More precisely, ℓ 1 is the number of edges e ′ in N ℓ (v) that have spins σ e ′ = σ e , and l2 (σ) is the number of clauses a in N ℓ (v) that have spin neighborhood σ δa = σ (up to a permutation) times the number of σ e in σ.In particular, note that ℓ 1 , l2 1 ≤ (kd) 2t holds.
We assume that S l and S l+1 , where S K+1 ≡ S 1 , are adjacent in the sense that the right endpoint in S l , which is either v(e j l ) or a(e j l ), and the left endpoint in S l+1 , which is either v(e i l+1 ) or a(e i l+1 ), lies in the same free component.Also, we denote the length of S l by L l for 1 ≤ l ≤ K. Then by definition of the boundary-transversing length, we have that K l=1 L l = 2t holds.In particular, we have that K ≤ 2t.Observe that the cycle C is 'almost' determined by the boundary segments (S l ) l≤K .Namely, since all cyclic free components have at most one cycle, there are at most 2 paths from the right endpoint of S l to the left endpoint of S l+1 .Therefore, if we fix the configuration of boundary segments (S l ) l≤K , there are at most 2 K ≤ 2 2t corresponding self-avoiding cycles whose boundary-transversing length is 2t.Now suppose that e i is contained in a boundary segment.Then, by its definition, either v(e i ) is frozen or a(e i ) is separating (both statements hold when v(e i ), a(e i ) are not endpoints).Without loss of generality, suppose v(e i ) is frozen.Then, the revealed spin neighborhood of v(e i ) must be τ = (τ l ) l≤d ∈ {R, B} d .Since we have min τ ∈{R,B,S} B(τ ) ≳ k 1, the number of edges with spin τ l is Ω k (n) for every l ≤ d.Thus, the probability of having an edge e i = (v(e i ), a(e i )) under u.a.r.matching of the half-edges with the same spins is at most O k (n −1 ).Therefore, the probability of containing a specific configuration of a boundary segment S is O k (n −L(S) ), where L(S) denotes the of S.
Moreover, note that the number of choosing variables and clauses involved in (S ℓ ) 1≤ℓ≤K is at most O k,t (n 2t (log n) K ), which can be argued as follows.The number of choosing variables and clauses involved in S 1 is O k (n L1+1 ).Then, since the left endpoint of S 2 must lie in the same free component as the right endpoint of S 1 , and the largest free component has O k (log n) number of variables and clauses (cf.ξ ∈ E 1 4 ), there are at most O k (n L2 log n) number of choosing the variables and the clauses involved in S 2 .By repeating this and noting that the right endpoint of S K must lie in the same free component as in the left endpoint of S 1 , the total number is O k,t (n 2t (log n) K ).
To conclude, we have shown that the probability of containing boundary segments (S l ) l≤K is at most O k (n −2t ), and number of choosing the (S l ) l≤K is at most O k,t (n 2t (log n) K ).Also, K ≤ 2t and there are 2 2K number of self-avoiding cycles whose boundary-transversing length is 2t.Therefore, it follows that E ξ [N b cyc [2t; G, x] ≲ k,t (log n) 2t ≪ n 1/4 hold uniformly over ξ = (B, {p f } f∈F ) such that {p f } f∈F ∈ E 1 4 , p f = 0 if f is multi-cylcic, and ∥B − B ⋆ ∥ 1 ≤ n −1/3 , which concludes the proof.
In fact, the following lemma shows the relation between the frozen and message configurations.We refer to [33], Lemma 2.7 for its proof.It turns out that ṁ[⋆], m[⋆] can be arbitrary measures for our purpose, and hence we assume that they are uniform measures on {0, 1}.
The equations (57) and (58) are known as belief propagation equations.We refer the detailed explanation to Section 2 of [33] where the same notions are introduced, or to Chapter 14 of [19] for more fundamental background.From these quantities, we define the following local weights.Note that if σ = S, then φ( σ, σ) = 2 for any σ.The rest of the details explaining the compatibility of φ and Φ can be found in [33], Section 2.4.Then, the formula for the cluster size we have seen in Lemma A.4 works the same for the coloring configuration.Then, we have size(x; G) = w lit G (σ).Among the valid frozen configurations, we can ignore the contribution from the configurations with too many free or red colors, as observed in the following lemma.

Figure 1 :
Figure adapted from

Figure 2 :
Figure 2: T d,k,t for d = k = 3 and t = 2. Variables and clauses are drawn by the circular and square nodes, respectively.The boundary half-edges in ∂T d,k,t are highlighted blue.

.
The boundary profile of (G, x) is the tuple B[G, x] ≡ ( Ḃ, B, B) defined as follows.Let σ ∈ C E be the component coloring corresponding to x in Definition 2.3.Then, Ḃ, B, and B are respectively measures on {R, B} d , {R, B, S} k and {R, B, S} defined as ) , e ∈ E(T d,k,t ), and L

Definition 2 . 13 .
For C > 0, let Ξ C ≡ Ξ C,n be the set of coloring profile ξ which satisfy both ∥ξ − ξ ⋆ ∥ □ ≤ C √ n and E ξ N cyc (2t; G) ≤ C, where N cyc (2t; G) denotes the number of cycles in G of length at most 2t.

Definition 3 . 3 .
For a free component f, let b f (B x ) for x ∈ {0, 1} count the number of boundary half-edges e ∈ ∂f such that the spin-label equals x.Thus, b f (B 0 ) + b f (B 1 ) = | ∂f| holds.Further denote b f (S) = | ∂f|.We define the vector b f ∈ N 3 associated with f ∈ F as b
(a) For τ ≡ (τ 1 , . . ., τ d ) ∈ {R, B, S} d , consider n Ḃ(τ 1 , . . ., τ d ) number of variables and assign to each of the variables the spin τ i to its i'th half-edge, 1 ≤ i ≤ d.For τ ∈ Ċf , consider n • ṁ(τ ) • p f(τ ) number of variables and assign to each of the variables the spin τ π(i) to its i'th half-edge, 1 ≤ i ≤ d, where π ∈ S d is a u.a.r.permutation.The total number of variables considered is n by compatibility (cf.Definition 3.3).Then, permute the location of the considered variables u.a.r., i.e. assign a random order to the n variables.(b) Similarly, for τ ∈ {R, B, S} k , consider m B(τ ) number of clauses with its neighboring spin τ , and for τ ∈ Ĉf , consider n • m(τ ) • p f(τ ) with its neighboring spins a u.a.r.permutation of τ .Then, permute the location of the considered clauses u.a.r.. (c) Subsequently, match the half-edges adjacent to variables and the half-edges adjacent to clauses among those that have the same spin in a uniformly random manner.(d)Finally, we draw the literals: for each clause a ∈ F , independently draw L δa ∈ {0

5 . 4 . 7 .
x].Thus, for the purpose of proving Proposition 4.1, we may assume that the spins τ ∈ {R, B, S} d and τ ∈ {R, B, S} k are permuted u.a.r.like the spins τ ∈ Ċf and τ ∈ Ĉf in the Steps (a) and (b) in Observation 4.Remark Note that Observation 4.5 and Remark 4.6 show that

( a )
Consider n variables with d half-edges hanging and m clauses with k half-edges hanging.Each halfedge is assigned a color σ ∈ C by the Steps (a), (b) in Observation 4.5 and Remark 4.6.Then, pick a variable v uniformly at random among n variables.(b) Let (e 1 , ..., e d ) = δv be the half-edges adjacent to v.For e 1 match it with a half-edge hanging on a clause u.a.r.among those which have the same color σ e1 .Similarly, match e 2 with a half-edge u.a.r.among those which have the same color and have not been matched, i.e. e 2 cannot be matched with the half-edge that e 1 has been matched with.Repeat the same procedure for e 3 , ..., e d , i.e. match them sequentially u.a.r.among those which have the same color and have not been matched previously.