Locality Bounds for Sampling Hamming Slices

Spurred by the influential work of Viola (Journal of Computing 2012), the past decade has witnessed an active line of research into the complexity of (approximately) sampling distributions, in contrast to the traditional focus on the complexity of computing functions. We build upon and make explicit earlier implicit results of Viola to provide superconstant lower bounds on the locality of Boolean functions approximately sampling the uniform distribution over binary strings of particular Hamming weights, both exactly and modulo an integer, answering questions of Viola (Journal of Computing 2012) and Filmus, Leigh, Riazanov, and Sokolov (RANDOM 2023). Applications to data structure lower bounds and quantum-classical separations are discussed.


INTRODUCTION
Historically, complexity theory has been dominated by research determining the complexity of computing particular functions.Following the seminal work of Viola [26] (with preliminary ideas appearing in [1,13]), the past decade has seen a rise in study on the complexity of sampling particular distributions [3,6,9,11,14,22,26,[29][30][31].In this setting, the goal is to construct a circuit whose input is an in nite string of unbiased, independent random bits and whose output is a distribution close in total variation distance to a speci ed distribution.As a standard motivating example, consider the parity function.Håstad's agship result [16] shows that AC 0 circuits need exponentially many gates to compute parity.However, one can easily sample pairs of the form ( , parity( )), where each output bit depends on just two input bits: output ( 1 ⊕ 2 , 2 ⊕ 3 , . . ., −1 ⊕ , ⊕ 1 ) on input 1 , 2 , . . .[2,4].In fact, AC 0 circuits can even sample ( , ( )) for more complicated functions, such as inner product [17] and symmetric functions [26].
Let : {0, 1} → {0, 1} be a Boolean function of output bits.Additionally, let U be the uniform distribution over {0, 1} and (U ) be the output distribution of given a uniform input.We say is -local if every output bit of depends on at most input bits.For constant , this captures NC 0 circuits, and more generally, -local functions encompass circuits of depth log( ) and bounded fan-in.In his in uential paper, Viola [26] considered the problem of determining locality lower bounds for Boolean functions where (U ) approximates one of the following distributions: for some prime = Θ(log( )).
In particular, an Ω(log( )) locality lower bound was proved for both distributions 1 , conditional on 's input domain being su ciently small ( = (1+ (1))• ).Additionally, Viola gave results without this restriction, but with a worse distance bound decaying exponentially in the locality .Towards a full locality-distance trade-o , Viola asked whether lower bounds could be obtained with strong error bounds and no input length restriction.Later, the similar bound Ω(log( / )) was discovered for D by Filmus, Leigh, Riazanov, and Sokolov [11], answering the question for small .To complement this result, Viola proved an Ω(log( )) bound in the case of non-dyadic2 / [31]. 3owever, Viola's question in the general regime remained open.
More recently, building on Viola's results on the distribution M , Watts and Parham [32] proved an input-independent separation between QNC 0 and NC 0 circuits of restricted domain size.Such domain conditions boil down to the exact domain conditions of the locality bounds for M in Viola's analysis, rendering their result only a partial separation.

Our Results
We resolve Viola's question in the a rmative by providing nontrivial locality lower bounds on D in the = Θ( ) regime, as well as for M , with no restrictions on the domain size.
Formally, we say a distribution P is -far (resp., -close) from a distribution Q if the total variation distance is at least (resp., at most ).Our results provide a wide range of trade-o s between locality and distance.For readability, we present them here with some particular parameter choices, but see https://arxiv.org/abs/24 02.14278 for their full generality and proofs.We remark that the above theorems hold in a stronger setting where is fed with some arbitrary binary 4 product distribution as an input.Taking advantage of the fact that the input is actually unbiased coins, we prove the following bound.
where is a multiple of 3. [31] (see Footnote 3), which for su ciently large eclipses Theorem 1.3 (as well as Theorem 1.1 and Theorem 1.2 when / is non-dyadic).However, the proof in [31] notes the √ term can be optimized, so the exact trade-o between bounds depends on the extent this optimization is possible.Given these similar bounds, we view our primary contribution here to be the generality of our statements, as they have no dyadic restrictions.Towards lower bounds for the distribution M , we introduce the following notation.Let be an integer and let Λ ⊆ Z/ Z be a non-empty set, where Z/ Z = {0, 1, . . ., − 1}.We de ne the distribution D ,Λ to be the uniform distribution over ∈ {0, 1} conditioned on

The above theorem (as well as a version with adaptivity) appears with the bound
where Λ even is the set of even numbers in Λ and Λ odd is the set of odd numbers in Λ.
We highlight that Theorem 1.4 is essentially tight for any choice of and Λ, and we include a more thorough discussion in the full version of our paper.
1.1.1Data Structure Lower Bounds.It is an active line of research to determine optimal bounds for succinct data structures: structures that store their data close to the information theoretic limit, while still including su cient redundancy to allow for e cient and meaningful queries [3,12,19,22,23,25,26,30].We will focus on the following setting of a binary alphabet and bit probes.
De nition 1.5 (Dictionary Problem).Let H ⊆ {0, 1} and , ∈ N. The dictionary problem of H with parameters and asks for a pair of algorithms A and B such that the following holds: • Given an arbitrary ∈ H , A produces a data structure str ∈ {0, 1} .• Given access to str ∈ {0, 1} , for every query ∈ [ ], B produces an answer ∈ {0, 1} with (adaptive / non-adaptive) bit probes (i.e., number of bits read) to str.• When str = str , we have = for all ∈ [ ].
We remark that this setting is static, in contrast to the dynamic setting where the data structure needs to support updates to the underlying input .As a weaker model, proving static lower bounds has traditionally been much more di cult than dynamic lower bounds.The locality-distance trade-o provides a useful tool in establishing trade-o s between parameters in the static dictionary problem.
Claim 1.6 ([26, Claim 1.8]).Suppose we can solve the dictionary problem of H ⊆ {0, 1} with parameters , and non-adaptive (resp., adaptive) queries.Then there exists a -local (resp., Combining Claim 1.6 with our results, we obtain a number of lower bounds.Here we highlight the most interesting ones, and interested readers are encouraged to instantiate more on their own. 5orollary 1.7 (Via Theorem 1.4).Let where ≥ 3 is an odd constant.The dictionary problem of H needs either = bits of storage or = (1) bit probes per query.
Note that the information theoretic limit is ⌈log(|H |)⌉ = − ⌊log( )⌋.On the other hand, the trivial data structure that simply stores in bits can support every membership query by a single bit probe.Hence Corollary 1.7 shows that the only e cient data structure using constant probes is the trivial one.We can obtain a similar sharp trade-o for even by adding a modulus to re ect Theorem 1.4's di erent quantitative behavior for odd and even moduli.
Corollary 1.8 (Via Theorem 1.4).Let where ≥ 3 is an even constant.The dictionary problem of H needs either = bits of storage or = (1) bit probes per query.
In the case of where / is non-dyadic, the results of [25] are superior to those we can obtain via our techniques.Speci cally, they show that the dictionary problem of H with (adaptive) queries requires at least log(|H |) + 2 − ( ) − log( ) bits of storage.However, our generality allows us to prove trade-o s in the dyadic setting.
Corollary 1.9 (Via Theorem 1.2).Let where is a multiple of two.The dictionary problem of H needs either = − (1) bits of storage or = (1) bit probes per query.The previous best result in the setting of Corollary 1.9 is [26], which gives an = − 0.01 log( ) versus = Ω(log( )) trade-o .Our Corollary 1.9 improves the storage bound to optimal (ignoring the hidden constant in (1)) at the cost of a worse bit probe bound.It remains an interesting question whether one can get the best of the two results that further improves our bit probe bound to Ω(log( )) without weakening the − (1) storage bound.
Meanwhile we note that it is impossible to get an -vs-(1) trade-o as in Corollary 1.7 and Corollary 1.8.Here is a simple data structure of = ( − 1)-bit of storage and using = 2 bit probes per query for H in Corollary 1.9: for each ∈ [ − 1], store the pre x sum str [ ] = 1 ⊕ • • • ⊕ .For = , the pre x sum is precisely /2 mod 2 that we do not need to store.Then every query can be answered by the parity of the pre x sum up to and − 1.It would be interesting to determine if a similar structure exists with fewer than ( − 1)-bits of storage while maintaining (1) bit probes per query.
The results compared here are by no means a complete list of data structure lower bounds for the dictionary problem.In particular, there are many results on cell probe lower bounds (e.g., [23]), the dynamic dictionary setting (e.g., [20]), and other natural choices of H (e.g., [25]).We refer interested readers to [26] for a detailed discussion.
1.1.2Input-Independent antum-Classical Separation.A driving research direction in quantum computing is exhibiting separations between quantum and classical complexity.In the theme of our paper, we consider the problem of devising distributions that quantum circuits can e ciently sample, whereas classical circuits cannot.Note that such a separation does not rely on a particular input.Instead, the quantum circuit is fed with a xed initial state (ideally |0⟩ ), and each qubit is measured in the computational basis at the end to produce the desired distribution over {0, 1} .Meanwhile, a classical circuit, which has output bits, has access to unbiased coins and aims to reproduce the distribution.
The problem of establishing such an input-independent separation between circuit classes QNC 0 and NC 0 was rst proposed by Bravyi, Gosset, and König [5], and was later found to be connected to the complexity of quantum states [32] as well.Using ideas from [26], Watts and Parham [32] gave a family of distributions over {0, 1} that constant-depth quantum circuits can produce within distance 1/6 + (1), but any NC 0 circuit's output is at least (1/2 − (1))-far from, assuming the total number of random bits the NC 0 circuit could use is (1 + (1)) • .The exact distributions are variants of the M distribution above.
However, an ideal separation result should have no restriction on the number of classical random bits, as well as a maximal quantumclassical distance gap of 1 − (1) or even 1 − −Ω ( ) .Towards this goal, [11] suggested determining locality lower bounds for M and related distributions.Without diving into detail, we remark that our Theorem 1.4 resolves this open problem, and we can lift the domain size assumption in [32,Theorem 5] while still preserving the separation.
Aside from directly improving previous analysis, we note that there is a simpler distribution that produces an optimal separation.Let U 1/3 be the 1/3-biased distribution over bits, where each bit is independently set as 1 with probability 1/3.
Corollary 1.11.There exists a distribution that QNC 0 circuits of depth one can sample without error, but any NC 0 circuit is We suspect this corollary may be folklore, especially after a reviewer pointed out that a bound similar (and stronger in at least some parameter regimes) to Theorem 1.10 is implicit in [25,31].However, it does not seem to explicitly appear in the literature, so we hope our statement will be bene cial to future researchers.
Remark 1.12.The quantum-classical separation result obtained in Corollary 1.11 seems a bit dishonest, as it takes advantage of precision issues arising from the classical binary representation.One may desire a separation where the quantum circuit is also restricted to "binary operations" to rule out distributions like 1/3biased.One natural candidate is Cli ord circuits where non-Cli ord gates are not allowed.However, the sampling task there sometimes can be reduced to the search task with a constant depth overhead [15,Section F], where the latter is trivial in the input-independent setting.Hence one must be careful in formulating such a restriction on the quantum circuit.
A di erent way to compensate for the precision issue is to give NC 0 circuits access to arbitrary binary product distributions.Then NC 0 circuits can certainly generate U 1/3 by simply receiving 1/3biased coins.We remark that our proof of Theorem 1.4 still holds in this setting. 6Thus, even giving NC 0 this extra power, we have a separation combining Theorem 1.4 and [32, Theorem 5].One caveat here is that the QNC 0 circuit needs to start with the GHZ state.If it is forced with |0⟩ as the initial state, one may want to prove locality lower bounds (without the domain assumption) for a more complicate distribution designed in [32,Theorem 3].Due to its similarity with the M distribution, we believe our techniques can be used there, and we leave this as a future work. 7

Future Directions
Beyond considering speci c distributions, Filmus, Leigh, Riazanov, and Sokolov [11] conjectured a classi cation of when NC 0 circuits can approximately sample D Λ , where D Λ is the uniform distribution over binary strings with Hamming weights in Λ.They hypothesized that if is (1)-local and (U ) is -close to D Λ , then (U ) is ( )-close to D Λ ′ for Λ ′ being one of the following: Our Theorem 1.1, combined with their main results, rules out all the singleton Λ ′ other than {0} and { }.In addition, our Theorem 1.4 rules out all the -periodic Λ ′ for 3 ≤ ≤ 1/2− (1) .With a number of new ideas, in an upcoming paper [18] we are able to resolve this conjecture (and a strengthening of it) a rmatively.
One question we not able to resolve concerns the quantitative bounds derived.While our distance bounds are asymptotically optimal when locality is constant, the locality-distance trade-o s deteriorate quickly when locality becomes superconstant.We believe our trade-o s can be further improved.However, in the full version of our paper we give examples to show the tightness of the parts in our analysis that create such inevitable blowup.This suggests that new ideas may be needed to get substantial improvements.

PROOF OVERVIEW
Let be a -local function with output bits, and let D be a distribution over {0, 1} .The goal is to prove that , fed with uniform inputs, cannot generate a distribution close to D. The general recipe of establishing such a bound is as follows: (1) First, we consider a simpler setting where not only does every output bit of depend on few input bits, but every input bit of in uences few output bits as well.
In this case, we can nd many output bits that depend on disjoint sets of input bits.Now if the desired distribution D has long-range correlation (e.g., the Hamming weight must equal ), we would expect a large error, since these output bits are independent and cannot coordinate with each other.(2) Then, based on the error bound established in the rst step, we aim to reduce the general case, where we may have popular input bits that many output bits depend on, to the structured case above.
At this step, we shall prove certain graph elimination results, showing that the desired structure in the rst step can be obtained after deleting some input bits.
The above description is an oversimpli cation of our analysis, and for each of our results in Subsection 1.1 we face di erent issues, which we elaborate on below.For convenience and simplicity, we will hide minor factors when stating bounds.The framework of viewing as a convex combination of speci c, easier-to-handle restrictions was largely developed in [26,30] and applied in [11].Thus, our primary contributions are the speci c choices of structure we reduce to and the corresponding technical analysis.
We rst consider the toy example D = U 1/3 , the 1/3-biased distribution.The idea here works equally well for any -biased distribution where is non-dyadic.Observe that 1/3 can only be approximated up to error ≈ 2 − using integer multiples of 2 − .Therefore the marginal distribution for every output bit of is doomed to be 2 −far from a 1/3-biased coin.Since total variation distance is closed under marginal projections, this already implies a 2 − bound.
To further boost it to 1 − (1), we rst assume that we can nd non-connected output bits, i.e., they do not depend on common input bits, which means they are independent.Since each one of these output bits incurs 2 − error, intuitively their error should accumulate.We prove that this is indeed the case.
We brie y sketch the proof: each weak distance bound implies an event E that happens more often in P | { } than Q| { } .Then by independence and standard concentration, the number of total events happening in P is typically • /2 larger than the number in Q , thus establishing the bound.Applied here, each P | { } corresponds to a selected output bit, and each Q| { } is a 1/3-biased coin.
Hence we get a 1 − exp − • 2 − bound.Now back to reality, we may not immediately nd non-connected output bits, since the degrees of input bits can be unbounded.For example, there could be one input bit that all outputs depend on, and therefore no two output bits are independent.However, conditioning on this one bit would decrease the degree to − 1 and also x the problem, at the cost of changing the distribution by a factor of 2. Since the distance bound above is su ciently strong, we can indeed pay some loss to condition on input bits.
In particular, we show that the convex combination of distributions P 1 , . . ., P is (1 − • )-far from a distribution Q, provided that each P is (1 − )-far from Q. Lemma 2.2.Let P 1 , . . ., P and Q be distributions.Assume there exists an event E and value such that ∥P − Q ∥ TV ≥ 1 − for each ∈ [ ]. Then for any distribution P as a convex combination of P 1 , . . ., P , we have This is proved as follows: each distance bound implies an event E happening with probability at least 1 − in P but at most in Q.Then their disjunction will inherent the 1 − probability in any convex combination of P 's, but still happens with at most • probability in Q by the union bound, which gives the desired statement.Applied here, each P corresponds to the distribution of output bits conditioned on a speci c assignment of = log( ) input bits.This will add a 2 overhead on top of the distance bound for the distribution after each conditioning.Given this observation and the 1 − exp − • 2 − bound above, we can a ord to delete roughly • 2 − input bits as long as we can nd non-connected output bits after this.
At this point, the problem is graph theoretic: given a bipartite graph where each left vertex (representing an output bit) has degree bounded by , we are allowed to delete a few right vertices (representing input bits) to get many non-connected left vertices, where we say two left vertices are non-connected if they are not both adjacent to the same right vertex.More precisely, we are allowed to delete at most • 2 − right vertices to get at least nonconnected left vertices.In addition, we would like to maximize , since the nal bound will be roughly 1 − exp − • 2 − .It turns out that this can be achieved with = /2 2 , which explains our bounds in Theorem 1.10.The starting point of the proof is the following naive attempt: if we remove all right vertices of degree at least ℓ, then we obtain a bipartite graph with left degree and right degree ℓ, which readily gives /( • ℓ) non-connected left vertices.Hence if the desired bound does not hold, the number of right vertices of degree at least ℓ is larger than • 2 − ≥ /( 2 • ℓ).Then summing over all ℓ up to roughly 2 2 , we will nd the right-hand side of above will be a sum of harmonic series and larger than • , whereas the left-hand side of above is still upper bounded by the number of total edges, which is at most • .This forms a contradiction.By analyzing more carefully, we can improve the 2 2 to 2 2 .This turns out to be sharp, using the following construction.  .A simpler case here is still when / has precision issues -think of = /3 for now.Then every bit in D /3 is supposed to be 1/3biased, whereas every output bit in the produced distribution is still 2 − -far from it.
While we largely follow the analysis above, the caveat here is that being far from U 1/3 does not imply being far from D /3 , as the distance between U 1/3 and D /3 is itself 1 − 1/ √ .More precisely, this issue arises when we try to aggregate the errors from independent output bits.In the previous argument, we compared P (representing the true output bits) and Q = U 1/3 (representing the desired marginal distributions), then showed that the weak individual error can be boosted to 1 − (1) between their product distributions, where the issue kicks in as the product of Q 's is U 1/3 instead of (and actually far from) D /3 .
To get around this, we strengthen the above argument to use Q 's as a proxy between the actual distribution and the desired distribution.Notice that, despite being far in the total variation distance metric, U 1/3 is close to D /3 in the pointwise multiplicative error sense.More formally, every string in the support of D /3 has density /3 −1 , and has density (1/3) /3 (2/3) 2 /3 under U 1/3 .
These two quantities are only o by a √ multiplicative factor, which means every event of probability at most under U 1/3 will have probability at most √ under D /3 .Therefore we can modify the previous analysis to show that there is an event of probability , thus establishing a strong distance bound between the actual and desired distributions.Since later in the graph theoretic task we will set ≈ /2 2 , this poly( ) loss is a ordable when is not particularly large.
The above analysis works well when individual output bits have inevitable error against the marginal of the desired distribution.As such, we need new ideas if we want to establish lower bounds for the general case.For simplicity let us focus on the = /2 case, as the analysis will generalize to any that is not too close to 0 or .If we can nd independent output bits that are -far from being unbiased, then we can use the same argument to boost them to a 1 − exp − 2 bound.Otherwise we need to exploit the long-range correlation of D /2 that the Hamming weight must sum to exactly /2.One possible exploitation is through anticoncentration inequalities, which have played an important role in the analysis of similar problems [6,26].In particular, if there are independent output bits that are actually unbiased, then by Littlewood-O ord anticoncentration [10,21], they cannot sum to any particular value with probability more than 1/ √ , which seemingly means the distribution is still (1 − 1/ √ )-far from D /2 .The issue with this argument is that the independent output bits can correlate with many other output bits, which might be able to force the total Hamming weight to a xed value.For example, one can consider the construction ), where we have /2 independent bits but the total sum is always /2 and every individual bit is unbiased.
To address this problem, we need to take into account the neighborhood of each output bit.De ne the neighborhood ( ) of an output bit as the set of output bits that depend on some input bit that also depends on.We will exploit a key tension between two facts about ( )'s distribution.Firstly, every small neighborhood should be unbiased, since the marginals of D /2 restricted to any small number of bits are 1/poly( )-close to the uniform distribution over those bits.Secondly, resampling the input bits on which depends should not change the Hamming weight of the output (and thus does not change the Hamming weight of ( )).However, since the output of depends only on these inputs, the second property implies the distribution over Hamming weights of ( ) conditioned on = 0 would be the same as the distribution over = 1, which contradicts the rst property.Note this argument has no issue with the above construction.
Let be a parameter to be optimized later.We classify each neighborhood as Type-1 if it is -far from being unbiased, and as Type-2 if it is -close to unbiased coins.Mimicking the previous analysis, we say two neighborhoods ( ), ( ) are non-connected if all pairs ( ′ , ′ ) ∈ ( ) × ( ) are non-connected.Thus by the same argument, if we have non-connected neighborhoods of Type-1, then our distribution is far from D /2 .Lemma 2.4.Assume there are ≥ 1 non-connected Type-1 neighborhoods.Then Now suppose we have non-connected neighborhoods of Type-2, each of size at most .We would like to use anticoncentration inequalities to argue that with high probability the Hamming weight does not sum up to /2.Assume the neighborhoods are (1), . . ., ( ) and ( ) is the set of input bits the -th output bit depends on.We x all the input bits outside (1) ∪ • • • ∪ ( ) as .Then all the output bits outside (1) ∪ • • • ∪ ( ) are xed, and moreover, the neighborhoods (1), . . ., ( ) are independent to each other.At this point, if the Hamming sum of each ( ) is still not xed, we can apply the following anticoncentration result to obtain the desired bound.Then there exists a universal constant > 0 such that holds for any ∈ R.
To this end, we use the property that ( ) is Type-2, i.e., it is roughly unbiased under random .Say ( ) has size .Then under a uniform random input, the Hamming sum of ( ) is distributed like a binomial distribution of coins.If we resample the input bits in ( ), with half probability the -th output bit is ipped, whereas the Hamming sum of ( ) \ is a binomial distribution of − 1 coins.This implies that such an experiment has 1/ √ − probability of changing the Hamming sum of ( ), where 1/ √ comes from the total variation distance between a binomial distribution of coins and its shift, and comes from the error between the actual distribution of ( ) and U .Meanwhile, since does not touch ( ), we cannot change the Hamming sum by simply resampling ( ) if the Hamming sum is already xed by .Hence as long as ≤ 1/ √ , we show that the Hamming sum of ( ) is not xed under random (and thus a typical) .Since these neighborhoods are independent, by standard concentration many neighborhoods will enjoy this property simultaneously for a typical .Then we can apply anticoncentration and obtain a bound of roughly Finally we need to handle the graph theoretic task: given a bipartite graph with left degree at most , show we can obtain non-connected left neighborhoods (representing the neighborhoods of output bits) of size by removing / right vertices.The left neighborhood of a left vertex is the set of left vertices reachable from it with two edges.Two left neighborhoods are non-connected if they do not connect to common right vertices.In addition, we aim to maximize and minimize , since the nal distance bound will be 1 − 1/ √ − √ • exp {− / }.This task is signi cantly more challenging than the previous one, as now we need to eliminate the dependency of the neighborhoods too.Consequently, we only get a bound with tower-type dependence on .That is, we show that the problem can be solved with = /tow 2 ( ) and = tow 2 ( ). 8 Perhaps surprisingly, this tower-type dependency is in fact necessarily, seen using the following construction.Here we brie y sketch the proof.As before, assume towards contradiction it is false.Then we follow the previous approach and argue that we will have too many right vertices of large degree, which will imply the following structural result: if we have removed / right vertices from the graph where ≥ is su ciently large, we can additionally remove /log( ) right vertices to shave edges from the graph.Then we arrive at a contradiction, as the graph has at most • edges and thus can support the elimination process up to times.However repeating times only removes roughly /log ( ) ( ) many right vertices in total, which means the elimination process should continue if log ( ) ( ) ≥ . 9.4 Hamming Slices of Weight 0 Modulo 3.
We note that the analysis for D /2 also works for the union of multiple Hamming slices, since the main place where we use /2 is that it is one xed value and thus has 1/ √ bound via anticoncentration.Beyond a single slice, the 1/ √ bound simply scales with the number of slices.Nevertheless, this does not go beyond √ slices.Here we demonstrate that our framework is robust enough to handle Ω( ) periodic slices.
For simplicity, we consider the case where D equals the uniform distribution over -bit binary strings with Hamming weight 0 modulo 3. Note that this distribution consists of roughly /3 Hamming slices and has marginal distribution almost unbiased.We follow the proof of D /2 .Let be a parameter measuring the distance between the marginal distribution of neighborhoods and the unbiased distribution.Similarly we classify each neighborhood as Type-1 if the distance is at least , and as Type-2 if otherwise.Once we have non-connected neighborhoods of Type-1, we readily get a 1 − exp − 2 distance bound following the same argument.
On the other hand, if we have non-connected Type-2 neighborhoods of size at most , then we use anticoncentration inequalities (in fact, a local limit theorem) to show that with certain probability we cannot have Hamming weight equal to 0 modulo 3. Recall that in the single Hamming slice case, we argue that a Type-2 neighborhood ( ) is not xed after a typical restriction which does not touch ( ) (the input bits that the -th output bit depends on).This is proved via a thought experiment where we resample the input bits in ( ) and compare the binomial distribution of coins with its shift.Here we need a similar statement that a Type-2 neighborhood is not xed modulo 3 after a typical restriction .The only di erence is that now we need to compare the binomial-modulo-3 distribution with its shift.Since 3 does not divide 2, the binomialmodulo-3 distribution can never be uniform over {0, 1, 2}.In fact, by granularity, it is 2 − -far from its shift, which means that the Hamming sum modulo 3 of ( ) is typically not xed as long as ≤ 2 − /2.Then using the following local limit theorem (which is an almost tight Littlewood-O ord-type anticoncentration) on the additive group modulo 3, we obtain that under typical , the Hamming sum modulo 3 is roughly uniform over {0, 1, 2}, thus it hits any particular value with probability 1/3 + (1).Lemma 2.7.Let ≥ 3 be an integer, and let 1 , . . ., be independent random variables in Z.For each ∈ [ ] and ≥ 1, de ne , = max ∈Z Pr [ ≡ (mod )] and assume Then for any Λ ⊆ Z/ Z, we have where Λ even = {even numbers in Λ} and Λ odd similarly.
By the same reasoning, we seek the above structure at the cost of removing at most /2 input bits, while simultaneously maximizing and minimizing .It turns out that this is still manageable with a tower-type loss on via a similar graph elimination argument.At last, we mention that the local limit theorem used for analyzing Type-2 neighborhoods holds generally for all modulus including 2. However, the comparison between the binomial-modulo-distribution and its shift can only be done for modulus ≥ 3.This is because the binomial-modulo-2 distribution is indeed uniform over {0, 1}.Thus for even 's (i.e., 's not coprime with 2), there will be an additional contributing factor (Lemma 2.7), which results in a di erent bound for even 's in Theorem 1.4.

More General Input Distributions.
Now we brie y discuss how to modify our analysis to prove similar lower bounds when the input distribution changes from unbiased coins to general product distributions.While this is not true for the 1/3-biased distribution (or any -biased in general), it works for the Hamming slices setting.Since it is standard that a Boolean circuit takes unbiased coins as input, we focus on this case and leave the following more general treatment for interested readers as an exercise.
Recall that our analysis starts with a simpler setting where we can nd many small non-connected neighborhoods.In this case, we prove distance lower bounds by comparing the marginal distributions of these non-connected neighborhoods with the desired marginal distribution (unbiased or 1/3-biased coins).Then we classify them into Type-1 and Type-2 and argue the nal distance bound separately.The analysis in this part has nothing to do with the input distribution, since the only property we need is the non-connectivity of output neighborhoods in the input-output relation, which generalizes trivially when we view the input "bits" as taking values in a larger alphabet.Then we reduce the general setting to the above simpler setting by removing a few input bits.This part is also oblivious to the alphabet of the input as it works in a purely graph theoretic sense where the input-output dependency is de ned in an abstract way regardless the alphabet.
The only problematic part is where we put the above two steps together (Lemma 2.2).There, we have to pay a union bound of for all the possible conditioning (or equivalently, the number of di erent distributions after conditioning), since the true output distribution is a convex combination of them.If the alphabet of the input is Σ and we need to remove input bits, we will need to set = |Σ| .To compensate this loss, the graph theoretic problem needs to be slightly reformulated, but it will still be manageable if |Σ| is a constant or even slightly superconstant.For example, in the setting of D = D /2 , previously we needed to obtain nonconnected neighborhoods of size after removing / input bits; now we need to obtain non-connected neighborhoods of size after removing /( • log(|Σ|)) input "bits".
Finally we remark that an extremely general result, where the bounds have no restriction on the alphabet, is simply not true.One can use a 1-local function to sample any distribution if the input alphabet is large enough to include all possible outcomes and is dubbed just with the desired distribution.

Lemma 2 . 1 .
Let P and Q be distributions over an -dimensional product space, and let ⊆ [ ] be a non-empty set of size .Assume • P | and Q| are two product distributions, and • P | { } − Q| { } TV ≥ holds for all ∈ .

2. 2
The Hamming Slice of Weight /3.Now we move on to the single Hamming slice case D = D , the uniform distribution over -bit binary strings of Hamming weight

Proposition 2 . 8 .
There exists a set ⊆ [ ] such that any xing of input bits in reduces to a -local function with at most non-connected neighborhoods of size at most , where | | ≤ 2 28 −18 and ≥ tow 2 (16 ) and ≤ tow 2 (16 ).
, log } input bits to get the above structure.This seems too stringent and impossible, even without considering the undesirable loss in the nal bound.Instead, we observe that the distance bound from the second case actually tells more; it is proved in the stronger sense that our output distribution hits any point in the support of D /2 with probability at most 1/ √ .Hence we can re ne the previous analysis and Lemma 2.2 as follows: any convex combination of P 1 , . . ., P is (1 − • 1 − 2 )-far from Q, provided that each P is either (1 − 1 )-far from Q, or hits the support of Q with probability at most 2 .The proof is not much di erent, and we simply merge the event "not hitting the support" into the previous union bound.Therefore we can remove / input bits now.