A New Information Complexity Measure for Multi-pass Streaming with Applications

We introduce a new notion of information complexity for multi-pass streaming problems and use it to resolve several important questions in data streams. In the coin problem, one sees a stream of $n$ i.i.d. uniform bits and one would like to compute the majority with constant advantage. We show that any constant pass algorithm must use $\Omega(\log n)$ bits of memory, significantly extending an earlier $\Omega(\log n)$ bit lower bound for single-pass algorithms of Braverman-Garg-Woodruff (FOCS, 2020). This also gives the first $\Omega(\log n)$ bit lower bound for the problem of approximating a counter up to a constant factor in worst-case turnstile streams for more than one pass. In the needle problem, one either sees a stream of $n$ i.i.d. uniform samples from a domain $[t]$, or there is a randomly chosen needle $\alpha \in[t]$ for which each item independently is chosen to equal $\alpha$ with probability $p$, and is otherwise uniformly random in $[t]$. The problem of distinguishing these two cases is central to understanding the space complexity of the frequency moment estimation problem in random order streams. We show tight multi-pass space bounds for this problem for every $p<1/\sqrt{n \log^3 n}$, resolving an open question of Lovett and Zhang (FOCS, 2023); even for $1$-pass our bounds are new. To show optimality, we improve both lower and upper bounds from existing results. Our information complexity framework significantly extends the toolkit for proving multi-pass streaming lower bounds, and we give a wide number of additional streaming applications of our lower bound techniques, including multi-pass lower bounds for $\ell_p$-norm estimation, $\ell_p$-point query and heavy hitters, and compressed sensing problems.


INTRODUCTION
Streaming problems with stochastic inputs have been popularly studied in the streaming community [6,9,13,14,16,20,27,28,37], which have applications to diverse areas including learning theory [12,17,47,48] and cryptography [18,19,31,50].In this setting, one sees a stream of i.i.d.samples from some underlying distribution.As the samples are i.i.d., this is a special case of the well-studied random order streaming model.In this paper, we will consider streaming problems (with stochastic inputs) that are allowed multiple passes over their input.Surprisingly, even the most basic problems in data streams are not resolved in the stochastic setting.We discuss two such problems below.
Coin Problem.If the stream is 1 , 2 , . . ., , with each being independently and uniformly drawn from {−1, 1}, then the coin problem is to compute the sum of these bits up to additive error ( √ ), which gives a non-trivial advantage for estimating the majority.This can be solved trivially in (log ) bits of memory by storing the sum of input bits, and was recently shown to require Ω(log ) bits of memory in [9].However, if we allow two or more passes over the stream, the only known lower bound is a trivial Ω(1) bits.
Naturally, the coin problem is closely related to the fundamental question of counting the number of elements in a data streammaintain a counter in a stream under a sequence of updates of the form ← + 1 or ← − 1, which is arguably the most basic question you could ask.More generally, one would like to approximate up to a constant multiplicative factor.One can solve this exactly using ⌈log 2 ⌉ bits of memory, where is the length of the stream.This bound is tight for deterministic algorithms even if you allow a large poly( ) approximation factor [3].It is also tight for constant-factor-approximation randomized 1-pass algorithms, via a reduction from the augmented indexing problem [36].In fact, the lower bound of [9] for the coin problem implies that the randomized 1-pass lower bound holds even when the algorithm succeeds with high probability over an input of . . .coin ips.Despite our understanding of 1-pass algorithms, we are not aware of an Ω(log ) lower bound for this basic problem of approximate counting for more than one pass, even for worst-case streams, once two or more passes are allowed.For -pass streaming algorithms, a weaker lower bound of Ω((log ) 1/ ) can be derived using communication complexity lower bounds for the Greater-Than function 1 [40,51].There is work on approximate counting lower bounds of [2], which analyzes a single linear sketch and therefore can only apply to single pass streaming algorithms.It also requires at least exponentially long streams in the value of the nal count, despite the count being ( ) in absolute value at any time during the stream.
Needle problem.Here the goal is to distinguish between streams 1 , • • • , sampled from two possible underlying distributions: let = Ω( ), • Uniform distribution 0 : each is picked independently and uniformly at random from the domain [ ], and • Needle distribution 1 : the distribution rst uniformly samples an element from [ ] (we call it the needle).Then, each item independently with probability equals , and otherwise is sampled uniformly from [ ].
Lovett and Zhang [37] showed a lower bound of Ω 12 log bits for any constant number of passes to distinguish 0 from 1 , and left it as an open problem to improve this bound.Using an algorithm for frequency moment estimation, the work of Braverman et al. [11] shows that for = 1 2/3 , there is a single-pass upper bound of 1 2 bits.For larger , the best known algorithm is to store the previous 1 2 items in a stream, using log 2 bits, and check for a collision.Thus, for every there is a gap of at least log in known upper and lower bounds, and for the important case of > Ω 1 2/3 , the gap is Θ(log 2 ).We note that the work of Lovett 1 Brie y, given inputs , ∈ [ 0.1 ] to the two-player CC problem for Greater-than function, Alice adds • 0.9 1s to the stream and Bob adds • 0.9 number of −1s.
Determining the sign of − , or estimating − up to additive error 0.9 requires Ω ( (log ) 1/ ) randomized -round communication since it solves Greater-Than.
and Zhang also requires = Ω( 2 ) in its lower bounds 2 , which limits its applications to frequency moment estimation.The needle and coin problems are related to each other if = Θ 1 √ and when the algorithm has access to a random string that is not counted towards the space complexity 3 .One can randomly hash each universe element in a stream in the needle problem to {−1, 1} -under the needle distribution 1 , the absolute value of the sum of bits is likely to be an additive Θ( √ ) larger than in the uniform distribution 0 , and thus the coin problem is at least as hard as the needle problem.However, the needle problem for = Θ 1 √ could be strictly easier than the coin problem.We stress that the coin and needle problems are arguably two of the most fundamental problems in data streams.Indeed, a vast body of work has considered estimating the -th frequency moment = =1 | | , for an underlying -dimensional vector undergoing positive and negative updates to its coordinates.If = 1, this problem is at least as hard as the coin problem.The lower bound of [9] thus gave the rst Ω(log ) lower bound for single pass 2estimation in the bounded deletion data stream model [32], even when one does not charge the streaming algorithm for storing its randomness.As the lower bound of [9] was actually an information cost lower bound, it gave rise to the rst direct sum theorem for solving multiple copies of the 2 -estimation problem, showing that solving copies each with constant probability requires Ω( log ) bits in this data stream model.Lower bounds for the coin problem were shown to imply additional lower bounds in random order and bounded deletion models for problems such as point query, and heavy hitters; see [9] for details.For > 2, if we set = Θ in the needle problem, then it is not hard to see that di ers by a constant factor for distributions 0 and 1 with large probability.In this case, the lower bound of [37] gives an Ω 1−2/ log lower bound for frequency moment estimation in the random order insertion-only data stream model for any constant number of passes, provided = Ω( 2 ).There is also a long sequence of prior work on this problem [6,13,16,39], which obtains polynomially worse lower bounds (though does not require = Ω( 2)).The single-pass arbitrary order stream 1−2/ upper bound of [11] for > 3 matches this up to a logarithmic factor, and a central question in data streams is to remove this logarithmic factor, as well as the requirement that = Ω( 2 ).

Our Contributions
We give a new multi-pass notion of information complexity that gives a uni ed approach to obtain lower bounds for the coin and the needle problem for any number of passes.We note that the measure of information we use is a generalization of the notion of information complexity for = 1 pass given in [9].Namely, we de ne the -pass information complexity notion by where ( 1 , • • • , ) ∼ and M ( , ) represents the th memory state in the th pass.We will set to be the uniform distribution over {−1, 1} in the coin problem and the uniform distribution 0 in the needle problem.When is clear from the context, we will drop it from the notation and write (M).The primary challenge in establishing lower bounds for multi-pass streaming algorithms arises from the fact that the streaming data loses its independence when multiple passes are used.To mitigate this, the idea of (M) is to capture some residual independence by carefully xing some memory states in the previous pass.To make this notion useful, we show that (M) is upper bounded by 2 .Note that this multi-pass information complexity notion applies to any streaming problem as long as it is de ned on a product distribution.As we will see in the paper, this notion is useful for proving multi-pass streaming lower bounds via various approaches, such as round elimination as well as reductions to communication problems.
Just as the measure of information in [9] was crucial for 1-pass applications, such as the amortized complexity of approximate counting in insertion streams [1], we will show our notions have a number of important applications and can be used to obtain multipass lower bounds for both the coin and needle problems.We will de ne and motivate our information complexity notion more below, but we rst describe its applications.
1.1.1The Coin Problem.We give tight lower bounds on the information complexity of the coin problem, signi cantly extending the results of [9] for the 1-pass setting to the multi-pass setting.We then give a new multi-pass direct sum theorem for solving multiple copies, and use it for streaming applications.
Multi-Pass Coin Problem.We give the rst non-trivial multi-pass lower bound for the coin problem.1.1 ( P C P ).Given a stream of i.i.d.uniformly random bits, any -pass streaming algorithm which outputs the majority of these bits with probability 1 − for a small enough constant > 0, requires Ω( log ) bits of memory.
Theorem 1.1 is a signi cant strengthening of the main result in [9] which held only for = 1 pass.Although the work of [10] allows for a larger bias on its coins, it also held only for = 1 pass.
As discussed before, we can interpret the coins as updates in {−1, 1} to a counter initialized to 0. Adjoining a pre x of √ 1s to the stream for a large enough constant > 0, we have that by bounds on the maximum deviation for a 1-dimensional random walk that will be non-negative at all points during the stream, which corresponds to the strict turnstile streaming model -where one can only delete previously inserted items.We also have that the nal value of will deviate from its expectation √ by an additive Ω( √ ) with constant probability, that is, it is anti-concentrated.Consequently, Theorem 1.1 implies the following.).Any -pass strict turnstile streaming algorithm which counts the number of insertions minus deletions in a stream of length up to a small enough constant multiplicative factor and with probability at least 1 − for a small enough constant > 0, requires Ω( log ) bits of memory.
By constant multiplicative factor, we mean to output a number ′ for which (1 − ) ≤ ′ ≤ (1 + ) for a constant > 0. For insertion-only streams where no deletions of items are allowed, nontrivial algorithms based on Morris counters achieve (log log ) bits [21,25,44,49].We rule out any non-trivial algorithm for strict turnstile streams for any constant number of passes.
To obtain further applications, we rst show a direct sum theorem for multi-pass coin problems.).Suppose < log and < for a su ciently small constant > 0. Given independent streams each of i.i.d.uniformly random bits, anypass streaming algorithm which outputs a data structure such that, with probability 1− for a small enough constant > 0 over the input, the algorithm's randomness, and over a uniformly random ∈ [ ], outputs the majority bit of the -th stream, requires Ω log bits of memory.
As an example application of this theorem, in [41] the following problem was studied for a real number ∈ [0, 2]: given vectors vector , the -norm 4 . We will refer to this problem as Multi-ℓ -Estimation.The best upper bound is ( • log ), which follows just by solving each instance independently with constant probability and using (log ) bits [36].An Ω( log log +log ) randomized lower bound follows for any (1)pass streaming algorithm by standard arguments 5 .We note that if we do not charge the streaming algorithm for its randomness, then the (1) pass lower bound for Multi-ℓ -Estimation is an even weaker Ω( log log ).
By using Theorem 1.3 and having each vector in the Multi-ℓ -Estimation problem correspond to a single counter, we can show the following: for a su ciently small constant > 0. Any -pass streaming algorithm which solves the Multi-ℓ -Estimation Problem on instances of a stream of updates for each vector, solving each ℓ -norm estimation problem up to a small enough constant factor with probability 1 − for a su ciently small constant , requires Ω log bits of memory.
Another important streaming question we consider is the ℓ 2 -Point ery Problem: given an underlying -dimensional vector ∈ {−poly( ), . . ., poly( )} that undergoes a sequence of positive and negative additive updates to its coordinates, for each ∈ {1, . . ., }, one should output up to an additive error ∥ ∥ 2 with constant probability.Related to this question is the ℓ 2 -Heavy Hi ers Problem which asks to output a set which (1) contains all indices for which 2 ≥ ∥ ∥ 2 2 , and (2) does not contain any index for which 2 ≤ 2 2 ∥ ∥ 2 2 .Further, for all ∈ , one should output an estimate with | − | ≤ ∥ ∥ 2 .In [9], for both of these problems an Ω( −2 log ) memory lower bound was shown for single-pass algorithms on length-streams, which improved the previous best known Ω( −2 log ) lower bounds when the stream length is much larger than the dimension of the input vectors.Notably, the lower bounds in [9] only hold for single pass algorithms.
By having each coordinate of an underlying vector correspond to a counter, we can also use Theorem 1.3 to solve the ℓ 2 -Point ery Problem and the ℓ 2 -Heavy Hi ers Problem.Here we also use that the Euclidean norm of the underlying vector is concentrated.

T 1.5 ( P P H H
). Suppose < log and −2 < for a su ciently small constant > 0. Any -pass streaming algorithm which, with probability 1 − for a su ciently small constant > 0, solves the ℓ 2 -Point ery Problem or the ℓ 2 -Heavy Hi ers Problem on a vector ∈ {−poly( ), . . ., poly( )} in a stream of updates, requires at least Ω −2 log bits of memory.
Our Ω( −2 log ) bit lower bound for the ℓ 2 -Heavy Hi ers Problem can be applied to the Sparse Recovery Problem in compressed sensing (see, e.g., [24,45]), which involves an input vector ∈ {−poly( ), . . ., poly( )} in a stream, and asks to output ansparse vector for which where is with all but the top coordinates set to 0.Here Δ > 1 is any xed constant.
A standard parameter of sparse recovery is the Signal-to-Noise Ratio (SNR), which is de ned to be . The SNR is at most 1, and if it is 1, there is a trivial Ω( (log + log )) bit lower bound.Indeed, since the guarantee of (1) has multiplicative error, we must have = = in this case, and it takes Ω( (log + log )) bits to encode, for each of the non-zero locations in , its location and its value.However, when the SNR is a constant bounded away from 1, this encoding argument no longer applies.Indeed, while one can show an Ω( log ) bits lower bound to encode the identities of locations, each of their values can now be approximated up to a small multiplicative constant, and so encoding their values requires only Ω( log log ) bits.While an Ω( + log log ) measurement lower bound is known for multi-pass streaming algorithms [46] for constant SNR bounded away from 1, perhaps surprisingly in the data stream model, an Ω( log ) bit lower bound for streams of length and SNR bounded away from 1 was unknown.As our lower bound for the ℓ 2 -Heavy Hi ers Problem only requires recovering a large constant fraction of the ℓ 2 -heavy hitters, all of which are comparable in magnitude in our hard instance, and the Euclidean norm is concentrated, we in fact can obtain a lower bound for the Sparse Recovery Problem even if the SNR is a constant bounded away from 1.We note that there is an (log log )-pass streaming algorithm which uses ( log log log ) bits of memory to solve the sparse recovery problem for any SNR, see [43] which builds upon [29] (see the text after the proof of Theorem 3.7 in [29] on how to obtain an exactlysparse output).Our lower bound is thus tight up to poly(log log ) factors.
). Suppose < log and < for a su ciently small constant > 0. Any -pass streaming algorithm which, with probability 1 − for > 0 a small constant, solves the Sparse Recovery Problem for constant SNR in (0, 1), requires Ω( log ) bits of memory.
1.1.2The Needle Problem.Lovett and Zhang [37] recently showed the following lower bound for the needle problem.

T 1.7 ([37]
).Any -pass streaming algorithm M which distinguishes between the uniform and needle distributions with high probability, where denotes the needle probability, the stream length, and the space, satis es 2 log( ) = Ω(1), provided the domain size = Ω( 2 ).
While this lower bound is nearly tight, it was conjectured by [6,13,16,37] that the additional log( ) term can be removed, and it also was plausible that the = Ω( 2 ) restriction could be removed.This conjecture is for good reason, as for = Θ( ) and ≪ 1 2/3 and = 1, an upper bound for estimating frequency moments of [11] shows that 2 = (1).Indeed, the upper bound of [11] shows how to estimate = =1 up to an arbitrarily small but xed constant factor in ( 1−2/ ) bits of memory and a single pass, for any > 3. Notice that in distribution 0 , we could choose a proper = Θ( ) such that = + ( ) with high probability.On the other hand, for distribution 1 , we have that > ( • ) , and so if = Θ(1/ 1−1/ ), these two distributions can be distinguished by the algorithm of [11].In this case the conjecture would say ), which matches the space upper bound of [11].
We resolve this conjecture.As a consequence, several other streaming lower bounds mentioned by [16,37,38] can be improved automatically.Also, our results also imply that the frequency estimation problem for > 2 is as hard in the random order model as in the arbitrary order model (both are Ω( 1−2/ )).

T 1.8 (
).Any -pass streaming algorithm M with space that distinguishes 0 and 1 with high probability satis es 2 = Ω(1), where denotes the needle probability and ≤ /100 denotes the number of samples.
If we use the algorithm of [11] to the needle problem by the reduction discussed above, we can conclude that our lower bound for the needle problem is tight when ≪ 1 2/3 .However, we further improve the upper bound by giving a new algorithm and show that:

(I ).
There exists a one-pass streaming algorithm that distinguishes 0 and 1 with high probability and uses ( 12 ) bits of space when ≤ 1 log 3 .
Our upper bound improves upon [11], and shows that our lower bound for the needle problem is indeed tight for any ≤ In fact, for ≥ 1 √ , for the -player communication problem, where each player has a stream item and the players speak one at a time from left to right in the message passing model (see, e.g., [33] for upper bounds for a number of problems in this model), we show that the problem can be solved by having each player send at most ((log log )(log log log )) bits to the next player.It is not quite a streaming algorithm, as the players need to know their identity, but would be a streaming algorithm if we also allow for a clock, so that we know the number for the -th stream update, for each .

T 1.10 (
).There exists an -player one-round communication protocol that distinguishes 0 and 1 with high probability and each player uses at most ((log log )(log log log )) bits of space for any ≥ 1 √ .
Theorem 1.10 shows that for = 1 √ , the needle problem is strictly easier than the coin problem, and thus the abovementioned algorithm for the needle problem, by rst reducing to the coin problem, is suboptimal.Indeed, in the same communication model or for streaming algorithms with a clock, our Ω log lower bound in Theorem 1.1 applies.Thus, for a constant number of passes, the coin problem requires Ω(log ) bits of memory whereas the needle problem with = 1 √ can be solved with ((log log )(log log log )) bits of memory, showing that there exists a separation between the two problems in the -player communication model.
Remark.Note that our algorithm not only applies to the needle problem, but also could be adapted to a more general setting where the needle is randomly ordered while non-needle items could be in an arbitrary order with some constraints.
Our improved lower bound for the needle problem can be used to obtain optimal lower bounds in the random order model for arguably the most studied problem in the data stream literature, namely, that of approximating the frequency moments.Starting with the work of Alon, Matias, and Szegedy [4], there has been a huge body of works on approximating the frequency moments in arbitrary order streams, see, e.g., [5,7,8,15,22,23,30,34,42], and references therein.As mentioned above, Braverman et al. [11] gave an upper bound of ( 1−2/ ) for constant approximation for all > 3, which is optimal for arbitrary order insertion streams.
A number of works have also studied the frequency moment estimation problem in randomly ordered streams.While the ( 1−2/ ) bit upper bound of [11] still holds, we did not have a matching lower bound.Chakrabarti, Cormode, and McGregor [13] gave the rst non-trivial Ω( 1−3/ ) lower bound.A follow-up paper by Andoni et al. [6] improved this lower bound to Ω( 1−2.5/ ).Recently, a lower bound of Ω( 1−2/ /log ) was shown by Lovett and Zhang [37] provided = Ω( 2 ).Since a stream of i.i.d.samples is automatically randomly ordered, our Theorem 1.8 resolves this long line of work, giving an Ω( 1−2/ ) lower bound.We thus improve the lower bound of [37] by a logarithmic factor, and also remove the requirement that = Ω( 2 ).The application to frequency moments follows by applying our theorem with = Θ( ) and = 1/ 1−1/ and arguing that the needle problem gives rise to a constant factor gap in the value of the -th frequency moment in the two cases.We note that the work of [26] claimed to obtain an Ω( 1−2/ ) lower bound for frequency moment estimation in a random order, but was later retracted due to an error which has been pointed out in multiple places, e.g., [39] retracts its lower bounds and points out the error6 in [26].
There are other related problems to frequency moment estimation that we also obtain improved lower bounds for, such as frequency moment estimation of sub-sampled streams.McGregor et al. [38] studied streaming problems in a model where the stream comes in so rapidly that you only can see each element independently with a certain probability.Our Theorem 1.8 gives an optimal lower bound for this problem as well, via the reduction in [38].Another example concerns stochastic streaming problems such as collision probability estimation studied by Crouch et al. [16].They provided several lower bounds based on the needle lower bound of [6].Our Theorem 1.8 automatically improves their lower bounds via the same reductions.

TECHNICAL OVERVIEW
In this section, we give a brief overview of our proofs for both the coin problem and needle problem.

Properties of New Multi-Pass IC Notion
In this section, we will show some important properties of our IC notion.In addition to the , we get an another natural expression for the an information measure for -pass algorithms M as follows: De nition 2.1.Let M be a -pass streaming algorithm, its input is denoted by 1 , . . ., following a product distribution , then we de ne the following information complexity notion: One might see that the notion above shares some similarities with , while the di erence is further divides the information costs into smaller components.In our paper, we show the following properties, which provide an upper bound for both notions and show some relations.Particularly, we show that, when 1 , . . ., are drawn from a product distribution , the following holds: ).Note that our information complexity notions could be applied to any other problems as long as they are de ned on product distributions.

Multi-Pass Lower Bound for the Coin Problem
In this section, we assume that 1 , . . ., are drawn from the uniform distribution over {−1, 1} ; we will drop for the rest of the section.We show an Ω( log ) lower bound on (M, ) for any -pass algorithm M that solves the coin problem (or computes majority of input bits with large enough constant advantage).
We will prove our -pass lower bound for computing majority (or approximating sum) on the uniform distribution by reducing it to the one-pass lower bound proven by [9], which is stated as follows: 14).Given a stream of uniform {−1, 1} bits 1 , . . ., , let O be a one-pass algorithm that uses private randomness.For all > 0 − 1 20 , there exists ≥ 1 5 (for small enough constant 1 > 0 and large enough constant 0 > 0), such that if then Here, O represents the memory state of the one-pass algorithm O after reading input elements.
In other words, if the output of algorithm O reduces the variance of the sum7 , then it needs to have high information cost.

Construction of O.
To prove Theorem 2.3, we develop a new simulation technique to prove our multi-pass streaming lower bounds for the coin problem.Given a stream of . . .uniform bits, let M be a -pass algorithm that goes over twice in order and computes the majority -that is, the expected variance of the sum of input bits conditioned on the output of the second pass is a constant factor less than that of the maximum.Informally, for ease of discussion, we refer to the variance reduction as M approximating up to an additive error 8 of √ .Using M, we construct a one-pass algorithm O, which given a stream of . . .uniform bits , also approximates up to an additive error of ∼ √ .Let M represent the random variable for the output (or memory state) at the end of the -th pass of M ( ∈ [ ]). O executes passes of M in parallel.Before reading the input bits 1 , . . ., , O samples memory states at the end of rst − 1 passes from the joint distribution on (M 0 , . . ., M −1 ).O then modi es the given input to ′ such that the parallel execution of the − 1 passes of the algorithm M on ′ 1 , . . ., ′ end in the sampled memory states.O also maintains an approximation for the modi cation, that is of =1 ( ′ − ); this helps O to compute as long as M computes ′ after passes.As we want O to have comparable information cost to that of M, the approximation of the modi cation should take low memory 9 .The key observation that makes such an approximation possible is: since the KL divergence of the distribution , conditioned on reaching memory states M 0 , . . ., M −1 , from the uniform distribution is bounded by the entropy of (M 0 , . . ., M −1 ) (which we assume to be << ), algorithm O does not need to drastically modify (which has a uniform distribution).Still, we cannot a ord to store the modi cation exactly; however, a cruder approximation su ces, which can be computed using low memory.
As described above, algorithm O has two components, 1) imitate passes of M simultaneously, and 2) maintain an approximation for modifying input to a valid input ′ for the rst − 1 passes of M. To formally describe algorithm O (in Section 2.2.3), we rst state these two components separately as algorithms Im (in Section 2.2.1) and Apr (in Section 2.2.2) respectively.

Single-Pass Algorithm Im
Imitating Passes of M. Recall that M is a -pass algorithm that runs on a stream of . . .uniform {−1, 1} bits, 1 , 2 , . . ., .We describe algorithm Im in Algorithm 1.Let Im (where ∈ [ ]) represent the random variable for the memory state of Algorithm Im after reading inputs bits, and Im 0 be a random variable for the starting memory state for the algorithm.The input to the algorithm Im is drawn from the uniform distribution on {−1, 1} .

Low Information Approximation Algorithm
Apr.We develop Apr for the general problem of approximating the sum of elements, each in {−1, 0, 1}.The problem is as follows: given parameters > 0, 0 ← ( ′ 0 , ′ 1 , . . ., ′ −1 ) {Im stores these memory states for the entire algorithm} 3: ∀ ∈ [ ], ′ ( ,0) ← ′ ( −1) {Starting memory states for the passes of M} 4: for = 1 to do else 10: ′ ← with probability 1 − 2 , and ′ ← 1 else if ≤ 0 then end if ) , . . ., ′ ( , ) ) from the joint distribution of (M (1, ) , M (2, ) , . . ., M ( , ) ) condition on , Im executes th time-step for all passes of M} 20: we formalize the error guarantee as Additionally, we establish that the streaming algorithm Apr (described in Algorithm 2) has low information cost -the memory state at each time-step has low entropy, that is, ∀ ∈ [ ], H(Apr ) ∼ 2 log √ .Note that, the exact computation of requires log memory.Informally, Apr samples each with probability ∼ 2 and maintains their sum using a counter Δ.It is easy to see that Δ/ is an approximation of (with additive error √ ) as long as As is an upper bound only on the expectation of 1 ≠0 , the algorithm Apr needs to nd another way to approximate the sum whenever 1 ≠0 >> .For this, Apr maintains two more counters and Γ, where counts the number of elements sampled in the sum Δ, and Γ stores exactly whenever counter becomes >> .Apr is formally described in Algorithm 2.
Algorithm 2: Algorithm Apr for approximate sum Input stream: 1 , . . ., , drawn from joint distribution D on {−1, 0, 1} Given parameters: √ with probability at least 1 − 1 3 over the private randomness ∼ Apr In Theorems 2.4 and 2.5, we establish the approximation and information cost guarantees for the algorithm Apr.Before, we note that Apr uses private randomness at Step 7 of the algorithm and de ne Apr to be a Ber( ) random variable for all ∈ [ ].
T 2.4.∀ > 4 √ , > √ , the output of Algorithm 2 (Apr) on every input stream ∈ {−1, 0, 1} satis es the following with probability at least 1 − 1 3 over the private randomness ∼ Apr , such that E ∼D =1 1 ≠0 ≤ , memory states of Algorithm 2 (Apr) satis es the following: ∀ ∈ {0, . . ., }, H(Apr ) ≤ 40 + 6 log log + 2 log √ Here, Apr denotes the random variable for Apr's memory state after reading input elements, and it depends on input , as well as the private randomness ∼ R Apr used by the algorithm.The random variables , ′ are as de ned in Subsection 2.2.1, where is drawn from uniform distribution on {−1, 1} and ′ corresponds to value ′ as in Algorithm 3. Let D be the joint distribution generated by Algorithm 3 on inputs to Apr that is, the joint distribution on ( 1 − ′ 1 , 2 − ′ 2 , . . ., − ′ ).Let be the random variable for the th input element to Apr, that is, = − ′ .Let {M ′ ( , ) } ∈ [ ], ∈ {0,..., } and M ′ < be random variables as de ned in Subsection 2.2.1 (these are random variables for corresponding values that appear in Algorithm 3).For the approximation algorithm Apr, we show Informally, we relate the probability of modi cation at step to the information that end memory states have about , conditioned on the previous memory states.The claim follows from the fact that the sum of this information over , is bounded by the entropy of the end states.Note that the above claim is tight if an end state Algorithm 3: Single pass algorithm O using -pass algorithm M for computing majority Input: a stream of . .uniform {−1, 1} bits 1 , . . ., Given parameters: > 0, > √ Goal: approximate =1 .
Intuitively, Lemma 2.8 shows that if output of the -pass algorithm M gives information about (measured by reduction in the variance), then the output of one-pass algorithm O also gives information about -sum of the input stream to O.The former guarantee implies the output of O contains information about ′ (the modi ed input); as O stores an approximation for ( − ′ ), this implies that it also has information about .All that remains to show is that the approximation for modi cation has an additive error of at most ( √ ), with high probability.For this, we use E ∼D =1 1 ≠0 ≤ • H(M < ) and ℓ 2 approximation guarantee for Apr from Theorem 2.4.

Solving Multiple Instances of the Coin Problem.
We generalize our multi-pass streaming lower bounds to solving multiple instances of the coin problem simultaneously.Informally, given interleaved input streams generated by . . .uniform bits each, the goal of a multi-pass streaming algorithm is to output the majority of an arbitrary stream at the end of passes.We show that any -pass streaming algorithm that solves the -Coins Problem requires Ω( log ) bits of memory (for < ).As for the single coin case, we reduce the multiple coin case to the analogous result for one-pass streaming algorithms proven by [9].We simulate the multi-pass algorithm for the -Coins Problem using a one-pass algorithm that maintains approximations for modifying each input stream to a valid stream for the -pass algorithm.For the generalization, we utilize the fact that the single coin simulation works even when the output of the rst pass has poly( ) entropy; for the -Coins Problem problem, we work with memories as large as log .

Multi-pass Streaming Lower Bound for the Needle Problem
Since we have shown that (M, 0 ) is upper bounded by 2 , it su ces to give an Ω(1/ 2 ) lower bound for (M, 0 ) as we formally present in Lemma 2.9.In the following, we give the intuition behind Lemma 2.9.
In the needle problem, we use the notion (M, 0 ) as we de ned before, where 0 stands for the uniform distribution.For simplicity, for from 1 to do Player sends M ( , +1 −1) to Player + 1 (send to Player 1 when = ) 10: end for 11: end for 12: return the output of Player we write (M) in the needle problem, and it could be easily distinguished from the notion (M) used in the coin problem.L 2.9.In the needle problem, if a -pass streaming algorithm M distinguishes between 0 and 1 with high probability, then we have (M) = Ω(1/ 2 ).
Let us rst consider the special case when = 1/2.A useful observation is that the needle problem with = 1/2 is very similar to the MostlyDISJ communication problem [35].Viewing the needle problem with = 1/2 as a multiparty communication problem (we name it MostlyEq), we have the following de nition: De nition 2.10 ( -party MostlyEq problem).There are parties in the communication problem, where the -th party holds an integer ∈ [ ].We promise that ( 1 , . . ., ) are sampled from either of the following distributions : (1) Uniform distribution (denoted by ): each is sampled from [ ] independently and uniformly.
(2) Mostly equal distribution (denoted by ): rst uniformly sample an element (needle) from [ ]. Then each independently with probability 1/2 equals , and uniform otherwise.The goal of the players is to distinguish which case it is.
The MostlyEq problem and the needle problem with = 1/2 are closely related.For the MostlyEq problem, we prove a information complexity lower bound, formalized by the following theorem: In other words, the information complexity of Π is Ω(1).
Here, the failure probability for a protocol Π is de ned by By a standard reduction to MostlyEq (constructing a communication protocol by simulation streaming algorithm), we know the mutual information I(M; ) between M = (M ( , ) ) ∈ [ ], ∈ [ ] and input = ( 1 , • • • , ) is also Ω(1).Then, we can prove that (M) ≥ I(M, ) ≥ Ω(1) with information theory calculations.Now, let us consider general ≤ 1/2.We rst use decompose the needle problem into many local needle problems by rede ning the sampling process of 1 as follows: ( (3) For each ∉ , the th streaming sample is uniformly random.(4) For each ∈ , the th streaming sample equals to with probability 1/2 and uniformly random otherwise.It is easy to see that the data stream sampled by the process above follows 1 .Thus, solving the needle problem with general is equivalent to solving the needle problem with = 1/2 hiding in a secret location .Then, we de ne local needle distribution as the distribution 1 condition on that the set sampled in Step (1) equals , and de ne local needle problem as distinguishing between and 0 within a small error.Since the streaming algorithm M does not know , if M solves the needle problem for general , M must distinguish between and 0 for at least a constant faction of .
If This comes from viewing the state of the streaming algorithms as the transcripts of the communication protocols.In addition, further information theory calculations show that

1 log 3 . 3 .
The remaining gap only exists in the range of > 1 log

1 )
{At the th time-step, Im stores these memory states} 21: end for > √ , and a stream of elements 1 , . . ., ∈ {−1, 0, 1} jointly drawn from a distribution D (such that E ∼D =1 1 ≠0 ≤ ), the aim is to output =1 up to an additive error of √ .Let Apr = ( Apr 1 , . . ., Apr ), where Apr denotes the random variable for private randomness used by algorithm Apr at the th time-step;

2. 2 . 3
Single-Pass Algorithm O for Computing Majority Using -Pass Algorithm M. After introducing the two components, we now describe the one-pass algorithm O that approximates the sum almost as well as the -pass algorithm M, while having similar information cost to M. O runs Algorithm 1 (Im) to imitate the passes of M -Im modi es input bit at the -th time-step to bit ′ .In parallel, O runs Algorithm 2 (Apr) on the modi cation -the -th input element to Apr is ( − ′ ) ∈ {−1, 0, 1}.After reading , O runs th time-steps of algorithms Im and Apr (the input to Apr is generated on the y), and stores the th memory states of both the algorithms.While describing O formally in Algorithm 3 (where parameters and would be decided later), we will restate algorithm Im and use Apr as a black-box.As used in Subsection 2.2.2, R Apr represents the private randomness used by algorithm Apr at the th time-step and Apr represents the random variable for th memory state ( and represent their instantiations).The input to Apr is denoted by .LetApr ( ∈ [ ]) represent the th transition function for algorithm Apr, that is, = Apr ( −1 , , ).Let Apr ( −1 , ) denote the random variable for the th memory state, when the th input element is , ( − 1)th memory state is −1 and private randomness is drawn from R Apr .
is de ned at Step 3. Let { ′ } ∈ [ ] denote the random variable for value ′ in Step 5 of Algorithm 1.These distributions depend on the joint distribution on ( , 2.6 suggests a value for parameter that Algorithm 3 should run Algorithm Apr on, so as to use approximation guarantees from Claim 2.4.Let O be Algorithm 3 with parameters = 10 and = √ • 1+ .We prove the following lemmas regarding information cost and output of algorithm O. See the full version for detailed proofs of these lemmas.Lemma 2.7 follows from careful disentanglement of the information costs for subroutines Im and Apr used in Algorithm O.