Streaming Algorithms with Few State Changes

In this paper, we study streaming algorithms that minimize the number of changes made to their internal state (i.e., memory contents). While the design of streaming algorithms typically focuses on minimizing space and update time, these metrics fail to capture the asymmetric costs, inherent in modern hardware and database systems, of reading versus writing to memory. In fact, most streaming algorithms write to their memory on every update, which is undesirable when writing is significantly more expensive than reading. This raises the question of whether streaming algorithms with small space and number of memory writes are possible. We first demonstrate that, for the fundamental $F_p$ moment estimation problem with $p\ge 1$, any streaming algorithm that achieves a constant factor approximation must make $\Omega(n^{1-1/p})$ internal state changes, regardless of how much space it uses. Perhaps surprisingly, we show that this lower bound can be matched by an algorithm that also has near-optimal space complexity. Specifically, we give a $(1+\varepsilon)$-approximation algorithm for $F_p$ moment estimation that uses a near-optimal $\widetilde{\mathcal{O}}_\varepsilon(n^{1-1/p})$ number of state changes, while simultaneously achieving near-optimal space, i.e., for $p\in[1,2]$, our algorithm uses $\text{poly}\left(\log n,\frac{1}{\varepsilon}\right)$ bits of space, while for $p>2$, the algorithm uses $\widetilde{\mathcal{O}}_\varepsilon(n^{1-2/p})$ space. We similarly design streaming algorithms that are simultaneously near-optimal in both space complexity and the number of state changes for the heavy-hitters problem, sparse support recovery, and entropy estimation. Our results demonstrate that an optimal number of state changes can be achieved without sacrificing space complexity.


Introduction
The streaming model of computation is a central paradigm for computing statistics for datasets that are too large to store.Examples of such datasets include internet traffic logs, IoT sensor networks, financial transaction data, database logs, and scientific data streams (such as huge experiments in particle physics, genomics, and astronomy).In the one-pass streaming model, updates to an underlying dataset are processed by an algorithm one at a time, and the goal is to approximate, collect, or compute some statistic of the dataset while using space that is sublinear in the size of the dataset (see [BBD + 02, M + 05] for surveys).
Formally, an insertion-only data stream is modeled by a sequence of updates u 1 , . . ., u m , each of the form u t ∈ [n] for t ∈ [m], where [n] = {1, . . ., n} is the universe size.The updates implicitly define an underlying frequency vector f ∈ R n by f i = |{t | u t = i}|, so that the value of each coordinate of the frequency vector is the number of occurrences of the coordinate identity in the data stream.
One of the most fundamental problems in the streaming literature is to compute a (1 + ε) approximation of the F p moment of the underlying frequency vector f , defined by F p (f ) = (f 1 ) p + . . .+ (f n ) p , where ε > 0 is an accuracy parameter.The frequency moment estimation problem has been the focus of more than two decades of study in the streaming model [AMS99, BJKS04, Woo04, IW05, Ind06, Li08, KNW10, LW13, BDN17, BVWY18, GW18, JW18a, JW19, WZ21a, WZ21b, ABJ + 22, BJWY22].In particular, F p -estimation is used for p = 0.25 and p = 0.5 in mining tabular data [CIKM02], for p = 1 in network traffic monitoring [FKSV02] and dynamic earth-mover distance approximation [Ind04], and for p = 2 in estimating join and self-join sizes [AGMS02] and in detecting network anomalies [TZ04].Note that a (1 + ε)-approximation to the F p moment also gives a (1 + O (ε))-approximation to the L p norm of f , defined by Another fundamental streaming problem is to compute L p -heavy hitters: given a threshold parameter ε ∈ (0, 1], the L p -heavy hitters problem is to output a list L containing all j ∈ [n] such that f j ≥ ε • ∥f ∥ p , and no j ∈ [n] with f j < ε 2 • ∥f ∥ p .The heavy-hitter problem is used for answering iceberg queries [FSG + 98] in database systems, finding elephant flows and spam prevention in network traffic monitoring [BEFK17], and perhaps has an even more extensive history than the F p moment estimation problem [MG82, BM91, CCF04, CM05, MAA05, MM12, Ind13, BCIW16, LNNT16, BCI + 17, BGL + 18, LNW18, INW22, BLMZ23, LT23].
The primary goal of algorithmic design for the streaming model is to minimize the space and update time of the algorithm.However, the generic per-update processing time fails to capture the nuanced reality of many modern database and hardware systems, where the type of updates that are made on a time step matter significantly for the real-world performance of the algorithm.Specifically, it is typically the case that updates that only require reads to the memory contents of the algorithm are significantly faster than updates that also modify the memory of the algorithm, i.e., writes.Thus, while many streaming problems are well-understood in terms of their space and update time, little is known about their write complexity: namely, the number of state changes made over the course of the stream.
In this paper, we propose the number of state changes of a streaming algorithm as a complexitytheoretic parameter of interest, and make the case for its importance as a central object of study, in addition to the space and update-time of an algorithm.While there is significant practical motivation for algorithms that update their internal state infrequently (see Section 1.1 for a discussion), from a theoretical perspective it is not clear that having few state changes is even possible.Specifically, most known streaming algorithms write to their memory contents on every update of the stream.Moreover, even if algorithms using fewer state changes existed, such algorithms would not be useful if they required significantly more space than prior algorithms that do not minimize the number of state changes.Therefore, we ask the following question: Is it possible to design streaming algorithms that make few updates to their internal state in addition to being space-efficient?
Our main contribution is to answer this question positively.Specifically, we demonstrate that algorithms exist that are simultaneously near-optimal in their space and state-change complexity.This demonstrates further that we do not need to pay extra in the space complexity to achieve algorithms with few state changes.
Challenges with deterministic algorithms.From a theoretical perspective, minimizing the number of internal state changes immediately rules out a large class of deterministic streaming algorithms.For example, counting the stream length can be performed deterministically by simply repeatedly updating a counter over the course of a stream.Similarly, L 1 -heavy hitters can be tracked deterministically and in sublinear space using the well-known Misra-Gries data structure [MG82].Another example is the common merge-and-reduce technique for clustering in the streaming model, which can be implemented deterministically if the coreset construction in each reduce step is deterministic.Other problems such as maintaining Frequent Directions [GLPW16] or L 2 regression in the row arrival model [BDM + 20, BFL + 23] also admit highly non-trivial deterministic algorithms that use sublinear space.However, these approaches all update the algorithm upon each stream update and thus seem inherently at odds with achieving a sublinear number of internal state changes over the course of the stream.
Relationship with sampling.On the other hand, sampling-based algorithms inherently seem useful for minimizing the number of internal state changes.There are a number of problems that admit sublinear-size coresets based on importance sampling, such as clustering [BFLR19, BFL + 21, CAWZ23], graph sparsification [AG09, BHM + 21], linear regression [CMP20], L p regression [WY23], and low-rank approximation [BDM + 20].These algorithms generally assign some value quantifying the "importance" of each stream update as it arrives and then sample the update with probability proportional to the importance.Thus if there are few additional operations, then the number of internal state changes can be as small as the overall data structure maintained by the streaming algorithm.On the other hand, it is not known that space-optimal sampling algorithms exist for a number of other problems that admit sublinear-space streaming algorithms, such as F p estimation, L p -heavy hitters, distinct elements, and entropy estimation.Hence, a natural question is to ask whether there exist space-optimal streaming algorithms for all of these problems that also incur a small number of internal state changes.

Our Contributions
In this work, we initiate the study of streaming algorithms that minimize state changes, and demonstrate the existence of algorithms which achieve optimal or near-optimal space bounds while simultaneously achieving an optimal or near optimal number of internal state changes.

Heavy-hitters.
We first consider the L p -heavy hitters problem, where the goal is to output estimates f j to the frequency f j of every item j ∈ given an input accuracy parameter ε ∈ (0, 1).Accurate estimation of the heavy-hitter frequencies is important for many other streaming problems, such as moment estimation, L p sampling [AKO11, JW18b, JWZ22], cascaded norms [JW09, JLL + 20], and others.Note that under such a guarantee, along with a 2-approximation of ∥f ∥ p , we can automatically output a list that contains all j ∈ [n] such that We defer discussion of how to obtain a 2-approximation to ∥f ∥ p for the moment and instead focus on the additive error guarantee for all f j .Our main result for the heavy hitters problem is the following: Theorem 1.1.Given a constant p ≥ 1, there exists a one-pass insertion-only streaming algorithm that has O n 1−1/p •poly log(nm), 1 ε internal state changes, and solves the L p -heavy hitter problem, i.e., it outputs a frequency vector f such that • polylog(mn) bits of space, while for p > 2, the algorithm uses Õ 1 ε 4+4p n 1−2/p bits of space.
We next give a lower bound showing that any algorithm solving the L p -heavy hitters problem requires Ω(n 1−1/p ) state updates.
Theorem 1.2.Let ε ∈ (0, 1) be a constant and p ≥ 1.Any algorithm that solves the L p -heavy hitters problem with threshold ε with probability at least 2 3 requires at least 1 2ε n 1−1/p state updates.
Together, Theorem 1.1 and Theorem 1.2 show that we achieve a near-optimal number of internal state changes.Furthermore, [BIPW10,JST11] showed that for any p > 0, the L p -heavy hitters problem requires Ω 1 ε p log n words of space, while [BJKS04, Gro09, Jay09] showed that for p > 2 and even for constant ε > 0, the L p -heavy hitters problem requires Ω n 1−2/p words of space.Therefore, Theorem 1.1 is near-optimal for all p ≥ 1, for both the number of internal state updates and the memory usage.

Moment estimation.
We then consider the F p moment estimation problem, where the goal is to output an estimate of F p (f ) = (f 1 ) p + . . .+ (f n ) p for a frequency vector f ∈ R n implicitly defined through an insertion-only stream.Our main result is the following: Theorem 1.3.Given a constant p ≥ 1, there exists a one-pass insertion-only streaming algorithm that has Õ n 1−1/p internal state changes, and outputs F p such that We next give a lower bound showing that any approximation algorithm achieving (2 − Ω(1))approximation to F p requires Ω(n 1−1/p ) state updates.
Theorem 1.4 shows that our algorithm in Theorem 1.3 achieves a near-optimal number of internal state changes.Moreover, it is known that any one-pass insertion-only streaming algorithm that achieves (1 + ε)-approximation to the F p moment estimation problem requires Ω 1 ε 2 + log n bits of space [AMS99,Woo04] for p ∈ [1, 2] and Ω 1 ε 2 n 1−2/p bits of space [WZ21a] for p > 2, and thus Theorem 1.3 is also near-optimal in terms of space for all p ≥ 1.

Technical Overview
Heavy-hitters.We first describe our algorithm for L p -heavy hitters using near-optimal space and a near-optimal number of internal state changes.For ease of discussion, let us assume that F p = Θ ε (n), so that the goal becomes to identify the coordinates j ∈ [n] with f j ≥ ε • n 1/p , given an input accuracy parameter ε ∈ (0, 1).
We first define a subroutine SampleAndHold based on sampling a number of items into a reservoir Q.As we observe updates in the stream, we sometimes update the contents of Q and sometimes observe that some stream updates are to coordinates that are being held by the reservoir.For the items that have a large number of stream updates while they are being held by the reservoir, we create separate counters for these items.
We first describe the intuition for p > 2. We create a reservoir Q of size κ = O ε (n 1−2/p ) and sample each item of the stream into the reservoir Q with probability roughly Θ ε 1 n 1/p .Note that at some point we may attempt to sample an item of the stream into the reservoir Q when the latter is already full.In this case, we choose a uniformly random item of Q to be replaced by the item corresponding to the stream update.
Our algorithm also checks each stream update to see if it matches an item in the reservoir; if there is a match, we create an explicit counter tracking the frequency of the item.In other words, if j ∈ [n] arrives as a stream update and j ∈ Q is in the reservoir, then our algorithm SampleAndHold creates a separate counter for j to count the number of subsequent instances of j.
Now for a heavy hitter j ∈ [n], we have f j ≥ ε • n 1/p and thus since the sampling probability is Θ ε 1 n 1/p , then we can show that j will likely be sampled into our reservoir Q at some point.In fact, since the reservoir Q has size κ = O ε (n 1−2/p ), then in expectation, j will be retained by the reservoir for roughly Ω ε (n 1−1/p ) stream updates before it is possibly replaced by some other item.Moreover, since f j ≥ ε • n 1/p , then we should expect another instance of j to arrive in Ω ε (n 1−1/p ) additional stream updates, where the expectation is taken over the randomness of the sampled positions, under the assumption that F p = Θ(n), which implies also that the stream length is at most O (n).Therefore, our algorithm will create a counter for tracking j before j is removed from the reservoir Q.Furthermore, we can show that the counter for j is likely created sufficiently early in the stream to provide a (1 + ε)-approximation to the frequency f j of j.Then to decrease the number of internal state updates, we can use Morris counters to approximate the frequency of subsequent updates for each tracked item.
Counter maintenance for heavy-hitters.The main issue with this approach is that too many counters can be created.As a simple example, consider when all items of each coordinate arrive together, one item after the other.Although this is a simple case for counting heavy-hitters, our algorithm will create counters for almost any item that it samples, and although our reservoir uses space O ε (n 1−2/p ), the total number of items sampled over the course of the stream is O ε (n 1−1/p ) and thus the number of created counters can also be O ε (n 1−1/p ), which would be too much space.We thus create an additional mechanism to remove half of the counters each time the number of counters becomes too large, i.e., exceeds O ε (n 1−2/p ).
The natural approach would be to remove half of the counters with the smallest tracked frequencies.However, one could imagine a setting where m = Θ(n) and the p-th moment of the stream is also Θ(n), but there exists a heavy-hitter with frequency Θ(n 1/p ) that appears every Θ(n 1−1/p ) updates.In this case, even if the heavy-hitter is sampled, its corresponding counter will be too low to overcome the other counters in the counter maintenance stage.Therefore, we instead remove the counters with the smallest tracked frequencies for each set of counters that have been maintained for a number of steps between 2 z and 2 z+1 for each integer z ≥ 0, which overcomes per-counter analysis in similar algorithms based on sampling [BO13,BKSV14].
Since our algorithm samples each item of the stream of length O (n) with probability 1 Oε(n 1/p ) , then we expect our reservoir to have Ω ε (n 1−1/p ) internal state changes.On the other hand, the counters can increment each time another instance of the tracked item arrives.To that end, we replace each counter with an approximate counter that has a small number of internal state changes.In particular, by using Morris counters as mentioned above, the number of internal state changes for each counter is poly log n, 1 ε , log 1 δ times over the course of the stream.Therefore, the total number of internal state changes is Ω ε (n 1−1/p ) while the total space used is O ε (n 1−2/p ).
For p ∈ [1, 2], we instead give the reservoir Q a total of κ = poly ε (log n) size, so that the total space is poly ε (log n) while the total number of internal state changes remains Ω ε (n 1−1/p ).
Removing moment assumptions.To remove the assumption that F p = O ε (n), we note that if each element of the stream of length m is sampled with probability q < 1, then the expected number of sampled items is qm, but the p-th power of the expected number of sampled items is (qm) p .Although this is not the p-th moment of the stream, we nevertheless can expect the F p moment of the stream to decrease at a faster rate than the number of sampled items.Thus we create L = O (log(nm)) substreams so that for each ℓ ∈ [L], we subsample each stream update [m] with probability 1 2 ℓ−1 .For one of these substreams J ℓ , we will have We show that we can estimate the frequency of the heavy-hitters in the substream J ℓ and then rescale by the inverse sampling rate to achieve a (1 + ε)-approximation to the frequency of the heavy-hitters in the original stream.
It then remains to identify the correct stream ℓ such that F p (J ℓ ) = O ε (n).A natural approach would be to approximate the moment of each substream, to identify such a correct stream.However, it turns out that our F p moment estimation algorithm will ultimately use our heavy hitter algorithm as a subroutine.Furthermore, other F p moment estimation algorithms, e.g., [AMS99, Ind06, Li08, GW18], use a number of internal state changes that is linear in the stream length and it is unclear how to adapt these algorithms to decrease the number of internal state changes.Instead, we note that with high probability, the estimated frequency of each heavy-hitter by our algorithm can only be an underestimate.This is because if we initialize counters throughout the stream to track the heavy hitters, then our counters might miss some stream updates to the heavy hitters, but it is not possible to overcount the frequency of each heavy hitter, i.e., we cannot count stream updates that do not exist.Moreover, this statement is still true, up to a (1 + ε) factor, when we use approximate counters.Therefore, it suffices to use the maximum estimation for the frequency for each heavy hitter, across all the substreams.We can then use standard probability boosting techniques to simultaneously accurately estimate all L p -heavy hitters.
Moment estimation.Given our algorithm for finding (1 + ε)-approximations to the frequencies of L p -heavy hitters, we now adapt a standard subsampling framework [IW05] to reduce the F p approximation problem to the problem of finding the L p -heavy hitters.The framework has subsequently been used in a number of different applications, e.g., [WZ12, BBC + 17, LSW18, BWZ21, WZ21a, MWZ22, BMWZ23] and has the following intuition.
For ease of discussion, consider the level set Γ i consisting of the coordinates j ∈ for each i, though we remark that for technical reasons, we shall ultimately define the level sets in a slightly different manner.Because the level sets partition the universe [n], then if we define the contribution C i := j∈Γ i (f k ) p of a level set Γ i to be the sum of the contributions of all their coordinates, then we can decompose the moment F p into the sum of the contributions of the level sets, F p = i C i .Moreover, it suffices to accurately estimate the contributions of the "significant" level sets, i.e., the level sets whose contribution is at least a poly ε, 1 log(nm) fraction of the F p moment, and crudely estimate the contributions of the insignificant level sets.
[IW05] observed that the contributions of the significant level sets can be estimated by approximating the frequencies of the heavy hitters for substreams induced by subsampling the universe at exponentially smaller rates.We emphasize that whereas we previously subsampled updates of the stream [m] for heavy hitters, we now subsample elements of the universe [n].That is, we create L = O (log(nm)) substreams so that for each ℓ ∈ [L], we subsample each element of the universe [n] into the substream with probability 1 2 ℓ−1 .For example, a single item with frequency F 1/p p will be a heavy hitter in the original stream, which is also the stream induced by ℓ = 1.On the other hand, if there are n items with frequency (F p /n) 1/p , then they will be L p -heavy hitters at a subsampling level where in expectation, there are roughly Θ 1 ε p coordinates of the universe that survive the subsampling.Then [IW05] notes that (1 + ε)-approximations to the contributions of the surviving heavy-hitters can then be rescaled inversely by the sampling rate to obtain good approximations of the contributions of each significant level set.The same procedure also achieves crude approximations for the contributions of insignificant level sets, which overall suffices for a (1 + ε)-approximation to the F p moment.
The key advantage in adapting this framework over other F p estimation algorithms, e.g., [AMS99, Ind06, Li08, GW18] is that we can then use our heavy hitter algorithm FullSampleAndHold to provide (1 + ε)-approximations to the heavy hitters in each substream while guaranteeing a small number of internal state changes.

Additional Intuition and Comparison with Previous Algorithms
There are a number of differences between our algorithm and the sample-and-hold approach of [EV02].Firstly, once [EV02] samples an item, a counter will be initialized and maintained indefinitely for that item.By comparison, our algorithm will sample more items than the total space allocated to the algorithm, so we must carefully delete a number of sampled items.In particular, it is NOT correct to delete the sampled items with the largest counter.Secondly, [EV02] updates a counter each time a subsequent instance of the sample arrives.Because our paper is focused on a small number of internal state changes, our algorithm cannot afford such a large number of updates.Instead, we maintain approximate counters that sacrifice accuracy in exchange for a smaller number of internal state changes.We show that the loss in accuracy can be tolerated in choosing which samples to delete.
Another possible point of comparison is the precision sampling technique of [AKO11], which is a linear sketch, so although it has an advantage of being able to handle insertion-deletion streams, unfortunately it must also be updated for each stream element arrival, resulting in a linear number of internal state changes.Similarly, a number of popular heavy-hitter algorithms such as Misra-Gries [MG82], CountMin [CM05], CountSketch [CCF04], and SpaceSaving [MAA05] can only achieve a linear number of internal state changes.By comparison, our sample-and-hold approach results in a sublinear number of internal state changes.
Finally, several previous algorithms are also based on sampling a number of items throughout the stream, temporarily maintaining counters for those items, and then only keeping the items that are globally heavy, e.g., [BO13,BKSV14].It is known that these algorithms suffer a bottleneck at p = 3, i.e., they cannot identify the L p heavy-hitters for p < 3. The following counterexample shows why these algorithms cannot identify the L 2 heavy-hitters and illustrates a fundamental difference between our algorithms.
Suppose the stream consists of √ n blocks of √ n updates.Among these updates, there are √ n items with frequency n 1/4 , which we call pseudo-heavy.There is a single item with frequency √ n, which is the heavy-hitter.Then the remaining items each have frequency 1 and are called light.Note that the second moment of the stream is Θ(n), so that only the item with frequency √ n is the heavy-hitter, for constant ε < 1.
Let S = {1, 2, . . ., n 1/4 } and suppose for each w ∈ S, block w is a special block that consists of n 1/4 different pseudo-heavy items, each with frequency n 1/4 .Let T = x + S, for x = {1, 2, . . ., n 1/8 }, so that T consists of the n 1/8 blocks after each special block.Each block in T consists of n 1/8 instances of the heavy-hitter, along with √ n − n 1/8 light items.The remaining blocks all consist of light items.
Observe that without dynamic maintenance of counters for different scales, in each special block, we will sample polylog(n) pseudo-heavy items whose counters each reach about Õ n 1/4 .But then each time a heavy-hitter is sampled, its count will not exceed the pseudo-heavy item before the number of counters before it is deleted, because it only has n 1/8 instances in its block.Thus with high probability, the heavy-hitter will never be found, and this is an issue with previously existing sampling-based algorithms, e.g., [BO13,BKSV14].
Our algorithm overcomes this challenge by only performing maintenance on counters that have been initialized for a similar amount of time.Thus in the previous example, the counters for the heavy-hitters will not be deleted because they are not compared to the counters for the pseudo-heavy items until the heavy-hitters have sufficiently high frequency.By comparison, existing algorithms will retain counters for the pseudo-heavy items, because they locally look "larger", at the expense of the true heavy-hitter.

Reference State Changes Setting
Table 1: Summary of our results compared to existing results for a stream of length m on a universe of size n.We emphasize that reporting L 2 heavy-hitters includes the L 1 heavy-hitters.All algorithms use near-optimal space.

Preliminaries
We , where we use the convention σ 0 = ∅.Then we say the total number of internal changes by the algorithm A is m t=1 x t .We also require the following definition of Morris counters to provide approximate counting.

Theorem 1.5 (Morris counters). [Mor78, NY22] There exists an insertion-only streaming algorithm (Morris counter) that uses space (in bits)
O log log n + log 1 ε + log log 1 δ and outputs a (1 + ε)approximation to the frequency of an item i, with probability at least 1 − δ.Moreover, the algorithm is updated at most poly log n, 1 ε , log 1 δ times over the course of the stream.

Heavy Hitters
In this section, we first describe our algorithm for identifying and accurately approximating the frequencies of the L p -heavy hitters.

Sample and Hold
A crucial subroutine for our F p estimation algorithm is the accurate estimation of heavy hitters.In this section, we first describe such a subroutine SampleAndHold for approximating the frequencies of the L p -heavy hitters under the assumption that F p is not too large.We now describe our algorithm for the case p ≥ 2. Our algorithm creates a reservoir Q of size κ = O ε (n 1−2/p ) and samples each item of the stream into the reservoir Q with probability roughly Θ ε 1 n 1/p .If the reservoir Q is full when an item of the stream is sampled, then a uniformly random item of Q is replaced with the stream update.Thus if the stream has length n, then we will incur O ε (n 1−2/p ) internal state changes due to the sampling.For stream length m, we set the sampling probability to be roughly Oε(n 1−1/p ) m .
Our algorithm also checks each stream update to see if it matches an item in the reservoir and creates a counter for the item if there is a match.In other words, if j ∈ [n] arrives as a stream update and j ∈ Q is in the reservoir, then our algorithm SampleAndHold creates a separate counter for j to count the number of subsequent instances of j.In addition, each time the number of counters becomes too large, i.e., exceeds O ε (n 1−2/p ), we remove half of the counters that have been maintained for a time between 2 z and 2 z+1 , for each integer z > 0. In particular, we remove the counters with the smallest tracked frequencies for each of the groups.To reduce the number of internal state changes, we use Morris counters rather than exact counters for each item.
For p ∈ [1, 2], we set κ = poly ε (log n) to be the size of the reservoir Q, so that the total space is poly ε (log n) while the total number of internal state changes remains Ω ε (n 1−1/p ).Our algorithm SampleAndHold appears in Algorithm 1.

Algorithm 1 SampleAndHold
Input: Stream s 1 , . . ., s m of items from [n], accuracy parameter ε ∈ (0, 1), p ≥ 1 Output: Accurate estimation of an L p heavy hitter frequency if there exist k active Morris counters initialized between time t − 2 z and t − 2 z+1 for integer z > 0 then ▷too many counters Retain the k 2 counters initialized between time t − 2 z and t − 2 z+1 for integer z > 0 with largest approximate frequency 22: return the estimated frequencies by the Morris counters We first analyze how many additional counters are available at a time when a heavy hitter j is sampled.
nm), 202pκ log 2 (nm)] be chosen uniformly at random.Let v be the last time that j is sampled by the algorithm.Then with probability at least 1 − 1 50p log 2 (nm) , there are at most k − κ counters at time v.
Proof.Consider any fixing of the random samples of the algorithm and the random choices of k, before time v.Let T 1 < T 2 < . . .be the sequence of times when the counters are reset.Note that since k ∈ [200pκ log 2 (nm), 202pκ log 2 (nm)], then each time the counters are reset, between [100pκ 2 (nm), 101pκ log 2 (nm)] counters are newly allocated.Note that the sequence could be empty, in which case our claim is vacuously true.For each T i , consider the times T i at which additional counters would be created after T i is fixed if there were no limit to the number of counters.Moreover, let u i be the choice of k at time T i .Note that the first 100pκ log 2 (nm) of the times in T i are independent of the choice of u i , while latter times in T i may not actually be sampled due to the choice of u i .Let T w be the first time for which which v appears in the first 101pκ log 2 (nm) terms of the T w .Then with probability at most 1 100p log 2 (mn) , the choice of u w will be within κ indices after v in the sequence T i .On the other hand, the choice of u w could cause T w+1 to be before v, e.g., if v is the (101pκ log 2 (nm))-th term and u w = 100κ log 2 (nm), in which case the same argument shows that with probability at most 1 100p log 2 (mn) , the choice of u w+1 will be within κ indices after v in the sequence T w+1 .Note that since u w + u w+1 ≥ 200pκ 2 (nm), then v must appear before T w+2 .Therefore by a union bound, with probability at least 1 − 1 50p log 2 (nm) , there are at most k − κ counters at time v.
We next upper bound how many additional counters are created between the time a heavy hitter j is sampled until it becomes too large to delete.

Lemma 2.2. Suppose m ≥ n and
j} and let u ∈ J be chosen uniformly at random and suppose that the algorithm samples j at time u.Then for p ∈ [1, 2), over the choice of u and the internal randomness of the algorithm, the probability that fewer than κ 1 = O log 11+3p (mn) ε 4+4p new counters are generated after u and before ε 4 •Fp 2 10 γ log 2 (nm) additional instances of j arrive is at most 1 − 1 100p log(nm) .Similarly for p ≥ 2, over the choice of u and the internal randomness of the algorithm, the probability that fewer than κ 2 = O n 1−2/p log 11+3p (mn) ε 4+4p new counters are generated after u and before 2 14 γ 2 log 4 (nm) .We define B 1 , . . ., B α to be blocks that partition the stream, so that the i-th block includes the items of the stream after the (i − 1)-th instance of f j , up to and including the i-th instance of f j .We therefore have α = f j + 1.
Observe that since (f j ) p ≥ ε 2 •Fp 2 10 γ log 2 (nm) , then we have f j = βX for some β > 1.Since |W ℓ | ≤ Fp 2 pℓ , then the expected number of unique items in W ℓ contained in a block is at most Fp 2 pℓ α , conditioned on any fixing of indices that are sampled.Thus in a block, the conditional expectation of the number of stream updates that correspond to items in W ℓ is at most Fp 2 (p−1)ℓ α , and so the expected number of items in W ℓ that are sampled in a block is at most ϱFp 2 (p−1)ℓ α .Therefore, the expected number of retained items in the previous and following 2 i blocks for i ≤ ℓ is at most . For i > ℓ, note that since W ℓ only contains elements of frequency 2 ℓ , then no elements of W ℓ will be retained over j once the consideration is for 2 i > 2 ℓ blocks.
For p ∈ [1, 2), note that 2 ℓ ≤ F . Therefore, after 2 i blocks, the expected number of items in W ℓ is at most , we have that the expected number of retained items in the previous and following 2 i blocks is at most . By Markov's inequality, we have that with probability 1 − 1 100p log 3 (nm) over a random time u, the number of counters for the items with frequency in across 2 i blocks before and after u.Thus, by a union bound over all i = O (log m) and L = O (p log(nm)) choices of ℓ, we have that with probability at least 1 − 1 100p log(nm) , κ 1 new counters are generated after u.For p ≥ 2, we have that the expected number of retained items across 2 i blocks is at most ε 2 m and X = ε 2 F 1/p p 2 14 γ 2 log 4 (nm) , we have that the expected number of retained items across 2 i blocks is at most . By Markov's inequality, we have that with probability 1 − 1 100p log 3 (nm) over a random time u, the number of counters for the items with frequency in across 2 i blocks after u.Moreover, observe that the number of counters generated within 2 z timesteps is certainly at most the number of counters generated within 2 z blocks.Thus, by a union bound over all z = O (log m) and L = O (p log(nm)) choices of ℓ, we have that with probability at least 1 − 1 100p log(nm) , κ 2 new counters are generated after u.By Lemma 2.1 and Lemma 2.2, we have: Then with probability at least 1 − 1 100p log(nm) , the counters are not reset between the times at which j is sampled and ε 4 •Fp 2 10 γ log 2 (nm) occurrences of j arrive after it is sampled.
We now claim that a heavy hitter j will be sampled early enough to obtain a good approximation to its overall frequency.

Lemma 2.4. Suppose
Then with probability at least 0.99, SampleAndHold outputs f j such that Proof.Note that for the purposes of analysis, we can assume m ≥ n, since otherwise if m < n, then SampleAndHold essentially redefines n to be the number of unique items in the induced stream by setting ϱ and κ appropriately, even though the overall universe can be larger.Note that By assumption, we have (f j ) p ≥ ε 2 •Fp 2 7 γ log 2 (nm) and thus, we certainly have 2 7 γ log 2 (nm) for p ≥ 2 and ε ∈ (0, 1).Let T be the set of the first ε 16 log(nm) fraction of occurrences of j, so that .
We claim that with probability at least 2 3 , a Morris counter for j will be created by the stream as it passes through T .Indeed, observe that since each item of the stream is sampled with probability 2ε 2 (F p ) 1/p , then we have with high probability, an index in T is sampled.By Lemma 2.3, the index will not be removed by the counters resetting.Since T is the set of the first ε 16 log(nm) fraction of occurrences of j, then the Morris counter is used for at least 1 − ε 16 log(nm) f j occurrences of j.We use Morris counters with multiplicative accuracy 1 + O ε log(nm) . Hence by Theorem 1.5, we obtain an output f j such that Finally, for the purposes of completeness, we remark that an item that is not a heavy-hitter cannot be reported by our algorithm because our algorithm only counts the number of instances of tracked items and thus it always reports underestimates of each item.Underestimates of items that are not heavy-hitters will be far from the threshold for heavy-hitters and thus, those items will not be reported.

Full Sample and Hold
We now address certain shortcomings of Lemma 2.4 -namely, the assumption that and the fact that Lemma 2.4 only provides constant success probability for each heavy hitter j ∈ [n], but there can be many of these heavy-hitters.

4:
Let m (r) x be the length of the stream J (r) x

5:
Run SampleAndHold (r) x on J (r) x 6: Let f (r,x) j be the estimated frequency for j by SampleAndHold (r) We first show subsampling allows us to find a substream that satisfies the required assumptions.We can then boost the probability of success for estimating the frequency of each heavy hitter using a standard median-of-means argument.

Lemma 2.5. Let j ∈ [n] be an item with
Proof.Consider a fixed j ∈ [n] with (f j ) p ≥ ε 2 •Fp 2 10 γ log 2 (nm) .Let q > 0 be the integer such that f j 2 q ∈ 400 ε 2 , 800 ε 2 .Let A j be the random variable denoting the number of occurrences of j in J q and note that over the randomness of the sampling, we have . By Chebyshev's inequality, we have that the number of occurrences of j in J (r) q is a (1 + ε)-approximation of f j 2 q with probability at least 0.9.We similarly have that with probability at least 0.9, , then by a Chernoff bound, we have that for any y ∈ [n], the number of occurrences of y in J (r) for some constant ξ > 1, with high probability.Thus by a union bound, we have that with high probability, for some sufficiently large constant ξ ′ > 1, which satisfies the assumptions of Lemma 2.4.Hence we have with probability at least 0.99, f (r,q) j is a (1 + ε)-approximation to the number of occurrences of j in J (r) q .Thus, f (r,q) j ≤ 2•800 p ε 2p •m q with probability at least 0.7.Observe that since (f j ) p ≥ ε 2 •Fp 2 10 γ log 2 (nm) , then for any y ∈ [n], we have that the expected number of occurrences of y in J (r) q is at most , for some sufficiently large constant ξ > 1. Hence by standard Chernoff bounds, we have that the median satisfies with high probability.Moreover, observe that (1) for any stream with subsampling rate 1 2 x > 1 2 q , we similarly have that the number of occurrences of j in J x is a (1 + ε)-approximation of f j 2 x with high probability and (2) SampleAndHold cannot overestimate the frequency of j.Thus, f • (f j ) p with high probability.
Putting things together, we obtain Theorem 1.1.
Theorem 1.1.Given a constant p ≥ 1, there exists a one-pass insertion-only streaming algorithm that has O n 1−1/p •poly log(nm), 1 ε internal state changes, and solves the L p -heavy hitter problem, i.e., it outputs a frequency vector f such that • polylog(mn) bits of space, while for p > 2, the algorithm uses Õ 1 ε 4+4p n 1−2/p bits of space.
Proof.Correctness of the algorithm for the heavy-hitters follows from Lemma 2.5.For the items that are not heavy-hitters, observe that by a standard concentration argument, it follows that with high probability across the independent instances, the median of the number of sampled items after rescaling will not surpass the threshold.
We next analyze the space complexity.For p ∈ [1, 2], only O log 11+3p (mn) counters are stored, while for p > 2, only Õ 1 ε 4+4p n 1−2/p counters are stored.Hence, the space complexity follows.It remains to analyze the number of internal state changes.The internal state can change each time an item is sampled.Since each item of the stream is sampled with probability ϱ = γ 2 n 1−1/p log 4 (nm) ε 2 m , then with high probability, the total number of internal state changes is .

F p Estimation
In this section, we present insertion-only streaming algorithms for F p estimation with a small number of internal state changes.We first observe that an F p estimation algorithm by [JW19] achieves a small number of internal state changes for p < 1.We then build upon our L p -heavy hitter algorithm to achieve an F p estimation algorithm that achieves a small number of internal state changes for p > 1.

F p Estimation, p < 1
As a warm-up, we first show that the F p estimation streaming algorithm of [JW19] uses a small number of internal state changes for p < 1.We first recall the following definition of the p-stable distribution: Definition 3.1 (p-stable distribution).[Zol89] For 0 < p ≤ 2, the p-stable distribution D p exists and satisfies n i=1 Z i x i ∼ ∥x∥ p • Z for Z, Z 1 , . . ., Z n ∼ D p and any vector x ∈ R n .
A standard method [Nol03] for generating p-stable random variables is to first generate θ ∼ Uni − π 2 , π 2 and r ∼ Uni([0, 1]) and then set The F p estimation streaming algorithm of [JW19] first generates a sketch matrix D ∈ R k×n , where k = O 1 ε 2 and each entry of D is generated from the p-stable distribution.Observe that D can be viewed as k vectors D (1) , . . ., D (k) ∈ R n of p-stable random variables.For i ∈ [k], suppose we maintained ⟨D (i) , x⟩, where x is the frequency vector induced by the stream.Then it is known [Ind06] that with constant probability, the median of these inner products is a (1 + ε)-approximation to F p .
[JW19] notes that each vector D (i) can be further decomposed into a vector D (i,+) containing the positive entries of D (i) and a vector D (i,−) containing the negative entries of D (i) .Since D (i) = D (i,+) + D (i,−) , then it suffices to maintain ⟨D (i,+) , x⟩ and ⟨D (i,−) , x⟩ for each i ∈ [k].For insertion-only streams, all entries of x are non-negative, and so the inner products ⟨D (i,+) , x⟩ and ⟨D (i,−) , x⟩ are both monotonic over the course of the stream, which permits the application of Morris counters.Thus the algorithm of [JW19] instead uses Morris counters to approximately compute ⟨D (i,+) , x⟩ and ⟨D (i,−) , x⟩ to within a (1 + O (ε))-multiplicative factor.The key technical point is that [JW19] shows that for p < 1 and so (1 + O (ε))-multiplicative factor approximations to ⟨D (i,+) , x⟩ and ⟨D (i,−) , x⟩ are enough to achieve a (1 + ε)-approximation to ⟨D (i) , x⟩.Now the main gain is that using Morris counters to approximate ⟨D (i,+) , x⟩ and ⟨D (i,−) , x⟩, not only is the overall space usage improved for the purposes of [JW19], but also for our purposes, the number of internal state updates is much smaller.
As an additional technical caveat, [JW19] notes that the sketching matrix D cannot be stored in the allotted memory.Instead, [JW19] notes that by using the log-cosine estimator [KNW10] instead of the median estimator, the entries of D can be generated using O log(1/ε) log log(1/ε) -wise independence, so that storing the randomness used to generate D only requires O log(1/ε) log log(1/ε) log n bits of space.

F p Estimation, p > 1
In this section, we present our F p approximation algorithm for insertion-only streams that only have Ω ε (n 1−1/p ) internal state updates for p > 1.
We first define the level sets of the F p moment, as well as the contribution of each level set.
Definition 3.3 (Level sets and contribution).Let F p be the power of two such that , we define the level set We define the contribution C ℓ and the fractional contribution ϕ ℓ of level set Γ ℓ to be C ℓ := i∈Γ ℓ (f i ) p and ϕ ℓ := C ℓ Fp .For an accuracy parameter ε and a stream of length m, we say that a level set Γ ℓ is significant if its fractional contribution ϕ ℓ is at least ε 2p log(nm) .Otherwise, we say the level set is insignificant.
Our algorithm follows from the framework introduced by [IW05] and subsequently used in a number of different applications, e.g., [WZ12, BBC + 17, LSW18, BWZ21, WZ21a, MWZ22, BMWZ23] and has the following intuition.We estimate the contributions of the significant level sets by approximating the frequencies of the heavy hitters for substreams induced by subsampling the universe at exponentially smaller rates.Specifically, we create L = O (log n) substreams where for each ℓ ∈ [L], we subsample each element of the universe [n] into the substream with probability 1 2 ℓ−1 .We rescale (1 + ε)-approximations to the contributions of the surviving heavy hitters by the inverse of the sampling rate to obtain good approximations of the contributions of each significant level set.
To guarantee a small number of internal state changes, we use our heavy hitter algorithm FullSampleAndHold to provide (1 + ε)-approximations to the heavy hitters in each substream, thereby obtaining good approximations to the contributions of each significant level set.Our algorithm appears in full in Algorithm 3.
We note the following corollary of Lemma 2.5.
2 7 γ log 2 (nm) .Then with probability at least 9 10 , H (r) ℓ outputs f j with Proof.The proof follows from Lemma 2.5 and the fact that FullSampleAndHold is only run on the substream induced by I (r) ℓ .
Lemma 3.5.Let ε ∈ (0, 1), Γ i be a fixed level set and let ℓ := max 1, i − log γ log(nm) be the outputs of the Morris counters at level i 9: Let M be the power of two such that m p ≤ M < 2m p 10: Let S (r) i be the set of ordered pairs (j, f j ) of Proof.We consider casework on whether i − log γ log 2 (nm) > 1.This corresponds to whether the frequencies f j p in a significant level set are large or not large, informally speaking.If the frequencies are large, then it suffices to estimate them using our sampling-based algorithm.However, if the frequencies are not large, then subsampling must first be performed before we can estimate the frequencies using our sampling-based algorithm.Suppose i− log γ log 2 (nm) and thus Moreover, for ℓ = max 1, i − log γ log 2 (nm) ε 2 , we have ℓ = 1, so we consider the outputs by the Morris counters H (r) 1 .By Lemma 3.4, we have that with probability at least 9 10 , • (f j ) p , as desired.
For the other case, suppose i − log γ log 2 (nm) and therefore, Conditioning on the event E 2 , we have Therefore by Lemma 3.4, we have that with probability at least 9 10 , • (f j ) p , as desired.
We now justify the approximation guarantees of our algorithm.
Lemma 3.6.Pr F p − F p ≤ ε • F p ≥ 2 3 .Proof.We would like to show that for each level set i, we accurately estimate its contribution C i , i.e., we would like to show . On the other hand, C i is a scaled sum of items j whose estimated frequency satisfies f j p . Then j could be classified into contributing to Thus we first consider an idealized process where j is correctly classified across all level sets and show that in this idealized process, we achieve a (1 + O (ε))-approximation to F p .We then argue that because we choose λ uniformly at random, then only a small number of coordinates will be misclassified and so our approximation guarantee will only slightly degrade, but remain a (1 + ε)-approximation to F p .
Idealized process.We first show that in a setting where ( f j ) p is correctly classified for all j, then for a fixed level set i, we have 2 i and let E 2 be the event that F p (I Conditioned on E 1 , E 2 , and j ∈ I (r) i , then we have that Pr [E 3 ] ≥ 9 10 , by Lemma 3.5.We define where we define Note that where we recall that C i denotes the contribution of level set Γ i .We also have Thus by Chebyshev's inequality, we have To analyze the probability of the events E 1 and E 2 occurring, note that in level ℓ, each item is sampled with probability 2 −ℓ+1 .Hence, Hence by a union bound over the p log(nm) level sets, Randomized boundaries.Given a fixed r ∈ [R], we say that an item j ∈ [n] is misclassified if there exists a level set Γ i such that but for the estimate f j , we have either By Lemma 3.4, we have that conditioned on E 3 , independently of the choice of λ.Since λ ∈ 1 2 , 1 is chosen uniformly at random, then the probability that j ∈ [n] is misclassified is at most ε 2 log(nm) .Furthermore, if j ∈ [n] is misclassified, then it can only be classified into either level set Γ i+1 or level set Γ i−1 , because f j p is a 1 + ε 8 log(nm) -approximation to (f j ) p .Thus, a misclassified index induces at most 2(f j ) p additive error to the contribution of level set Γ i .In expectation across all j ∈ [n], the total additive error due to misclassification is at most F p • ε 2 log(nm) .Therefore by Markov's inequality for sufficiently large n and m, the total additive error due to misclassification is at most ε 2 • F p with probability at least 0.999.Hence in total, Putting things together, we give the full guarantees of our F p estimation algorithm in Theorem 1.3.
Proof.The space bound follows from the fact that for p ∈ [1, 2], only O log 11+3p (mn) ε 4+4p counters are stored, while for p > 2, only Õ 1 ε 4+4p n 1−2/p counters are stored.The internal state can change each time an item is sampled.Since each item of the stream is sampled with probability ϱ = γ 2 n 1−1/p log 4 (nm) ε 2 m , then with high probability, the total number of internal state changes is . Finally for correctness, we have by Lemma 3.6, that

Entropy Estimation
In this section, we describe how to estimate the entropy of a stream using a small number of internal state changes.Recall that for a frequency vector f ∈ R n , the Shannon entropy of f is defined by H(f ) = − n i=1 f i log f i .Observe that any algorithm that obtains a (1 + O (ε))-multiplicative approximation to the function h(f ) = 2 H(f ) also obtains an O (ε)-additive approximation of the Shannon entropy H(f ), and vice versa.Hence to obtain an additive ε-approximation to the Shannon entropy, we describe how to obtain a multiplicative (1 + ε)-approximation to h(f ) = 2 H(f ) .

Lemma 3.7 ([HNO08]
).Given an accuracy parameter ε > 0, let k = log 1 ε + log log m and ε ′ = ε 12(k+1) 3 log m .Then there exists an efficiently computable set {p 0 , . . ., p k } such that p i ∈ (0, 2) for all i, as well as an efficiently computable deterministic function that uses Section 3.3 in [HNO08] describes how to compute the set {p 0 , . . ., p k } in Lemma 3.7 as follows.We define ℓ = 1 2(k+1) log m and the function g(z) = ℓ(k 2 (z−1)+1)) 2k 2 +1 .For each p i , we set p i = 1 + g(cos(iπ/k)), which can be efficiently computed.Thus, the set {p 0 , . . ., p k } in Lemma 3.7 can be efficiently computed as pre-processing, assuming that n and m are known a priori.Let P (x) be the degree k polynomial interpolated at the points p 0 , . . ., p k , so that P (p i ) = F p i (f ) for each i ∈ [k], where F p i (f ) is the (p i )-th moment of the frequency vector f .[HNO08] then showed that a multiplicative (1 + O (ε))-approximation to h(f ) = 2 H(f ) can then be computed from 2 P (0) , and moreover a (1 + O (ε))-approximation to 2 P (0) can be computed from (1 + O (ε))-approximations to Thus by Lemma 3.7 and Theorem 1.3, we have: Theorem 3.8.Given an accuracy parameter ε ∈ (0, 1), as well as the universe size n, and the stream length m = poly(n), there exists a one-pass insertion-only streaming algorithm that has Õ √ n internal state changes, uses O log O(1) (mn) bits of space, and outputs H such that , where H is the Shannon entropy of the stream.

Lower Bound
In this section, we describe our lower bound showing that any streaming algorithm achieving a (2 − Ω(1))-approximation to F p requires at least 1 2 n 1−1/p state updates, regardless of the memory allocated to the algorithm.The proof of Theorem 1.2 is similar.The main idea is that we create two streams S 1 and S 2 of length O (n) that look similar everywhere except for a random contiguous block B of n 1/p .In B, the first stream S 1 has the same item repeated n 1/p times, while the second stream S 2 has n 1/p distinct items each appear once.The remaining n − n 1/p stream updates of S 1 and S 2 are additional distinct items that each appear once, so that F p (S 1 ) ≥ (2 − o(1)) • F p (S 2 ) and F p (S 2 ) = Ω(n).Any algorithm A that achieves a (2 − Ω(1))-approximation to F p must distinguish between S 1 and S 2 and thus A must perform some action on B. However, B has size n 1/p and has random location throughout the stream, so A must perform Ω(n 1−1/p ) state updates.
Theorem 1.4.Let ε ∈ (0, 1) be a constant and p ≥ 1.Any algorithm that achieves a 2 − ε approximation to F p with probability at least 2 3 requires at least 1 2 n 1−1/p state updates.Proof.Consider the two following possible input streams.For the stream S 1 of length n on universe n, we choose a random contiguous block B of n 1/p stream updates and set them all equal to the same random universe item i ∈ [n].We randomly choose the remaining n − n 1/p updates in the stream so that they are all distinct and none of them are equal to i.Note that by construction, we have F p (S 1 ) = (n − n 1/p ) + (n 1/p ) p = 2n − n 1/p .For the stream S 2 of length n on universe n, we choose S 2 to be a random permutation of [n], so that F p (S 2 ) = n.
For fixed ε ∈ (0, 1), let A be an algorithm that achieves a 2 − ε approximation to F p with probability at least 2 3 , while using fewer than 1 2 n 1−1/p state updates.We claim that with probability 1 2 , A must have the same internal state before and after B in the stream S 1 .Note that we can view each stream update as (i, j) where i ∈ [n] is the index of the stream update and j ∈ [n] is the identity of the universe item.Observe that for a random stream update i ∈ [n], a random universe update j ∈ [n] alters the state of A with probability at most 1 2n 1/p , since otherwise for a random stream, the expected number of state changes would be larger than n 1−1/p 2 , which would be a contradiction.Since both the choice of B and the element j ∈ [n] that is repeated n 1/p times are chosen uniformly at random, then the expected number of changes of the streaming algorithm on the block B is at most n 1/p 2n 1/p = 1 2 .Therefore, with probability at least 1 2 , the streaming algorithm's state is the same before and after the block B.
Moreover, the same argument applies to S 2 , and so therefore with probability at least 1 2 , the streaming algorithm cannot distinguish between S 1 and S 2 if its internal state only changes fewer than n 1−1/p 2 times.
We now prove Theorem 1.2 using a similar approach.
Theorem 1.2.Let ε ∈ (0, 1) be a constant and p ≥ 1.Any algorithm that solves the L p -heavy hitters problem with threshold ε with probability at least 2 3 requires at least 1 2ε n 1−1/p state updates.Proof.Consider the two input streams S 1 and S 2 defined as follows.We define the stream S 1 to have length n on a universe of size n.Similar to before, we choose a random contiguous block B of ε • n 1/p stream updates and set them all equal to the same random universe item i ∈ [n].We randomly choose the remaining n − n 1/p updates in the stream so that they are all distinct and none of them are equal to i.Note that by construction, we have F p (S 1 ) = (n − n 1/p ) + (n 1/p ) p = 2n − n 1/p .Therefore, the item replicated ε • n 1/p times in block B is an ε 2 -heavy hitter with respect to L p .
We also define the stream S 2 to have length n on universe n.As before, we choose S 2 to be a random permutation of [n], so that F p (S 2 ) = n.
For fixed ε ∈ (0, 1), let A be an algorithm that solves the ε-heavy hitter problem with respect to L p , with probability at least 2 3 , while using fewer than 1 2ε n 1−1/p state updates.We claim that with probability 1 2 , A must have the same internal state before and after B in the stream S 1 .Observe that each stream update can be viewed as (i, j) where i ∈ [n] is the index of the stream update and j ∈ [n] is the identity of the universe item.For a random universe item i ∈ [n], the probability that a random stream update j ∈ [n] alters the state of A is at most 1 2ε•n 1/p , since otherwise for a random stream, the expected number of state changes would be larger than n 1−1/p 2ε , which would be a contradiction.Because both the choice of B is uniformly and the element j ∈ [n] that is repeated ε • n 1/p times are chosen uniformly at random, then the expected number of changes of the streaming algorithm on the block B is at most ε•n 1/p 2ε•n 1/p = 1 2 .Hence, the streaming algorithm's state is the same before and after the block B, with probability at least 1 2 .However, the same argument applies to S 2 .Thus with probability at least 1 2 , the streaming algorithm cannot distinguish between S 1 and S 2 if its internal state only changes fewer than n 1−1/p 2ε times.Therefore, any algorithm that solves the L p -heavy hitter problem with threshold ε with probability at least 2 3 requires at least 1 2ε n 1−1/p state updates.
use[n]to denote the set {1, 2, ..., n} for an integer n > 0. We write poly(n) to denote a fixed univariate polynomial in n and similarly, poly(n 1 , ..., n k ) to denote a fixed multivariate polynomial in n 1 , ..., n k .We use Õ (f (n 1 , ..., n k )) for a function f (n 1 , ..., n k ) to denote f (n 1 , ..., n k ) • poly(log f (n 1 , ..., n k )).For a vector f ∈ R n , we use f i for i ∈ [n] to denote the i-th coordinate of f .Model.In our setting, an insertion-only stream S consists of a sequence of updates u 1 , ..., u m .In general, we do not require m to be known in advance, though in some cases, we achieve algorithmic improvements when a constant-factor upper bound on m is known in advance; we explicitly clarify the setting in these cases.For each t ∈ [m], we have u t ∈ [n], where without loss of generality, we use[n]to represent an upper bound on the universe, which we assume to be known in advance.The stream S defines a frequency vector f ∈ R n by f i = |{t | u t = i}|, so that for each i ∈ [n], the i-th value of the frequency vector S is how often i appears in the data stream S. Observe that through a simple reduction, this model suffices to capture the more general case where some coordinate is increased by some amount in {1, . .., M } for some integer M > 0 in each update.For a fixed algorithm A, suppose that at each time t ∈ [m], the algorithm A maintains a memory state σ t .Let |σ t | denote the size of the memory state, in words of space, where each word of space is assumed to used O (log n + log m) bits of space.We say the algorithm A uses s words of space or equivalently, O (s log n + log m) bits of space, if max t∈[m] |σ t | ≤ s.For each t ∈ [m], let X t be the indicator random variable for whether the algorithm A changed its memory state at time t.That is, m do else if there exists i ∈ [k] with q i = s t then ▷item is in the reservoir 13: Start a Morris counter for s t ▷hold a counter for the item 14: else 15: Pick µ t ∈ [0, 1] uniformly at random 16: if µ t < ϱ then ▷with probability ϱ 17: Pick i ∈ [k] uniformly at random 18: let E 1 be the event that |I Input: Stream s 1 , . . ., s m of items from [n], accuracy parameter ε ∈ (0, 1), p ≥ 1 Output: (1 + ε)-approximation to F p 1: L = O (p log(nm)), R = O (log log n), γ = 2 20 p 2: for t = 1 to t = m do