Balanced Allocations in Batches: The Tower of Two Choices

In the balanced allocation framework, the goal is to allocate m balls into n bins, so as to minimize the gap (difference of maximum to average load). The One-Choice process allocates each ball to a bin sampled independently and uniformly at random. The Two-Choice process allocates balls sequentially, and each ball is placed in the least loaded of two sampled bins. Finally, the (1+β)-process mixes these processes, meaning each ball is allocated using Two-Choice with probability β in (0,1), and using One-Choice otherwise. Despite Two-Choice being optimal in the sequential setting, it has been observed in practice that it does not perform well in a parallel environment, where load information may be outdated. Following [BCEFN12], we study such a parallel setting where balls are allocated in batches of size b, and balls within the same batch are allocated with the same strategy and based on the same load information. For small batch sizes b in [n, n log n], it was shown in [LS22c] that Two-Choice achieves an asymptotically optimal gap among all allocation processes with two (or any constant number of) samples. In this work, we focus on larger batch sizes b in [n log n, n³]. It was proved in [LS22a] that Two-Choice leads to a gap of Θ(b/n). As our main result, we prove that the gap reduces to O(√((b/n) log n)), if one runs the (1+β)-process with an appropriately chosen β (in fact this result holds for a larger class of processes). This not only proves the phenomenon that Two-Choice is not the best (leading to the formation of "towers" over previously light bins), but also that mixing two processes (One-Choice and Two-Choice) leads to a process which achieves a gap that is asymptotically smaller than both. We also derive a matching lower bound of Ω(√((b/n) log n)) for any allocation process, which demonstrates that the above (1+β)-process is asymptotically optimal. Our analysis also works in the presence of randomly weighted balls, and also implies exponential tails for the number of bins above a certain load value.


Introduction
Sequential balanced allocations.In the sequential balanced allocations framework, there are m tasks (balls) to be allocated into n servers (bins).It is well-known that allocating the balls into bins sampled uniformly at random (a.k.a.One-Choice) leads w.h.p. 1 to a maximum load of Θ(log n/ log log n) for m = n and a gap (maximum load minus average load) of Θ (m/n) • log n for m n log n.
An improvement over One-Choice is the d-Choice process [3,6,15], where each ball is allocated to the least loaded of d bins sampled uniformly at random.For any m n, this process achieves w.h.p. an log d log n + Θ(1) gap, i.e., a gap that does not depend on m.For d = 2, this great improvement is known as "power-of-two-choices" (see also surveys [28,38] for more details).Despite the simplistic nature of the balanced allocation framework, the Two-Choice process has had a significant impact on practical applications such as load balancing and distributed storage systems, which was also acknowledged by the "ACM Paris Kanellakis Theory and Practice Award 2020 " [2] (see also Applications below).
Several variants of Two-Choice have been studied.Of particular importance to this work is the (1+β)-process, where each ball is allocated using Two-Choice with probability β ∈ (0, 1] and One-Choice otherwise.Mitzenmacher [26,Section 4.4.1]introduced this process as a model of Two-Choice with erroneous comparisons.Peres, Talwar and Wieder [33] showed that for β := β(n) 1, it achieves w.h.p. a Θ((log n)/β) gap (see also [20]), which becomes worse for smaller β, but still remains independent of m.The (1 + β)-process has also been applied to the analysis of Two-Choice in the popular graphical setting [4,16,33], where bins are organized as vertices in a graph, and each ball is allocated to the lesser loaded of two adjacent vertices of an edge sampled uniformly at random.
Another variant of Two-Choice that has received some attention recently is the family of Two-Thinning processes [12,13], where the ball is allocated to the second sample only if the first one does not meet a certain criterion, e.g., based on a threshold on its load or a quantile on its rank.
It should be noted that the analyses of all these processes strongly rely on the fact that the load information of each bin is updated after each allocation.In effect this means balls can only be allocated sequentially, which is a downside in distributed and parallel environments.
Outdated information settings.In this work, we demonstrate that in outdated information settings by choosing an appropriately small β, (1 + β) achieves the asymptotically optimal gap among a large class of processes, including not only Two-Choice (and One-Choice), but even adaptive processes that may allocate with a different scheme after each batch.This confirms earlier empirical observations that the performance of the Two-Choice process deteriorates under outdated information and delays [8,14,27,31,37].
Berenbrink, Czumaj, Englert, Friedetzky and Nagel [5] introduced the b-Batched setting where balls are allocated in batches of size b.That means, in every batch the b balls are allocated in parallel, as the decision where to allocate the ball only depends on the load configuration before that batch of balls arrived.For b = n, they proved that Two-Choice achieves w.h.p. an O(log n) gap.This bound was recently improved to Θ(log n/ log log n) in [22], and in the same work, it was shown that Two-Choice has a gap that matches the maximum load of One-Choice for b balls, for any batch size b ∈ [n • e − log Θ(1) n , n log n], and so it is asymptotically optimal.In contrast, for b n log n, Two-Choice (and a family of other processes) have w.h.p. a Θ(b/n) gap [20], a bound which was shown to hold even in the presence of weights and on some graphs.This analysis also demonstrates that increasing d in the d-Choice process, does not always improve the gap, which is in sharp contrast to the sequential setting.In [22], a more powerful setting, τ -Delay was studied for the Two-Choice process, where an adversary can choose to report for each of the bins any load from the last τ steps.For b = τ , b-Batched is a special instance of τ -Delay and for any τ n log n, the same asymptotic bounds where shown to hold.
Outdated information settings have also been studied in the queuing setting [1,14,18,27,37].In particular, Mitzenmacher [27] studied the corresponding version of the b-Batched setting, called the bulletin board model with periodic updates, showing that some processes requiring centralized coordination can outperform Two-Choice, but no explicit rigorous bounds were proven.This shortcoming of Two-Choice was characterized as herd behavior, meaning that some of the initially lighter bins receive disproportionately many balls, turning them into heavy bins.In another empirically study, Dahlin [8] also observed the herd behavior and suggested similar centralized strategies to improve upon d-Choice.Regarding identifying optimal processes, Whitt [37] remarks: We have shown that several natural selection rules are not optimal in various situations, but we have not identified any optimal rules.Identifying optimal rules in these situations would obviously be interesting, but appears to be difficult.Moreover, knowing an optimal rule might not be so useful because the optimal rule may be very complicated.
Applications.Recently, several distributed low-latency schedulers, including Sparrow [31], Eagle [9], Hawk [10], Peacock [17], Pigeon [36] and Tarcil [11], have used variants of the Two-Choice process.In [31], with regards to the implementation of Sparrow, the authors state: The power of two choices suffers from two remaining performance problems: first, server queue length is a poor indicator of wait time, and second, due to messaging delays, multiple schedulers sampling in parallel may experience race conditions.
Similar observations have been made in the context of distributed stream processing [29,30] and load balancers [24].These studies support that batch sizes b = Ω(n log n) for which Two-Choice is no longer optimal are relevant to real-world applications.
Weighted settings.Several works study balanced allocation processes with weights [7,20,33,35].We will be focusing on weights sampled independently from probability distributions with bounded moment generating functions as in [20] and [33], which includes the geometric, exponential and Poisson distributions.
Our results.In this work, we prove that a family of processes satisfying a mild technical condition achieve the asymptotically optimal gap2 of O (b/n) • log n in the weighted b-Batched setting for b ∈ [2n log n, n 3 ], leading to roughly a quadratic improvement over the gap of the Two-Choice process.This family of processes includes the (1 + β)-process, which is a process that can be easily implemented in a decentralized manner, and demonstrates that by setting β = (n/b) • log n we attain this asymptotically optimal gap.
We also provide lower bounds establishing the tightness of our upper bounds.Interestingly, the lower bound of Ω( (b/n) • log n) applies to a much more powerful class of allocation processes, where the allocation rule is arbitrarily tailored at the beginning of the batch.
The intuition for these optimal processes relates to the herd behavior observed in [27] and [8].For the d-Choice process, the maximum probability of allocating to a bin is max i∈[n] p i ≈ d/n.This means that, for example, in Two-Choice in a batch of b balls there are some bins that receive ≈ 2b/n balls and so a gap of ≈ b/n arises.This becomes worse as d grows.To avoid this,

Two-Choice
(1 + β)-process we will investigate processes where max i∈[n] p i = (1 + o(1))/n, which means that in expectation no bin receives too many balls in any particular batch.For example, the (1 + β)-process has max i∈[n] p i ≈ (1+β)/n, which means that this mixing of One-Choice steps with Two-Choice steps circumvents the herd behavior.See Fig. 1.1 for a visualization of how (1 + β) achieves a more balanced distribution than Two-Choice over one batch, and Fig. 1.2 for how the gaps of different processes are getting worse with larger max i∈[n] p i .The asymptotic gap bounds of the One-Choice, Two-Choice and (1 + β) processes in the b-Batched setting are summarized in Table 1.3.Our results also imply bounds for the shape of the load vector (see Remark 4.3).Our analysis also applies in the presence of randomly weighted balls, and also implies exponential tails for the number of bins above a certain load value.
Our techniques.Our techniques build on and refine those in [20], making use of the hyperbolic cosine potential function [33] and variants.More specifically, a slightly weaker version of our tight upper bound is based on [20,Theorem 3.1] and a refinement of [20,Lemma 4.1].For our tight gap bound, our approach uses an interplay between two hyperbolic cosine potential functions to prove concentration and then an exponential potential with a larger smoothing parameter to deduce the refined gap.A similar method was used in [20, Section 5], but one crucial novelty here is that we consider allocation processes whose probability allocation vector have a small ∞ distance from the uniform distribution.We believe that relating and comparing different allocation processes based on their ∞ distance (or other metrics) could be a promising avenue for future work.This can be also seen as a natural relaxation of the majorization technique, which has been the dominant tool to relate different allocation processes [21,33].
Organization.In Section 2, we introduce the basic notation for balanced allocations, and define the processes and settings that we will be working with.In particular, in Section 2.3 we define general conditions on the probability allocation vector used by the processes, under which our upper bounds on the gap apply.In Section 3, we prove the O b/n • log n bound on the gap for a family of processes in the weighted b-Batched setting.In Section 4, we perform a refined analysis and improve this bound to O (b/n) • log n .In Section 5, we show that this achieved gap is asymptotically optimal, and in Section 6, we present some empirical results on the gap of some specific processes.Finally, in Section 7, we summarize the results and conclude with some open problems.

Process
Gap in Sequential Setting Gap in b-Batched Setting Batch Size For the sake of simplicity, we focus on the setting with unit weights and only list results for (1 + β).Among all these processes, One-Choice produces the worst gap in both settings, even though the gap does not change between the b-Batched and sequential setting.For Two-Choice, the gap becomes b/n in the b-Batched setting with b = Ω(n log n), whereas for (1+β) the gap is improved to (b/n) • log n (for a suitable β).

Notation, Processes and Settings
In this section, we introduce notation, processes and settings used throughout this work.

Basic Notation
We consider the allocation of m balls into n bins, which are labeled [n] := {1, 2, . . ., n}.For the moment, the m balls are unweighted (or equivalently, all balls have weight 1).For any step t 0, x t is the n-dimensional load vector, where x t i is the number of balls allocated to bin i in the first t allocations.In particular, x 0 i = 0 for every i ∈ [n].Finally, the gap is defined as x t i − t n .
It will also be convenient to sort the load vector x.To this end, let x t := x t − t n .Then, relabel the bins such that y t is a permutation of x t and y t 1 We call a bin i ∈ [n] overloaded, if y t i 0 and underloaded otherwise.A probability vector p ∈ R n is any vector satisfying . Following [33], many allocation processes can be described by a time-invariant probability allocation vector p t , which is the probability vector with p t i being probability of allocating a ball to the i-th heaviest bin.
By F t we denote the filtration of the process until step t, which in particular reveals the load vector x t .

Processes
We start with a formal description of the One-Choice process.
One-Choice Process: Iteration: For each t 0, sample one bin i, independently and uniformly at random.Then update: We continue with a formal description of the Two-Choice process.
Two-Choice Process: Iteration: For each t 0, sample two bins i 1 and i 2 , independently and uniformly at random.Let i ∈ {i 1 , i 2 } be such that x t i = min{x t i 1 , x t i 2 }, breaking ties randomly.Then update: It is immediate that the probability allocation vector of Two-Choice is Following [33], we recall the definition of the (1 + β)-process which interpolates between One-Choice and Two-Choice: (1 + β) Process: Parameter: A mixing factor β ∈ (0, 1].Iteration: For each t 0, sample two bins i 1 and i 2 , independently and uniformly at random.Let i ∈ {i 1 , i 2 } be such that x t i = min x t i 1 , x t i 2 , breaking ties randomly.Then update: In other words at each step, the (1+β)-process allocates the ball following the Two-Choice rule with probability β, and otherwise allocates the ball following the One-Choice rule.Therefore, its probability allocation vector is given by Recall that in [33] (and [20]), it was shown that Gap(m) = O log n β for any m n and β ∈ (0, 1]; so in particular, this gap (bound) does not grow with m.
The next process is another relaxation of Two-Choice.

Quantile(δ) Process:
Parameter: A quantile δ ∈ {1/n, 2/n, . . ., 1}.Iteration: For each t 0, sample two bins i 1 and i 2 , independently and uniformly at random.Then update: Note that the Quantile(δ) processes can be implemented as a two-phase procedure: First probe the bin i 1 and place the ball there if i 1 is not among the δn heaviest bins.Otherwise, take a second sample i 2 and place the ball there.Since we only need to know whether a bin's rank is above or below a value, the response by a bin can be encoded as a single bit (at the cost of knowing the rank of each bin).The probability allocation vector of Quantile(δ) is given by:

Conditions on Probability Vectors
In [20], the weighted b-Batched setting was analyzed for probability allocation vectors satisfying the following two conditions.The first condition says that the process has a small ε/n bias to place away from overloaded and towards underloaded bins; and the second condition says that no bin has too high probability of being allocated.
• Condition C 1 : There exist constant quantile3 δ ∈ (0, 1) and (not necessarily constant) ε ∈ (0, 1), such that for any 1 k δn, and similarly for any δn + 1 k n, In the same paper [20,Proposition 7.4] it was shown that any process with max i∈[n] p i 1+ε n for ε = Ω(1) also has Gap(m) = Ω(b/n) for any b = Ω(n log n).Therefore, to improve on this asymptotic gao bound, we have to consider processes with max i∈[n] p i = 1+o (1)  n .In our analysis in Sections 3 and 4 we will make use of the following condition based on the ∞ -distance between the probability allocation vector p and the uniform distribution (i.e., One-Choice): Note that this condition implies condition C 2 for the same C > 1, but unlike C 2 it imposes both an upper and a lower bound on the p i 's.It is easy to verify that (1 + β)-process satisfies all three conditions.Lemma 2.1.For any β ∈ (0, 1], the Proof.Recall that for the (1 + β)-process, the probability allocation vector satisfies We will first show that C 1 holds with δ = 1/4 and ε = β/2.For any 1 k δn, since p is non-decreasing the prefix sums satisfy Similarly, for any δn + 1 k n, the suffix sums satisfy Note that in contrast to Two-Choice which satisfies C 3 for C = 2 − 1 n , by choosing β small enough we can make the probability allocation vector arbitrarily close to uniform.
We also note that for any process P satisfying condition C 3 for some C > 1, we can define a process P satisfying condition C 3 for C ∈ (1, C) by mixing the probability allocation vector of P with that of One-Choice with probability η = C −1 C−1 .For instance, the Quantile(1/2) process satisfies condition C 3 for any ).Therefore, mixing Quantile(1/2) with One-Choice with probability η ∈ [0, 1], gives the following probability allocation vector satisfying condition Observation 2.2.The process obtained by mixing Quantile(1/2) with One-Choice satisfies condition

Weighted and Batched Settings
As in [20], we now extend the definitions of Section 2.1 and Section 2.2 to weighted balls and later to the batched setting.To this end, let w t 0 be the weight of the t-th ball to be allocated for t 1.By W t we denote the total weights of all balls allocated after the first t 0 allocations, so W t := n i=1 x t i = t s=1 w s .The normalized loads are x t i := x t i − W t n , and with y t i being again the decreasingly sorted, normalized load vector, we have Gap(t) = y t 1 .The weight of each ball will be drawn independently from a fixed distribution W over [0, ∞).Following [33], we assume that the distribution W satisfies: Specific examples of distributions satisfying above conditions (after scaling) are the geometric, exponential, binomial and Poisson distributions.
In the analysis we will be using the following property (see also [33]) and refer to these distributions as Finite-MGF(ζ) (or Finite-MGF(S)):

Lemma 2.3 ([20, Lemma 2.4]). There exists S := S(ζ)
max{1, 1/ζ}, such that for any γ ∈ (0, min{ζ/2, 1}) and any κ ∈ We will now describe the allocation of weighted balls into bins using a batch size of b n.For the sake of concreteness, let us first describe the b-Batched setting if the allocation is done using Two-Choice.For a given batch size consisting of b consecutive balls, each ball of the batch performs the following.First, it samples two bins i 1 and i 2 independently and uniformly at random, and compares the load the two bins had at the beginning of the batch (let us denote the bin which has less load by i min ).Secondly, a weight is sampled from the distribution W. Then a weighted ball is added to bin i min .Recall that since the load information is only updated at the beginning of the batch, all allocations of the b balls within the same batch can be performed in parallel.
In the following, we will use a more general framework, where the process of sampling (one or more) bins and then deciding where to allocate the ball to is described by a probability allocation vector p over the n bins (Section 2.1).Also for the analysis, it will be convenient to focus on the normalized and sorted load vector y, which is why the definition below is based on y rather than the actual load vector x.b-Batched Setting with Weights Parameters: Batch size b n, probability allocation vector p, weight distribution W. Iteration: 2. Sample b weights w t+1 , w t+2 , . . ., w t+b from W.

Update for each bin
4. Let y t+b be the vector z t+b , sorted decreasingly.
We also look at the version of the processes that perform random tie-breaking between bins of the same load.For b = 1, this makes no observable difference to the process, but for multiple steps, this effectively averages out the probability over (possibly) multiple bins that have the same load.This would, for instance, correspond to Two-Choice, randomly deciding between the two bins if they have the same load.In particular, if p is the original probability allocation vector, then the one with random tie-breaking is p(y t ) (for t being the beginning of the batch), where 1.Let p := p(y t ) be the probability allocation vector accounting for random tie-breaking.

Update for each bin i ∈ [n]
, 5. Let y t+b be the vector z t+b , sorted decreasingly.
3 Warm-up: In this section, we will refine the analysis of [20,Section 4] to prove an O( b/n • log n) bound on the gap for a family of processes.This will also be used as a starting point for the analysis in Section 4 to obtain the tighter bound.The main theorem that we prove is the following.
In particular, by choosing β = Θ n/b we get a process that is asymptotically better than Two-Choice and which is within just a √ log n multiplicative factor from the optimal gap bound proven for unit weights in Section 5.
The analysis is based on the hyperbolic cosine potential which is defined for smoothing parameter γ > 0 as We also decompose Γ t by defining Further, we use the following shorthands to denote the changes in the potentials over one step We will make use of the following drift theorem shown in [20].Note that in statement of the theorem, rounds could consist of multiple single-step allocations and in that case p t is not necessarily the probability allocation vector, but it could be a probability vector giving an estimate for the "average number of balls" allocated to a bin.Theorem 3.3 (cf.[20,Theorem 3.1]).Consider any allocation process P and a probability vector p t satisfying condition C 1 for some constant δ ∈ (0, 1) and some ε ∈ (0, 1) at every round t 0. Further assume that there exist K > 0, γ ∈ 0, min 1, εδ 8K and R > 0, such that for any round t 0, process P satisfies for potentials Φ := Φ(γ) and Ψ := Ψ(γ) that, and Then, there exists a constant c := c(δ) > 0, such that for Γ := Γ(γ) and any round t 0, Now we will show that any process satisfying condition C 3 , also satisfies the preconditions of Theorem 3.3 for the expected change of the potential functions Φ and Ψ over one batch.Lemma 3.4.Consider any allocation process with probability allocation vector p t satisfying condition C 3 for some C ∈ (1, 1.9) at every step t 0. Further, consider the weighted b-Batched setting with weights from a Finite-MGF(S) distribution with constant S 1 and a batch size b and Consider an arbitrary bin i ∈ [n].Define the binary vector Z ∈ {0, 1} b , where Z j indicates whether the j-th ball was allocated to bin i.The expected change for the overload potential Φ t i of the bin is given by, In the following, let us upper bound the factor of Φ t i : using in (a) that the weights are independent given F t , in (b) Lemma 2.3 twice with κ = 1 − 1 n and with κ = − 1 n respectively (and that (1 − 1/n) 2 1), in (c) the binomial theorem and in (d) that p i 1 n 2 by condition C 3 for C ∈ (1, 1.9).Let us define We will now show that 1, which holds indeed since using in (a) using in (a) that 1 + v e v for any v, in (b) that e v 1 + v + v 2 for v 1.75 and (3.8), and in Similarly, for the underloaded potential Ψ t , for any bin i ∈ [n], As before, we will upper bound the factor of Ψ t i : Similarly, to (3.8), we get that So, using in (a) that 1 + v e v for any v, in (b) that e v 1 + v + v 2 for v 1.75 and (3.12), and in (c) Having verified the preconditions for Theorem 3.3, we are now ready to prove the bound on the gap for this family of processes.
Remark 3.5.The same upper bound in Theorem 3.1 also holds for processes with random tie breaking.The reason for this is that (i) averaging probabilities in (2.1) can only reduce the maximum entry (and increase the minimum) in the allocation vector p t , i.e. max i∈[n] p t i (x t ) max i∈[n] p i , so it still satisfies condition C 3 and (ii) moving probability between bins i, j with x t i = x t j (and thus Φ t i = Φ t j and Ψ t i = Ψ t j ), implies that the aggregate upper bounds Hence, by Theorem 3.3, there exists a constant c := c(δ) > 0 such that for any step m 0 which is a multiple of b, Therefore, by Markov's inequality To conclude the claim, note that when Γ m 8c δ • n 3 holds, then also, Consider any process with probability allocation vector p t satisfying at every step t 0, condition C 1 for constant δ ∈ (0, 1) and ε, as well as condition C 3 for C = 1 + ε.Then, there exists a constant κ := κ(δ, S) > 0, such that for any step m 0 being a multiple of b, There are two key steps in the proof: Step 1: Similarly to the analysis in [21], we will use two instances of the hyperbolic cosine potential (defined in (3.1)), in order to show that it is concentrated at O(n).More specifically, we will be using Γ 1 := Γ 1 (γ 1 ) with the smoothing parameter γ 1 := δ 40S • n/(b log n) and Γ 2 := Γ 2 (γ 2 ) with γ 2 := γ 1 8•30 , i.e., with a smoothing parameter which is a large constant factor smaller than γ 1 .So, in particular Γ t 2 Γ t 1 at any step t 0. In the following lemma, proven in Section 4.1, we show that w.h.p.Γ 2 = O(n) for any log 3 n consecutive batches.
The proof follows the interplay between the two hyperbolic cosine potentials, in that conditioning on Γ t 1 = poly(n) (which follows w.h.p. by the analysis in Section 3) implies that ∆Γ t+1 2 n 1/4 • (n/b) • log n (Lemma 4.5 (ii)).This in turn allows us to apply a bounded difference inequality to prove concentration for Γ 2 .In contrast to [21] and [22], here we need a slightly different concentration inequality Theorem A.6 (also used in [20]), as in a single batch the load of a bin may change by a large amount (with small probability).The complete proof is given in Section 4.1.
Step 2: Consider an arbitrary step s = t + j • b where {Γ s 2 cn} holds.Then, the number of bins i with load y s i at least z := With this in mind, we define the following potential function for any step t 0, which only takes into account bins that are overloaded by at least z balls: where λ := ε 4CS = Θ( (n/b) • log n) and we define Λ t i = 0 for the rest of the bins i.This means that when {Γ s 2 cn} holds, the probability of allocating to one of these bins is p s i 1−ε n , because of the condition C 1 .Hence, the potential drops in expectation over one batch (Lemma 4.9) and this means that w.h.p.Λ m = poly(n), which implies that Gap(m

Step 1: Concentration of the Γ Potential
Recall that in Theorem 4.1, we considered the weighted b-Batched setting with any b ∈ [2n log n, n 3 ] and weights sampled independently from a Finite-MGF(S) distribution with constant S 1, for any allocation process with probability allocation vector p t satisfying condition C 1 for constant δ ∈ (0, 1) and ε ∈ (0, 1) as well as condition C 3 for some C > 1, at every step t 0. The proof of this lemma is similar to the proofs in [20, Section 5] and [21, Section 5], in that we use the interplay between two instances of the hyperbolic cosine potential Γ 1 := Γ 1 (γ 1 ) and Γ 2 := Γ 2 (γ 2 ) with smoothing parameter γ 2 being a large constant factor smaller than γ 1 .More specifically, we will be working with γ 1 := δ 40S • n/(b log n) and γ 2 := γ 1 8•30 .The rest of this section is organized as follows.In Section 4.1.1,we establish some basic properties for the potentials Γ 1 and Γ 2 and in Section 4.1.2we use these to show that w.h.p.Γ t 2 = O(n) for at least log 3 n batches, and complete the proof of Lemma 4.2.Then, in Section 4.2, we complete the proof of Theorem 4.1.

Preliminaries
We define the following event, for any step t 0 Further, let x t be the load vector obtained by moving any ball of the load vector x t to some other bin, then using that γ 2 := γ 1 8•30 .By aggregating, we get the first claim . Second statement.Let bin j ∈ [n] be the bin where the j-th ball was allocated.We consider the following cases for the contribution of a bin i to Γ t 2i : Case 1 [i = j and y t j 0]: Since j ∈ [n] is overloaded, we have that Case 2 [i = j and y t j < 0]: Similarly, if j is underloaded, we have that Case 3 [i = j and y t i 0]: The contribution of the rest of the bins is due to the change in the average load.More specifically, for any overloaded bin i ∈ Case 4 [i = j and y t i < 0]: Similarly, for any underloaded bin i ∈ [n] \ {j}, Hence, aggregating over all bins for sufficiently large n.Third statement.Let i, j ∈ [n] be the differing bins between x t and x t .Then since H t holds, it follows that w t 15 ζ • log n, so for bin i, Hence, Next, we will show that E[ Γ 2 ] = O(n) and that when Γ 2 is sufficiently large, it drops in expectation over the next batch.Lemma 4.6.Consider any process satisfying the conditions in Lemma 4.2.Then, there exists a constant c := c(δ) such that for any step t 0 being a multiple of b, Further, Hence, by Theorem 3.3 we get the conclusion by setting c := 16c/δ, for some constant c := c(δ) > 0.
Similarly for the potential Third statement.Furthermore, by Lemma 3.4 and Theorem 3.3, we also get that for any t 0, We define the constant When Γ t 2 cn holds, then (4.3) yields, Fourth statement.Similarly, when Γ t 1 < cn, (4.3) yields, In the next lemma, we show that w.h.p.Γ 1 is poly(n) for every step in an interval of length 2b log 3 n.Proof.We will start by bounding Γ s 1 at steps s being a multiple of b.Using Lemma 4.6 (i), Markov's inequality and the union bound over 2 log 3 n + 1 steps, we have for any t 0, Similarly, using (3.9) in Lemma 3.4, Hence, combining and aggregating over the bins, Applying Markov's inequality, for any r ∈ [0, b), Hence, by a union bound over the 2b log 3 n 2n 3 log 3 n possible steps (since b n 3 ) for s ∈ [0, 2 log 3 n] and r ∈ [0, b), Finally, taking the union bound of (4.4) and (4.5), we conclude We will now show that w.h.p. there is a step every b log 3 n steps, such that the exponential potential Γ 2 becomes O(n).We call this the recovery phase.δ be the constant defined in Lemma 4.6.For any step t 0 being a multiple of b, Proof.By Lemma 4.6 (ii), using Markov's inequality at step t being a multiple of b, we have We will be assuming Γ t 2 cn 9 .By Lemma 4.6 (iii), for any step r 0, then In order to prove that Γ t+s•b 2 is small for some s ∈ [0, b log 3 n], we define the "killed" potential function for any r ∈ [0, log 3 n], . Hence, the Γ potential satisfies unconditionally the drop inequality of Lemma 4.6 (iii), that is, Inductively applying this for log 3 n batches and using that Γ t So by Markov's inequality, By combining with (4.6), Due to the definition of Γ 2 , at any step t 0, deterministically Γ t 2 2n.So, we conclude that w.p. at least 1 − 2n −8 , we have that Γ holds, which implies the conclusion.

Completing the Proof of Lemma 4.2
We are now ready to prove Lemma 4.2, using a method of bounded differences with a bad event Theorem A.6 ([19, Theorem 3.3]).
Proof of Lemma 4.2.Our starting point is to apply Lemma 4.8, which proves that there is at least one step t Note that if t < b log 3 n, then deterministically Γ 0 2 = 2n cn (which corresponds to ρ = −t/b).We are now going to apply the concentration inequality Theorem A.6 to each of the batches starting at t + ρ • b, . . ., t + (log 3 n) • b and show that the potential remains cn at the last step of each batch.More specifically, we will show that for any r ∈ [ρ, log 3 n], for r = t + b • r, Within a single batch all allocations are independent, so we apply Theorem A.6, choosing γ k := 1 b and N := b, which states that for any T > 0 and µ : By Lemma 4.6 (iv), we have µ Hence, for T := n/ log 2 n, since 2n log n b n 3 , we have By union bound of ( and Then, where in the last inequality we have used (4.9) and the fact ρ − log 3 n.So, Note that for any ρ ∈ [− log 3 n, 0], we have that A ρ ∩ K log 3 n ρ ⊆ A. Hence we conclude by the union bound of (4.10) and (4.11), that

4.2
Step 2: Completing the Proof of Theorem 4.1 We will now show that when Γ t 2 = O(n), the stronger potential function Λ t drops in expectation over the next batch.This will allow us to prove that Λ m = poly(n) and deduce that w.h.p.
Proof.Consider an arbitrary step t 0 being a multiple of b and consider a labeling of the bins so that they are sorted by load.Assuming that {Γ t 2 cn} holds, the number of bins with load For any bin i ∈ [n] with y t i z, we get as in (3.5) (using that λ 1 and that p satisfies C 3 for C ∈ (1, 1.9)), Since there are at most δn such bins (i.e., i δn), p satisfies condition C 1 and the normalized vector y t is sorted, by Lemma A.2 the upper bound on and in (c) that 1 + v e v for any v.For the rest of the bins with i > δn, Aggregating the contributions over all bins, We define the killed potential Λ, with Λ t 0 := Λ t 0 and for j > 0, Since Λ t Λ t , we have that by Lemma 4.9 for t = t 0 + j • b, we have that When E t 0 +j•b does not hold, then deterministically Λ t 0 +(j+1)•b = Λ t 0 +j•b = 0. Hence, we have the following unconditional drop inequality Assuming E t 0 holds, we have for sufficiently large n.Recalling that γ 2 = Θ(λ • log n), there exists a constant κ 1 > 0 such that Applying Lemma A.1 to (4.13) with a := e − λε 2n •b and b := n 2 for log 3 n steps, Hence, by (4.12), Combining with (4.12), we have Finally, {Λ m 2n 5 } implies that ), so the claim follows.For the case when m < b • log 3 n, it deterministically holds that Λ t 0 n, which is a stronger starting point in (4.14) to prove that E[ Λ m ] 2n 5 , which in turn implies the gap bound.

Lower Bounds on the Gap
In this section, we prove two lower bounds of Ω( (b/n) • log n) on the gap.Both lower bounds hold even in the unit weights case.
Observation 5.1.Consider the b-Batched setting with any b n log n, and assume all balls have unit weights.Then, for any process which uses the same probability allocation vector within each batch with random tie breaking, Proof.Any such process behaves exactly like One-Choice in the first batch and so the lower bound follows from that of One-Choice for b balls into n bins (cf.[34] and [23,Lemma A.2]).
The next lower bound is more involved.This bound also applies to processes which are allowed to adjust the probability allocation vector from one batch to another arbitrarily; e.g., the probability for a heavily underloaded bin might be set close to (or even equal to) 1, and similarly, the probability for a heavily overloaded bin might be set close to (or equal to) 0. Additionally, the lower bound below applies to any two consecutive batches, and not only to the end of the first batch as in Observation 5.1.
Theorem 5.2.Consider the b-Batched setting with any b = Ω(n log n) in the unit weights case.Furthermore, consider an allocation process which may adaptively change the probability allocation vector for each batch.Then there is a constant κ > 0 such that for any allocation process (which may adaptively change the probability for each batch) it holds that for every t 0 being a multiple of b, Proof.In the proof, we shall prove a slightly stronger statement: That is, there is no load configuration and no probability allocation vector (depending on F t ) such that the gap is small, both before and at the end of an arbitrary batch.
For notational convenience, we will prove this statement by assuming that t = 0, and x 0 is an arbitrary load vector satisfying i∈[n] x 0 i = 0 (in other words, we shift time backwards by t steps) and p = p 0 is the probability allocation vector used by the process.Consider one arbitrary bin j ∈ [n].Then, For a sufficiently large constant C > 0, let us now assume max j∈[n] z j C/2 • (b/n) • log n; clearly, if this is not the case, we already have a large gap already before the next batch.
Next consider a bin j ∈ [n] with We will now apply a Chernoff bound (Lemma A.4) for x b j ∼ Bin(b, p j ), with δ : and thus bin j will not contribute to the gap at step b.
Hence in the remainder of the proof, we would like to assume that for all bins j ∈ [n],  and ( x b i ) i∈[n] be a load vector where these locations are sampled according to p. Clearly, there is a coupling so that for every j ∈ [n] \ J , x b j x b j (since p b j p b j ).Further, for any j ∈ J , by a union bound, Hence it follows that, for any threshold T > 0, Therefore, in the remainder of the proof, we will lower bound Pr max j∈[n]\J x b j T for a suitable value of T = Ω( b/n • log n).We will also use the definition Finally, we define ξ = 0.1 as a (sufficiently) small constant.
Case 1: We have at least n − n ξ bins for which ϕ j −C b/n.Since i∈[n] ϕ i = 0, this implies that there must be at least one bin with j ∈ Further, using that the median of a Bin(N, q) r.v. is either N q or N q , then Pr it follows that with probability at least 1/2 we will have a large gap.Case 2: We have at least n ξ bins with ϕ j −C b/n; call this set B. We further know that, due to the definition of p, we have for all bins j ∈ [n] that p j

Experimental Results
In this section, we complement our theoretical analysis with some experimental results for the b-Batched setting.In Fig. 6.1, we plot the gap of the (1 + β)-process for various batch sizes and different values of β ∈ (0, 1] (Two-Choice corresponding to β = 1).The plot strongly suggests the existence of an optimal β, which seems to increase as the batch size b grows.In Fig. 6.2, we present the corresponding empirical results of Fig. 6.1 for the Quantile process (mixed with One-Choice).As with the (1 + β)-process, the optimal mixing factor η tends to increase as the batch size grows.The Quantile with the optimized mixing factor seems to perform slightly worse than the optimized (1 + β)-process.In Fig. 6.3, we plot the gap of Two-Choice, Three-Choice and (1 + β) versus the batch size.For small values of b, the gap of Two-Choice and Three-Choice is small, but soon grows rapidly, diverging from the asymptotically optimal (1 + β)-processes as predicted by the theoretical analysis.Similar, results are observed for weights sampled from an exponential distribution Fig. 6.4.Finally, in Table 6.5, we show the gap of the (1 + β) and Quantile compared to Two-Choice and One-Choice with b balls (which is the theoretically optimal attainable value), for slightly larger values of n ∈ {10 4 , 10 5 }.The for large b, the (1 + β) has roughly half the gap of Two-Choice and is close to the theoretically optimal value of One-Choice for m = b balls.

Conclusions
In this work, we revisited the outdated information setting of [5], where balls are allocated to bins in batches of size b, using the load information available at the beginning of the batch.We established that by defining the mixing factor β carefully as a function of the batch size b, (1 + β) achieves the asymptotically optimal gap for any b n log n.That is, by having β chosen appropriately small, (1 + β) circumvents the "herd behavior" (as called in [27]), where some of the previously underloaded bins are chosen too frequently, turning them into heavily overloaded bins in the next batch.Similarly, β should also not be too small, as otherwise the process would be too close to One-Choice.
There are several directions for future work.First, recall that our lower bounds apply to a large class of processes which allocate all balls within the same batch independently.However, there are processes which allocate multiple balls in a coordinated way.For example, the process of Park [32] draws d samples, and then places into each of the k least loaded bins one ball.It would be interesting to explore the gap of this type of processes in the b-Batched setting.A second avenue is to analyze Two-Thinning processes (and in particular processes that use a fixed load threshold relative to the average) in outdated information settings.An experimental study of threshold processes with outdated information was already conducted in 1989 [25, Figure 8], but no rigorous bounds were proven.A third possibility is to investigate whether the (1 + β) and related processes are superior to Two-Choice in other settings, like the τ -Delay or random noise settings studied in [22].Finally, one could study settings where the load information of bins is updated at different rates, depending on the specific bin.In such a setting, when deciding between sampled bins, both their reported load estimates and update rates should be taken into account.

A.1 Auxiliary Probabilistic Claims
For convenience, we add the following well-known inequality for a sequence of random variables, whose expectations are related through a recurrence inequality.
Lemma A.1.Consider a sequence of random variables (X i ) i∈N such that there exist a ∈ (0, 1) and b > 0 such that every i 1, Then, for every i 1, Proof.We will prove by induction that for every i ∈ N, For i = 0, it trivially holds that E X 0 | X 0 X 0 .Assuming the induction hypothesis holds for some i 0, then since a > 0, The claims follows using that for a ∈ (0, 1), ∞ j=0 a j = 1 1−a .For the next lemma, we define for two n-dimensional vectors x, y, x, y := n i=1 x i • y i .
Lemma A.2 ([20, Lemma A.7]).Let (p k ) n k=1 , (q k ) n k=1 be two probability vectors and (c k ) n k=1 be non-negative and non-increasing.Then if p majorizes q, i.e., for all 1 k n, k i=1 p i k i=1 q i holds, then p, c q, c .
We continue with an "anti-concentration" result, i.e., a lower bound on the probability that a binomial random variable is significantly larger than its expectation.
Lemma A.3.Let m, n be integers such that m n log n.Further, let p be a probability satisfying p ∈ [1/(2n), 1/2] and let X ∼ Bin(m, p).Then for any constant ξ ∈ (0, 1), there exists a constant κ 0, such that Proof.Since X ∼ Bin(m, p), we know that

A.2 Concentration Inequalities
We now proceed by stating a standard Chernoff bound.

Figure 1 . 1 :
Figure 1.1:The b = 750 balls of the latest batch shown in red allocated over the n = 35 bins (left) for Two-Choice and (right) (1 + β) with β = 1/2.Observe that Two-Choice allocates more aggressively on the bins that are lightly loaded at the beginning of the batch, while (1 + β) spreads the allocations more evenly.

Figure 1 . 2 :
Figure 1.2:In the b-Batched setting for large batch size b, the gaps achieved by the processes are ordered by their maximum entry in the probability allocation vector p: Three-Choice with max i∈[n] p i ≈ 3 n , Two-Choice with max i∈[n] p i ≈ 2 n , (1 + β) with max i∈[n] p i ≈ 1+β n for β = 0.5, β = (n/b) • log n and β = (n/b) • log n.See Fig. 6.3 for full details of the experiment.
) b-Batched Setting with Weights and Random Tie-Breaking Parameters: Batch size b n, probability allocation vector p, weight distribution W. Iteration: For each t = 0 • b, 1 • b, 2 • b, . ..: Further, consider the weighted b-Batched setting with weights from a Finite-MGF(S) distribution with S 1 and a batch size b 2CS (C−1) 2 • n.Then, there exists a constant k := k(δ) > 0, such that for any step m 0 being a multiple of b, Pr max i∈[n]

Corollary 3 . 2 .
Let b n log n and consider the weighted b-Batched setting with weights from a Finite-MGF(S) distribution with S ∈ [1, b/4n].Then, there exists a constant k > 0 such that for the (1 + β)-process with β = 4S • n b and for any step m 0 being a multiple of b,

. 3 )
The proof proceeds in a similar manner to[20,  Lemma 4.1], but we bound the terms in (3.6) and (3.10) more tightly using the new condition C 3 .Compared to the statement of [20, Lemma 4.1], the coefficients of the term γ 2 n change from 5C 2 S 2 b 2 n to 5(C −1) 2 b 2 n .Note that C is replaced by C − 1, which makes a difference when C = 1 + o(1), and that S does not appear as we have assumed that b 2CS (C−1) 2 • n.Proof.Consider an arbitrary step t 0 being a multiple of b and for convenience let p = p t .First note that the given assumptions γ n 2(C−1)•b and b

4Theorem 4 . 1 . 1 .
Tight Bound: O( (b/n) • log n) Gap In this section, we will prove the stronger O (b/n) • log n bound on the gap for a family of processes in the weighted b-Batched setting (with b ∈ [2n log n, n 3 ]).More specifically, these processes are a subset of the ones analyzed in Section 3 and include the (1 + β)-process with β = (n/b) log n, as well as Quantile(1/2) mixed with One-Choice.As we will show in Section 5, these processes achieve the asymptotically optimal bound.Consider the weighted b-Batched setting with any b ∈ [2n log n, n 3 ] and weights from a Finite-MGF(S) distribution with constant S Further let ε = (n/b) • log n.

using in (a) the Taylor estimate e v 1 + 2v (for v 1 )• log n 1 log n and S 1 ζ
and that γ 2 • 15 ζ and in (b) that (4.2),

Now we are ready to complete the proof of Theorem 4. 1 . 2 cn
Proof of Theorem 4.1.First consider the case when m b • log 3 n.Let t 0 = m − b • log 3 n.Let E t := Γ t

1 2n.T n −ξ/ 2 .p j 1 2n,
Hence, we set T := b• p j +κ• b • p j • log n, and applying Lemma A.3 yields for any bin j ∈ B, Pr x b j Since |B| n ξ , the claim follows.

Table 1 . 3 :
Overview of the gap bounds in previous works (rows in Gray ) and the gap bounds derived in this work (rows in Green ).All gap bounds hold with probability at least 1 − o(1).Lower bounds hold for sufficiently large enough m.
[20,n , which means that the weight of the ball sampled in step t is O(log n) (since by assumption ζ > 0 is constant).By a simple Chernoff bound and a union bound, we can deduce that this holds for a poly(n)-long interval.Lemma 4.4 (cf.[20,Lemma 5.4]).Consider any Finite-MGF(ζ) distribution W with constant ζ > 0.Then, for any steps t 0 0 and t 1 ∈ [t 0 , t 0 + n 3 log 3 n], we have that cn26and H t holds.We start by bounding the load of any bin i ∈ [n], n) and γ 2 := γ 1 8•30 .Consider any step t 0, such thatΓ t 1 2where in the second implication we used log(2 c) + 26 γ 1 log n 27 γ 1 log n, for sufficiently large n.First statement.Using (4.1), we bound the contribution of any bin i ∈ [n] to Γ t 2 as follows,