The Power of Two Choices with Load Comparison Errors

In this paper, we analyze the effects of erroneous load comparisons on the performance of the Po2 scheme. Specifically, we consider load-dependent and load-independent errors. In the load-dependent error model, an incoming job is sent to the server with the larger queue length among the two sampled servers with probability $\epsilon$ if the difference in the queue lengths of the two sampled servers is less than or equal to a constant $g$; no error is made if the queue-length difference is higher than $g$. For this type of errors, we show that the benefits of the Po2 scheme is retained as long as the system size is sufficiently large and $\lambda$ is sufficiently close to $1$. Furthermore, we show that, unlike the standard Po2 scheme, the performance of the Po2 scheme under this type of errors can be worse than the random scheme if $\epsilon>1/2$ and $\lambda$ is sufficiently small. In the load-independent error model, the incoming job is sent to the sampled server with the {\em maximum load} with an error probability of $\epsilon$ independent of the loads of the sampled servers. For this model, we show that the performance benefits of the Po2 scheme are retained only if $\epsilon \leq 1/2$; for $\epsilon>1/2$ we show that the stability region of the system reduces and the system performs poorly in comparison to the {\em random scheme}.


Introduction
Modern data centres consist of large numbers of parallel servers.Balancing the load across these servers is crucial to ensure better resource utilization and satisfactory quality of service.To distribute the load across the servers uniformly, each incoming job needs to be assigned to an appropriate server in the system.Assigning an incoming job to one of the servers is done by a job dispatcher using a load balancing scheme.For homogeneous systems, it is well known that the Join-the-Shortest-Queue (JSQ) load balancing scheme [1,2], where an incoming job is assigned to the server having smallest number of ongoing jobs, is optimal in terms of minimizing the average response time of jobs.For heterogeneous systems, variants of the JSQ scheme are known to be asymptotically optimal [3,4].However, implementing JSQ-type schemes is often difficult in practice as dispatchers only have local views of the system and therefore can access the load information of only a subset of servers [5,6].A popular alternative to the JSQ scheme, called the Join-the-Idle-Queue scheme [7,8], requires only the knowledge of the idle servers.However, even this scheme suffers from similar implementation issues as the dispatcher needs to store the idle tokens of a large number of servers.
Due to the challenges mentioned above, randomised schemes, such as the celebrated Power-of-d choices or the Pod scheme [9,10], are widely used in practice [11,12].In the Pod scheme, an arrival is sent to the server with the shortest queue length among a set of d servers sampled uniformly at random.For d = 1, the Pod scheme reduces to the random scheme.For d = 2, it is well-known that an exponential improvement in the average response time is achieved in comparison to the random scheme.However, this result requires that the server with the least queue length among the two sampled servers is always correctly identified.In a real system, the queue lengths stored at a dispatcher may get outdated due to infrequent updates from the servers [13,14].This could result in misidentifying the server currently having the smaller queue length.Another scenario where an error can occur, is when an adversary tries to carry out an attack by misreporting the queue-lengths sent from the servers to the dispatcher [15].The attacker can manipulate the queue lengths in a way that the dispatcher assigns the job to the server with the maximum load among the two sampled servers.Such erroneous assignments can significantly increase average response time of jobs and may even cause the system to become unstable.The importance of studying the effect of inaccurate load comparisons on load balancing was highlighted as early as 2001 in a survey paper by Mitzenmacher, Richa and Sitaraman [16].However, except from the static setting [17,18], where there is a finite pool of jobs, this problem has not been studied.
Our Contributions: This motivates us to consider the effects that comparison errors can have on the performance of the Po2 scheme in the dynamic setting.We consider two types of comparison errors, referred to as the load-dependent and load-independent errors.In the load-dependent error model, an "error" is made with probability ǫ ∈ [0, 1] if the difference in the queue lengths of the sampled servers is sufficiently small (less or equal to a constant g ≥ 0); if this is not the case (i.e., if the queue-length difference is higher than g), then no error occurs.An error, in this context, refers to the event where the job is sent to the server having the larger queue length among the two sampled servers.Hence, in this model, the dispatcher makes an error only when the current queue lengths of the sampled servers are close to each other; this is natural to expect when errors occur primarily due to outdated queue lengths at the dispatcher as servers having close queue lengths are likely to be more affected by this type of errors.To model errors due to adversarial attacks, we consider the load-independent error model in which an error occurs with probability ǫ independent of the current loads of the sampled servers.Clearly, this model of error can have a more drastic impact on the system's performance than the load-dependent error model.Our goal is to characterise the performance of the Po2 scheme under these two error models for a system where there are n unit-rate servers and jobs with exponentially distributed sizes arrive according to a Poisson process with rate nλ (λ < 1).
It is natural to expect that as g and ǫ increases (i.e., as the error rate becomes higher), the performance of the Po2 scheme under the load-dependent error model would deteriorate, eventually resulting in a performance poorer than the random scheme.While this is true for light traffic (small values of λ), we show that, in the heavy traffic limit (λ → 1) and large system sizes, the performance of the system remains exponentially better than that under the random scheme for all values of g and ǫ.This implies that the benefits of sampling one additional server in the Po2 scheme outweighs the negative impact of comparison errors when the system operates at its maximum capacity.This result can be explained though the dynamics of the system in the mean-field regime.Specifically, we show the fixed point of the mean-field has a super-exponential decay of tail probabilities for all values of g and ǫ.While this decay rate is slower than that under the standard Po2 scheme, it is still super-exponential and therefore its benefits in comparison to the exponential decay rate under the random scheme become more prominent in the heavy traffic regime.
For the load-independent error model, we show that the benefits of the Po2 scheme are retained only if the error probability ǫ satisfies ǫ ≤ 1/2.For ǫ > 1/2, we show that system becomes unstable for arrival rates larger than 1/2ǫ and the performance becomes worse than that under the random scheme.This can be intuitively explained by the fact that for ǫ > 1/2 the Po2 scheme chooses the server with the larger queue length more often than the server with the smaller queue length.
From a technical point of view, we make a number of important contributions.First, we derive the stability region of the system for both error models and establish uniform (in the system size n) bounds on the stationary expected queue length per server.These bounds are essential to establish tightness of the stationary measures and interchange of limits in the mean-field regime.The existing results of [19] on JSQ-type load balancing schemes do not apply to our schemes since a job is not always sent to the server with the minimum queue length among the sampled servers.Although the fluid limit results of [20] can be applied to derive stability conditions, they do not yield the uniform bounds essential to establish tightness of the stationary distributions.To obtain both stability conditions and the uniform bounds, we use drifts of suitable Lyapunov functions.However, bounding the drifts of these Lyapunov functions is difficult for our schemes as the schemes compare only a subset of queues at each arrival and do not always choose the shortest queue as the final destination.We develop a generic approach through which the drift can be bounded for any scheme where queue lengths of multiple servers are compared to dispatch each job.
The second important technical contribution is the mean-field analysis of the Po2 scheme under the load-dependent error model.This analysis differs significantly from conventional analysis in that the fixed point of the mean-field in this case does not satisfy any recursion.For such a system, even the existence of the fixed point is not evident.Proving global stability is also more complicated as it relies on induction on the component index.To prove the desired results, we use a new approach that employs bounds on the decay rate of the mean-field process and its monotonicity.We believe that this approach is more broadly applicable to other models where a fixed point does not admit a recursive relationship.

Related Works
In the last two decades, the Pod scheme has emerged as a widely used load balancing scheme due to its promising gains and minimal overhead.It has been studied extensively under various scaling limits and traffic conditions.The mean-field scaling limit for this scheme was first studied for exponential service time distributions in the seminal works [9,10].Their results were later generalised to general service time distributions in [21,22].The heavy traffic optimality of the Pod scheme has been established in [23,24].In [25], the analysis of the Pod scheme has been carried out for the case where the number of choices, d, is allowed to depend on the system size n and d(n) = ω(1).In this work, both the meanfield and Halfin-Whitt regimes are considered.In the mean-field limit, the Pod scheme has been shown to reach the same performance as the JSQ scheme.Recently, the Pod scheme has been analysed for different graph topologies.For example, in [26], the Pod scheme is analyzed for non-bipartite graphs and sufficient conditions on the graph sequence is obtained to match the result on complete graphs in the mean-field limit.For the bipartite graphs, the Pod scheme and its variants have been analysed in [27,28].In all cases, results similar to the complete graph setting have been obtained.For heterogeneous systems, the Pod scheme has been studied in [29,30] where speed-aware versions of the Pod scheme have been shown to yield similar performance benefits.
The above mentioned results for the Pod scheme assume that on each arrival the dispatcher has the accurate knowledge of the queue lengths of the d sampled servers.However, this assumption may not be true in practice due to the issues discussed in the introduction.Recently, in [17], the balls and bins problem was studied under various noisy load comparison models.Here, n balls are placed into n bins sequentially and at each step a ball is placed into a bin from a subset of d bins chosen at random.For the load-dependent error model discussed above, it has been shown that gap between the maximum and the average load is O( g log(g) log log(n)).This result motivates us to consider the effects of noisy load comparisons on the performance of the Po2 scheme in the dynamic setting where jobs are allowed to leave the system after being served.To the best of our knowledge, this is the first work that studies the Po2 scheme under the erroneous load comparison model in the dynamic setting.

System Model
We consider a system consisting of n parallel servers, each with its own queue of infinite buffer size.Each server is able to process jobs at unit rate.Jobs arrive according to a Poisson process with a rate nλ (λ < 1).Each job requires a random amount of work, independent and exponentially distributed with unit mean.The inter-arrival and job lengths are assumed to be independent of each other.A job dispatcher assigns each incoming job to a queue of a server where jobs are served according to the First-Come-First-Server (FCFS) scheduling discipline.
The job dispatcher uses the Po2 scheme to assign jobs to the servers1 .Under the classical Po2 scheme, a job is sent to the server with the minimum queue length among two servers, chosen uniformly at random.However, in practice, the server with the smaller queue length may not be always be correctly identified either due to outdated queue-length information at the dispatcher or due to an attacker misreporting the queue lengths sent from the servers to the dispatcher.Motivated by these scenarios, in this paper, we consider the following versions of the Po2 scheme where load comparisons are not always accurate.In the following, an error refers to the event where an arrival is sent to the server with the larger queue length among the two sampled servers.

Load-Dependent Error Model
In this model, an error occurs with probability ǫ ∈ [0, 1] only when the difference in queue lengths of the sampled servers is in the range (0, g] for some constant g ≥ 0. If the queuelength difference is strictly above g, then we assume that no error is made, i.e., the job is sent to the server with the smaller queue length.In case of a tie, we assume that an arbitrary tie breaking rule based on server indices is used.Without loss of generality (WLOG), we assume that servers are indexed from the index set [n] = {1, 2, . . ., n}, and, in case of a tie, the job is sent to the server with the smaller index among the two sampled servers.We refer to the Po2 scheme under this model of error as the Po2-(g, ǫ) scheme.

Load-Independent Error Model
In this model, an error occurs with probability ǫ ∈ [0, 1] independent of the current queue lengths of the sampled servers.More precisely, the incoming job is sent to the server having the higher queue length among the sampled servers with probability ǫ and with probability 1 − ǫ it is sent to the server with the smaller queue length.Ties are broken in the same way as discussed before.For simplicity, we refer to the Po2 scheme under this model of error as the Po2-ǫ scheme.

System State and Notations
To analyze the system under the schemes discussed above, we first introduce Markovian state descriptors of the system.We use two Markovian state descriptors.First, we define the queue-length vector at time t ≥ 0 as where Q (n) k (t) denotes the queue length of the k th server.Second, we define the tail measure on the queue lengths at time t as , where x (n) i (t) denotes the fraction of servers with at least i jobs at time t.For completeness, we set x (n) i (t) = 1 for all i ≤ 0 and all t ≥ 0. From the Poisson arrival and the exponential job size assumption it is clear that both Q (n) = (Q (n) (t), t ≥ 0) and x (n) = (x (n) (t), t ≥ 0) are Markov processes.When the system is stable, we denote by π n the unique invariant measure of the process x (n) and we use x (n) (∞) and Q (n) (∞) to denote the steady-state values of the processes x (n) and Q (n) , respectively.As the load balancing scheme does not distinguish between servers, we have for each i ∈ [n] and each t ∈ [0, ∞].To define the state space of the process x (n) , we first define the space S as S {s = (s i ) : Note that the space S is compact under the norm defined as The process Q (n) takes values in Z n + and the process x (n) takes values in the space S (n) defined as S (n) {s ∈ S : We further define the space S as follows S {s ∈ S : where the ℓ 1 -norm, denoted by • 1 , is defined as s 1 i≥1 |s i | for any s ∈ S.

Main Results and Insights
In this section, we summarise our main results and discuss their consequences.In the following theorem, we characterise the stability region for each of the two schemes discussed above.
Furthermore, for λ < 1, the steady-state average queue length per server is bounded above as (ii) For any ǫ ∈ [0, 1] and n ≥ 2, the system under the Po2-ǫ scheme is stable, if and only if λ < min 1, 1 2ǫ .Furthermore, for λ < min(1, 1/2ǫ), the steady state average queue length per server is bounded above as In addition to the stability regions, the above theorem gives uniform (in the system size n) bounds on the steady-state mean queue length per server for each scheme.These uniform bounds are crucial in establishing the tightness of stationary measures and justifying interchange of the limits in lim n→∞ lim t→∞ x (n) (t) = lim t→∞ lim n→∞ x (n) (t), which shows that the mean-field approximation of the steady-state behaviour of the finite system is asymptotically exact.
The bounds in ( 1) and ( 2) also help us to compare the performance of the Po2-(g, ǫ) and the Po2-ǫ schemes to that of the random scheme.For example, when ǫ ≤ 1/2, both the upper bounds reduce to λ/(1 − λ) which is the steady-state average queue length per server under the random scheme.This implies that, under both models of error, the Po2 scheme performs better than the random scheme when the error probability ǫ ≤ 1/2.This is intuitive, as, for ǫ ≤ 1/2, an incoming job under the Po2 scheme is sent to the server with the smaller queue length more often than to the server with the larger queue length.For ǫ > 1/2, however, the schemes may perform poorly in comparison to the random scheme (as both bounds become higher than λ/(1 − λ)).
This is numerically verified in Figure 1 and Figure 2 for the Po2-(g, ǫ) scheme and in Figure 3 for the Po2-ǫ scheme.In each of these figures, we plot the steady-state mean response time of jobs as a function of the normalized arrival rate λ.From Figures 1 and 3, we observe that both the schemes outperform the random scheme when ǫ ≤ 1/2.For ǫ > 1/2, however, the Po2-ǫ scheme becomes unstable for λ ≥ 1/2ǫ and its performance becomes poorer than that of the random scheme for all λ < 1.For the Po2-(g, ǫ) scheme, we observe from Figure 2 that the system is stable for all λ < 1 even when ǫ > 1/2.However, in this case, the performance of the Po2-(g, ǫ) scheme is poorer than that of the random scheme for small values of λ.
The usual approach of proving results similar to the ones stated in Theorem 1 consists of coupling and stochastic comparison with the random scheme.However, this approach does not work here since the random scheme can outperform each of the two schemes when ǫ > 1/2.Instead, we use drifts of suitable Lyapunov functions to prove Theorem 1 which   holds for all ǫ ∈ [0, 1].Establishing bounds on the drifts of Lyapunov functions is difficult for our schemes as only a subset of servers is compared at each arrival instant and the job is not always sent to the sampled server with the minimum queue length.We develop a generic approach through which the required bounds can be obtained for any scheme where queue lengths of multiple servers are compared to dispatch the incoming jobs.
Hence, the process x, defined in Theorem 2.(i), characterises the dynamics of the system in the limit as n → ∞.This will be referred to as the mean-field limit of the system or the mean-field process.The evolution of x, described in (3), can be explained as follows.For the n th system, the component x n i (t) increases by 1/n when a job joins a server with queue length exactly i − 1.The rate at which this happens is nλp i−1 (x), where p i−1 (s), for s ∈ S, is the probability that an arrival joins a server with queue-length i − 1 when the system is in state s.The expression of p i−1 (s) in ( 4) can be obtained as follows.Under the Po2-(g, ǫ) scheme, a job joins a server with queue length i − 1 under the following scenarios: (1) One of the two sampled servers is of queue length exactly i − 1 and the other sampled server is of queue length is at least i + g.This occurs with probability 2(s i−1 − s i )s i+g and, in this case, the job joins the server queue length i − 1 with probability 1. (2) One of the sampled server is of queue length i − 1 and the other sampled server's queue length lies in the range {i − 1 − g, . . ., i − 2}.This occurs with probability 2(s i−1 − s i )(s i−1−g − s i−1 ), and, in this case the server with queue length i − 1 is selected with probability ǫ. (3) One of the sampled server is of queue length i − 1 and the other sampled server's queue length lies in the range {i + 1, . . ., i − 1 + g}.This occurs with probability 2(s i−1 − s i )(s i − s i+g ) and, in this case, the server with queue length i − 1 is selected with probability (1 − ǫ).4) Finally, both the sampled servers can have the same queue length i − 1 with probability (s i−1 − s i ) 2 , and, in this case, the job joins a server with queue length i − 1 with probability 1. Combining the above probabilities, we obtain the expression for p i−1 (s).Similarly, the component x n i (t) decreases by 1/n when a job leaves a server with queue length i and this occurs with rate n(x i − x i+1 ).Hence, the total expected rate of change (drift) in the component i+1 (t)).In the limit as n → ∞, this becomes the rate of change of x i (t).
In part (ii) of Theorem 2, we show that, as t → ∞, the mean-field process x converges in ℓ 1 to the unique point x * ∈ S at which G(x * ) = 0.This point x * is referred to as the fixed point of the mean-field since starting at this point the mean-field remains at this point at all times.Since by part (i) we have x n (t) → x(t) almost surely for each t ≥ 0, the convergence to the fixed point implies lim t→∞ lim n→∞ x n (t) = lim n→∞ lim t→∞ x n (t) = x * , which, in turn, means that the fixed point x * characterises the steady-state behaviour of the system in the limit as n → ∞.In particular, lim n→∞ P(Q In Theorem 2.(iii) we compare the steady-state mean response time of jobs under the Po2-(g, ǫ) scheme to that under the random scheme when the traffic is high (i.e., λ → 1).Note that (7) implies that T g,ǫ 2 (λ) = O(log T 1 (λ)) as λ → 1.Furthermore, by the previous part of the theorem, the steady-state mean response time of the jobs under the Po2-(g, ǫ) scheme converges as n → ∞ to T g,ǫ 2 (λ).Hence, this result shows that, when the system is heavily loaded, the mean response time of jobs under the Po2-(g, ǫ) scheme is exponentially smaller than that under the random scheme.This is also verified in Figure 2 for ǫ = 0.8 and g = 100.Note that for such high error rates, the mean response time of jobs under the Po2-(g, ǫ) policy can be larger than that underthe random scheme for low values of λ.However, when λ is close to its maximum value 1, the Po2-(g, ǫ) scheme performs exponentially better than the random scheme for all values of g and ǫ.This implies that the advantage of having an additional choice in the Po2 scheme outweighs the negative impact the comparison errors when the traffic is high.
The main difficulty in proving Theorem 2 is that the fixed point x * cannot be found in closed form.This is because each component x * k in ( 6) depends not only on the previous components but also on the next g components.This makes it hard to characterise the fixed point; indeed, even the existence of such x * in S is not evident.This also makes proving the global stability difficult as it uses induction on the component index k.To overcome these difficulties, we use the monotonicity of the mean-field and uniform bounds on its tails.We believe that this new approach is generally applicable to similar systems where the fixed point cannot be found in closed form.
We now present the asymptotic results for the Po2-ǫ scheme in the following theorem.x n (t) − x(t) 1 a.s → 0 where x = (x(t) = (x i (t), i ≥ 1), t ≥ 0) satisfies x(0) = u and for t ≥ 0 and i ≥ 1 Here for each i ≥ 1, F i is the i th component of the function F = (F i , i ≥ 1) : S → R ∞ and, for s ∈ S, p i−1 (s) is defined as We refer to the process x as the mean-field limit of the sequence (x (n) ) n≥1 .
(iii) (Heavy-Traffic Limit): For ǫ ≤ 1/2 and λ < 1, we have where is the limiting (as n → ∞) steady state average response time of jobs under the Po2-ǫ scheme.
In parts (i) and (ii) of Theorem 3, we characterize the mean-field limit x and its fixed point x * under the Po2-ǫ scheme.As before, we show that the fixed point is unique and globally asymptotically stable.In the last part (part iii) of Theorem 3, we compare the mean response time of jobs under the Po2-ǫ scheme to that under the random scheme in the limit as n → ∞.Our result indicates that when n is large and λ is close to 1, the steady state mean response time of jobs under the Po2-ǫ scheme satisfies T ǫ 2 (λ) ≈ c ǫ log(T 1 (λ), where c ǫ = 1/(log(2 − 2ǫ)).This means that an exponential reduction in the steady state mean response time is achieved as long as ǫ ≤ 1/2.Hence, the Po2-ǫ scheme retains the benefits of the Po2 scheme as long as ǫ ≤ 1/2.

Stability and Uniform Bounds
In this section, we find the stability regions for the Po2-(g, ǫ) and the Po2-ǫ schemes and derive uniform bounds on the steady-state queue length per server (Theorem 1) using drifts of appropriate Lyapunov functions.We first develop a general framework to analyse any load balancing scheme that compares the queue lengths of two uniformly sampled servers to dispatch every job.Note that it is easy to generalise this framework further to cases where more than two servers are sampled and the sampling is not necessarily uniform.
For any function V : Z n + → [0, ∞), the drift of D Q n V is defined as the expected rate of change in the value of the function along the trajectory of the process Q n given the current state.More precisely, where e (n) i denotes the n-dimensional unit vector with one in the i th position; r ±,n i (Q) are the transition rates from the state Q to the states Q ± e (n) i .According to the Foster-Lyapunov theorem (Proposition D.3 of [31]), to prove the stability or positive recurrence of the process Q (n) , it is sufficient to show the existence of at least one function V : < 0 for all states Q lying outside a compact subset of the state-space.To further obtain uniform bounds on the stationary queue lengths, we use the fact (Proposition 1 of [32]) that The rate of departure from the i th server is given by r For any scheme which compares the states of two servers to dispatch the job to one of the servers, we define the class of an arrival as the (unordered) pair (i, j) of servers sampled at the arrival instant.Let C denote the collection of all such classes.Since |C|= n 2 and a job is equally likely to belong to one of these classes, the arrival rate of any class (i, j) ∈ C is nλ/ n 2 = 2λ/(n − 1).Hence, we can write the rate of arrival to the i th server as where p(Q i , Q j ) is the probability that a class (i, j) job joins the server i when the queue lengths of servers i and j are Q i and Q j , respectively.Note that the probability p(Q i , Q j ) depends on the load balancing scheme used by the dispatcher.The exact expression of p(Q i , Q j ) for each scheme is given later, but it is important to note that p(Q i , Q j ) + p(Q j , Q i ) = 1 since a class (i, j) job joins either server i or server j with probability 1.Now, for the Lyapunov function V : the drift given in ( 12) simplifies to which upon further simplification gives where represents the number of busy servers when system is in state Q.In the above, we have used the facts r −,n Moreover, using (13), the first term in the exresssion of the drift can be written as Thus, to obtain the stability region and the uniform bound on steady-state queue length, we need to obtain upper bounds on

Po2-(g, ǫ) Scheme
For the Po2-(g, ǫ) scheme, the probability p(Q i , Q j ) for any class (i, j) ∈ C is given by Using the above expression, we obtain the following bound for Po2-(g, ǫ) scheme.
Lemma 4. For g ≥ 0, ǫ ∈ [0, 1], and for any class (i, j) ∈ C, under the Po2-(g, ǫ) scheme, we have Proof.To prove the lemma, we first observe that for any a ≤ b, and The result of the lemma is direct when Q i = Q j .So, we consider the case Q i < Q j .Note that the proof for Q i > Q j is exactly the same with Q i and Q j interchanged.For ǫ ≤ 1/2, using ( 16) and Q i < Q j , we have where the last equality follows because of the assumption (18), we have Next, we note from (16) and which gives Therefore, using and This completes the proof.
Proof of Theorem 1.(i): Using the bound of Lemma 4, the RHS of ( 15) can be bounded as Therefore, using ( 19) in ( 14), we can upper bound the drift D Q (n) V (Q) for the Po2-(g, ǫ) scheme as Now, since B(Q) ≤ n and λ < 1, the drift , and is bounded above by n(λ(2g½(ǫ > 1/2)+ 1) + 1), otherwise.This shows that the system under the Po2-(g, ǫ) scheme is stable for all λ < 1.The necessity of this condition for stability can be established easily by showing that the drift of the Lyapunov function To prove (1), recall from the previous paragraph that Therefore, taking expectation of (20) and using the rate conservation equation E πn [B(Q)] = nλ (which holds in steady-state), we obtain where last equality follows due to the exchangeability of π n .

Po2-ǫ Scheme
For the Po2-ǫ scheme, p(Q i , Q j ) for any class (i, j) ∈ C is given by Using the expression above, we obtain the following bound.
Lemma 5.For ǫ ∈ [0, 1] and for any class (i, j) ∈ C, under the Po2-ǫ, scheme we have Proof.For Q i = Q j the above inequality follows directly.Similar to the proof of Lemma 4, it is sufficient to consider the case Hence, the proof is complete.
Proof of Theorem 1.(ii): Using the bound of Lemma 5, the RHS of ( 15) can be bounded as Therefore, using ( 23) in ( 14), we upper-bound the drift D Q (n) V (Q) for the Po2-ǫ scheme as Since B(Q) ≤ n, the above implies that, for λ < min(1, 1/2ǫ), the drift is strictly negative whenever . This shows that the system under Po2ǫ scheme is stable for all λ < min(1, 1/2ǫ).Furthermore, since sup which proves (2).Next we prove that for λ ≥ min(1, 1 2ǫ ) the system is unstable.For ǫ ≤ 1/2 and λ ≥ 1, the process Q (n) is not positive recurrent.This follows using the same argument as used in the stability proof of the Po2-(g, ǫ) scheme.Now, for ǫ > 1/2 and 2λǫ > 1, we consider the Lyapunov function is the minimum such index.Using (12), the drift of the function V 2 (Q) can be written as From ( 13), we have where the inequality follows from (21) since , the result follows from the Foster-Lyapunov criterion for transience and null recurrence (Theorem 3.3.10 of [33]).

Mean-Field Analysis of the Po2-(g, ǫ) scheme
In this section, we prove the main results for the Po2-(g, ǫ) scheme stated in Theorem 2.

Mean-Field Limit of the Po2-(g, ǫ) Scheme
First, we establish the mean-field limit of the Po2-(g, ǫ) scheme (Theorem 2.(i)) for any g ≥ 0 and ǫ ∈ [0, 1].Note that under the Po2-(g, ǫ) scheme, the rate of transition of the process x (n) from state x ∈ S (n) to state y ∈ S (n) is given by r where p i−1 (x) is as defined in ( 4) and e i is the i th unit vector in R ∞ .Clearly, the above rates satisfy the transition structure for a density-dependent jump Markov chain [9,34].Furthermore, it is easy to verify that y∈S r (n) x,y < n(λ + 1) for all x ∈ S and the function G : S → R ∞ is Lipschitz under the ℓ 1 -norm with a Lipschitz constant of L ǫ λ = λ(16ǫ + 4)+ 2 (proved in Lemma 6).Hence, using the Kurtz's theorem for density-dependent jump Markov processes [ [35], Chapter 8], we obtain the desired result.Lemma 6.The function G(x) is Lipschitz under the ℓ 1 norm with constant L ǫ λ = λ(16ǫ + 4) + 2.
Proof.Note that we can write (4) for any x ∈ S as Now from (26), using the triangle inequality we can write for any x, y ∈ S Therefore, using the above inequality we can write Hence, the result follows from ( 27) and ( 28).

Mean-Field Steady State Behaviour for the Po2-(g, ǫ) Scheme
We now turn to the proof of Theorem 2.(ii) which shows that the mean-field process x given by ( 3) has a unique fixed point x * which satisfies ( 5) and ( 6).Moreover, we show that the fixed point x * is globally stable, i.e., all trajectories of the mean-field process x starting in S converges to x * .For any u ∈ S, let x(t, u) denote the trajectory of the mean-field process starting at state u.Further, define v k (t, u) = i≥k x i (t, u) and v k (u) = i≥k u i for each k ≥ 1.When the context is clear, we shall drop the dependence of the trajectory on the initial state u and on the time t.Lemma 7. Let g ≥ 0, ǫ ∈ [0, 1].The following statements hold for the process x defined in Theorem 2.
1.If u ∈ S, then, for any t ≥ 0, we have x(t, u) ∈ S.
2. For any u, u ′ ∈ S satisfying u ≤ u ′ we have x(t, u) ≤ x(t, u ′ ) for all t ≥ 0, where the inequality ≤ is understood component-wise.
Proof.We first note from (3) and (4) that ẋi where M λ,ǫ = λ max (1, 2ǫ).This implies that for each i ≥ 1 we have Using the above recursively for each i ≥ 1, we obtain Summing the above for all i ≥ 1 we obtain This shows that if v 1 (x(0)) < ∞, then v 1 (x(t)) < ∞ for all t, thus establishing the first part of the lemma.
For the second part, we note that for each i ≥ 1, ) is non-decreasing with respect to all x k , k = i.Hence the result follows from Theorem 5.3 of [36].
The second property stated in Lemma 7 is called the quasi-monotonicity of the process x.This property ensures that if the mean-field process starts from the idle initial state, i.e., if x(0) = e 0 = (1, 0, 0, . . .), then it is monotonically non-decreasing in time, i.e., This follows because the state e 0 is dominated by any other state in S. In particular, e 0 ≤ x(t 2 − t 1 , e 0 ).Hence, by quasi-monotonicity, we have Furthermore, Lemma 7 guarantees that if x(0) ∈ S then x(t) ∈ S for all t ≥ 0. Hence, by adding (3) for all i ≥ k and using the fact that where the last inequality follows by using Existence of the Fixed Point x * : To prove the existence of the fixed point x * satisfying ( 5) and ( 6), we first show that x i (t, e 0 ) 1 remains uniformly bounded for all t ≥ 0. Note that this is a stronger result than x i (t, e 0 ) 1 < ∞ for each t ≥ 0 which has already been established in Lemma 7.
Proof.Since x i (t, e 0 ) ∈ [0, 1] for each i and all t ≥ 0 and x i (t, e 0 ) is monotonically nondecreasing in time we must have x i (t) → x * i as t → ∞ for each i ≥ 1 for some x * = (x * i ) ∈ S. We first show that the component-wise limit x * defined above is also the ℓ 1 limit of x(t, e 0 ) which will also imply that x * ∈ S. To show this, we note from Proposition 8 that the uniform bound on i≥1 x i (t, e 0 ) in (33) implies by dominated convergence theorem that lim t→∞ This shows that x(t, e 0 ) − x * 1 → 0 as t → ∞, and x * ∈ S. It now remains to show that G(x * ) = 0. Note that the convergence of x(t) → x * in ℓ 1 as t → ∞, and the monotonicity of x(t) imply that for any δ > 0 there exists a t δ > 0 such that for all t ≥ t δ we have is the time at which the continuous function G i (x(s)) attains its minimum value in the compact interval [t, t + h].Therefore, we have Now we can write where for the second inequality we use and the fact that the function G is Lipschitz with constant L ǫ λ and (35).Note that the above inequality is true for any δ > 0. Therefore, by fixing h > 0 and letting δ → 0 we have G i (x * ) = 0 for all i ≥ 1.Hence, G(x * ) = 0. Finally, we obtain (5) by using i≥1 G i (x * ) = 0 and (6) by using i≥k G i (x * ) = 0.
Global Stability and Uniqueness of the Fixed Point x * : Now we prove that for any u ∈ S, x(t, u) converges to x * as t → ∞ in ℓ 1 , where x * is the limit of x(t, e 0 ) as defined in Lemma 9.By Proposition 8 and the dominated convergence theorem, it suffices to establish this convergence component-wise.Furthermore, it is sufficient to consider initial points u ≤ x * and u ≥ x * since, by the quasi-monotonicity of x, we have x(t, min(u, x * )) ≤ x(t, u) ≤ x(t, max(u, x * )), where the min and the max are taken component-wise.
We shall now establish the convergence x i (t, u) → x * i for all i ≥ 1 by showing where C i > 0 is a finite constant for each i ≥ 1.To prove (38), we use induction on i.For i = 1, using (5) we have where the second equality follows from (31) for k = 1 and the inequality follows as v 1 (t, u) is uniformly bounded in t.Since the RHS is independent of τ , the integral on the left hand side must be by bounded v 1 (u) as τ → ∞.This shows the base case of the induction.Now assume that (38) is true for all i ≤ L − 1.For i = L, using ( 31) and ( 6), we have where the inequality follows from the uniform boundedness of v 1 (t, u) in t.To complete the proof, we shall now bound each integral term appearing on the RHS.By using the induction hypothesis and the inequalities a It now remains to bound the integral in the last term.Note that the third term contains x i (t) for i ∈ {L, L + 1, . . ., L + g − 1} for which the induction hypothesis does not apply.Hence, to bound the integral we need to bound these terms.We note that by monotonicity we have x i (t) ≥ x * i for all i ≥ 1.Hence, Hence, the last term can be bounded above by Using the induction hypothesis, we can further bound this by 2ǫλ L+g−1 i=L (C i−1−g + C i−g ).This completes the proof of global stability of x * .Since all the trajectories converge to x * , it must be the unique solution of G(s) = 0 since starting from any other y = x * , satisfying G(y) = 0, the trajectory remains at y which contradicts the global stability of x * .
Limit Interchange: Note that Theorem 1.(i), implies that π n (S) = 1, ∀n.Therefore, we have π n ( S) = 1 for all n.Since, the space S is compact, by Prohorov's theorem the sequence (π n ) n must converge weakly to the limit π * with π * ( S) = 1.Furthermore, since by ( 1), E πn i≥1 x (n) i (∞) is uniformly bounded in n, we have π * (S) = 1.Now we prove that the measure π * is the stationary measure of the mean-field process x defined in (3).We know that (π n ) n ⇒ π * and the space S is separable.Therefore, the Skorokhod's Representation Theorem implies that x (n) (0) a.s → x(0).Moreover, if we start the process x (n) (0) ∼ π n , then x (n) (t) ∼ π n for all t ≥ 0. Hence, from Theorem 2.(i) it follows that x(t) ∼ π * for all t ≥ 0. This proves that π * is indeed the stationary measure for the meanfield process x.Now from the global stability of the fixed point x * , it follows immediately that the stationary measure π * is unique and is equal to δ x * .This completes the proof of limit interchange.

Mean-Field Analysis of the Po2-ǫ Scheme
In this section we prove the main result for the Po2-ǫ scheme stated in Theorem 3.

Mean-Field Limit of the Po2-ǫ Scheme
We first establish the mean-field limit of the Po2-ǫ policy given in Theorem 3-i.First note that the rate of transitions of the process x (n) from x ∈ S (n) to y ∈ S (n) is given by q where 2 ) is the probability that an arrival joins a server of queue length i − 1.The mean-field limit of the Po2-ǫ scheme is proved using the similar argument as shown for the Po2-(g, ǫ) scheme.First we show that the function F(x) for x ∈ S defined in (8) is Lipschitz under ℓ 1 -norm.
Proof.Let x, y ∈ S. Then we have This completes the proof.Furthermore, from (39) it is clear that the rate at which the jumps occur in x (n) is bounded everywhere that is y∈S q (n) x,y < n(λ + 1), ∀ x ∈ S. (40) Therefore, using Lemma 10 and from (40) we conclude that the conditions of Kurtz's Theorem are satisfied.Hence, we have This completes the proof.

Mean-Field Steady State Behaviour for the Po2-ǫ Scheme
In this section we prove Theorem 3.(ii), which shows that the differential equations defined in (8) has a unique fixed point x * and it follows the recursion defined in (10).Moreover, we prove that the fixed point x * is globally stable and finally establish the interchange of limits.
For u ∈ S, we define The following statements hold for the process x defined in Theorem 3.
1.If u ∈ S, then, for any t ≥ 0, we have x(t, u) ∈ S.
2. For any u, u ′ ∈ S satisfying u ≤ u ′ we have x(t, u) ≤ x(t, u ′ ) for all t ≥ 0, where the inequality ≤ is understood component-wise.

Summing the above for all
This completes the proof of first part.
To prove second part we need to show that dx i (t) dt is non-decreasing in x j (t) for all j = i.We know from (8) that Therefore, it is clear that the above expression is non-decreasing with x i+1 (t).Note that the derivative of the terms involving x i−1 (t) component in the above expression is 2(1 − 2ǫ)x i−1 (t) + 2ǫ, which clearly positive for ǫ ≤ 1/2.Moreover, for ǫ > 1/2 we can write the derivative of terms involving x i−1 (t) as 2[ǫ − (2ǫ − 1)x i−1 (t)] which is decreasing with x i−1 (t) and has minimum value of 2(1 − ǫ) > 0. Hence, dx i (t) dt is also non-decreasing with x i−1 (t).
Fixed Point: For fixed point x * we need to equate (8) to 0 and get Summing (41) for all i ≥ 1 we get x * 1 = λ.Moreover, summing (41) for all i ≥ j we obtain Global Stability: To prove global stability of the fixed point x * , we use the monotonicity of the mean-field process x shown in Lemma 11.Note that Lemma 11 guarantees that if x(0) ∈ S then x(t) ∈ S for all t ≥ 0. Hence, by adding (8) for all i ≥ k and using the fact that x(t) 1 < ∞ gives Specifically, for k = 1 we have Now from the monotonicity property (Lemma 11) of the mean-field process x we have for any where min(u, v) with u, v ∈ S is defined by taking the component-wise minimum.From (44) it is clear that to prove global stability it is enough to prove convergence x(t, x(0)) → x * holds for initial states satisfying either of the following two conditions: (i) x(0) ≥ x * and (ii) x(0) ≤ x * .To prove convergence holds for above two initial conditions we first show that for any solution x(•, x(0)) ∈ S, v k (t, x(0)) is uniformly bounded in t for all k ≥ 1.Consider the case when x(0) ≥ x * .From Lemma 11 it follows that for x(0) ≥ x * , we have x(t, x(0)) ≥ x * for all t ≥ 0. Therefore, we can write where the last equality follows from (10).Hence, from (43) we have dv 1 (t,x(0)) dt ≤ 0 from which it follows that 0 ≤ v 1 (t, x(0)) ≤ v 1 (x(0)) for all t ≥ 0. Since the sequence (v k (t, x(0))) k≥1 is non-increasing, we have 0 ≤ v k (t, x(0)) ≤ v 1 (x(0)) for all k ≥ 1 and for all t ≥ 0. This proves that v k (t, x(0)) is uniformly bounded in t for each k ≥ 1 if x(0) ≥ x * .Now consider the case x(0) ≤ x * .From Lemma 11 it follows that for x(0) ≤ x * , we have x(t, x(0)) ≤ x * for all t ≥ 0. Therefore, we have v 1 (t, x(0)) ≤ v 1 (x * ) for all t ≥ 0. This shows that the component v k (t, x(0)) is uniformly bounded in t for each k ≥ 1 for x(0) ≤ x * .
Since v k (t, x(0)) is uniformly bounded in t, the convergence x i (t, x(0)) → x * i for all i ≥ 1 will follow from for the case x(0) ≥ x * and from for the case x(0) ≤ x * .We now prove (45) to show convergence for the case x(0) ≥ x * ; the proof of other case follows similarly.We will use induction starting with i = 1.We can write (45) for i = 1 as where the second equality follows from (43) and the inequality follows as v 1 (t, x(0)) is uniformly bounded in t.Observe that the right hand side is bounded by a constant for all τ , the integral on the left hand side must converge as τ → ∞.This shows that x 1 (t, x(0)) → x * 1 as t → ∞.Now assume that (45) is true for all i ≤ L − 1.For i = L we can write (45) as where the first equality follows from (42) for k = L and from (10) for i = L.Moreover, the inequality follows as v L (t, x(0)) is uniformly bounded in t.Note that by the induction hypothesis, the last two integral on the right hand side of above expression converges as τ → ∞.Hence, the integral on the left hand side also must converge as required.
Limit Interchange: We know from Theorem 1.(ii) that for ǫ ∈ [0, 1], and λ < min(1, 1/2ǫ) we have Therefore, using the global stability result and the process convergence result of the Po2-ǫ scheme, the limit interchange follows using similar argument as proved for the Po2-(g, ǫ) scheme.

Heavy-Traffic Limit of the Po2-ǫ Scheme
In this section we prove Theorem 3.(iii), which computes the ratio of the average response time of jobs under the Po2-ǫ scheme with the logarithmic of average response time of jobs under the random scheme as λ → 1.We first write the recursion given in (10) as where a 1 = (1 − 2ǫ), a 2 = 2ǫ, d 1 = 2, and d 2 = 1.The result now follows from Theorem 4.5 of [37].

Conclusion and Future Directions
In this paper, we analyzed the effects of load comparison errors on the performance of the Po2 scheme.We considered two models of error.For the load-dependent error model, we showed that the Po2 scheme retains its benefits over the random scheme in the heavy traffic limit λ → 1 for all values of g and ǫ.For the load-independent error model, we have shown that the Po2 scheme retains its benefits over the random scheme only if the probability of error ǫ ≤ 1/2.We introduce a general framework using Lyapunov functions to prove stability of our schemes.We also use a new approach to establish the mean-field limit results as the fixed point does not admit a recursive solution.
There are many interesting directions for further research.We have analyzed the performance of the Po2-(g, ǫ) scheme assuming g to be constant independent of n.It will be interesting to see the effect of varying g as a function of n.Another direction is to study the effects of delay in receiving the queue length information at the dispatcher.A more explicit delay dependent error model can be considered.Here, the challenge will be to analyze the effect of the delay on the performance of the Po2 scheme.

Figure 1 :
Figure 1: Mean response time of jobs under the Po2-(g, ǫ) scheme as a function of arrival rate λ for ǫ = 0.4, g = 100.

Figure 2 :
Figure 2: Mean response time of jobs under the Po2-(g, ǫ) scheme and the random scheme as a function of arrival rate λ for ǫ = 0.8, g = 100.

2 − b 2 ≤
2(a − b) and ab − cd ≤ (a − c) + (b − d) for 1 ≥ a ≥ c ≥ 0 and 1 ≥ b ≥ d ≥ 0, the integrals in the second and the third terms on the RHS can be easily bounded by 2C L−1 and C L−1 + C L−1−g , respectively.