Near-Optimal Stochastic Bin-Packing in Large Service Systems with Time-Varying Item Sizes

In modern computing systems, jobs' resource requirements often vary over time. Accounting for this temporal variability during job scheduling is essential for meeting performance goals. However, theoretical understanding on how to schedule jobs with time-varying resource requirements is limited. Motivated by this gap, we propose a new setting of the stochastic bin-packing problem in service systems that allows for time-varying job resource requirements, also referred to as 'item sizes' in traditional bin-packing terms. In this setting, a job or 'item' must be dispatched to a server or 'bin' upon arrival. Its resource requirement may vary over time while in service, following a Markovian assumption. Once the job's service is complete, it departs from the system. Our goal is to minimize the expected number of active servers, or 'non-empty bins', in steady state. Under our problem formulation, we develop a job dispatch policy, named Join-Reqesting-Server (JRS). Broadly, JRS lets each server independently evaluate its current job configuration and decide whether to accept additional jobs, balancing the competing objectives of maximizing throughput and minimizing the risk of resource capacity overruns. The JRS dispatcher then utilizes these individual evaluations to decide which server to dispatch each arriving job to. The theoretical performance guarantee of JRS is in the asymptotic regime where the job arrival rate scales large linearly with respect to a scaling factor r. We show that JRS achieves an additive optimality gap of O(√r) in the objective value, where the optimal objective value is Θ(r). When specialized to constant job resource requirements, our result improves upon the state-of-the-art o(r) optimality gap. Our technical approach highlights a novel policy conversion framework that reduces the policy design problem into a single-server problem.


Background and Motivation
In modern computing systems, a job often takes the form of a virtual machine (VM) or a container [8,14].Such a job comes with a resource requirement, such as a certain number of CPUs and amount of memory, while in service.Each server in the system offers a limited amount of these resources.When a job arrives at the system, the job dispatch policy needs to decide which server the job should be assigned to, given the job's resource requirement and servers' current job configurations.This job scheduling problem can be approached as a Stochastic Bin-Packing (SBP) problem, where jobs are viewed as items, job resource requirements as item sizes, and servers as bins.A traditional SBP setting considers a finite set of jobs that arrive online but do not depart from the system.The objective is to minimize the number of servers that have jobs on them, or 'non-empty bins', subject to the resource capacities of the servers.SBP, with a rich history in operations research and theoretical computer science [10][11][12], is a field of continuous developments and advancements [1,16,18].
To incorporate job departures into the problem formulation, a setting referred to as stochastic bin-packing in service systems has been proposed recently [17,[32][33][34][35][36].In this setting, jobs not only arrive but also depart over time.More specifically, jobs are assumed to arrive according to Poisson processes, and each job is assumed to stay in the system for an exponentially distributed service time.The service time of a job remains unknown until the job departs.Before delving further into SBP in service systems, it is worth mentioning that there is a parallel thread of research on the so-called dynamic bin-packing problem that also handles job departures (see, e.g., [6,9,20], and references therein), but it is primarily from a worst-case analysis perspective.Additionally, the virtual machine scheduling problem with objectives different from minimizing the number of active servers has also been widely studied (see, e.g., [22-24, 26-29, 39]).
For SBP in service systems, the goal is to design a job dispatch policy  that minimizes the expected number of active servers in steady state, denoted as  ().The optimality gap of a policy  is defined as  () −  ( * ), where  * is the optimal policy.Since SBP in service systems aims to model today's large-scale computing systems, the optimality gap of a policy is usually studied in the regime where the total job arrival rate becomes large.As we scale up the total job arrival rate linearly with a scaling factor,  , the optimal value  ( * ) can be shown to be Θ( ). 1 Therefore, we say a policy is asymptotically optimal if its optimality gap is  ( ).
The optimality gap for SBP in service systems has been characterized in the line of work [32][33][34][35][36].In particular, Stolyar [32], Stolyar and Zhong [34] propose greedy policies that are asymptotically optimal, but the scheduler that executes these policies needs to know detailed state information, which is in a high-dimensional space.Stolyar and Zhong [35,36] later develop policies that use much less state information and achieve Θ( ) (with an arbitrarily small constant) and  ( ) optimality gaps, respectively.
While prior work on SBP in service systems has provided substantial insights into scheduling virtual-machine-type jobs, it primarily focuses on job resource requirements that remain fixed over time.However, in modern computing systems, jobs' resource requirements often vary over time [2,13,21,30,31,37].For example, when a job involves providing user-facing services, the instantaneous requirement on CPUs and memory depends on the service demand, which is subject to fluctuation over time [13,21].Time-varying job resource requirements pose significant challenges in optimizing system efficiency, particularly when aiming to minimize the number of active servers, thereby improving server utilization.It is pertinent to note that low utilization has been recognized as a significant obstacle to the continued scaling of today's computing systems.
Motivated by this gap, in this paper, we propose a new setting of stochastic bin-packing in service systems that allows job resource requirements, or 'item sizes', to vary over time.

Problem Formulation: A Simplified Version
We first describe our job model that features time-varying resource requirements.For ease of exposition, here we present a simplified setting where each job in service can be in one of the two phases,  and  , associated with low and high resource requirements, respectively.Our full model, presented in Section 2, allows more than two phases and more than one type of resources.To model the temporal variation in the resource requirement, we assume that each job transitions between the two phases while in service until it is completed, following a continuous-time Markov chain illustrated in Figure 1(a).We use an absorbing state ⊥ to denote that the job is completed.A job can initialize in either phase  or phase  , and they are referred to as type  and type  jobs, (a) A simplified version of our job model.Each job in service is in either an  phase or an  phase, associated with low and high resource requirements, respectively.When the job is completed, it is said to be in the state ⊥.The job transitions between the two phases while in service until it is completed, following a continuous-time Markov chain with rates   ′ , ,  ′ ∈ {, , ⊥}.
(b) A system model with an infinite number of identical servers.As soon as a job arrives to the system, the job needs to be dispatched to a server to start service immediately.The configuration of each server is the number of jobs in each phase on the server.respectively.Note that the setting where a job's resource requirement does not vary over time is a special case of our job model where the transition rates between phases are 0. We consider a system with an infinite number of identical servers, illustrated in Figure 1(b).We assume jobs arrive according to a Poisson process as existing work on SBP in service systems.In particular, we assume that the two types of jobs arrive at the system following two independent Poisson processes, with rates Λ  and Λ  , respectively; i.e., the interarrival times of type  and type  jobs are i.i.d.following exponential distributions with means 1/Λ  and 1/Λ  , respectively.Upon arrival, a job needs to be dispatched to a server according to a dispatch policy, and the job enters service immediately.The goal is to design a policy  to minimize the expected number of active servers (servers currently serving a positive number of jobs) in steady state, denoted as  ().
As job resource requirements vary over time, situations can arise where the total job resource requirement on a server exceeds the server's resource capacity, resulting in resource contention.Modern computing systems can tolerate temporary overruns of resource capacity, though they often incur performance degradation or other costs [7,15].In our model, we incorporate a rate at which the cost accumulates due to resource contention.We first represent the state of a server by its configuration, a vector  = (  ,   ) where   and   are the numbers of jobs in phase  and phase  , respectively.Then a cost rate function ℎ(•) maps a server's configuration to a rate of cost.For example, the cost rate can be proportional to how much the total resource requirement of the jobs on the server exceeds this server's resource capacity.A more general definition of ℎ(•) is given in Section 2. We assume that the resource contention does not affect the transition rates in the job model nor prompt jobs to be terminated, suitable for the application scenarios where the contention level is low and manageable.Let  () denote the average expected cost rate per server.Now our bin-packing problem can be formulated as follows: minimize   () where  > 0 is a budget for the cost rate of resource contention.We are interested in solving this problem in the asymptotic regime where the arrival rates (Λ  , Λ  ) scale to infinity [33][34][35][36], motivated by the ever-increasing computing demand that drives today's computing systems to be large-scale.Specifically, we assume (Λ  , Λ  ) = (  ,    ) for some fixed coefficients   and   and a scaling factor  , and we study the asymptotic regime where  increases.

Main Result
Our main result is an asymptotically optimal policy, named Join-Reqesting-Server (JRS), for this new setting of SBP in service systems with time-varying job resource requirements.The asymptotic optimality is in the sense that under our proposed policy JRS, the expected number of active servers is at most 1 +   −0.5 times the optimal objective value of the optimization problem in (1), while the cost rate incurred is at most 1 +   −0.5 •  (i.e., exceeding the budget by at most a diminishing fraction).This asymptotic optimality result translates into an additive optimality gap of  ( √  ) in the objective value (expected number of active servers), since the optimal objective value can be shown to be Θ( ).This main result is formally presented in Theorem 1.
Our model can be specialized to the traditional setting of SBP in service systems where jobs' resource requirements remain fixed over time.For this specialization, we replace the constraint  () ≤  in the problem formulation (1) with a capacity constraint, which requires the total resource requirement by jobs on a server to be within the server's resource capacity.Our proposed policy JRS can then be adapted into one that has an  ( √  ) optimality gap in the objective value, which improves upon the state-of-the-art  ( ) optimality gap.A discussion on the implementation complexity of JRS and how it compares with existing policies for the traditional setting of SBP in service systems is provided in Section 4.3.To be clear, this setting is not a strict special case of the formulation in (1) because  > 0 is required there, but our approach and proof carry over.
From a technical approach perspective, our contribution is a novel approach that decomposes the policy design into two steps: defining a single-server sub-problem, and then converting the solution of the sub-problem into a policy in the original problem.This decomposition not only reduces the complexity of policy design but also makes the analysis tractable.We provide an overview of this approach in Section 1.4.

Approach Overview
To motivate our approach, we ask two questions: How should we design a good dispatch policy for this system?How can we prove that a dispatch policy is asymptotically optimal?Before presenting our answers to these two questions, we first comment on why they are challenging to answer.On the one hand, solving this problem directly via dynamic programming is intractable due to the unbounded state space resulting from the infinitely many servers.Even if we restrict ourselves to the servers that are active, the state space is still prohibitively large.On the other hand, we can consider designing a heuristic policy.However, unlike traditional SBP problems where we can simply seek to pack the servers as compactly as possible, here the question of how many jobs should be put on a server is complicated.The complexity comes from the time-varying resource requirements of jobs, which affect future resource contention.
A policy-conversion framework.We answer the two questions at the same time with a novel policy-conversion framework.The framework has two steps: (1) Define the single-server problem, which is a easy-to-solve low-dimensional subproblem; (2) Convert the optimal policy of the single-server problem into a policy in the original problem.This framework allows us to break down the complicated policy design problem into two components.In defining the single-server problem in Step 1, our goal is to quantify the throughput of each server under the resource contention constraint; in the policy conversion in Step 2, our goal is Fig. 2. A single-server system with an infinite supply of jobs.A single-server policy decides when to request jobs and how many jobs of each type to request.
to dispatch jobs optimally based on each single server's characteristics.As we will show, a careful construction of the single-server problem and the conversion procedure naturally leads to a policy for the original system and a proof of its asymptotic optimality.Below we give a quick overview of how we carry out these two steps and the motivation for the design choices.
Single-server problem.To define the single-server problem, consider the following setting: suppose that our goal is to maximize the throughput of one specific server while keeping its expected cost rate of resource contention below , then how should we send jobs to this server?Observe that even though we want to send jobs to the server as frequently as possible, the frequency is fundamentally limited by how fast the server is able to serve jobs and how many jobs can be packed on the server.This motivates us to consider the single-server system illustrated in Figure 2. The system has one server and an infinite supply of jobs of all types, so the server can start the service of any number of new jobs of any type at any time.We say the server requests a job from the infinite supply whenever it starts serving a new job.We assume the same job model and cost model as in the original infinite-server system.The single-server problem aims to find a job-requesting policy that maximizes the throughput (the number of each type of job that can be served) along the direction of the arrival rate vector (Λ  , Λ  ) = (  ,    ), while maintaining the steady-state expected cost rate of resource contention below .
How is the single-server problem related to the original problem?Let  * be the number such that the total throughput of  * single-server systems under the optimal job-requesting policy is equal to (  ,    ) (assuming  * is an integer for simplicity).Consider the following policy in the original system: let each of the first  * servers in the original system adopt the optimal job-requesting policy and send requests to the dispatcher based on its current configuration.If the requested jobs were to arrive as soon as the dispatcher received the requests, the dispatcher would be able to fulfill the requests immediately.In this case, the first  * servers in the original system would have the same dynamics as  * i.i.d.single-server systems, achieving the largest possible throughput and satisfying the constraint on resources contention.So the original system would have achieved the optimal number of active servers.
However, in the actual model, the dispatcher cannot immediately fulfill a job request because jobs arrive stochastically over time.Nevertheless, the dispatcher can still find a suitable way to match each job arrival with the requests.To see this, note that the time points when the dispatcher receives type  requests result from the superposition of  * = Θ( ) independent point processes, each with the average rate    / * .As  → ∞, the instantaneous rates of requesting type  jobs concentrate around the arrival rate    for each .As a result, most job requests can be fulfilled within a diminishing delay, so most servers in the original system can closely track the optimal single-server dynamics.
A meta-policy, Join-Requesting-Server (JRS), and its asymptotic optimality.Based on the singleserver problem and the idea of tracking the optimal single-server dynamics, we propose a metapolicy, Join-Reqesting-Server (JRS), which converts a single-server policy  to a dispatch policy in the original infinite-server system.We say that JRS takes  as a subroutine.The full definition of JRS is given in Section 4, along with discussions on various practical considerations in its implementation.
We show that the asymptotic performance of JRS is related to its subroutine in the sense described in Theorem 3, which we refer to as the conversion theorem.In particular, JRS with the optimal single-server policy (which we refer to as Single-OPT) as the subroutine is asymptotically optimal for the original infinite-server problem as  → ∞.
In order to track the optimal single-server dynamics, JRS uses a more sophisticated mechanism to control the long-term consequences of missing job requests or fulfilling requests with delays.The mechanism involves the auxiliary variables of tokens and virtual jobs, which regularize the process of generating requests and matching arrivals with requests.These auxiliary variables play a crucial role in the proof of Theorem 3 in Section 5, where a novel Stein's method argument is carried out.We discuss the role of tokens and virtual jobs and their necessity at the end of Section 4 and in Section 5.5.
Finally, we comment that our policy conversion framework can be applicable to other systems with similar structures.Specifically, we can try to define a suitable single-server problem, solve for its optimal policy, and convert the optimal single-server policy to the original problem.A similar conversion theorem should hold as long as the servers are weakly coupled in some sense.See Section 6 for a discussion of such systems.
Relation to the mean-field approach.We remark that the mean-field approach often studies the empirical distribution of configurations on all servers [17,[32][33][34][35][36], which can be viewed as a probability distribution of a single server's configuration.However, the mean-field approach is typically used to analyze this empirical distribution under a given policy for the original system.In contrast, our approach solves the single-server problem to design a single-server policy, and then converts it to a policy in the original system with performance guarantee.

Paper organization
In Section 2, we present the general problem formulation, which generalizes the simplified version in this section.In Section 3, we give a more detailed overview of our main result and approach, with a short proof of Theorem 1 (main result) based on Theorems 2-4 at the end of the section.Section 4 provides a detailed description of our meta-policy, Join-Reqesting-Server (JRS), along with discussions on practical considerations in its implementation.In Section 5, we prove the performance guarantee of JRS (Theorem 3) under an irreducibility assumption.The proof for the general case and other proofs are deferred to the appendices.We conclude the paper and discuss some future directions in Section 6.

PROBLEM FORMULATION
Job Model.As described in Section 1, we consider a job model where each job in service can be in one of multiple phases, each phase associated with a different resource requirement.Here the resource requirement can be a multi-dimensional vector, with each coordinate specifying the requirement of one type of resource.To model the temporal variation in the resource requirement, we assume that each job transitions between phases while in service until it is completed.The phase transition process is described by a continuous-time Markov chain on the state space I ∪ {⊥}, where I is the set of phases and ⊥ is the absorbing state that denotes the completion of the job.We call a transition between two phases in I an internal transition, and let   ′ denote the transition rate from phase  to phase  ′ ; the departure of a job then corresponds to a transition from a phase  ∈ I to ⊥, whose transition rate is denoted as  ⊥ .The phase transitions of different jobs are assumed to be independent of each other.
We classify a job as a type  job if it starts from phase  ∈ I when entering service.Jobs of each type  arrive to the system according to an independent Poisson process with rate Λ  .Server Model.We consider an infinite-server system with identical servers.As soon as a job arrives to the system, the job needs to be dispatched to a server to start service immediately.Note that this is always feasible because there are an infinite number of servers in the system.We assume that jobs cannot be preempted or migrated.To describe the state of a server, we define the configuration of a server as an |I|-dimensional vector  = (  )  ∈ I , whose -th entry   is the number of jobs in phase  on the server.Each server has a limit on the total number of jobs in service at the same time.This limit is denoted as  max and referred to as the service limit.Then the set of feasible server configurations is System Dynamics.The system state can be represented by the concatenation of the configurations of all servers.Specifically, we index the servers by positive integer numbers and denote the configuration of server ℓ at time  as  ℓ ().Then the state of the entire system can be represented by the infinite vector ( ℓ ()) ℓ ∈Z + .Suppose that the system is in state ( ℓ ) ℓ ∈Z + .Let   be an |I|-dimensional vector whose -th entry is 1 and all other entries are 0. Then the following state transitions can happen: ∀ℓ ′ ≠ ℓ: a job on server ℓ transits from phase  to phase  ′ ; •  ℓ →  ℓ −   ,  ℓ ′ →  ℓ ′ ∀ℓ ′ ≠ ℓ: a job on server ℓ departs the system from phase .
The specifics of the system dynamics depend on the employed dispatch policy that decides which server to dispatch to when a job arrives.
Active Servers.We are interested in the number of active servers, i.e., servers currently serving a positive number of jobs.Note that given the arrival rates of jobs, the smaller the number of active servers, the better the system is utilized.Let   () be the number of servers in configuration  at time , i.e.,   () = ∞ ℓ=1 1 { ℓ ( )= } .Then the number of active servers can be written as ≠0   (), where 0 ∈ R | I | is the zero vector.
Cost of Resource Contention.Recall that the cost rate function ℎ(•) maps a server's configuration to a rate of cost.We assume that ℎ(•) is any function that is Γ-Lipschitz continuous with respect to the  1 distance for some constant Γ > 0 and satisfies ℎ(0) = 0.
Performance Goal.Our high-level goal is to design dispatch policies that minimize the number of active users while keeping the cost rate of resource contention within a certain budget.Specifically, we consider policies that are allowed to be randomized and non-Markovian (i.e., the policies can make history-dependent decisions).We further focus on policies that induce a unique stationary distribution on the configuration process {( ℓ ()) ℓ ∈Z + }, assuming that the configuration process is embedded in a Markov chain that has a unique stationary distribution.We are interested in such policies because the resulting time averages of quantities related to the configurations are equal to the corresponding expectations under the unique stationary distribution regardless of the initial state.Let  be a policy of interest, ( ℓ ) ℓ ∈Z + be a random element that follows the stationary distribution of the system state induced by , and   be the corresponding number of servers in configuration  in steady state under .Then the expected number of active servers is given by We define the expected cost rate per expected active server as . Now our goal can be formulated as the following optimization problem, referred to as problem P ((Λ  )  ∈ I , ): where  is a budget for the cost rate of resource contention.
Asymptotic Optimality.We focus on the asymptotic regime where for all  ∈ I, the arrival rate is given by Λ  =    for some constant coefficient   and a positive scaling factor  → +∞.To define asymptotic optimality, we first define the following notion of approximation to the optimization problem P ((Λ  )  ∈ I , ) in (2): a policy  is said to be (, )-optimal if  () ≤  •  * ((Λ  )  ∈ I , ) and  () ≤  • , where  * ((Λ  )  ∈ I , ) is the optimal objective value in (2).Now consider a family of policies  ( ) indexed by the scaling factor  .We say that the policy  ( ) is asymptotically optimal if it is  ( ) ,  ( ) -optimal to the optimization problem P ((   )  ∈ I , ) with  ( ) ,  ( ) → 1 as  → ∞.We will suppress the superscript ( ) for simplicity when it is clear from the context.
We note that under any policy ,  () = Θ( ).This can be proven using the renowned Little's Law [19] in the following way.The total job arrival rate is Θ( ) and the expected time that a job spends in the system is  (1).So by Little's Law, the expected number of jobs in the system in steady state is Θ( ).Since each server can accommodate a constant number of jobs, the expected number of active servers is Θ( ).Given this,  ( ) ,  ( ) -optimality implies an optimality gap of  ( ) •  in the objective value.

Main Result
Our main result, Theorem 1, is the asymptotic optimality of our proposed policy Join-Reqesting-Server (JRS), with a subroutine we call Single-OPT, as briefly discussed in Section 1.This asymptotic optimality result implies an  ( √  ) optimality gap in the expected number of active servers.We defer the detailed descriptions of JRS and Single-OPT to Section 4 and Appendix C. Theorem 1 follows immediately from Theorems 2-4 to be introduced in Section 3.2; a short proof is included at the end of Section 3.2 for clarity.
Theorem 1 (Asymptotic Optimality).Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements.Let the arrival rates be (   )  ∈ I and the cost rate budget be  > 0. Then the policy Join-Requesting-Server (JRS) with the subroutine Single-OPT is 1 +   −0.5 , 1 +   −0.5 -optimal.That is, the expected number of active servers under JRS with Single-OPT is at most 1 +   −0.5 times the optimal value of the problem P ((   )  ∈ I , ), while the cost rate incurred is at most 1 +   −0.5 • .
Specialization to Non-Time-Varying Resource Requirements.As mentioned in Section 1, we can specialize this result to the setting where the resource requirement of a job does not vary over time.To do that, we remove the cost constraint in P ((   )  ∈ I , ), and redefine the set of feasible server configurations, K, to also incorporate hard capacity constraints for each type of resources.The rest of the analysis is almost identical to that of the analysis for time-varying resource requirements; we omit the details due to the space limit.This specialization results in a policy that is 1 +   −0.5 -optimal in the expected number of active servers.

Our Approach
In a nutshell, our approach is to reduce the original optimization problem in an infinite-server system to an optimization problem in a single-server system, which is defined below.
A Single-Server System.Consider a single-server system serving jobs with time-varying resource requirements.The system has an infinite supply of jobs of all types.As a result, the server can request any number of new jobs of any type at any time.Once a job is requested, it immediately enters service.
We represent the server configuration at time  using a vector  () = (  ())  ∈ I , whose -th entry denotes the number of jobs in phase .We assume that the single-server system has the same service limit  max and cost rate function ℎ(•) as a server in the original infinite-server system.Therefore, the server configuration  () is also in the set K = { :  ∈ I   ≤  max }, and the cost rate at time  is ℎ( ()).
A single-server policy  determines when and how many jobs of each type to request.We allow the single-server policy to be randomized and assume it is Markovian, i.e., it makes decisions only based on the current configuration.Note that allowing non-Markovian policies will not change the optimal value of the single-server problem that we will consider (see Appendix C).Let  ≜ ( ())  ∈ K be a stationary distribution of the server configuration under the policy , and let  (∞) be a random variable with the distribution .When we consider a policy  and its stationary distribution , we assume that the system is initialized from .The policy  together with  defines the request rate of type  jobs   , which is the expected number of type  jobs requested per unit time in steady state.Note that   is the throughput of type  jobs since the system has a finite state space.
We consider the following single-server problem, denoted as P ((   )  ∈ I , ): The single-server problem can be interpreted as follows.We can think of  as the number of copies of the single-server system under  needed to support the arrival rates (   )  ∈ I in the infinite-server system.To minimize  , it is equivalent to maximizing the throughput (  )  ∈ I in each single-server system, while maintaining their proportions as (   )  ∈ I .We remark that for the problem P ((   )  ∈ I , ), we only need to consider policies that do not depend on the scaling factor  .To see this, we can replace the decision variable  with  ≜  / and the optimization problem can be equivalently formulated as follows, which does not involve  : minimize Lower Bound.The single-server problem gives a lower bound to the original problem in (2) as stated in the following theorem.The proof is given in Appendix A.
Theorem 2 (Lower Bound).Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements.Let the arrival rates be (   )  ∈ I and the cost rate budget be  > 0. Let  * be the optimal value of the original infinite-server problem in (2), and let  * be the optimal value of the single-server problem P ((   )  ∈ I , ), then  * ≥  * .
Converting From the Single-Server System to the Infinite-Server System.Having established a lower bound on the infinite-server problem P ((   )  ∈ I , ) in terms of the optimal value of the single-server problem P ((   )  ∈ I , ), next we focus on finding an asymptotically optimal policy.We will characterize the performance guarantee of a class of policies and then show that the best policy within the class is asymptotically optimal.Specifically, we consider a meta-policy called Join-Reqesting-Server (JRS), which converts a Markovian single-server policy  into an infiniteserver policy.We call the policy resulting from the conversion a JRS policy with a subroutine .Through analyzing the meta-policy JRS, we show that the performance of each JRS policy can be characterized by the performance of its subroutine, as stated in Theorem 3 below.The proof of Theorem 3 under an irreducibility assumption is given in Section 5, and the proof for the full version is given in Appendix B.3.
Theorem 3 (Conversion Theorem).Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements.Let the arrival rates be (   )  ∈ I and the cost rate budget be  > 0. Let ( , ,  ()) be a solution feasible to the single-server problem P ((   )  ∈ I , ).In addition, we assume that the policy  is Markovian.Let the infinite-server policy  be JRS with a subroutine .Then under , we have As a result, Optimal Single-Server Policy.Theorem 3 together with the lower bound in Theorem 2 reduces the infinite-server problem P ((   )  ∈ I , ) in (2) to the single-server problem P ((   )  ∈ I , ) in (3).We can obtain the optimal single-server policy, Single-OPT, by solving a linear program, as stated in the theorem below.
Theorem 4 (Optimality of Single-OPT, Informal).There exists a linear program LP ((  )  ∈ I , ) that is equivalent to the single-server problem P ((   )  ∈ I , ).In particular, we can construct an optimal Markovian policy for P ((   )  ∈ I , ) from the optimal solution of LP ((  )  ∈ I , ).
Proof of Theorem 4 and details on the construction of the optimal policy are given in Appendix C.
Proof of the Theorem 1 Based on Theorems 2-4.Because Single-OPT (along with an optimal stationary distribution) optimally solves P ((   )  ∈ I , ), it achieves the optimal value  * .Let  be JRS with a subroutine Single-OPT, then according to Theorem 3, we have  () ≤ 1 +   −0.5 •

PROPOSED META-POLICY: JOIN-REQUESTING-SERVER (JRS)
In this section, we describe our meta-policy, Join-Reqesting-Server (JRS), in full detail.For ease of presentation, we focus on the case where the subroutine policy  for JRS is  0 -irreducible, i.e., under , there exists a configuration  0 such that the single-server system can return to  0 from any other configurations (which is equivalent to assuming that the configuration of the single-server system under policy  forms a unichain).The algorithm for the general case is given in Appendix B.3.

How the Single-Server Policy Requests Jobs
Before going into the definition of JRS, we first take a closer look at how the Markovian single-server policy  requests jobs, to avoid potential ambiguity caused by the fact that a single-server policy can request jobs at any time.Let   denote the number of type  jobs requested, and let  ≜ (  )  ∈ I .We say  is feasible if the total number of jobs on the server does not exceed  max after adding the jobs.The policy  performs one of the following two types of requests based on the current configuration.
• Reactive requests.A reactive request is triggered by either an internal transition or a departure.
The changes in the configuration when a reactive request is made can be represented by the diagram  →  ′ →  ′ + , where  →  ′ is due to the internal transition or departure, and  ′ →  ′ +  happens since the policy immediately requests  jobs.The policy  specifies a probability distribution over all feasible  when it decides to perform reactive requests for the configuration  ′ .• Proactive requests.A proactive request happens on its own, and it happens at a finite rate depending on the current configuration of the server.The change of the configuration when a proactive request happens can be represented by the diagram More specifically, suppose the policy  decides to perform proactive requests for a configuration .
Then for each feasible , the policy  specifies a rate and runs a timer with an exponentially distributed duration with the specified rate.When the timer ticks,  jobs are requested.When the configuration changes, all the timers are canceled and restarted with new rates based on the new configuration.

Description of Join-Requesting-Server (JRS)
The inputs of JRS include: (i) the single-server policy , (ii) the objective value of  in the singleserver problem (3), denoted as  , and (iii) the transition rates in the job model.We first divide the infinite server pool into two sets based on the server index ℓ.Let  = ⌈ ⌉.We call servers with index ℓ ≤  normal servers; we call servers with index ℓ >  backup servers.The normal servers are responsible for serving most of the jobs, while the backup servers are activated only to handle overflow jobs (jobs that are not dispatched to normal servers).
The JRS is specified in two steps.• Step 1 (Job Requesting on a Normal Server): We let each normal server request jobs using its subroutine, the single-server policy .The input to the policy  is what we refer to as the observed configuration of the server, which will be further explained below.When  requests  = (  )  ∈ I jobs,   type  tokens are generated for each  ∈ I to store the job requests.The server pauses the job requesting process if it already has any type of tokens, and resumes when all the tokens that it generated are removed.
• Step 2 (Arrival Dispatching): -Real jobs.When a type  job arrives, the dispatcher chooses a type  token uniformly at random, removes the token, and assigns the job to the corresponding server.When there are no type  tokens, the dispatcher sends the job to an idle backup server.-Virtual jobs.When the total number of type  tokens throughout the system exceeds the limit  max = ⌈ √ ⌉ (called the token limit), a type  virtual arrival is triggered, which causes the dispatcher to choose a type  token uniformly at random, remove the token, and assign a virtual job to the corresponding server.A virtual job has the same transition dynamics as a real job but does not consume physical resources.The observed configuration of a normal server in Step 1 is the configuration resulting from real jobs and virtual jobs combined.That is, it is a vector whose -th entry represents the total number of real and virtual jobs in phase  on this server.The observed configuration changes when there is a new real or virtual job arrival assigned to the server, or when a real or virtual job on the server has a phase transition or departs.We update the input to the policy  when the observed configuration changes.Whenever the observed configuration changes, the policy  cancels the exponential timers in progress; but a reactive request from the policy  can only be triggered when a real or virtual job on the server has a phase transition or departs.
Intuition behind JRS.To provide a better understanding of the main design ideas of JRS, here we give an intuitive description of how it works.Broadly, servers generate job requests and store unfulfilled requests as tokens; the dispatcher assigns jobs to servers according to the tokens to fulfill job requests.This is the mechanism for matching job arrivals with requests, which is referenced at the end of Section 1.4.However, rather than matching all tokens with job arrivals, JRS opts to convert some of the tokens into virtual jobs to keep the total number of tokens within an upper limit  max .By capping the number of tokens, JRS ensures that the job requests generated by each server get fulfilled quickly (either by a real job or a virtual job), and thus the observed configurations of servers maintain proximity to i.i.d.copies of the single-server systems.
The choice of the token limit  max = Θ( √  ) balances two key considerations.On the one hand, a smaller  max brings the observed configurations closer to i.i.d.copies of single-server systems.On the other hand, if  max is overly small, the rate of generating virtual jobs becomes high and the probability for a job arrival to see no tokens is also high.As a result, the observed configurations, which include both real and virtual jobs, deviate from the real-job configurations.A more in-depth discussion on the role of tokens and virtual jobs and whether they are fundamental is in Section 5.5.

Practical considerations in implementing Join-Requesting-Server (JRS)
Computational complexity of JRS.The computational complexity of JRS consists of two components: the offline component that computes a single-server policy  and its objective value  , and the online component that carries out the two steps of JRS.
The offline component reduces to solving the linear program given in (61) in Appendix C.1, whose number of optimization variables is linear in the number of feasible configurations times the number of phases, i.e., |K | × |I|, on a single server.Admittedly, |K | can be large when a single server has a large quantity of resources and there are many job phases.However, we opt for the view that a single server is not excessively large and the system's scale is primarily captured by the scaling factor  .Therefore, it is advantageous that the computational complexity of this offline component is independent of  .
In the online component, the bulk of the computation is in job requesting and virtual job simulation, which can be executed distributedly on the normal servers.Specifically, each normal server monitors its observed configuration and generates tokens according to the single-server policy ; additionally, when a virtual job is assigned to the server, the server simulates the dynamics of the virtual job, i.e., generates random variables corresponding to phase transitions and job departure.Backup servers do not need to perform any computation beyond serving jobs.
The scheduler, which stores all the tokens, has two responsibilities in the online component: (i) the scheduler matches each newly arrived job to a token of the same type, chosen uniformly at random, or sends the job to a backup server when there are no tokens of the same type; (ii) the scheduler monitors the number of tokens of each type and assigns virtual jobs when the number of tokens exceeds the limit  max .
It is informative to compare the computational complexity of JRS with existing algorithms designed for the traditional setting of stochastic bin-packing in service systems, where the resource requirements are non-time-varying [17,[32][33][34][35][36].At a high level, these existing algorithms function as follows: upon the arrival of a job, the scheduler checks the current configurations of all servers and assigns the job to a server whose configuration optimizes certain predefined criteria.Among these, the GRAND algorithm [33,35,36] stands out for its simplicity and asymptotic optimality.Under GRAND, the scheduler only needs to identify configurations that can accommodate the incoming job and then randomly assigns the job to one of these feasible servers, along with some idle servers.Compared to JRS, GRAND does not have an offline planning component, and individual servers do not perform computation beyond serving jobs.The scheduler's role in GRAND is slightly more complex than in JRS.Consequently, when considering using JRS in settings where job resource requirements are non-time-varying, practitioners should weigh whether the additional computational complexity is warranted.

Model parameter estimation.
A limitation of JRS is its dependency on known model parameters, including job arrival rates and phrase transition rates.Such dependency is not present in existing algorithms designed for the setting with non-time-varying resource requirements.In real-world applications, the model parameters can be estimated from workload traces such as [37,38].Estimation errors can impact the system's performance, an issue that merits further in-depth investigation in future work.Here, we provide a preliminary result on the performance degradation due to parameter estimation errors.Roughly speaking, suppose that the estimation error in the job arrival rate coefficients   's and the phase transition rates   ′ 's and  ⊥ 's are bounded by  ≥ 0 (along with an insensitivity assumption on the single-server problem).Then if we use JRS where the single-server policy is obtained by solving for the optimal single-server policy under the estimated parameters, the resulting JRS is 1 +  +   −0.5 , 1 +  +   −0.5 -optimal for any  ≤  max , where  and  max are positive constants independent of  .The exact statement is given in Proposition 1 in Appendix D, along with a proof.
Connection to practical algorithms.Recent progress has been made in addressing the issue of low utilization due to time-varying job resource requirements, notably within Google's datacenters, as discussed in [2].The approach in [2] makes predictions on the future resource requirements of jobs, which lead to a further prediction on the future peak resource requirement on a server if a newly arrived job were to be sent to that server (assuming no future job arrivals).This prediction categorizes each server as either feasible or infeasible for the new job, and this binary outcome is subsequently used by a separate scheduler for job assignment.
Our proposed JRS policy can be viewed as giving more detailed predictions on whether it is suitable for a server to take on new jobs, represented by the tokens.The predictions are optimized by taking into account future job arrivals and the stochastic dynamics of jobs.

PROOF OF THEOREM 3 (CONVERSION THEOREM) ASSUMING IRREDUCIBILITY
In this section, we prove Theorem 3 to establish the performance guarantee of JRS.For ease of presentation, we focus on the case where the subroutine policy  is  0 -irreducible.The proof for the general case is in Appendix B. 3.This section is organized as follows.We first provide some preliminaries in Section 5.1.Then we outline the steps and lemmas needed for the proof in Section 5.2.In Section 5.3, we prove Theorem 3 based on the lemmas.In Section 5.4, we prove one of the lemmas, Lemma 2, where we devise a novel approach to employ Stein's method.Finally, in Section 5.5, we discuss the role of tokens and virtual jobs and their necessity from a proof perspective.The proofs of the rest of the lemmas presented in this section are given in Appendix B.
To rigorously discuss the approximation of the steady-state random variables, we define some metrics.Recall that K ≜ { :  ∈ I   ≤  max } is the set of feasible single-server configurations.Let K  ≜ { 1: :  ℓ ∈ K, ∀ℓ } be the set of feasible configurations for all normal servers.We use ∥•∥ to denote the  1 norm in both space K and space K  : For any two random variables   ,   ∈ K  , their closeness will be measured in terms of Wasserstein distance as follows: where the supremum is taken over the all Lipschitz-1 functions from K  to R.

Steps and Lemmas Needed for the Proof of Theorem 3 Assuming Irreducibility
Our goal is to show that the steady-state distribution of the normal servers' real-job configurations  1: is close to the steady-state distribution of i.i.d.copies of the single-server systems  1: in Wasserstein distance, and that the backup servers are almost empty as the arrival rate gets large.More formally, we aim to show that  ( 1: , as  → ∞.These two bounds provide the performance guarantee claimed in Theorem 3. Instead of directly looking into the distribution of real-job configuration  1: , we show that the distribution of each of the three sums,  1: +  1: +  1: ,  1: +  1: , and  1: , can be approximated by the distribution of  1: in Wasserstein distance.The approximation result for each sum helps us derive the approximation result for the sum with one fewer term.The result that the backup servers are almost empty also follows from these approximations.This sequence of approximations is illustrated in the figure below, where recall that  ℓ () ≜  ℓ () +  ℓ () is the observed configuration.
A crucial observation that leads to this stepwise proof is that the process (  1: (),  1: ()) forms a Markov chain on its own.This is because real jobs and virtual jobs have the same transition dynamics and are indistinguishable by the subroutine when requesting jobs.Moreover, by the construction of JRS, the Markov chain (  1: (),  1: ()) governs the dynamics of the virtual-job configurations  1: () and the configurations on backup servers.
Lemma 2. Under the conditions of Theorem 3 and  being  0 -irreducible, we have The key challenge for proving Lemma 2 is that the job dispatching decisions are based on the configurations of all normal servers, which creates dependencies among the transitions of different servers.The key idea that helps us overcome this challenge is to consider the sum  1: + 1: , which remains unchanged under job arrivals regardless of dispatching decisions.Observe that  1: +  1: has decoupled dynamics across servers because it is only changed by internal phase transitions, departures, and requests of new jobs, which happen independently on each server.This helps us prove  (  1: +  1: ,  1: ) =  √  , which implies Lemma 2, as argued earlier in the section.Formally, the proof of Lemma 2 makes use of Stein's method (see, e.g., [3][4][5]) to compare  1: + 1: with  1: .Stein's method usually consists of three steps: generator comparison, Stein factor bound, and moment bound.In our case, due to the finiteness of the state space K, we only need to do the generator comparison and the Stein factor bound.In the generator comparison step, we show that the instantaneous transition rates of  1: +  1: match with those of  1: ; in the Stein factor bound step, we show that small difference in the transition rates of  1: +  1: and  1: does not cause much increase in the overall distance of the distributions.The detailed proof is in Section 5.4.
Step 2. We establish Lemma 3 and Lemma 4 below, which bound the steady-state expected number of virtual jobs and the jobs on backup servers.
Lemma 3.Under the conditions of Theorem 3 and  being  0 -irreducible, for each  ∈ I, the steady-state expected number of virtual jobs of type  is of the order  ( √  ), i.e., Lemma 4.Under the conditions of Theorem 3 and  being  0 -irreducible, for each  ∈ I, the steady-state expected number of type  jobs on backup servers is of the order  ( √  ), i.e., The key idea for proving Lemma 3 and Lemma 4 is that by the characterization of  1: in Lemma 2 and the fact that the job requests are made based on  1: , we can show that the rate of requesting jobs is approximately equal to the arrival rate for each job type.Therefore, the number of tokens rarely reaches 0 or  max .This implies the rarity of virtual jobs and jobs on backup servers.The proofs are provided in Appendix B.2.

Proof of Theorem 3 Assuming Irreducibility Based on Lemmas 1-4.
Proof.First we show that Lemmas 2 and 3 imply the closeness between  1: and  1: .By Lemma 2, for any  ∈ Lip(1), we have Now we prove ( 5), the bound on the expected number of the active servers, by taking a suitable  in (9).Observe that where the last term on RHS is  √  by Lemma 4. To show that the difference between the first two terms on the RHS of (10) are also  √  , consider  1 ( for any  1: ,  ′1: ∈ K  , we have  1 ∈ Lip (1).By (9), Recall that  =  , so we get (5).Similarly, to prove (6), we observe that The last term of ( 11) can be bounded as by Lemma 4 and the fact that K is a finite set.To show that the difference between the first two terms on the RHS of ( 11) is also for any  1: ,  ′1: ∈ K  , we have  2 ∈ Lip (1).By (9) Recall that  =  , so we get (6).
To show ( 7) and ( 8), noting that  = Θ ( ), we have where in the last inequality we have used the fact that  > 0. This completes the proof.□

More Details on the System and Proof of Lemma 2
To bound the distance between  1: and where   1: +  1: is a valid expression because  1: +  1: ∈ K as discussed in Remark 1 below.

5.4.1
More Details on System Dynamics and Generator.To prepare for the proof, we first look into the dynamics of the two systems under study.In particular, we write out the generators of  1: () and (  1: (),  1: ()), which are used in the Stein's method arguments.
We first examine the dynamics of the single-server system under Markovian policy .Four types of events change a single-server system's configuration: internal transitions, departures, reactive requests, and proactive requests (see Section 4.1).The change of configuration due to any event can be represented by the diagram  →  ′ →  ′ + , where the arrow  →  ′ denotes an internal transition or a departure from configuration  to  ′ if  ≠  ′ ; the arrow  ′ →  ′ +  denotes a reactive request that adds  jobs to the system if  ≠  ′ , and denotes a proactive request if  =  ′ .We call the above change of configuration a transition, and denote its rate as  ,( ′ ,) .Let  () denote the set of possible ( ′ , ) pairs in a transition.
We define the total transition rate at configuration  as   ≜ ( ′ ,) ∈ ( )  ,( ′ ,) , and define the maximal transition rate  max = max  ∈ K   .Since K is a finite set, we have  max < ∞.Also, observe that the request rate of type  jobs is given by where  denotes the stationary distribution of single-server configuration under policy .
In this context (•, In other words, we use • to omit the entries of 's input that agree with the corresponding entries of  1: +  1: . Remark 1.In (17), although  is only defined on the domain K  , it is valid to write  1: +  1: as its input because we always have  ℓ +  ℓ ∈ K, i.e., the total number of real jobs, virtual jobs, and tokens on a normal server never exceeds  max .To see why this is true, the single-server policy  requests jobs only when there are no tokens on the server, and it will not request more than  max −  jobs if there are already  real and virtual jobs on the server.

Proof of Lemma 2.
Proof.Generator Comparison.For any  ∈ Lip(1), consider the Poisson equation (see, e.g., [3]) that solves for   : K  → R: We let  1: =  1: +  1: in (18) and take the expectation.This results in On the other hand, because (  1: (),  1: ()) is a finite-state Markov chain, we have where   is given by    1: ,  1: =   (  1: +  1: ).Subtracting ( 20) from ( 19), we get We want to show that  and  are close so that we can bound the RHS of ( 21).Now we plug the formula of the generators in ( 15) and ( 17) into the RHS of ( 21) and get where in   •,  ′ , • −   •, , • we have omitted the entries that agree with  1: .The equality (i) is true because each of the ℓ-th terms in  and  is equal if  ℓ = 0.For the inequalities (ii) and (iii), recall that   is the total transition rate given by   ≜ ( ′ ,) ∈ ( )  ,( ′ ,) , and Therefore ( 22) can be further bounded by To prove (12), it remains to show that sup Stein Factor Bound.To prove (23), observe that the following   (•) is a solution to the Poisson equation (18): This allows us to bound the difference of   using coupling.Specifically, we define the coupling of two systems, each consisting of  i.i.d.copies of the single-server system under .The two systems are initialized with configurations (•,  ′ , •) and (•, , •) that only differ at the ℓ-th server, where we omit the entries that agree with  1: .Let  1:,1 (),  1:,2 () be the joint configuration of the two systems, which is actually 2 i.i.d.copies of the single-server system.As a result, we can specify the couplings ( ℓ ′ ,1 (),  ℓ ′ ,2 ()) for different ℓ ′ separately.For ℓ ′ ≠ ℓ, the corresponding server in the two systems have the same initial configurations, so we can always keep their configurations identical.For the ℓ-th servers, we let them evolve independently following their own dynamics until a stopping time  mix when their configurations become the same.After that, we can use coupling to keep their configurations identical.Under this coupling, it is not hard to see that where in the second inequality we have used the fact that  is 1-Lipschitz continuous under the  1 norm of the space K  .For each pair of ,  ′ , observe that because  is a  0 -irreducible policy, E[ mix ] is finite; and because K is a finite set,  ℓ,1 () −  ℓ,2 () is uniformly bounded.All these finite quantities depend on a single-server system under a policy  that is independent of  .As a result, the last expression in ( 25) is of constant order.Moreover, because there are finite pairs of (,  ′ ), the supremum sup ) ∥ is also of constant order, independent of  .This proves the Stein factor bound in (23).Together with the generator comparison, we have proved Because  ℓ=1  ∈ I  ℓ  ≤ |I|  max =  √  , for any  ∈ Lip(1), we have Plugging the above equation to (12), we get sup  ∈Lip(1) E This proves Lemma 2. □

Role of Tokens and Virtual Jobs
This section aims to shed light on the role of tokens and virtual jobs in the proposed policy, JRS.
We first outline how we devise the token-and-virtual-job mechanism from the perspective of generators.To begin with, consider the scenario where dispatch decisions are solely based on real-job configurations ( ℓ ) ℓ=1,2,... .In this case, the transitions of servers' configurations would be correlated in general due to job arrivals, which are dispatched based on the joint configuration of all servers.To break this correlation, let us introduce tokens but not set an upper limit yet on the number of tokens (thus no virtual jobs).Observe that  1: () +  1: () remains unchanged by job arrivals, so tokens remove the correlation brought about by job arrivals.However, because tokens lack internal phase transitions or departures, the transition dynamics of  1: () +  1: () will diverge from  1: () when () is large.In other words, although tokens help decouple the transitions on servers, they cannot keep the transitions of  1: () +  1: () close to  1: ().To solve this issue, we finally introduce the mechanism of converting tokens into virtual jobs when the number of tokens is high, where virtual jobs can make internal phase transitions or departures just like real jobs.Now, the sum  1: () +  1: () +  1: () remains unchanged by job arrivals nor the creation of virtual jobs, and the internal phase transitions and job departures are similar to those of  1: ().More formally, the generators of  1: () +  1: () +  1: () and  1: () are close to each other -their additive difference can be upper bounded by a quantity proportional to the expected number of tokens, as shown in (22).Therefore, by regulating the number of tokens, we can control the difference between the generators of  1: () +  1: () +  1: () and  1: ().
Another key design component of JRS is that the subroutine requests jobs based on the observed configurations.This has been used in the proof of Lemma 2 to show that the observed configurations  1: +  1: are close to  1: , which consist of  i.i.d.single-server systems.Recall that each singleserver system in  1: is designed to have a throughput of    / for each job type  ∈ I. Therefore, the proximity between  1: +  1: and  1: ensures that the job request rate mirrors the arrival rate for each job type, regardless of the real-job configurations.The fact that these two rates are approximately equal is important for proving Lemma 3 and Lemma 4. It guarantees that both the rate of generating virtual jobs (when there are too many tokens) and the rate of dispatching jobs to backup servers (when there are no tokens) are appropriate.
A natural follow-up question is whether the usage of tokens and virtual jobs is fundamental or an artifact of our analysis technique.For example, it is unclear whether removing the upper limit on the number of tokens would still yield an asymptotically optimal policy.This is an interesting question that we do not have a complete answer to.The token-and-virtual-job mechanism emerges as a natural choice under our analysis framework.Nevertheless, it is worth noting that our analysis primarily treats each server's configuration as a generic Markov chain, without utilizing many properties specific to the stochastic bin-packing setting.An exception to this is the proofs in Appendix B.2, where we use the model that each job leaves the system within a constant expected time.It would be interesting to explore more properties of the problem to better understand policy designs without auxiliary state variables like tokens and virtual jobs.

CONCLUSION
In this paper, we study a new setting of stochastic bin-packing in service systems that features time-varying item sizes.Since our formulation is motivated by the problem of virtual-machine scheduling in computing systems, we use the terminology of jobs and servers, where jobs are viewed as items, whose sizes are their resource requirements, and servers as bins.The time-varying item sizes capture the emerging trend in practice that jobs' resource requirements vary over time.Our goal is to design a job dispatch policy to minimize the expected number of active servers in steady state, subject to a constraint on resource contentions.Our main result is the design of a policy that achieves an optimality gap of  ( √  ), where  is the scaling factor of the arrival rate.When specialized to the setting where jobs' resource requirements remain fixed over time, this result improves upon the state-of-the-art  ( ) optimality gap.Our technical approach highlights a novel policy conversion framework, Join-Reqesting-Server, that reduces the policy design problem to that in a single-server system.
There are several potential directions that may be worth further exploration.One direction is to strengthen the optimality result within the current setting.Specifically, it is interesting to investigate: (i) whether it is possible to achieve an optimality gap smaller than Θ( √  ); and (ii) whether there exist asymptotically optimal policies whose average cost rate of resource contention satisfies the budget strictly instead of asymptotically.
We are also interested in extending our technique to the optimal control of other systems with similar structures.Intuitively, this technique could be applied to systems with many components that evolve mostly independently but are weakly coupled by certain constraints.Viewing each component as a server, we can define a suitable single-server problem and then design a policy for the original system to track the dynamics of the optimal single-server solution.Below we list several variations of our model that can potentially be analyzed using the proposed technique.
• A model where jobs running on each server will be put into a local queue when there are resource contentions.The goal thus becomes finding the optimal trade-offs between the number of active servers and the waiting time of the jobs.• A model that allows each server to have a Markovian state that affects the dynamics of the jobs running on the server.• A model that allows jobs to migrate to different servers at the cost of migration delays.
• A closed-system model where jobs re-enter the system after completion.
A third possible direction is to tackle the problem when the arrival rates and the parameters in the job model are unknown, as mentioned in Section 4.3.A possible approach is to develop an approximate version of the JRS framework, where the optimal single-server policy and the simulator for the virtual jobs are both learned from data.It is desirable to design such an approximate framework whose performance degrades gracefully as the approximation error increases.

A PROOF OF THEOREM 2 (LOWER BOUND)
Proof.It is sufficient to show that given an infinite-server policy  for P ((   )  ∈ I , ) in (2), we have  () ≥  * .To this end, we will construct a single-server policy  such that the resulting system configuration  (∞) in steady state satisfies: Let  be the distribution of  (∞), then ( (), , ) is a feasible solution to the problem P ((   )  ∈ I , ) in (3).As a result, we have  () ≥  * .Note that although  is actually non-Markovian, i.e., it makes decisions based on not only the current configuration but also the history, as we will show in Appendix C,  * is still a lower bound to the objective value that  can achieve in P ((   )  ∈ I , ).The construction of the single-server policy  involves simulating an infinite-server system under  from the empty configuration.At time 0, the policy  randomly chooses the ℓ-th server in the infinite-server system with probability  ℓ , for ℓ = 1, 2, • • • .It then requests jobs for the single-server system according to a policy  ℓ .The key to our policy  ℓ is to make the single-server system emulate the job assignment at the ℓ-th server of the simulated infinite-server system, but without incurring idleness.We first construct the policy  ℓ , and then specify the probabilities  ℓ .
Let us start by introducing some useful notation.Let  ℓ () be the single-server system configuration under  ℓ at time  and  ℓ () be the configuration of the ℓ-th simulated server in the infinite-server system under .We define a stochastic process { ℓ (),  ≥ 0} as follows: The "max" is well-defined because the integral is continuous in .Intuitively,  ℓ () gives the maximum time when the accumulative busy time of the ℓ-th server is .Note that { ℓ (),  ≥ 0} is only discontinuous when  ℓ () reaches 0, thus it is right-differentiable with derivative equal to 1 at any point.
We construct  ℓ and the simulation of the infinite-server system under  in a way such that: That is, we want that the single-server system has the same dynamic of the simulated ℓ-th server except skipping the idle period.To this end, we couple the two systems as follows: (1) When the ℓ-th simulated server  ℓ ( ℓ ()) receives a type  job, we let the single-server system  ℓ () request a type  job at time .For each such job, its phase transition process in the ℓ-th simulated server is the same as that in the single-server system.That is, when we observe any internal transition or departure event in  ℓ (), we produce a same event on the ℓ-th simulated server  ℓ ( ℓ ()).
(2) The simulations of the rest of the infinite-server system under policy  are driven by independently generated random seeds.
It is not hard to see that the simulated infinite server-system has the same stochastic behavior as an uncoupled system under .Moreover, as we couple all the events that happen in  ℓ () and  ℓ ( ℓ ()), together with the facts that  ℓ () and  ℓ ( ℓ ()) are piecewise constant and  ℓ (0−) =  ℓ ( ℓ (0−)) = 0, we get (29).
Next we claim that (29) implies the following relationship between the steady-state cost of the single-server system under  ℓ and the steady-state cost of the ℓ-th simulated server in the infinite-server system under : This is because for all  ≠ 0, we have where () and () hold because long-run averages converge to steady-state expectations; () is due to the fact that 1 { ℓ ( )= } , for any  ≠ 0; () is due to the fact that 1 { ℓ ( ℓ ( ) )≠0} = 1; and () follows from (29).
Let  ℓ  be the long-run request rates of type  jobs in the single-server system under  ℓ , and  ℓ  be the throughput of type  jobs of ℓ-th simulated server under .By the construction of  ℓ , the single-server system requests jobs based on the arrival events of the ℓ-th simulated server, we have With the constructed policies { ℓ , ℓ = 1, 2, . . .}, we are ready to define the policy .We let  choose an index ℓ with probability  ℓ at time 0, and then follow  ℓ .We set the probability  ℓ as where the second inequality uses the fact that where () follows from (30) and (31).Observe that under ,  (∞) ≠ 0 almost surely, we thus have which proves (27).Moreover, for each  ∈ I the request rate   is given by . This proves (28).By the argument presented at the beginning of the proof, we get  () ≥  * .□

B THE REST OF THE PROOFS NEEDED FOR THEOREM 3 (CONVERSION THEOREM) B.1 Proof of Lemma 1
Proof.We will show that under the JRS policy, the Markov chain for the system state (represented as (( ℓ ()) ℓ=1,2,... ,  1: (),  1: ())) has a unique stationary distribution by first arguing that it is  0 -irreducible (here being  0 -irreducible means the Markov chain has a state that can be reached by all other states through transitions, " 0 " in " 0 -irreducible" does not refer to any specific states), and then use Foster-Lyapunov theorem to show the positive recurrence [see, e.g., 25].Combining  0 -irreducibility with positive recurrence, we can conclude that the Markov chain under study has a unique stationary distribution.

B.2 Proofs of Lemma 3 and Lemma 4
In this subsection, we prove Lemma 3 and Lemma 4 together.We begin by introducing some notations.We use  ℓ ≜ (  ℓ ,  ℓ ) to represent the state of the ℓ-th server, and use  1: to represent the joint state of the first  servers.We also use the lowercase  ℓ ,  1: to represent the realizations of the corresponding random variables.We denote the total number of type  virtual jobs as   ≜  ℓ=1  ℓ  for  ∈ I, and its realization as   .We denote the total number of type  jobs on backup servers as   for ∈ I, and its realizations as   .We also denote the total number of type  tokens throughout the system as   ≜  ℓ=1  ℓ  , and its realization as   .Our goal can be rewritten as proving E[  ] =  √  and E[  ] =  √  for each  ∈ I.We first give an overview of the proof.Observe that in our model, the expected time that a job stay in the system is fixed.As a result, bounding the number of virtual jobs or jobs on backup servers in the system is equivalent to bounding the rate that they are generated, according to Little's Law.By our construction of the policy, the rate of generating those jobs is closely related to the dynamics of the total number of type  tokens   ().
To describe the dynamics of   (), we first introduce two functions   and   : The function   represents the increment in the number of type  virtual jobs due to the event that the total number of type  tokens on the first  servers exceeds the token limit  max .The function   corresponds to the increment in the total number of type  jobs on backup servers due to the event that a type  job arrives to the system without seeing a type  token.For a function  : (K × K)  → R that only depends on the number of type  tokens   , its drift can be written as We abuse the notation of  here.For ease of exposition, we will simply write   and   to represent   (  ,   ) and   (  ).
By construction, the total number of type  tokens {  ( }) is a stochastic process constrained within [0,  max ].Note that   () increases when some servers request new tokens, and decreases when a real or virtual arrival checks out the token or when some servers have the excessive tokens removed.When   () is away from the boundaries, the average rate that it increases is approximately equal to    , and the average rate that   () decreases is given by where we have used the approximations that  ℓ  ≈  ℓ , 1 { ℓ =0} ≈ 1 and  = ⌈ ⌉ ≈  .
As {  ()} randomly moves up and down with approximately the same rate and reflects on the boundaries of 0 and  max , it behaves as a reflected simple symmetric random walk.Intuitively speaking, the steady-state distribution of   is approximately a uniform distribution over [0,  max ].Recall that   and   can only be non-zero when   () is near the boundaries.Since the length of the interval  max = Θ √  , we can expect that   and   diminish as  → ∞.In the proof, we first establish the relationship between E[  ], E[  ] and   ,   using Little's Law.Then we derive bounds on   and   by analyzing the drift of several test functions of   .This step is implicitly based on the intuition that   is approximately uniformly distributed over [0,  max ], with the tokens being generated and eliminated at similar rates.Finally, we invoke Lemma 2 to show that the tokens are indeed generated and eliminated at similar speeds, which leads to bounds on   and   .
Finally, we make some additional remarks on the notations.First,   and   depend on the total number of type  tokens   and the number of newly requested jobs   , although we omit the dependency expression for ease of exposition.Second, we abuse the notation   and   to denote the corresponding random variables.We also write  ′ , as a shorthand for ( ′ ,) ∈ ( ℓ ) when the context is clear.

B.2.1 Proofs of Lemma 3 and Lemma 4.
We are now ready to prove Lemma 3 and Lemma 4.

Proof.
Step 1: Bounding Virtual Jobs and Jobs on Backup Servers using Little's Law.We first apply Little's Law to   and   .For each type , we let the expected time that a type  job stays in the system be   .Let  max = max  ∈ I   .Because there are only finitely many types of jobs, and each job spends finite expected time in the system,  max is a finite constant.By Little's law, Step 2: Drift Analysis.The above two equations ( 36) and ( 37) suggest that we can derive upper bounds on E[  ] and E[  ] by analyzing the following two terms: , interpreted as the average rate that   () reflects on the boundary at  max ; • E [   •   ], interpreted as the average rate that   () reflects on the boundary at 0.
We establish the relationships of   ,   and   by analyzing the drift of two test functions .
Letting (  ) =   and taking steady-state expectation over its drift, by (35) and the fact that the drift is zero in steady state, we get Similarly, letting (  ) =  2  and taking steady-state expectation over its drift, one can verify that = 1 Readers may refer to the complete calculation at the end of this subsection.
Step 3: Estimating the Terms Obtained from Drift Analysis.We will first focus on bounding analyzing the two terms in ( 40) and ( 41) separately.Then we invoke (38) The term in ( 41) is easy to deal with.Observe that the number of jobs requested each time should be no more than the maximal number of jobs that a server can hold, i.e.,   ≤  max , so where in the second inequality we have used the fact that the total rate is uniformly bounded by  max , and the last step uses the facts that  =  ( ) and  max = Θ √  .
To bound the term in (40), first observe that   ≤  max , which implies that The term on the RHS of the above equation is the expected absolute difference between the rates of generating and eliminating type  tokens, which can be shown to be small relative to  .Specifically, we claim that To show (43), first notice that we can remove the indicator 1 { ℓ =0} without introducing much error: where the first inequality is due to triangular inequality, the second inequality is due to the definition of  max , and the last inequality is because  ℓ=1 1 { ℓ ≠0} ≤ |I|  max .It remains to bound the term E  ℓ=1  ′ ,   ℓ ,( ′ ,)   −    , which can be seen as showing that the rate of generating type  tokens concentrates around the type  jobs' arrival rate    .It is natural to think of using some Law of Large Numbers.Unfortunately,  ℓ=1  ′ ,   ℓ ,( ′ ,)   is not a sum of i.i.d.random variables due to dependencies among  ℓ for different ℓ's.As a result, we want to invoke the Wasserstein distance bound in Lemma 2 to replace  ℓ in the above expression with  ℓ .We define the function  ( 1: ) as We claim that  ∈ Lip(1).For any two  1:,1 ,  1:,2 , where the first inequality is due to triangular inequality; the second inequality uses the fact that   ≤  max and 1 { 1,ℓ ≠ 2,ℓ } ≤ ∥ 1,ℓ −  2,ℓ ∥; the third inequality uses triangular inequality, the fact that the total rate at a configuration  is bounded by  max and the property of the  1 norm ∥•∥.Therefore,  ∈ Lip(1).The Lipschitz continuity of  allows us to invoke Lemma 2 and get Therefore, Observe that under a Markovian policy, the request rate of type  jobs can be written as Moreover, because  ′ ,   ℓ ,( ′ ,)   are i.i.d. for ℓ = 1, . . ., , we have Therefore, by combining the arguments above, we get which proves (43).This implies that the term in (40) is also in  √  .
Combining the bounds on the terms in (40) and (41), we get Finally, we bound E [   •   ].We rearrange the terms in (38) and get By (43), we have We invoke the equations ( 36) and ( 37) that we get at the beginning of the proof, and conclude that This finishes the proof.□ B.2.2 Deriving the equality in (39).We show the calculation detail of deriving the following equality.
= 1 The equality is obtained by considering the drift of the function (  ) =  2  , which is zero in steady state.Recall that the drift of (  ) is given by We will first calculate (  +   −   ) − (  ), then (  − 1 +   ) − (  ), and finally plug them into the (35).The calculation of (  +   −   ) − (  ) utilizes the following property of   : This property follows from the definition   = (  +   −  max ) + .Intuitively, this is because   is the "force" that pushes   back when it hits the boundary at  max .Using the property, we have The second last equality is due to (48), and the rest are all algebraic manipulations.
We carry out a similar calculation for (  − 1 +   ) − (  ): where the last equality is due to the property that and the rest are all algebraic manipulations.Putting together, After recombining the terms, we get This finishes the calculation.

B.3 Proof of Conversion Theorem without Assuming Irreducibility
In this subsection, we prove Theorem 3 without assuming  0 -irreducibility of the subroutine .Specifically, suppose that we have a Markovian single-server policy  and an initial distribution   over its recurrent classes   for  = 1, 2 . . . , we will construct an infinite-server policy  such that ( 5) and ( 6) still hold.The basic idea is to decompose a general Markovian single-server policy  into multiple  0 -irreducible Markovian policies, each induces one recurrent class and preserves stationary distribution and the throughput of  on that recurrent class, as stated in Lemma 5 below.
Lemma 5 (Decomposing The Reducible Policy).Let  be a general single-server Markovian policy with recurrent classes   for  = 1, 2, . . .,  .Then for each  exists a Markovian policy   such that • The induced Markov chain is  0 -irreducible with the unique recurrent class being   ; • The stationary distribution is the same as the stationary distribution under  starting from a configuration in   .
Proof.For each  = 1, 2, . . .,  , we define the policy   as follows: when the system has configuration  ∈   , the policy   makes the same decisions as ; when  ∉   and  = 0, the policy starts a timer whose duration follows an exponential distribution with rate 1 and immediately adds  0 many jobs of each type when the timer ticks for some arbitrary  0 ∈   ; when  ∉   and  ≠ 0, the policy does not request any jobs.
We show that under the new policy   ,   is also a recurrent class of the induced Markov chain.This is because if the system starts from a configuration  ∈   , then it will stay in   since it makes the same decisions and has the same transitions as under the policy .Because   is a recurrent class under , it is still a recurrent class under   .
To show that the Markov chain induced by   is  0 -irreducible, observe that starting from any  ∉   , the system state will return to   .Specifically, • If the system starts from a configuration  such that  ∉   and  ≠ 0, then no new jobs will be requested until either  ∈   or  = 0.In the latter case, by the construction of the policy, the system jumps to a configuration in   after the next transition.• If the system starts from a configuration  such that  ∉   and  = 0, the system jumps to a configuration in   after the next transition.
The claim that the stationary distribution under   is the same as the stationary distribution under  starting from a configuration in   is trivial to show, because when the system initializes from any configuration in   , it stays in   and the transitions are exactly the same under the two policies. □ The JRS policy with a general Markovian single-server policy as its subroutine  is constructed using the  0 -irreducible policies   's obtained from the decomposition of  and the probabilities   's.(1) We divide all the servers into  server pools, each with infinitely many servers.Let the -th server pool run the JRS policy with subroutine   (defined in Section 4.2) for each  = 1, . . . .(2) Whenever we see an arrival of type , we route the job to the -th infinite-server system with probability To analyze the policy , let   and    's be the stationary distribution and throughput of the policy   for  = 1, . . .,  .By Lemma 5, we have the following relationships: Based on the above relationships, we can prove the general version of the Conversion Theorem that does not require  0 -irreducibility.
Proof of Theorem 3.For each  = 1, . . .,  , Lemma 5 implies that the policy   and stationary distribution   form a feasible solution to the single-server problem P        ∈ I ,  , and the corresponding objective value is    .Consider the infinite-server system with arrival rates ∈ I and budget .As we have proved Theorem 3 for the JRS policy with  0 -irreducible subroutine   , it follows that where    is the random variable representing the steady-state number of servers in configuration  in the infinite-server system under the JRS policy with subroutine   .
By the construction of , the arrival rate to the -th server pool is equal to Here we use (52) and the relationship between  () and   ().Similarly, After re-indexing the servers, we get ( 5) and (6).The bounds on  () and  (), (7) and ( 8), follow from ( 5) and (6).They can be verified in the same way as that in the proof for the irreducible case, so we omit the argument here.□

C SOLVING THE SINGLE-SERVER PROBLEM
In this section, we show the equivalence of the single-server problem in (3) and a linear program (LP) as stated in Theorem 4. The equivalence needs to be proved in two directions.In Appendix C.1, we first derive the linear program (61) as a relaxation of the single-server problem (3) so that the optimal value of the LP is a lower bound to the optimal value of (3).Then in Appendix C.2, we will construct a Markovian single-server policy that achieves the optimal value of the LP, which implies the optimality of the policy.

C.1 Lower Bound via LP Relaxation
In this subsection we derive an LP relaxation of the optimization problem (3), restated below: Here we allow  to be non-Markovian, i.e., it can make decisions based on the history, but we still require its performance metrics used in the objective and the constraints of (3) to have well-defined steady-state distributions.
Observe that both  (∞) and   depend on the stationary distribution , but the constraints in terms of  are implicit.To derive an LP relaxation, we give an explicit characterization of the constraints that must be satisfied by the stationary distribution  induced by any feasible policy .
To do this, we derive a version of the stationary equation in terms of a quantity called transition frequency.The transition frequency of type  jobs is a function   : K → R describing the steadystate frequency of requesting a type  job when the system has configuration .To rigorously define transition frequency, we first introduce a concept called nominal transition.
Definition 1 (Nominal Transition).Consider a single-server system under any policy.When the configuration  () transitions from  to  ′ +  for some ,  ′ ∈ K with  = (  )  ∈ I new jobs added into service, we decompose the transition by adding intermediate configurations as illustrated below, where  first goes to  ′ if  ′ ≠ , then add jobs of each type one by one.
where ( 1 ,  2 , . . .,  | I | ) is a fixed ordering of the set of phases I.We call each short transition in the diagram a nominal transition.
For  1 ,  2 ∈ K, we denote  ( 1 ,  2 , ) as the cumulative number of nominal transitions from  1 to  2 during the time interval [0, ], which is a random variable with a distribution depending on the single-server policy and initial distribution of configurations.
Note that for any  ∈ I and  ∈ K s.
where  and   are column vectors representing  (•),   (•), and ,   are matrices that make (59) equivalent to (57).Therefore, the following three conditions are necessary for any tuple (, (  )  ∈ I ) to be a possible pair of stationary distribution and transition frequencies for a Markovian policy.
Based on the characterization of stationary  and   's in (60), we can now formulate a linear program LP ((  )  ∈ I , ).The linear program has decision variables Φ ∈ R,  ∈ R K , and   ∈ R K for  ∈ I, where Φ is a factor that scales the throughput of each type of jobs in the direction of where  is the vector form of the cost rate function ℎ; 1   is a |K |-dimensional vector with one in all entries except those with  ∈ I   =  max .In addition to the last three constraints on stationarity, the first constraint of LP ((  )  ∈ I , ) is the resource contention constraint; the second constraint is because the transition frequency from  to  +   is equal to the throughput of type  jobs.
By the construction of LP ((  )  ∈ I , ), any feasible solution of P ((   )  ∈ I , ) can be converted to a feasible solution of LP ((  )  ∈ I , ).Let  * be the optimal value of (3) and Φ * be the optimal value of (61).Then we have

C.2 Policy Construction
In this subsection, we describe a procedure that allows us to construct a policy that achieves the lower bound given by the LP relaxation in (61).Specifically, given a feasible solution (, (  )  ∈ I ) to (61), we define an LP-based policy that requests jobs as follows: • Case 1: When the system enters a configuration  with  () ≠ 0, for each  ∈ I, the policy starts a timer whose duration follows an exponential distribution with rate   ( )  ( ) .The policy requests a type  job when the -th timer ticks.When the configuration changes, all timers are canceled.
Remark 2. Note that the definition of the LP-based policy here is stated from a view different from the view in Section 4.2: here each request only adds one job to the server, and one request can happen immediately after another; while in Section 4.2 each request can add multiple jobs to the server, and there is only one request happening at a time.We refer to the view here as the impulsive view, because multiple requests happening at the same time can be thought of as having an infinite request rate.In contrast, we call the view in Section 4.2 the non-impulsive view.
The LP-based policy can be alternatively described using the non-impulsive view if we see multiple requests happening at the same time as one request that adds multiple jobs to the server.More specifically, each reactive request of the LP-based policy is initiated by an internal transition or departure and consists of one or multiple requests of Case 2; each proactive request of the LP-based policy consists of one request in Case 1 and possibly several requests in Case 2.
The following lemma characterizes the steady-state behavior of a single-server system under the LP-based policy.
Lemma 7 (Properties of LP-based Policies).Consider a single-server system under the LP-based policy  (, (  )  ∈ I ), where (, (  )  ∈ I ) is a feasible solution to (61).We have that  is a stationary distribution under policy , and (  )  ∈ I are the transition frequencies corresponding to .
The proof of Lemma 7 is based on (58), following the same argument as the proof of Lemma 6, as well as an induction argument.
Proof.Let (, (  )  ∈ I ) be a feasible solution to the LP in (61).To show that  and (  )  ∈ I are also the actual stationary distribution and transition frequencies, it suffices to show that if the initial distribution follows , i.e.P  (0) =  =  () then under the policy  (, (  )  ∈ I ), we have lim lim where  is the cumulative number of nominal transitions under the policy  (, (  )  ∈ I ) and the initial distribution .
Our proof is based on the following equation: The equation is a straightforward consequence of (58), following the same argument as the proof of Lemma 6.We prove the following two equations by induction on  ∈ I   . lim We first consider the base case when  ∈ I   = 0.In this case,  = 0 and we have for all .This reduces (65) to Now we discuss based on whether  () = 0.If  () ≠ 0, by the definition of our policy, for all , which is (66).Combining the above equation and the stationary equation (57) satisfied by (, (  )  ∈ I ), we conclude that the RHS of (69) is zero, i.e.
Because P( () = ) ≥ 0 and P( (0) = ) = 0, the LHS of the above expression is non-negative.However, the RHS of the above expression is non-positive.Therefore, both sides are equal to zero, thus we have   P( () = ) Having proved the base case, we do the induction step.Suppose we have proved (66) and (67) for all  such that  ∈ I   ≤  − 1 for some integer  ≥ 1.We consider  with  ∈ I   = .By the induction hypothesis, Then we repeat the arguments after (68) of the base case verbatim.By induction, we have proved (66) and (67).
Therefore, given the policy and initial distribution, the distribution of  () is stationary, i.e., we always have P( () = ) =  () for all  ∈ K.As a result, an analogue of (66) holds for all  ≥ 0: E[ (,  +   , )] is differentiable with respect to  and for all  ∈ K and all  ∈ I. Therefore, lim This completes the proof.□

D PERFORMANCE GUARANTEE OF JOIN-REQUESTING-SERVER WITH AN ESTIMATED MODEL D.1 Assumptions and result
In this section, we consider the performance of JRS when it is based on an estimated model.We will state the performance guarantee in terms of the estimation error, and give a proof sketch by pointing out which part of Theorem 3's proof needs to be changed accordingly.
Consider the setting where the maximal jobs on a server  max , the set of job phases I, the cost rate function ℎ(•), and the budget  are all known.However, we only have estimations of the jobs' arrival rates, internal transition rates, and departure rates.
Specifically, for any ,  ′ ∈ I, let   ′ be the true rate of internal transition from phase  to phase  ′ ; let  ⊥ be the true departure rate of phase ; let    be the true arrival rate of type  jobs.We let   ′ 's,  ⊥ 's, and    's be the estimated internal transition rates, departure rates, and arrival rates, respectively.We assume that there exists a small positive constant  that is independent of the scaling factor  such that the following assumptions hold.
Assumption 1 (-accurate estimation).For any ,  ′ ∈ I, Assumption 2 (Scaling of the single-server objective value).Consider the single-server problem in (3).Let ( , , ) be a solution feasible to the single-server problem with the estimated parameters that are -accurate, where  ∈ [0,  max ), for some constant  max > 0. We assume that there exist constants 0 <  1 <  2 independent of  and  such that Assumption 3 (-insensitivity of the optimal value).Consider the single-server problem in (3).Let the optimal value of the single-server problem with the estimated parameters be  * , where the estimated parameters are -accurate; let the optimal value of the single-server problem with true parameters be  * true .We assume that  * ≤  * true +  .
We also assume that JRS can accurately simulate the virtual jobs.
Assumption 4 (Accurate simulation).The virtual jobs simulated in JRS follow the true transition dynamics.
Given the above assumptions, we have the following proposition that states the optimality gap of JRS policy with estimated model parameters, which has a similar form as Theorem 3 for JRS under true model parameters.
Proposition 1 (Optimality gap with model estimation).Consider a stochastic bin-packing problem in service systems with time-varying job resource requirements.Let the infinite-server policy  be JRS with subroutine .Suppose  is specified based on an estimated model satisfying Assumption 1, 2, 3, and 4, for  s.t. ∈ [0,  max ), where  max is some positive constant independent of  .Let  be the objective value achieved by  in the single-server problem P ((   )  ∈ I , ) with estimated parameters.Under , for any initial state, we have where  denotes the steady-state configurations of the single-server system under  with the estimated parameters.If we let  be the optimal policy of the single-server problem with estimated parameters, for any initial state, we have where  is a positive constant independent of  .In other words,  is 1 +  +   −0.5 , 1 +  +   −0.5optimal.
Remark 3. Note even if the single-server system with estimated parameters  0 -irreducible under , it is hard to guarantee that the original system is  0 -irreducible due to the estimation errors.Consequently, the steady-state performance metrics  () and  () could depend on the system's initial state.Fortunately, our proof for Theorem 3 does not rely on the uniqueness of the stationary distribution, so we can adapt the proof to show the inequalities in Proposition 1 for any initial state.
Remark 4. Note that Assumption 4 ensures that the real jobs and virtual jobs on a server are indistinguishable, so that (  1: (),  1: ()) is still a Markov chain.Recall that  ℓ and and  ℓ denote the observed configuration and tokens for each normal server ℓ, respectively (see Section 5.1).We suspect that Assumption 4 can be removed since our proof does not rely too much on (  1: (),  1: ()) being a Markov chain.However, removing the assumption requires a more careful and notationally heavy analysis.We argue that in practice, this assumption is not restrictive, as one can record the traces of the jobs that arrived in the past and resample virtual jobs from those traces.

D.2 Lemmas
In this section, we give a proof sketch for Proposition 1 when the single-server system under  with estimated parameters is  0 -irreducible.The argument of extending to the general case is essentially the same as Appendix B.3, so we omit it here.On a high level,the proof of Proposition 1 is similar to that of Theorem 1.In particular, the key steps of the proof are to show that  ( 1: ,  1: ) =  ( √  ) and ∞ ℓ=+1  ∈ I  ℓ  =  ( √  ) as  → ∞, where  = ⌈ ⌉, and  1: are i.i.d.copies of steady-state configurations of the single-server system under  with the estimated parameters, and  is the steady-state configurations of the infinite server system under .Proposition 1 is based on the three lemmas stated below.The proof of Proposition 1 using the three lemmas is essentially the same as the argument in Section 5.3.Lemma 8.Under the conditions of Proposition 1 and the single-server system with estimated parameters under  being  0 -irreducible, for any initial state, we have   1: ,  These three lemmas are analogous to Lemma 2, Lemma 3, and Lemma 4, respectively.In the rest of the section, we sketch the proofs for the three lemmas above, highlighting the difference from the proofs of their analogues.

D.3 Proof sketch for Lemma 8
Recall from Section 5.4 that the proof of Lemma 2 is based on Stein's method, which compares the generator of the i.i.d.copies of the single-server system with the generator of the infinite-server system.To write down the generators with the estimated model, recall that in the single-server system, each transition can be represented by the diagram  →  ′ →  ′ + , where the arrow  →  ′ denotes an internal transition or a departure if  ≠  ′ ; the arrow  ′ →  ′ +  denotes a job request that is made right after reaching  ′ .For any , let  () be the set of ( ′ , ) ∈ K 2 such that  ′ +  ∈ K.We define two sets of transition rates as below: for any  ∈ K and ( ′ , ) ∈  (), • Under the estimated parameters and the policy , we let  ,( ′ ,) be the rate of the transition  →  ′ →  ′ + , and let   ≜  ′ ,  ,( ′ ,) be the total transition rate at configuration .• Under the true parameters and the policy , we let  ,( ′ ,) be the rate of the transition  →  ′ →  ′ + , and let   ≜  ′ ,  ,( ′ ,) be the total transition rate at configuration .
When  ≠  ′ and  = 0,  ,( ′ ,) is equal to the rate of going from  to  ′ via an internal transition or departure under the estimated job model, while  ,( ′ ,) is under the true job model.Because there are at most  max jobs, and by our assumption the estimation error of each job's transition rates are bounded by , thus  ,( ′ ,) and  ,( ′ ,) differ by at most  max .
When  ≠  ′ and  ≠ 0,  ,( ′ ,) is equal to the rate of going from  to  ′ via an internal transition or departure, multiplied by the probability of adding  jobs via a reactive request, under the estimated job model and the policy ;  ,( ′ ,) is the true job model instead of the estimated job model, but uses the same policy.Because the rate of going from  to  ′ differs by at most  max  under two different job models, and the probability of adding  jobs after going from  to  ′ is the same under the same policy, thus  ,( ′ ,) and  ,( ′ ,) differ by at most  max .□ In the Stein factor bound step, we need to show that sup This involves analyzing the i.i.d.copies of the single-server system, and the fact that the singleserver system with estimated parameters is  0 -irreducible under .This part of the proof is identical to the corresponding part of the proof of Lemma 2.

D.4 Proof sketch for Lemma 9 and Lemma 10
The proof of Lemma 9 and Lemma 10 has a similar structure as the proof of Lemma 3 and Lemma 4.
In the first step, we use Little's law to bound expectations of the number of type  virtual jobs,   , and the number of type  jobs on backup servers,   .We have two equations almost the same as (36) and (37) where  max is the maximal expected service time of any type of virtual job or real job.Because  max =  (1), it suffices to bound E  ℓ=1  ′ ,   ℓ ,( ′ ,)   1 { ℓ =0} and E    •   .In the second step, we utilize the fact that the two Lyapunov functions (  ) =   and (  ) =  2  have zero drift in steady-state, where   is the total number of type  tokens.We get two equalities similar to (38)