Prophet Inequalities with Cancellation Costs

Most of the literature on online algorithms and sequential decision-making focuses on settings with “irrevocable decisions” where the algorithm’s decision upon arrival of the new input is set in stone and can never change in the future. One canonical example is the classic prophet inequality problem, where realizations of a sequence of independent random variables X1, X2,… with known distributions are drawn one by one and a decision maker decides when to stop and accept the arriving random variable, with the goal of maximizing the expected value of their pick. We consider “prophet inequalities with recourse” in the linear buyback cost setting, where after accepting a variable Xi, we can still discard Xi later and accept another variable Xj, at a buyback cost of f × Xi. The goal is to maximize the expected net reward, which is the value of the final accepted variable minus the total buyback cost. Our first main result is an optimal prophet inequality in the regime of f ≥ 1, where we prove that we can achieve an expected reward 1+f/1+2f times the expected offline optimum. The problem is still open for 0<f<1 and we give some partial results in this regime. In particular, as our second main result, we characterize the asymptotic behavior of the competitive ratio for small f and provide almost matching upper and lower bounds that show a factor of 1−Θ(flog(1/f)). Our results are obtained by two fundamentally different approaches: One is inspired by various proofs of the classical prophet inequality, while the second is based on combinatorial optimization techniques involving LP duality, flows, and cuts.


Introduction
Consider a monopolist seller tasked with selling a single item to a sequence of arriving buyers.A quintessential problem in online algorithms and mechanism design -with a wide range of applications from selling seats in a concert hall to the multibillion dollar online display advertising industry -is how to allocate this item in a sequential fashion to the buyers so as to maximize social welfare, that is, to maximize the willingness to pay, or the value of the allocated buyer. 1 Given distributional knowledge about the sequence of arriving values, a common approach to this problem is to design online algorithms that attain the so-called prophet inequalities; the goal there is to evaluate the performance of the online algorithm relative to an "omniscient prophet", who knows the entire sequence of values and simply maximizes social welfare by allocating the item to the highest value buyer.This fundamental algorithm design question, which can alternatively be described as an optimal stopping problem, has its roots in the classic work of Krengel and Sucheston in the 70s and has since been studied quite extensively in computer science, mathematics, and operations research.This significant line of work on prophet inequalities shows that in many settings of Bayesian allocations and mechanism design (even beyond single parameter environments), optimal or nearoptimal prophet inequalities can be obtained by using simple and elegant take-it-or-leave-it pricing rules.For example, the seminal work of Samuel-Cahn (1984) shows that a single threshold algorithm can achieve the optimal 0.5 competitive prophet inequality in a single item setting when the values are independent of each other.Other examples include, but are not limited to, obtaining more powerful prophet inequalities in special cases such as i.i.d.values or independent values arriving in a random order (Hill and Kertz, 1982;Correa et al., 2017;Abolhassani et al., 2017;Esfandiari et al., 2017), and extensions to more general feasibility environments such as multiunits (Hajiaghayi et al., 2007;Yan, 2011;Alaei, 2014;Jiang et al., 2023), matroids (Chawla et al., 2010;Kleinberg and Weinberg, 2012;Anari et al., 2019), matchings (Alaei et al., 2012;Papadimitriou et al., 2021), general downward closed environments (Rubinstein, 2016), combinatorial valuations (Rubinstein and Singla, 2017), and combinatorial auctions (Feldman et al., 2013;Dutting et al., 2020) (see Lucier (2017) and Correa et al. (2019) for comprehensive surveys).Importantly, one common aspect in all of this existing work in the literature on prophet inequalities is the assumption that the underlying algorithms should only make irrevocable decisions -once the decision is made for an arriving buyer/random variable, that decision is set in stone and can never be modified in the future.Relaxing this assumption gives more power to an online algorithm, as it can now hedge against possible early commitments to low-valued buyers arriving at the beginning of the sequence.
Motivated by some modern applications of Bayesian online allocation in electronic marketplaces and platforms -in particular, the criticality of maintaining supply efficiency in cloud spot markets and online hotel reservation systems through overbooking -we initiate the study of the singleitem prophet inequality problem with (costly) recourse.More formally, we consider the setting in which the decision maker is presented with a sequence of independent random variables X 1 , X 2 , . . .one by one (think of them as the sequence of values of arriving buyers), where each X i is drawn independently from a known distribution F i .Once X i is presented to the decision maker, she has the option of accepting this variable or going to the next round to observe the next variable.In the vanilla prophet inequality setting, the decision maker stops once she accepts the first variable.We diverge from this setting by allowing her to accept X i even after the first allocation, but if the item has already been allocated to a buyer j < i with value X j , the decision maker should revoke this previous allocation by paying a cost in addition to losing the original reward X j .
We focus on a simple, yet practical and fundamental model for costly recourse known as the linear buyback setting, which has been introduced and studied in the literature on online allocations under adversarial arrivals (Babaioff et al., 2008;Badanidiyuru and Kleinberg, 2009;Ekbatani et al., 2022).In this model, the additional cost of discarding X j is equal to f • X j , where f ≥ 0 is called the buyback parameter.The goal is to maximize the expected net reward, defined as the value of the final accepted variable minus the total buyback cost. 2 From a technical perspective, this is a natural model, as the resulting problem is invariant under scaling of the values.The model is also natural from the perspective of applications, as it captures scenarios where X i 's are the willingness-to-pay of the arriving users of a service, and the decision maker uses "compensation fees" in the form of a fixed percentage of the original transfer paid by the user to take back the resource.One might also imagine scenarios in which the X i 's are a sequence of demand requests (received by a platform during a decision-making horizon) declaring how long they would like to use a service in the future (e.g., renting an item for X i hours next week).Now, the platform can serve only one request (e.g., because it has only one rental item) and its service is offered at a price of 1 per unit of time (so giving the service to demand X i generates a revenue of 1 × X i ).The platform also has the option of "outsourcing" the request by offering the user the same service from a third party after paying a price of 1 + f > 1 per unit of time to that third party (e.g., can borrow the same rental item from a third party to rent to the user for X i hours).Hence, if it outsources X j , it needs to pay (1 + f ) × X j .The goal of the platform is to decide to whom it should allocate its own service and to whom it should offer an outsourced service to maximize its net profit.
To establish prophet inequalities in the above model, we compare the expected net reward of our algorithms with the prophet benchmark E[max i X i ].Let α(f ) be defined as the optimal competitive ratio that can be achieved by an online algorithm in our setting.Knowing the sequence of distributions, one can use polynomial-time dynamic programming/backward induction to devise an optimal online policy for our problem that maximizes the expected net reward amongst all online algorithms (note that the state of such a dynamic programming is essentially the pair of time and the value of the previous allocation, and hence is polynomial-time solvable with proper discretization; we detail this later in Section 2).As a result, α(f ) can be defined as the worst-case ratio of the online optimum over the offline optimum.In the special case of f = +∞ (or alternatively, the version in which no buyback is allowed), our problem is essentially the classic prophet inequality problem -therefore, α(f ) ∈ [0.5, 1] for any choice of f and lim f →+∞ α(f ) = 0.5.Intuitively speaking, as f decreases, the performance of the best online algorithm should improve.Furthermore, lim f →0 α(f ) = 1, as in the extreme case of zero buyback cost with f = 0, a simple greedy algorithm, performing buyback whenever it has a positive gain, achieves the exact offline optimum.We now ask the following research question: Can we characterize the optimal competitive ratio α(f ), that is, the worst-case performance ratio of the optimal online algorithm over the offline optimum benchmark, as a function of the buyback parameter f ≥ 0? In particular, (i) can we always obtain a constant competitive ratio α(f ) > 0.5 for every fixed f > 0? (ii) can we understand the behavior of α(f ) in the asymptotic regime when f → 0?

Our results
Our first main result answers the above question affirmatively when the buyback parameter is in the range f ∈ [1, +∞).More specifically, we show that: (informal) Theorem I: For any buyback parameter f ∈ [1, +∞), there exists a polynomialtime online algorithm that achieves a competitive ratio α(f ) = 1+f 1+2f against the optimum offline benchmark.Furthermore, there exists an instance of n = 2 two-point random variables such that no online algorithm can obtain a competitive ratio better than 1+f 1+2f .
Note that as f becomes smaller, the competitive ratio of the optimal online policy can only improve; Therefore, as a simple corollary of the above result, we obtain a constant improvement over 0.5 for any fixed f .Our bad example in the above result is actually quite intuitive (and a natural generalization of the bad example for the classic prophet inequality): Consider two random variables where X 1 = 1, and X 2 = 1 + f with probability 1 1+f and 0 otherwise.It is easy to verify that every online algorithm obtains value at most 1: If it selects X 1 = 1, it is indifferent between ignoring X 2 or performing a buyback, obtaining an expected reward of 1 in either case.If we ignore X 1 , we also is the optimal answer for every f ≥ 0. Our result above shows that this is indeed the case for f ≥ 1.We also show that the same result holds for every instance with n = 2, that is, with only two variables (regardless of whether f ≥ 1 or 0 ≤ f < 1) (see Appendix C).Furthermore, we show that if the random variable X max max i X i is bounded within a multiplicative range of 1 + f , then a simple thresholding algorithm with no buyback achieves the ratio 1+f 1+2f for all f ≥ 0 (see Appendix B).However, we show that the bound 1+f 1+2f is actually not possible to achieve for 0 < f < 1 for the general problem.We establish this result by constructing an instance of the problem with n = 3 variables for which no online algorithm can obtain a competitive ratio better than 1+2f for 0 < f < 1; we also conjecture that this bound is tight for n = 3 variables.With the help of a computer-aided search, we also tried to find the worst-case instance for n = 4, and we observed a continuation of the pattern: There seems to exist an instance where, for any choice of 0 < f < 1 3 , the competitive ratio is strictly worse than the ratio obtained by . We leave finishing the investigation of identifying the worst-case instance for any n as an interesting open problem.This investigation suggests that the problem presents new challenges for smaller values of f , since the worst-case instance seems to employ more and more variables as f decreases.Based on this evidence, it may not be feasible to obtain a closed form for the competitive ratio as a function of the buyback parameter f in the entire 0 < f < 1 regime.Despite the fact that we could establish a complete characterization of the optimal competitive ratio for f ∈ [1, +∞), the problem is still open for 0 < f < 1 due to this issue.Therefore, we attempt only to answer weaker questions in this regime.In particular, we ask whether it is possible to characterize the asymptotic behavior of α(f ) when f → 0 by establishing (almost) matching upper and lower bounds on the performance of the online optimum policy.This brings us to our second main result.We show that: (informal) Theorem II: For some constant c 0 > 0, for any buyback parameter f ∈ [0, c 0 ], there is a polynomial-time online algorithm that achieves a competitive ratio α(f ) = 1 − O f log 1 f against the optimum offline benchmark.Furthermore, for every f ∈ [0, c 0 ], there is an instance of n = Θ log 1 f two-point random variables such that no online algorithm can achieve a competitive ratio better than 1 − Ω f log 1 f .
Our online policy in the result above is a simple threshold-greedy algorithm that works as follows: depending on the value of f and the set of distributions, it chooses a threshold T .Then it performs the first selection the first time an arriving random variable X i exceeds the threshold T .After that, it accepts any arriving random variable X i as long as it generates a positive marginal gain, that is where X j is the previously accepted random variable at time i.We show that this simple algorithm achieves a competitive ratio γ(f which is asymptotically 1−O f log 1 f for small values of f > 0, and is always strictly larger than 0.5 for all f ≥ 0. We note that this algorithm is order oblivious, which means that it only needs to know the set of arriving distributions -similar to the single threshold algorithm in the classical prophet inequality (Samuel-Cahn, 1984) -and not the order in which they arrive.Another important feature of this algorithm is that it can potentially perform an unbounded number of buybacks, which seems necessary to achieve the asymptotic optimal factor as f → 0.
It is also interesting to compare our asymptotic bound of 1 − Θ f log 1 f with the best known competitive ratio for small values of f in the adversarial setting, without prior stochastic information.In this setting, Babaioff et al. (2008) established the optimal competitive ratio of 1/(1 + 2f + 2 f (1 + f )) for deterministic algorithms and the follow-up work of Badanidiyuru and Kleinberg (2009) established the optimal competitive ratio of 1/W −1 (−1/e(1 + f )), where W −1 is the nonprincipal branch of the Lambert W function. 3 Both of these bounds are asymptotically equal to 1 − Θ √ f , which converges to 1 much slower than 1 − Θ f log 1 f .This gap can be interpreted as the "value of information" about the arriving distributions in our setting, which turns out to be critical in obtaining the asymptotically optimal competitive ratio.
Our results are summarized in Figure 1.
lower and upper bounds on the competitive ratio.Blue and green curves refer to the optimal online policy (for f → 0, we depict an exaggerated version of the 1 − Θ(f log 1 f ) curve to emphasize the qualitative behavior), the red curve is our threshold-greedy algorithm, and the black curves are optimal bounds in the adversarial setting (Babaioff et al., 2008;Badanidiyuru and Kleinberg, 2009).

Overview of our techniques
Broadly speaking, our results in this paper are obtained using two fundamentally different approaches: Our first approach is based on combinatorial optimization techniques involving LP duality, flows, and cuts, while the second approach is inspired by various proofs of the classical prophet inequality.
Flow approach.While it is always possible to set up a nonlinear mathematical program to characterize the worst-case performance of the optimal online policy against the prophet benchmark, studying such a program is not always tractable.We get around this issue by formulating instead a parametric linear program that characterizes the worst-case performance of the optimal online policy, and studying the program under the worst-case choice of these parameters.This turns out to involve using LP duality to encode the problem as a certain generalized flow problem.The generalized flow problem has a special structure which is amenable to explicit solutions; however, discovering this explicit solution (for f ≥ 1) seems to require a rather circuitous route.
In order to set up our parametric linear program, we start by proving that for every instance of our problem, there must be a harder instance which is made up of two-point random variables This kind of a reduction to a discrete, monotonic setting is intuitive because it ensures that information about the maximum value of the random variables we are observing is revealed as slowly as possible, thereby making the problem hard for an online decision-maker who gets to see these variables sequentially.
Formally achieving this reduction depends on being able to carefully analyze the optimal online policy using a Bellman recurrence that describes how much value the policy can obtain starting from a certain point in the middle of the sequence.We demonstrate that the reduction to the discrete and monotone setting formally depends only upon verifying a few special properties of the recurrence (Lemmas 3 and 4).
Once we have an instance in this monotonic form, we can set up a factor-revealing LP to understand the worst-case competitive ratio; however, the probabilities q t of the two-point variables must remain external parameters, since the constraints that determine the competitive ratio are nonlinear in q t .LP duality reveals another linear program, which we dub "contention resolution with recourse".This is somewhat analogous to contention resolution LPs, where each variable encodes the probability of being at a certain state during the process and we would like to guarantee a certain amount of allocation coverage to each state; however it has some new features: think of each state (i, j), i < j as a directed edge from node i to j, where this state encodes that the decision maker is at time j and the last variable accepted was at time i.The LP has the form of a generalized flow in this digraph (with "leakage" along each edge related to the buyback parameter).We also have capacities on each edge, related to the final policy being implementable in an online fashion, and the goal of the flow problem is find a generalized flow that satisfies certain demands at each node i.The task that remains is to find an explicit solution to this LP for every choice of the probabilities q t .It turns out that in the worst-case, it is safe to assume that all the probabilities q t are small.This allows us to reduce (using standard Riemann sum arguments) the problem of finding a solution to the LP for potentially many different values of q t to solving a single continuous flow problem.This part of the reduction in our analysis is inspired by known "variable splitting" methods for other prophet-inequality-type problems (e.g., Liu et al. (2021)) to reduce the problem to the case with small random variables, but the particular splitting we employ to make the instance harder is unique to our setting.
Explicitly solving continuous linear programs, such as our continuous generalized flow problem, is typically a challenging task.A key insight in solving our continuous flow problem is that if we believe the optimal competitive ratio in a certain range of f is obtained on a particular small instance (for example on 2 variables), then the structure of the general solution should mimic the solution for the small example.More precisely, by grouping variables in a larger instance (or by splitting a continuous interval) into two blocks, we can view the larger instance as a blow-up of the hard instance on 2 variables.We therefore hypothesize that the flow has a block structure, which allows us to make some natural assumptions that reduce the space of possible solutions that we need to examine to find a flow.
After we make these assumptions, one way to prove the existence of the desired flow is to use the max-flow min-cut theorem.We omit this proof (our first) of being able to obtain a factor of 1+f 1+2f for f ≥ 1.Instead, we exploit the intuition obtained from the study of minimum cuts to find a feasible flow explicitly.In the construction of our flows, inspired by the structure obtained at a block level, we posit that certain capacity constraints related to pairs of random variables that are far enough from each other are tight.We translate this tightness restriction to certain differential equations that can aid in partially constructing the continuous flow, and then use the remaining constraints in our continuous program to construct the final flow.
It turns out, however, that our 2-block solution cannot extend to the regime of f < 1.Indeed, we are able to use the Bellman recurrence and its properties to construct examples for which it is impossible to achieve the factor of 1+f 1+2f .In all the examples we construct, the random variables satisfy an important property-if the optimal algorithm accepts a certain variable X i , then it is indifferent between accepting or rejecting X i+1 .While we cannot prove that the worst-case instance on a given number of variables must always have this structure, it seems plausible that this is true.Our examples suggest a sequence of intervals I k where the optimal factor α(f ) is determined by an instance of this type on k variables; however, this remains as a conjecture.
Threshold-greedy approach.Our second approach is a natural algorithm motivated by existing proofs of the classical prophet inequality: We set a threshold T , wait for the first variable above T ; then, if we are holding a variable X i and encounter another variable X j > (1+f )X i , we buyback and swap X i for X j .While this is certainly not the optimal online policy, it has the advantage of being simple and practical, and it requires less prior information than the LP-based algorithm discussed above.Here, we only need to know the distribution of X max (in fact certain special quantiles of X max ), and we don't even need to know the individual distributions D i , their ordering, or even the number of variables n.
The analysis of the threshold-greedy algorithm can be viewed as an extension of the classical prophet inequality; however, with some significant complications.One of our ideas here is that in the usual proof of the prophet inequality, we really only count gains from events where exactly one variable is above the threshold.Somewhat surprisingly, events where multiple variables appear above T cannot be used to obtain a better prophet inequality; however, these events provide extremely useful gains in our problem with buyback.Roughly speaking, if the number of variables above T is k, we reach the actual maximum of these variables after at most k − 1 buybacks, and the reward we obtain is at least Xmax (1+f ) k−1 .The number of variables above T can be approximated by a Poisson random variable, which can be decoupled from the actual maximum X max using the FKG inequality.Finally, we can optimize the performance of the algorithm by carefully choosing the threshold T .
All of these ideas taken together lead to an almost-optimal lower bound of 1 − Θ(f log 1 f ) for small f > 0 (we choose T so that Pr(X max < T ) = f 1+2f ).Furthermore, a separate analysis shows that this simple algorithm achieves a factor of 1/2 + Ω(1/f ) for any fixed f > 1.This is obviously weaker than the optimal factor of 1+f 1+2f , but it shows that even limited prior information can be used to obtain an improvement over 1 2 for any buyback parameter.

Further related work
Buyback and recourse in online allocations The study of the single item buyback setting under adversarial arrivals originates from the work Babaioff et al. (2008), in which they determine the optimal competitive ratio for deterministic integral algorithms using a modification of the greedy algorithm.The follow-up work of Badanidiyuru and Kleinberg (2009) shows that the optimal competitive ratio can be achieved by a randomized algorithm that combines an elegant correlated randomization step with the greedy algorithm.The recent work of Ekbatani et al. (2022) shows an optimal competitive primal-dual fractional algorithm for this problem, which together with a lossless randomized rounding for the single item buyback setting, yields another optimal competitive randomized algorithm under adversarial arrivals.Beyond the single item setting, the buyback setting has also been studied for general matroids (Babaioff et al., 2008;Badanidiyuru and Kleinberg, 2009), online matchings (Ekbatani et al., 2022), and general matroid intersections (Badanidiyuru, 2011).The buyback setting is also the special case of an unconstrained online allocation with a combinatorial valuation Prophet inequalities with submodular combinatorial valuations are studied in Rubinstein and Singla (2017), in which they devise an O(1) competitive algorithm (weaker than our bound) for this general problem.Another related problem is prophet inequalities with overbooking (Ezra et al., 2018;Assaf and Samuel-Cahn, 2000), where the decision maker is allowed to accept more random variables than their capacity.We diverge from this setting by considering a capacity (single item) and a linear cost for cancellations.Finally, the role of "recourse" (or preemption) in resource allocation has been studied in other settings.Examples include, but are not limited to, general online algorithms with rejection power (Azar et al., 2003), online submodular maximization with preemption (Buchbinder et al., 2014), scheduling (Canetti and Irani, 1995), online packing with removal cost (Han et al., 2014), online matching with augmentations (Chaudhuri et al., 2009), and designing online BIC ad mechanisms with cancellations (Constantin et al., 2009).We diverge from all of this work by considering the different setting of the classic prophet inequality with buyback.
Online contention resolutions and LP duality Our generalized flow problem is the dual of a specific factor-revealing LP for the worst-case competitive ratio of the optimal online policy against the prophet benchmark.As mentioned earlier, this flow problem is intimately connected to a generalization of the online contention resolution scheme (CRS) problem when (costly) recourse is allowed.The OCRS problem is rooted in the seminal work of Feige (2006); Chekuri et al. (2011) (in the offline setting) and Feldman et al. (2016) on performing contention resolution for various combinatorial optimization problems, and has since been studied extensively in the CS and OR literature.For example, the work of Alaei (2014) obtains an (almost) optimal OCRS for uniform matroids, which has been further optimized or simplified (Jiang et al., 2022;Dinev and Weinberg, 2023) and extended to reusable resources (Feng et al., 2022).In a different direction, the beautiful work of Lee and Singla (2018) establishes a duality between prophet inequalities with respect to exante relaxation and OCRS, which is, at a high level, is also connected to our LP approach and other LP approaches in the literature on obtaining prophet inequalities or competitive ratios against fluidapproximations in Bayesian online allocation, e.g., Adelman (2004) Feige et al. (2015), adding correlations to prophet inequalities (Immorlica et al., 2020), prophet inequalities for matchings (Ezra et al., 2022a;Pollner et al., 2022), and the prophet secretary setting (Esfandiari et al., 2017;Adamczyk and Włodarczyk, 2018).
Optimum online policy A major part of the literature on prophet inequalities focuses on understanding simple algorithms, e.g., a single threshold, and developing competitive ratios against the prophet benchmark for these algorithms.However, some recent work tries to understand the optimal online policy (mostly from a computational perspective).Some examples are the work of (Anari et al., 2019) that establishes a PTAS in the laminar matroid environment, or the work of (Segev and Singla, 2021) that shows how to obtain a PTAS for the optimum online in a wide range of problems including prophet inequalities and stochastic probing.On the computational complexity side of the problem, finding the optimal online in a combinatorial setting such as matchings is known to be PSPACE-hard (Papadimitriou et al., 2021).Moving beyond computational aspects, there is a line of work that aims to understand the advantages of knowing the order, e.g.Niazadeh et al. (2018) and the more recent work of Ezra et al. (2022b).In our setting, fortunately, the state space of the DP is not exponentially-sized and hence it can be computed exactly using polynomial-time backward induction.

Organization
In Section 2 we study the Bellman recurrence and prove various properties important for the rest of the analysis.In Section 3 we formulate and solve our linear program that lets us obtain our main result-a competitive ratio of 1+f 1+2f when f ≥ 1.In Section 4, we construct an example with Θ log 1 f two-point random variables which proves we cannot achieve a competitive ratio better than 1 − Ω f log 1 f .Section 5 studies the simple threshold-greedy algorithm and shows that it can match this upper bound near f = 0. Finally in Section 6, we summarize our results and briefly discuss some open problems.The appendices include proofs omitted from the body of the paper (Appendix A), and also discuss some supplementary results -the case where X max is bounded (Appendix B), and the case of n = 2 random variables (Appendix C).

Dynamic programming formulation of the optimal policy
In this section, in order to develop a more general understanding of the optimal online policy, we first focus on formulating this policy as a simple dynamic program given the knowledge of the sequence of distributions of arriving random variables.We provide some basic properties of the value function of this DP, which turn out to be helpful in later parts of our analysis.
To obtain the DP formulation, we can derive a recurrence, also known as the optimality equation (Bellman, 1954), that describes what the optimal expected reward is at each point in the process.Suppose that the instance consists of a sequence of random variables X 1 , X 2 , . . ., X n , with known distributions.For t = 0, . . ., n − 1, we define Φ t (x) = E[optimal reward that an algorithm can obtain, given that before observing X t+1 , it holds value x].
In this notation, we ignore any buyback cost that the algorithm paid prior to observing X t+1 .We define Φ n (x) = x, which is consistent with the above since there is no variable X n+1 to observe and the algorithm simply keeps value x.
The key property is that the function Φ t (x) satisfies the following recurrence.
Proof.Upon observing X t , we have two options: either we keep the value x that we held from before, and in the future obtain expected reward Φ t (x), or we pay the buyback cost f x, and accept value X t , in which case we obtain expected reward Φ t (X t ) − f x.Given the value of X t , the optimal algorithm chooses the better of the two options, and in expectation obtains Since we know Φ n (x) = x, we can use backward induction to compute the function Φ t for every t = 0, . . ., n − 1, to determine the optimal algorithm's behavior.Note that whether the algorithm accepts or rejects a given value of X t is determined exactly by which of the two arguments of max{Φ t (x), Φ t (X t ) − f x} is larger.To define the optimal online policy uniquely, let us assume that it buys back x and accepts X t only if Φ t (X t ) − f x > Φ t (x) (but this choice is not essentially important).
We have the following properties.
Lemma 2. The function Φ t for each t = 0, . . ., n satisfies: Proof.We proceed by backward induction on t.The statements are clearly true for Φ n (x) = x.
Given that the properties are true for Φ t , we consider . This is a convex combination of functions which are obtained by taking the maximum of two convex functions (by induction), and hence also convex.
For the second property, again Φ t (x)+f x is non-decreasing by induction, and Φ t (X t ) is a constant for a fixed X t , hence max{Φ t (x) + f x, Φ t (X t )} is non-decreasing, and so is a convex combination of such functions.
Similarly, Φ t (x) − x is non-increasing by induction, and and so is a convex combination of such functions.
Particularly important for the subsequent analysis will be the following: Lemma 3. If at time t, while holding a value x, the optimal policy buys back x and swaps for a variable X t = v, then it will also buyback x and swap for (Note that this is not completely obvious, since holding a higher value is not necessarily more profitable -Φ t is not always monotone.Nevertheless the statement above is true.) Proof.If the optimal policy swaps x for X t = v, this means that Φ t (v) − f x > Φ t (x).This implies that Φ t (v) > Φ t (x), and by convexity Φ t (y) must be increasing for y ≥ v. Hence, for any Finally, we prove one more important lemma: Proof.From the recurrence, For every fixed value of X t , we have is a non-decreasing function by Lemma 2, and hence taking a maximum with a constant still makes it non-decreasing.By taking expectations, we obtain the statement of the lemma.
Given the DP formulation, our ultimate goal is to compare the performance of the online optimum, that is, Φ 0 (0) with that of the offline optimum, that is, E[max i X i ].In the next section, we switch to a linear programming formulation to analyze the worst-case performance of this DP policy with respect to the prophet benchmark.

Competitive ratio analysis via linear programming
In general, one can set up a nonlinear exponential-size mathematical program to characterize the worst-case performance of the optimal online policy against the prophet benchmark -in the same spirit as characterizing the worst-case competitive ratio as the equilibrium of a zero-sum game played between an algorithm who picks an online policy and an adversary who picks the sequence of distributions.In this section, instead of taking this naive approach, we show how to develop a parametric linear program, where the parameters somehow encode a lower-dimensional "sufficient statistic" of the worst-case instance.We then show how this LP characterizes the worst-case ratio of the optimal online policy and the offline optimum under the worst-case choices of these parameters.
To develop this parametric LP, first we need to describe a reduction that converts any instance to a restricted form, such that the worst-case competitive ratio does not change when restricting to these instances.

Reduction to a monotonic sequence of 2-point distributions
We claim that without loss of generality, we can consider a sequence of random variables Moreover the probability of each random variable being the last nonzero one can be made arbitrarily small.More precisely, we prove the following.
Theorem 5.For any ǫ > 0, if a competitive factor α(f ) can be achieved for any instance of the buyback problem with random variables X i = v i • Be(q i ), where Be(q i ) is a Bernoulli 0/1 random variable with expectation q i , 0 ≤ v 1 ≤ . . .≤ v n , q i n j=i+1 (1 − q j ) ≤ ǫ, and q 1 = 1, then the same competitive factor α(f ) can be achieved for any instance with nonnegative random variables of finite expectation.
We proceed in a sequence of simple reductions.The first step is to discretize the random variables.See Appendix A.1 for the proof of the following lemma.Lemma 6.If a competitive factor α(f ) can be achieved for any instance with discrete nonnegative random variables (with finitely many possible values), then the same competitive factor α(f ) can be achieved for any instance with nonnegative random variables of finite expectation.
The second step is to reduce the random variables by "splitting" to 2-point distributions.
Lemma 7. If a competitive factor α(f ) can be achieved for any instance with scaled Bernoulli random variables (with values in {0, v i } for some v i > 0), then the same competitive factor α(f ) can be achieved for any instance with discrete nonnegative random variables.Proof.Assume that D i is supported on a finite set of values i > 0 (we can assume that 0 is included, possibly with probability 0).We define k new random variables X 1 i , . . ., X k i , where X j i has support {0, v j i }, and the probabilities are chosen so that max 1≤j≤k X j i has the same distribution as X i : this can be implemented by setting Pr[ . We denote the distribution of X j i by D j i .
We claim that the new instance with distributions (D 1 1 , D 2 1 , . . ., D 1 2 , D 2 2 , . . ., D 1 3 , . . ., . . ., D 1 n , . ..) is equivalent to the original instance (D 1 , . . ., D n ).Consider an optimal policy P ′ for the new instance: we transform it into a policy P for the original instance, by observing what decisions P ′ makes on each block of variables X 1 i , . . ., X k i , and accepting X i if P ′ accepts any variable in the block.
Note that if P ′ accepts a variable in the block, it will never accept another variable in the same block (since the following values can only be smaller).Therefore, in a block of k variables, P ′ has k + 1 options: to accept one of the k variables, or no variable.We compare the future expected reward of these k + 1 options by comparing Φ t (x) and where t is the time at the end of the block.By Lemma 3, if Hence, if there is any profitable value to pick, then the most profitable value to pick is the largest one that appears, and hence the optimal policy in this case picks the first nonzero value in the block if any appears.
The largest variable among X 1 i , . . ., X 1 k is equal to X i and hence the two policies provide the same profit.Conversely, we can also convert any policy P for the original instance to an equivalent policy P ′ for the new instance, by accepting the largest variable in the i-th block if P accepts X i (although this direction is not needed to prove the lemma).
The third step is to rearrange the sequence of 2-point distributions, so that the values v i are non-decreasing.We prove that this can only make the instance more difficult.
Lemma 8.If a competitive factor α(f ) can be achieved for any instance with scaled Bernoulli random variables (with values in {0, v i } for some v i > 0), and 0 ≤ v 1 ≤ . . .≤ v n , then the same competitive factor α(f ) can be achieved for any instance with scaled Bernoulli random variables.
Proof.Consider an instance I 1 with variables X i = v i Be(q i ) and X i+1 = v i+1 Be(q i+1 ), v i < v i+1 , and the same instance with the ordering of variables X i , X i+1 swapped, which we call I 2 .Suppose we hold value x before observing the variables X i , X i+1 , and consider 2 cases: Case 1: The optimal policy P 1 for I 1 does not accept X i even if nonzero.Then, the value of this variable is irrelevant, and we can define P 2 to accept X i+1 based on whether P 1 accepts X i+1 , which yields an equivalent policy.
Case 2: The optimal policy P 1 for I 1 accepts This yields a policy which performs at least as well as P 1 in each of the 4 possible pairs of values of the variables X i , X i+1 .The only non-trivial case is when X i = v i , X i+1 = v i+1 , and P 1 accepts X i and then rejects X i+1 .In this case, P 1 obtains a value of Φ i+1 (v i ) − f x which is less than or equal to Φ i+1 (v i+1 ) − f x, the value P 2 obtains.If P 1 accepts X i and then accepts X i+1 , P 2 performs better because it pays lesser in buyback costs.In the other 3 when at least of X i or X i+1 is zero, both policies perform the same.Therefore, the expected reward of P 2 is better than that of P 1 .
By this argument, if there is any adjacent pair of variables where v i > v i+1 , we can swap them and obtain an instance where the optimal reward can only go down.By repeating this swapping operation, we can sort the original instance and obtain an instance where v 1 ≤ v 2 ≤ . . .v n .The argument above shows that the monotonic instance is at least as difficult as the original instance.
The last reduction step is not necessary for the LP formulation, but it will be useful later: we can split the random variables further, to obtain a sequence such that the probability of any random variable being the last nonzero variable is arbitrarily small, and we can also assume that q 1 = 1.Lemma 9.For any ǫ > 0, if a competitive factor α(f ) can be achieved for any instance with scaled Bernoulli random variables (1 − q j ) ≤ ǫ, and q 1 = 1, then the same competitive factor α(f ) can be achieved for any instance with scaled Bernoulli variables Proof.Consider any instance with scaled Bernoulli variables X i = v i Be(q i ), 0 ≤ v 1 ≤ . . .≤ v n .We can assume that q 1 = 1 and q i < 1 for i > 1: If q i = 1 for some i > 1, we can remove the variables preceding X i , since they are irrelevant.And if there is no variable with q i = 1, we can add a dummy variable at the beginning, distributed as 0 • Be(1).
We replace each variable X i , i > 1 with a sequence of variables X (1) i , . . ., X (ℓ) i distributed as v i Be(ǫ i ) so that q i = 1 − (1 − ǫ i ) ℓ , and ℓ large enough that ǫ i ≤ ǫ.Note that X i has the same distribution as max 1≤j≤ℓ X (j) i .Finally, we replace X 1 with a sequence of variables where X (1) 1 = v 1 (deterministic), and 1 has probability at most ǫ of being nonzero, and The new instance that we produced is clearly equivalent, since every variable X i is replaced with a block of variables with the same 2-point support whose maximum is distributed as X i .

LP formulations
Let us now formulate a parametric linear program that captures the worst-case ratio of the optimal online policy to that of the prophet benchmark by using the recurrence for the optimal online policy (Lemma 1).We assume that the instance is in the form provided by Theorem 5: a sequence of 2point distributions, X i = v i Be(q i ), and 0 ≤ v 1 ≤ v 2 ≤ . . .≤ v n .Lemma 1 then takes the following form: (1) We will formulate an LP with the following variables: Φ i,t , representing the value of Φ t (v i ), and v i representing the value v i itself.We also use Φ 0,t to represent Φ t (0) (and it is convenient to assume that v 0 = 0).Note that Φ i,t = Φ t (v i ) obviously depends on the v i 's, but in the LP formulation these will be independent variables; the connection between them will be provided by the constraints.The LP constraints are chosen to be valid for the optimal online policy: Another constraint is that Φ i,n = v i : at the end of the process, the reward is the value that we are holding.The values should satisfy 0 ≤ v 1 ≤ . . .≤ v n .And finally, we have a constraint normalizing the offline optimum to be 1: (1 − q j ) = 1.Note the non-linearity in the parameters q j : this is why cannot include them as variables in the LP, but leave them as external parameters.We now formulate our first LP.
Lemma 10.Given q 1 , . . ., q n ∈ [0, 1], the following LP gives the best possible factor that an online policy can achieve on an instance in the form for an optimal online policy on the instance X i = v i Be(q i ), Φ i,t = Φ t (v i ) defines a feasible solution by Lemma 1.Therefore, the LP optimum is at most the factor achieved by an optimal online policy.
Conversely, for any feasible solution, consider the instance X i = v i Be(q i ).By (1) and backward induction, the optimal expected reward if holding value v i at time t is at most Φ i,t .The expected overall reward of the policy is at most Φ 0,0 .This shows that the worst-case factor achieved by an optimal online policy is at most the LP optimum.
We now reformulate the LP a little bit to obtain a simpler form.
Lemma 11.Given q 1 , . . ., q n ∈ [0, 1], the following LP gives the best possible factor that an online policy can achieve on any instance in the form X i = v i Be(q i ): Proof.This LP is obtained from Lemma 10 by substitution: ∆ i,t = Φ i,t−1 − Φ i,t for 0 ≤ i < t ≤ n, and using the constraints Φ i,n = v i , Φ 0,n = 0.Moreover, we drop the constraint v i+1 ≥ v i .We claim that this does not change the value of the LP: For an instance with arbitrary values v 1 , . . ., v n ≥ 0, Lemma 8 proves that the worst case performance of the optimal online policy is attained when the values are ordered in ascending order, and the expression n i=1 v i q i n j=i+1 (1 − q j ), being equal to the expectation of the last nonzero X i that appears in the given order, only increases when the values are arranged in an ascending order.Now we appeal to LP duality to obtain another LP which will be much more useful for us.
Lemma 12.Given q 1 , . . ., q n ∈ [0, 1], the following LP gives the best possible factor that an online policy can achieve on any instance in the form Proof.Starting from the LP in Lemma 11, we introduced a dual variable y i,t for the constraint (1 − q j ) = 1.We wrote down constraints for each 0 ≤ s < t ≤ n corresponding to ∆ s,t , and for each 1 ≤ t ≤ n, corresponding to v t .Note that the constraint corresponding to ∆ 0,t looks somewhat different from the others, because ∆ 0,t appears in the objective and hence the RHS is nonzero.
Finally, we rewrite this LP in a slightly cleaner form, by using the substitution x s,t = q t y s,t , and also defining qt = q t n j=t+1 (1 − q j ). (2) Note that this is exactly the probability that X t = X max (in the monotonic instance).The LP can be then written as follows.
max Θ s.t This LP has a natural interpretation: x 0,t represents the probability that X t is the first variable that we accept, and x s,t for 1 ≤ s < t represents the probability that at some point we swap X s for X t .The first constraint expresses the fact that by the time we get to X t , there is probability 1 − t−1 i=1 x 0,i that we have not accepted any variable yet; and this is independent of the probability that X t = v t , which is q t .Therefore, q t (1 − t−1 i=1 x 0,i ) is an upper bound on the probability that we accept X t .Similarly, s−1 i=0 x i,s − t−1 j=s+1 x s,j is the probability that we accepted X s and haven't discarded it yet by the time we get to X t .Therefore, q t ( s−1 i=0 x i,s − t−1 j=s+1 x s,j ) is an upper bound on the probability that we swap X s for X t .We call these constraints online implementability constraints.
The third constraint expresses the fact that the total probability of accepting variable X t minus the buyback cost of possibly discarding it should be at least Θ times the probability that X t = X max .
If we satisfy this constraint, then we recover at least a Θ-fraction of the contribution of X t to the offline optimum, and hence this is a sufficient condition to achieve a factor of at least Θ.It is not obvious that this condition is also necessary, but Lemma 12 shows that this is in fact a necessary condition: If a factor of Θ can be achieved, then it's also possible to satisfy this condition.
This LP can be viewed as a generalized flow problem, with x s,t representing the flow from node s to node t, each node extracting a certain demand proportional to Θ, and a certain fraction of the flow along each edge being lost due to the buyback cost.In the following, our goal is to prove that this LP has a solution for f ≥ 1 and Θ = 1+f 1+2f , and thereby obtain a prophet inequality for the buyback problem.

Solving the LP: intuition
We describe in this section intuition which will lead us to a feasible solution for the LP.
• If we want to match the hardness factor 1+f 1+2f given by the worst-case 2-variable example, our solution certainly has to solve this example optimally.The probabilities in this instance are q1 = f 1+f and q2 = 1 1+f , and we can find an optimal solution of the corresponding dual LP, which gives an algorithm to solve 2-variable instances in this form.However, this still does not tell us how to solve a general instance.
• If we want to solve every instance in a manner consistent with this example, we hazard a guess that the solution should have a 2-block structure where the two blocks correspond to variables of total probability mass ℓ 1 = s∈A qs = f 1+f , and ℓ 2 = t∈B qt = 1 1+f .(As a special case, the 2 blocks could be the 2 variables in the hardness instance.)Crucially, the flow at the level of these blocks should be consistent with the optimal solution for the hard instance on 2 variables.Furthermore, there should be no flow inside each block, since this would cause us to incur an additional buyback cost which we cannot afford.Therefore, buyback edges should go only from A to B, and not within each block.This leads to constraints that imply the following aggregate flow between the initial node and the two blocks: .
• Note that the probabilities qt in a given instance do not necessarily allow us to form blocks of exactly the desired sizes ℓ 1 and ℓ 2 .However, we can always split variables into identical independent copies with smaller probabilities, in a way similar to Lemma 7, in order to obtain the desired block size, so this is not an issue.
• Once we postulate that the solution should have this block structure, it remains to find the specific flow from the initial node 0 (x 0,s for s ∈ A ∪ B), and the flow between the two blocks (x s,t for s ∈ A and t ∈ B).The interpretation of x 0,s is that it describes the probability of X s being selected as our first pick.The interpretation of x s,t is that it describes the probability of swapping from X s in block A to X t in block B. Since at the beginning of the process, we can select variables without the online implementability constraint imposing a significant restriction, x 0,s for s in the first block is not very restricted.However, we need to be careful not to make x 0,s too large because that would hamper our ability to select variables in the second block.
• No buyback happens within the first block, and hence x 0,s is the only source of acceptance probability in the first block.This means that we should have x 0,s ≥ Θq s in the first block.
On the other hand, a variable in the second block will never be discarded, and it can be either selected as our first pick or via buyback.Therefore, we have x 0,t ≤ Θq t in the second block.
It is natural to assume that x 0,s should be decreasing in s.We also hypothesize that in the second block, we should accept any nonzero variable if we have not accepted anything so far; i.e., the online implementability should be tight: x 0,t = q t (1 − t−1 s=1 x 0,s ).This, together with the block flow x 0,B , determines x 0,t for t ∈ B. For s ∈ A, we choose x 0,s to depend linearly on s in a way that produces the desired block flow x 0,A .
• Finding the interblock flow x s,t is the trickiest part of our solution.We have several constraints to satisfy here, and the more restrictive constraints appear towards the end of the second block.
Here, we have the maximum deficit Θ − x 0,t , while the online implementability constraint simultaneously takes its most restrictive form.Therefore, we should try to send as much flow as possible to the end of the second block, from every point s ∈ A. Given these constraints, we send a flow x s,t = σ(s)q s qt to a certain interval τ (s) ≤ t ≤ n, where σ(s) is determined by the online implementability constraint and τ (s) is chosen so that τ (s)≤t≤n x st = 1 1+f (x 0,s − Θ)q s is exactly the surplus that we allowed to send from s ∈ A. It remains to verify that the choices described here form a feasible solution of the dual LP.

Solving the LP: technical details
Here we show rigorously how to describe the LP solution.First, note from our earlier discussion that it is safe to assume that qi ≤ ǫ for any fixed ǫ > 0. The first step in solving the LP is to reduce the problem to a continuous version.Intuitively, we can imagine that we have a continuum of random variables, each of which attains a nonzero value with infinitesimal probability.To begin the reduction, let us write down x 0,s = H(s) qs , and x s,t = G(s, t) qs qt for some positive functions H and G.Note that the LP we're interested in now has the form: The next lemma demonstrates that it suffices to analyze the natural continuous version of the above LP: Lemma 13.If there exist non-negative, continuous functions h and g with domains [0, 1] and [0, 1] × [0, 1], and h(x) > 0 such that The proof of this lemma uses a standard Riemann sum argument, and the fact that we can assume all the qi are less than ǫ.We postpone the full proof to Appendix A.2. Now we proceed to an explicit description of such functions h and g for f ≥ 1.Our choices are motivated by the intuition described in Section 3.3.In the following, we denote by Θ = 1+f 1+2f the desired approximation factor.Let us define Furthermore, define where we think about A as the set of vertices which initially receive a surplus, and B the set of vertices which are initially at a deficit.The function h is chosen in order to satisfy the online implementability constraint, th(t) ≤ 1 − t 0 h(x)dx, and in fact tightly for t ≥ f 1+f , and with some slack for t < f 1+f .First we verify the tightness for t = f 1+f : We have as desired.For t > f 1+f , h(t) is defined exactly so that the constraint is satisfied tightly: For s ≤ f 1+f , we can verify the constraint by observing that where the inequality follows from the fact that as long as f ≥ 1.
Note that we also chose h in a way that ensures that the aggregate flows at the level of blocks are correct: and we can also calculate Together, these statements immediately imply , so we can use the surplus from A to make up for the deficit in B, with the flow matching the aggregate flow discussed in Section 3.3.
To implement this plan, let g(s, t) be 0 unless s ∈ A and t ∈ B. In this case, the third constraint in our integral formulation of the problem reduces to the following pair of requirements: Furthermore, we can relax the second constraint in the integral formulation of the problem and try to find g which satisfies a third requirement: We can think about the existence of such a function g as the question of whether there exists a flow of value with edge capacities: • c 0,s = surplus of node s after initial flow = h(s)−Θ 1+f .
• c s,t = f h(s)+Θ 1+f .Here, we demonstrate the existence of such a g directly, letting Note first of all that, so we've verified our first requirement.Furthermore, observing that τ (s) = 2Θ h(s)+Θ increases from f 1+f to 1 as s increases from 0 to f 1+f , it follows that there exists r such that t = τ (r) and Plugging in our expression for h(s), we calculate that Note on the other hand that we also have: This verifies the second requirement on g.Finally, note that if f ≥ 1, clearly g(s, t) ≤ f h(s)+Θ 1+f , so the third requirement is clearly satisfied.We have thus found, for f > 1, h and g which satisfy the continuous form of our LP with Θ = 1+f 1+2f , so we conclude that we can obtain a competitive ratio of 1+f 1+2f for the prophet inequality with buyback if f ≥ 1. Remark 1.As mentioned earlier in the introduction, there is a simple example with only 2 variables, such that no online policy can obtain better than 1+f 1+2f fraction of the prophet benchmark.So the competitive ratio bound obtained in this section for f ∈ [1, +∞) is tight.

Competitive ratio upper-bounds for small buyback factors
In the previous section, we showed that the competitive ratio of 1+f 1+2f can be achieved by the optimal online policy when f ∈ [1, +∞).Also, we already saw an example with 2 random variables showing that a factor better than 1+f 1+2f cannot be achieved (Remark 1).Therefore, it is tempting to conjecture that this bound is the best possible for all values of the buyback parameter f .However, as we show next, even with n = 3 variables, we can construct hard instances where any algorithm obtains a factor that is strictly worse when 0 < f < 1.

A counterexample with n = 3 variables
Building on the intuition for the worst-case instance with two variables, we consider the following instance with three variables for x > 1 + f : 0 otherwise Later, we pick x to minimize the competitive ratio. 4The expected value of the prophet benchmark is easy to calculate: To find the expected reward for the optimal online algorithm we can solve the recurrence described in Lemma 1 to calculate Φ 0 (0): Hence, no online algorithm can get a competitive ratio better than The minimum competitive ratio is obtained when and the instance is: Notice that the instance only makes sense when This happens precisely when f ≤ 1. (Furthermore, x > 1 + f only when f < 1.) Lemma 14.For all 0 < f < 1: Proof.Note that The above instance suggests that as f becomes smaller, the adversary may require a higher number of random variables to push the optimal online policy to achieve its worst-case competitive ratio.We conjecture that this instance is indeed the worst-case instance (with n = 3 variables) for the competitive ratio of the optimal online policy against the prophet benchmark, and leave it without a proof.With the help of a computer-aided search, we also tried to find the worst case instance for n = 4, and we observed that there exists an instance where, for any choice of 0 < f < 1 3 , the competitive ratio is strictly worse than the ratio obtained by .
Motivated by this observation, in the rest of this section we turn our attention to the asymptotic regime when f → 0, and aim to understand the behavior of α(f ) in this small buyback regime.

Construction of a multistage counterexample for small buyback factor
Considering the recurrence describing the optimal online policy, it becomes easier to analyze examples with multiple variables.We already saw an example with 2 random variables showing that a factor better than 1+f 1+2f cannot be achieved, and an example with 3 random variables showing that only a strictly lower ratio can be achieved for f ∈ (0, 1).Here we are interested in the behavior of the competitive ratio for f → 0. We present a counterexample which proves the following.
Theorem 15.Given a buyback factor 0 < f < 1 16 , there is no algorithm for the buyback problem which achieves a competitive ratio better than 1 − 1 2 f log 2 1 16f .
This demonstrates an interesting phenomenon: Recall that for f = 0, we can achieve a competitive ratio of 1, simply by performing buyback whenever possible.For small f > 0, we might aim to achieve a competitive ratio of 1 − cf for some fixed constant c > 0. However, the counterexample shows that this is impossible, and the optimal competitive ratio α(f ) as a function of the buyback factor f satisfies α ′ (0) = −∞.As we prove in Theorem 20, the behavior of the optimal competitive ratio for small f > 0 is indeed 1 − Θ(f log 1 f ).

The construction
We will construct an instance in the following form: X i = a i with probability p i , or 0 with probability 1 − p i , and a 1 < a 2 < a 3 < . . .< a n .What remains to be chosen are the values a i and the probabilities p i .Note that in this case the recurrence of Lemma 1 has the following form: Here we used the fact that Φ t (x) ≥ Φ t (0) − f x which is the second property in Lemma 2. We choose a n = 1 and work our way backwards to construct the values a n−1 , a n−2 , . . .Given a t , we define a t−1 as the point where We note that this is exactly the value for which the optimal algorithm would be indifferent between accepting X t = a t and rejecting it, given that it's currently holding a t−1 . 5e also set p i = 1/2 for all i; hence We have no reason to believe that this is the worst-case choice; however, it simplifies the computations and gives the result that we want (up to constant factors).By backward induction, we claim the following facts.
Lemma 16.For k = 0, 1, 2, . .., First, by definition we have Φ n (x) = x on the interval [0, a n ] (and in fact everywhere), which is the only fact we claim for Φ n .
Assume now that a n−k and Φ n−k is given, for k ≥ 0. We define a n−k−1 as the point at which Here we use the inductive hypothesis saying that on the relevant interval [0, Therefore, we can rewrite the equation above as From here, we get Recall that a n−k−1 is the breakpoint where the two expressions in max{Φ n−k (x), Φ n−k (a n−k ) − f x} are equal.This means that for x ∈ [0, a n−k−1 ], the second term defines the maximum, and the function By the inductive hypothesis, on the same interval we have Combining the two equations, for x ∈ [0, a n−k−1 ], we obtain using the relationship between a n−k and a n−k−1 from above (which we already proved).In particular, we have k+1 )a n−k−1 as desired.Finally, the equation above gives , which completes the inductive step.Now we can complete the construction and prove Theorem 15.
Proof of Theorem 15.We set n = ⌊log 2 1 f ⌋.We have Now let us analyze the performance of the optimal algorithm, which is We compare OPT and ALG as follows: It remains to observe that from Lemma 16, we also have )OPT, considering that OPT < 1.In the next section, we show how a simple algorithm achieves the asymptotic optimal competitive ratio of 1 − Θ f log 1 f , establishing that the bound in Theorem 15 is (almost) tight.

The threshold-greedy algorithm
In this section, we present a simple algorithm which achieves a competitive ratio of 1−Θ(f log 1 f ) for small f > 0, matching the example presented in Section 4.2 up to constant factors.This algorithm also gives a constant improvement over 1/2, for any fixed buyback factor f > 0. The algorithm is a simple extension of the standard threshold algorithm: After setting an initial threshold, and accepting some variable above the threshold, we buy back and replace the current variable if and only if this brings positive profit.
The algorithm.
• Set an initial threshold T (to be determined).
• If a variable X i arrives and X i ≥ T , then select X i .
• Given a currently selected value X i , if a variable X j arrives and X j > (1 + f )X i , then discard X i and select X j .
An advantage of this algorithm is that is order oblivious, and as will see T will depend only on the distribution of X max .

General analysis approach
Let us first develop some general formulae to analyze this algorithm.
Lemma 17. Suppose that the algorithm selects at least k variables and the k-th selected variable is X (k) .Then the profit of the algorithm is at least Proof.By induction: If the first selected variable is X (0) , then the profit at that point is clearly X (0) .Observe also that profit never goes down after future buybacks, so the profit at the end is at least X (0) .
Assuming that the k-th selected variable is X (k) and the variable selected just before that was X (k−1) , by induction our profit after selecting X (k−1) was p k−1 ≥ X (k−1) /(1 + f ) k−2 .After the last buyback, since we select X (k) only if X (k) > (1 + f )X (k−1) , and we pay f • X (k−1) for discarding X (k−1) , our profit is Next, consider the classical analysis of the prophet inequality, which uses the fact that our algorithm achieves expected value at least T We claim the following extension of this bound.
Lemma 18.The expected profit of our algorithm is where S ′ = j 1(X ′ j ≥ T ) and X ′ j is an independent copy of X j .
Proof.First, let us note that if the maximum is X i , and the algorithm makes θ i picks before X i , Lemma 4 implies that in the case that the maximum is above T , the algorithm obtains at least max T, X i (1 + f ) θ i .
It follows that decreasing function of (X j : j = i), and 1(X i = X max ) is also a decreasing function of (X j : j = i).
So we can apply the FKG inequality on the product space of (X j : j = i), for every fixed value of X i , and replace the variables (X j : j = i) in the first expression by independent copies (X ′ j : j = i).Using the notation S ′ = j 1(X ′ j ≥ T ), we obtain We will also use the following useful comparison between summations of independent Bernoulli variables and a Poisson variable.
Lemma 19.If S ′ = n j=1 1(X ′ j ≥ T ), for independent random variables X ′ 1 , . . ., X ′ n , and Pr[∀j; X ′ j < T ] = e −λ , then S ′ is stochastically dominated by a Poisson random variable with mean λ.In particular, for any ℓ ≥ 1, and let Y i be a Poisson random variable with mean

Analysis for small buyback factors
Consider now the case of small f (e.g.f ∈ (0, 1/2)).We can use Lemmas 18 and 19 to get a good bound on the algorithm's performance in this regime.We note that this bound beats the optimal randomized algorithm for the buyback problem with no information, for all f (Badanidiyuru and Kleinberg, 2009). 6 Theorem 20.There is an algorithm which achieves a competitive ratio of at least We remark that the last expression in the theorem follows from plugging in Proof.From Lemmas 18 and 19, we get: If Z is a Poisson random variable with mean λ (with e −λ = Pr[X max < T ]), and 0 ≤ A ≤ 1 is any random variable (possibly correlated with X 1 , . . ., X n ), then where 1+f follows from an elementary computation for the Poisson random variable Z. Since guarantees a competitive ratio of ce −λf 1+f .Letting x = e −λ , and solving for c, we get The bound presented here is worse than the classical prophet inequality when f → ∞ (in which case our bound tends to 1/3) as we are aiming primarily to match the asymptotic upper bound we obtained of 1 − Θ f log( 1 f ) .
which means we are guaranteed a competitive ratio of We remark that with A = 1(X max ≥ T ) 1(Z ≤ 1), we could have obtained another, different lower bound on the competitive ratio, better for medium values of f .Another reasonable option is A = c 1 1(X max ≥ T )c Z 2 , which can recover approximately the correct result both near f = 0 and f = ∞.

Analysis for large buyback factors
In the following, we aim to obtain a constant improvement over 1/2 for any fixed buyback factor using the simple threshold-greedy algorithm.We consider the algorithm in 3 different cases.

Let
If β is relatively small, then we prove that the basic algorithm without any buybacks already achieves an improvement over 1/2.
, and no buybacks, achieves Proof.We can assume that β < 1 2 , otherwise the statement of the lemma follows from the classical prophet inequality.Given the definitions of α, β, we can write We consider first α ≥ 1−2β 8−9β and the algorithm with threshold T .In this case, the classical bound already provides an improvement over 1/2: For β < 1, this is an increasing function of α, and hence it is minimized for this case at α = 1−2β 8−9β : The second case is that α ≤ 1−2β 8−9β (intuitively, X max is concentrated around T ), in which case we choose T ′ = (1 − 3α)T as the threshold.In this case, the classical bound again gives which is a decreasing function of α.In this case, it's minimized at α = 1−2β 8−9β , which gives, after some simplification, that T ′ E[Xmax] ≥ 1 7 (5 − 3β).We also assume β < 1 2 , which implies Hence, T ′ is larger than E[X max − T ′ ], and the expression T Its expectation is αT , and hence by Markov's inequality, Pr[T − X max > 3αT ] < 1/3.We conclude, Again, this is a decreasing function of α, minimized at α = 1−2β 8−9β , and we obtain 5.3.2Case 2: some value above the median but not enough for buyback Here we deal with a case where E[(X max − T ) + ] is rather large, but E[(X max − (1 + f )T ) + ] is not; i.e., we still do not have enough incentive to perform any buybacks.In this case, we gain due to certain slack in the classical proof of the prophet inequality.
Then the algorithm with T as a threshold (and no buyback) achieves Proof.First we observe two bounds on E[X max ]: .
Finally, we use the fact that n (since all the terms in the sum are positive, and X max must be one of them).Also, from above, E[X max ] ≥ T 2(1−β) .To summarize, the gain over

Case 3: high values for buyback
Finally, we consider the case where E[(X max − (1 + f )T ) + ] = γ • E[X max ] is significant.This is actually the only case where we leverage the possibility of buyback, and our analysis only takes advantage of the first buyback event.Obviously, our bounds are not the best possible.However, our intuition is that for large f , multiple buybacks are not very useful.We prove the following.

Combining the cases
Corollary 23.1.There is an algorithm for the prophet inequality with buyback factor f which achieves Qiqi Yan.Mechanism design via correlation gap.In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pages 710-719.SIAM, 2011.

A Omitted Proofs
A.1 Proof of Lemma 6 Suppose that there exists an instance with distributions D 1 , . . ., D n , such that the best achievable competitive ratio is strictly less than α(f ); say less than α(f ) − 3δ where δ > 0. We define discrete distributions D ′ 1 , . . ., D ′ n as follows: Given X i ∼ D i , X max = max 1≤i≤n X i , let M = E[X max ] and M ′ = inf{v : E[(X max −v) + ] ≤ δM }.Since lim v→∞ E[(X max −v) + ] = 0, the value M ′ is well defined.We also define V = {δ(1 + δ) k M : 0 ≤ k ≤ ⌈log 1+δ M ′ δM ⌉}, and X ′ i = min{v ∈ V : v ≥ min{X i , M ′ }}; in other words, X ′ i is obtained by rounding X i up to the nearest value in V (or rounded down, if v > M ′ ).It is obvious that X ′ i is a discrete random variable.Let us denote the distribution of X ′ i by D ′ i .
We claim that X ′ max = max 1≤i≤n X ′ i does not differ significantly from X max .Capping each variable X i at M ′ decreases E[X max ] by at most E[(X max − M ′ ) + ] ≤ δM .Hence, E[X ′ max ] ≥ (1 − δ)E[X max ].On the other hand, rounding up to a value in the form δ(1 Assume now that there is a policy to achieve a competitive ratio of α(f ) for any discretized instance.Then we can apply this policy to any instance, with values discretized to V as above.Assume that we accept value X ′ τ and pay buyback cost c ′ in the discretized instance, so that E In the original instance, we accept value X τ and pay buyback cost c.Since values were rounded up by at most a factor of 1 + δ and an additive error of δM , we have X ′ τ ≤ (1 + δ)X τ + δM .The buyback cost in the discretized instance satisfies c ′ ≥ c, because the original values are higher, except for values above M ′ -but a variable of value above M ′ will never be bought back in the discretized instance.Hence, the reward in the original instance is . This contradicts the assumption that this factor cannot be achieved for the original instance.
whenever |x − y| ≤ ǫ.Such an ǫ exists by the continuity (and hence the uniform continuity) of h and g.Finally we will assume for convenience that ǫ is small enough that M ǫ ≤ δ.Now, let us introduce the notation r j = qj q j = j i=1 qi , and let us try setting H(t) = h(r t ) and G(s, t) = g(r s , r t ). Then, Similarly, G(s, t) qt q t = r t g(r s , r t ) ≤ h(r s ) + G(s, j)q j + 3δ .
that E[X max ] = 1, and note that: Setting T = 1+f 1+2f , we obtain a competitive ratio of 1+f 1+2f .Alternatively, note that (This follows from the convexity of f T (1−T ) (1+f )T −1 at T = 1+f 1+2f ).So choosing C The setting with n = 2 random variables In this section we prove the optimal online algorithm can get 1+f 1+2f E[X max ] when there are 2 random variables.
The first inequality comes from the fact that max is always greater than any convex combination and the second inequality holds point-wise for any realization of X 2 .
So none of the constraints we would like are exactly satisfied, but our choices do satisfy the constraints approximately.