Regulation of Algorithmic Collusion

Consider sellers in a competitive market that use algorithms to adapt their prices from data that they collect. In such a context it is plausible that algorithms could arrive at prices that are higher than the competitive prices and this may benefit sellers at the expense of consumers (i.e., the buyers in the market). This paper gives a definition of plausible algorithmic non-collusion for pricing algorithms. The definition allows a regulator to empirically audit algorithms by applying a statistical test to the data that they collect. Algorithms that are good, i.e., approximately optimize prices to market conditions, can be augmented to collect the data sufficient to pass the audit. Algorithms that have colluded on, e.g., supra-competitive prices cannot pass the audit. The definition allows sellers to possess useful side information that may be correlated with supply and demand and could affect the prices used by good algorithms. The paper provides an analysis of the statistical complexity of such an audit, i.e., how much data is sufficient for the test of non-collusion to be accurate.


INTRODUCTION
Algorithms are increasingly being used to price goods and services in competitive markets.Several recent papers have shown that in certain settings, some configurations of certain pricing algorithms can find and maintain supra-competitive prices when in competition with each other [Asker et al. 2023;Assad et al. 2020;Banchio and Skrzypacz 2022;Calvano et al. 2020].As a result, (a) regulators may be concerned about how the risk of algorithmic collusion can be mitigated and the consistency of this regulation with legal standards for collusion, (b) individual sellers may be interested in algorithms that provably do not collude, and (c) third-party platforms.like AirBnB and eBay, may want to only recommend pricing algorithms to their sellers that will not risk incriminating the platforms themselves for price fixing [Harrington 2022].Several papers have proposed ways to change competition and antitrust law in response to the potential risks brought by algorithmic collusion [Beneke and Mackenrodt 2019;Gal 2023;Harrington 2018].This paper complements these proposals with a method for regulating algorithmic collusion from data.
For individual adoption of pricing algorithms, our test for algorithmic collusion parallels the role that overt communication plays in the modern legal theory of (non-algorithmic) collusion.Under current, if controversial, understandings of American antitrust law, an express agreement (in the form of overt communication) is a prerequisite to establishing liability under the Sherman Act. 1ourts cannot read the minds of the sellers to understand their pricing strategies and therefore prefer to rely on evidence of overt illegal coordination [Harrington 2018].Similarly, in the setting of algorithmic collusion, we might not know some of the fundamentals that guide a seller's pricing algorithm (e.g., seller's costs or information about the demand), but we can determine whether an outcome is competitive for some fundamentals.We refer to such outcomes as plausible non-collusion.What is left out is outcomes that are non-competitive for any fundamentals, i.e., regardless of what is in the minds of the sellers.We argue subsequently that there is no loss using our test for algorithmic collusion to forbid algorithms that obtain such non-competitive outcomes.
For third-party platforms of online marketplaces or algorithm vendors who are recommending or selling pricing algorithms to sellers in a market, the coordination of a third party on an algorithm that is known to obtain supra-competitive prices is illegal by current standards [Harrington 2022].Our test for algorithmic collusion identifies such problematic algorithms.
The possibility of algorithmic collusion creates greater risk of supra-competitive prices.Thus, regulators and lawmakers may desire methods for establishing algorithmic collusion beyond the legal standards of non-algorithmic collusion.Harrington [2018] noted that algorithms afford introspection that non-algorithmic human agents do not afford.He discussed regulating algorithmic collusion by prohibiting pricing algorithms with certain properties and proposed a few "inside the head" approaches to check if an algorithm has any of these properties.One approach is for the regulator to check the source code of the algorithms.This approach has several drawbacks.On one hand, this approach seems to require costly and detailed scrutiny by experts [Kroll et al. 2017] and has the potential to leak the intellectual property of the algorithm developers [Ruckelshaus v. Monsanto Co. 1984].On the other hand, the source code of popular black box algorithms such as those based on deep neural networks gives little information about its behavior.
Another approach Harrington [2018] discussed is to conduct dynamic testing on the pricing algorithms, i.e., running the algorithms by feeding them with simulated inputs to observe their behavior.Although dynamic testing is generally considered an effective approach for detecting software bugs, there are still challenges when applied to understanding the behaviors of pricing algorithms.To make better pricing decisions in vibrant market environments, the inputs to pricing algorithms are usually large in dimensions and dynamic.It is infeasible to exhaust all or even a small portion of the possible inputs the algorithms could take.Further, the inputs the algorithms receive from the environments in which they are deployed could be very different from the simulated ones the regulators could expect.Even worse, the not-so-recent Volkswagen emissions scandal and the emergence of the field of non-adversarially robust machine learning [Biggio et al. 2013;Carlini and Wagner 2017] showed that inputs, where collusion happens, could be adversarially hidden to evade scrutiny.As the early pioneer of computer science Edsger W. Dijkstra noted regarding testing for bug-finding, "The first moral of the story is that program testing can be used very effectively to show the presence of bugs but never to show their absence" [Dijkstra 1970].Dynamic testing can be largely uninformative for the behavior of pricing algorithms on inputs that are not tested during simulation but show up during the algorithms' actual deployment.Similar points for dynamic testing have also been made by Desai and Kroll [2017].
This paper takes a different approach.It identifies an empirical condition that can be checked from data logged by the algorithm while deployed to prove statistically that the algorithm is not colluding under reasonable assumptions.It provides a way to augment any "good" algorithm to collect this data without significantly harming its performance."Bad" algorithms that collude on implausibly competitive prices cannot be augmented to collect data and pass the audit.Our framework enables algorithms to prove that they are plausibly non-collusive and opens the opportunity for new legal standards for enforcing non-collusion, namely, requiring the algorithms used to continually pass such a test.
The paper develops an empirical definition of plausible non-collusion that has the following two groups of properties: • Economic properties: -(unilateral) non-collusion is a unilateral property that an algorithm can satisfy independently of what other algorithms are doing.-(information compatible) it allows the sellers to use side information that may be correlated.-(optimal) optimizing is not collusion.

• Legal properties:
-(plausibly correct) algorithms that collude on supra-competitive price inconsistent with plausible preferences and beliefs of sellers can not satisfy it.-(minimum burden of compliance) There are known good algorithms, i.e., those that do not use suboptimal prices, that satisfy the definition.Any new good algorithm can be augmented to collect the necessary data to satisfy the definition with little performance loss.
Justification of economic properties is as follows.It is critical for a definition of non-collusion to have the property of being unilateral.A seller should always be able to adopt a pricing strategy that is non-collusive, regardless of what other sellers do.It may surprise the reader that our definition of non-collusion allows correlation of the the sellers' behavior.This correlation of behavior is required to handle side information that could be correlated.Consider the following example, the demand for hotels in the business district is higher during the week and the day-of-week is known to all sellers.Setting different prices in response to known differences in demand is not collusion.Last but not least, if the seller is optimizing given the information they obtained, then this act of optimization is not collusion.We argue that if algorithms satisfy the three economic properties, then they are not colluding.On the other hand, if algorithms do not satisfy these three properties, then something undesirable is happening that regulators may wish to rule out.
The legal properties our definition satisfies make it appropriate for regulators to require it for pricing algorithms.Our definition of non-collusion rules out the algorithms that collude on supra-competitive prices identifiable without "seeing through the minds" of the sellers deploying them.Note that this definition does leave two ways sellers could be supra-competitive but plausibly competitive: by acting as though their costs are higher than their actual costs or ignoring information that they may have in the market that would result in lower prices.Since the regulator cannot see into the minds of the sellers, the legal standards suggest that such plausible non-collusion is not illegal. 2This parallels the modern legal theory of regulating non-algorithmic collusion via explicit agreements.Our definition also places a minimum burden on sellers deploying pricing algorithms that satisfy the economic properties.Algorithms that satisfy the properties and collect the relevant data are known and new algorithms satisfying the property can be augmented to collect relevant data with minor effects on their performance.Therefore, it is feasible for the regulator to make it a requirement for all pricing algorithms without putting excessive burdens on firms adopting them.
In summary, our main contributions are as follows: (1) a definition of non-collusion; (2) a framework for empirically auditing pricing algorithms for whether they satisfy plausibly noncollusion; and (3) an instantiation of our framework and an analysis of its statistical complexity.Using our provided framework, algorithms can collect data to prove their plausible non-collusion and regulators can audit algorithms without checking source code or limiting algorithms to be a pre-approved set.The main technical analysis of our definition of non-collusion contributes a quantification of the sample complexity (i.e., how much data is necessary) for an algorithm to collect to prove with high confidence that it is plausibly not colluding.

Related literature
As of current, collusion is regulated in the US legal system by three core federal antitrust laws: the Sherman Act (1890), the Federal Trade Commission Act (1914), andthe Clayton Act (1914).The standard legal definition for collusion leaves open the issue of whether an express agreement through overt communication is needed for the behavior to be deemed illegal [Chassang and Ortner   2 There may be opportunities to be further restrictive for pricing algorithms recommended by platforms like AirBnB and eBay that must be configured by individual sellers with information such as their costs.When the costs are reported and can be logged by the algorithm, it could be required that the prices are competitive for the reported costs.

2023]
. Earlier rulings such as v. United States [1939] andv. United States [1946] found firms engaging in illegal collusion without any explicit agreement via communication.However, more recent judicial decisions, such as Brooke Group Ltd. v. Brown & Williamson Tobacco Corp. [1993], have evolved to require the presence of such agreements.Tacit collusion by itself is not a violation of the Sherman Act [In re Text Messaging Antitrust Litigation 2015].The raison d'être for requiring express agreement is that it gives an explicit condition that courts can establish.Courts have declined to impose antitrust liability for tacit collusion alone, partly because it is difficult to distinguish tacit collusion from independent decision-making that simply takes into account the actions of rivals in oligopolistic markets [Yao and DeSanti 1993].As Judge Breyer puts it, " [it] is not because such [parallel] pricing is desirable (it is not), but because it is close to impossible to devise a judicially enforceable remedy for 'interdependent pricing'.How does one order a firm to set its prices without regard to the likely reactions of its competitors?"[Clamp-All Corp. v. Cast Iron Soil Pipe Institute 1988] The courts use an analytical framework that permits an inference of conspiracy where there is circumstantial evidence of tacit collusion "plus" something else that tends to "exclude the possibility that the alleged conspirators acted independently" [Kovacic et al. 2011;Yao and DeSanti 1993].Kovacic and Shapiro [2000] provides a detailed review of the evolution of thinking about competition as reflected by major antitrust decisions and research in industrial organizations.
Economists study collusion mostly via the lens of oligopoly theory.Non-cooperative game theory is the currently accepted economic model to analyze oligopolistic interactions [Yao and DeSanti 1993].Despite the vast literature, there is no unified theory of oligopolistic rivalry, though the mainstream models share common assumptions and approaches.That is, economists agree on what elements a "good" model should contain [Werden 2004;Yao and DeSanti 1993].Earlier works on oligopoly theory include Stigler [1964] and Friedman [1971].Werden [2004] provides a good review of basic terms and concepts in game theory as well as modern oligopoly theory.Our definition takes a similar approach, but relaxes competitive behavior to be unilateral and to allow correlated side information.
There has also been a lot of recent work on pricing algorithms and whether/how they could lead to potentially collusive outcomes.Empirical work, such as Assad et al. [2020], studied the effects of pricing algorithms in the German retail gasoline market.They found that prices increased substantially after both firms in a duopoly switched from manual to algorithmic pricing [Gal 2023].In a well-cited simulation study, Calvano et al. [2020] showed that a commonly used reinforcement learning algorithm learned to initiate and sustain a supra-competitive equilibrium when only instructed to maximize its own profits in a simultaneous, repeated price competition.Klein [2021] observes a similar reward-punishment pattern as Calvano et al. [2020].At the same time, there is also research that provides evidence for the opposite argument.For example, Abada and Lambin [2023] showed that seemingly collusive outcomes could originate in imperfect exploration rather than excessive algorithmic sophistication.den Boer et al. [2022] examined the Q-learning algorithm used in Calvano et al. [2020] in detail and concluded that "simulations presented by Calvano et al. (2020a) do not give sufficient evidence for the claim that these types of Q-learning algorithms systematically learn collusive strategies."Banchio and Mantegazza [2023] developed a theory of explaining the collusive behavior of learning algorithms by their statistical linkage.
Between the economic and legal literature, there seems to be a gap between how they view collusion.While the law examines whether competitors have taken possibly avoidable actions from which an anti-competitive agreement may be inferred, economic theory is more concerned with what final coordinated outcomes may be produced by certain conduct [Yao and DeSanti 1993].This may explain why some legal scholars tend to use the term "collusion" more narrowly to refer to illegal cartelization only (and not legal oligopolistic coordination) [Gal 2023].The term "algorithmic collusion" lends itself to different interpretations, and we use it throughout the paper to refer to "algorithmic tacit collusion" as opposed to "algorithmic explicit collusion", where algorithms implement an existing collusive strategy potentially defined or agreed upon by humans [Gautier et al. 2020].
Last but not least, this paper builds on an extensive literature on algorithms for dynamic learning of prices.Early papers by Bar-Yossef et al. [2002]; Blum and Hartline [2005]; Blum et al. [2003]; Kleinberg and Leighton [2003] show that the dynamic learning of prices fits into the framework of multi-armed bandit learning, enabling a large portfolio of well-studied algorithms to be successfully applied.Multi-armed bandit learning can be applied in repeated interactions with partial feedback3 .There is a canonical reduction from multi-armed bandit learning to online learning (with full feedback, e.g., where the learner also learns the payoffs of counterfactual prices) that employs propensity scoring, i.e., constructing unbiased estimators of counterfactual payoffs.Blum and Mansour [2007] reduce best-in-hindsight learning (a.k.a., external regret) to calibrated learning (a.k.a., internal or swap regret).Nekipelov et al. [2015] consider inferring values of low regret ad buyers from bidding data assuming a full feedback model.Our analysis is based on their definition of the rationalizable set of values and regrets for a buyer, naturally applied to the dual problem of a seller with a cost, and generalized from the full feedback setting to the partial feedback setting.
Concurrent to our work, Chassang and Ortner [2023] informally discussed the idea of enforcing a property (known as "no regret") on pricing algorithms based on the observation of Chassang et al. [2022].Chassang et al. [2022] proposed a test of competitive behaviors in procurement auctions and applied it to real-world data.They define competitive behavior as perfect public Bayesian equilibrium [Athey and Bagwell 2008] in Markov perfect strategies [Maskin and Tirole 2001].They derive necessary conditions for beliefs of the firms participating procurement auctions to be consistent with competitive behaviors, which are testable with data containing biding history of all firms.In their model, the cost of the firms (sellers) can be different across rounds while we consider the setting where the firm has fixed cost.On the other hand, our approach is unilateral.We do not make any assumption on the belief of the firm of concern and do not require full information about the other participants of the market.

Dynamic Imperfect Price Competition
We consider a setting of dynamic imperfect price competitions with  discrete price levels in which  sellers repeatedly compete for selling one unit of good or service (hereafter referred to as "good") over  rounds.Seller  has a fixed cost   to produce a unit of the good.In each round : • Seller  posts a price    ∈ P, where P is the set of discretized price levels with |P | = .• The market condition for seller  is captured by a demand function4    : is the prices of other sellers.In other words,    is jointly determined by the prices posted by all sellers.Assuming normal goods, fixing An illustrative example with two sellers is shown in Figure 1.Seller 1 and 2 have cost  1 ,  2 ∈ [0, 1] respectively.The price levels P ⊆ [0, 1].At each round , a buyer shows up with valuation   1 and   2 for the goods provided by the two sellers respectively.After seeing the prices   1 and   2 posted by the two sellers, the buyer chooses to buy from seller  that maximizes his utility    −    if    −    ≥ 0, breaking ties in favor of Seller 1.He "buys nothing" if    −    < 0 for both  = 1 and 2. Suppose each buyer draws his valuations (  1 ,   2 ) from the distribution   , the demand for Seller 1's good is A special case of this example is when the buyer's valuations of the goods of the two sellers are i.i.d.uniformly distributed over [0, 1] 2 for every round, i.e.,   =  [0, 1] ×  [0, 1].If we further assume that each seller sets one fixed price to post for all rounds, with cost  1 = 0.1 and  2 = 0.2, the equilibrium prices are   1 ≈ 0.50 and   2 ≈ 0.55.However, if the two sellers collude by setting supra-competitive prices   1 ≈ 0.60 and   2 ≈ 0.66, they will get a higher total average revenue.

Learning Problem of Sellers
The dynamic pricing problem of each seller is essentially an online learning problem.At round , seller 's pricing algorithm chooses a price distribution    ∈ Δ(P) from the set of all distributions over the prices based on her information about the history and the market.She then draws a price    from the distribution    and posts    .Following the model of online learning with bandit feedback, we make the minimum assumption about the information a seller possesses: After posting    at round , seller  observes the demand for her good    (   ,   − ) and hence her payoff    (   ,   − ) as she knows her cost   .
To measure the performance of a seller's online learning algorithm, we employ the concept of hindsight calibrated regret, which is defined as the benefit of the best-in-hindsight remapping of the prices chosen by the algorithms.We instantiate this definition in the setting of price competition.for seller  with cost   , the hindsight (realized) regret against a fixed price remapping  : P → P is The hindsight calibrated (realized) regret is defined as the maximum hindsight calibrated regret over all remappings, max    (,   ).
Since a seller's pricing algorithm chooses a distribution of prices at each round, a performance measure without considering a particular realization of the distributions is the expected regret.
Definition 2.2.Given a sequence of historical price distributions , and demand functions {   (•)}   =1 , the expected regret of seller  with cost   against a fixed price remapping  : The expected calibrated regret for  is defined as max    (,   ).
Note that calibrated regret is also called "swap regret" or "internal regret" in the literature.In addition, there is a common weaker notion of regret, known as the hindsight external regret.Hindsight external regret is defined as the benefit of deviating to a single best-in-hindsight action.
for seller  with cost   , the hindsight external regret Unlike calibrated regret, the definition of hindsight external regret does not allow beneficial side information.Later in this paper, we argue that hindsight external regret is insufficient precluding collusion.
Based on the results of Auer et al. [2002], Blum and Mansour [2007] and Stoltz [2005] give algorithms that achieve vanishing expected calibrated regret for an individual seller regardless of the market condition and other sellers' behavior.Such algorithms are among those generally referred to as "no-regret learning algorithms".
Theorem 2.1 (Blum and Mansour 2007;Stoltz 2005).For a seller  with cost   , there exists an online algorithm such that: . A characteristic of no-regret learning algorithms is that they lead to correlated equilibrium [Foster and Vohra 1997].Correlated equilibrium [Aumann 1974] is a static equilibrium concept that is often described as a mediator that draws a profile of prices from a joint distribution and privately suggests a corresponding price for each seller.The joint distribution of prices is a correlated equilibrium if each seller has no incentive to deviate from their suggested price.
Definition 2.4.A joint distribution of prices Π ∈ Δ(P  ) is a correlated equilibrium if for each seller , any realized price   from the distribution is a best response conditional on   : The regulator of pricing algorithms may not know the cost of the sellers.Nekipelov et al. [2015] consider inferring both the costs and the external regrets of learning algorithms (Definition 2.3).They refer to the pairs of costs and external regrets that are consistent with the data to be the rationalizable set.They show how to identify these rationalizable sets by assuming the pricing data contains counterfactual outcomes, i.e., what would have happened if a seller used a different price.We generalize this method to pricing data that does not contain counterfactual outcomes and the inference of calibrated regret.Definition 2.5.Given the historical price distributions as the set of all rationalizable pairs (  , ).
Each point (  ,   ) on the lower boundary of the rationalizable set gives the maximum expected calibrated regret   of seller  when she has cost   .The rationalizable set can be efficiently computed via the method provided in Nekipelov et al. [2015] with minimal assumptions.

Collusive Equilibria in Repeated Games
The setting of dynamic imperfect price competition is a repeated game.On the other hand, correlated equilibria described previously is an equilibrium concept for a static game (not repeated).Hindsight calibrated no-regret learning algorithms in the dynamic game, as we have seen, converge to this static equilibrium concept.The literature on repeated games, on the other hand, typically talks about dynamic equilibrium concepts, where an agent can explicitly condition on the actions of other agents in previous stages, perhaps to punish them for deviating from some prescribed strategy.We view such equilibria as collusive.The so-called "folk theorems" of repeated games describe outcomes that are possible as equilibria of the repeated game.Benoit and Krishna [1985] give a folk theorem for finitely repeated games.Stated in words: under weak conditions, any feasible and individually rational payoff of the static game can be approximated by the average payoff in a subgame-perfect equilibrium of a repeated game with a sufficiently long horizon.A stable collusive outcome in a static pricing game is one where some players are best responding while other players are in a coalition and obtain higher individual payoffs than they would in an equilibrium that could result if they were to all be best responding.In this outcome all best responding players are obtaining at least their individually rational payoffs and all colluding players are obtaining strictly more than their equilibrium payoffs which are at least their individually rational payoffs.Thus, the folk theorem implies that equilibria in finitely repeated pricing games can approximate any stable collusive outcomes that exist.
Corollary 2.2.In a dynamic imperfect price competition game that is finitely repeated with a sufficiently long time horizon, any stable collusive outcome can be approximated by an equilibrium of the repeated game.
We have focused on hindsight calibrated no-regret learning algorithms that converge to correlated equilibria in the static pricing game.Another large family of learning algorithms that are natural to use for price competition is no-policy-regret learning algorithms.When a seller is learning how to price, it is natural for competitors to react to the seller's prices with their own pricing strategy.Policy regret algorithms compare their performance to the performance they could have achieved if they switched to a fixed policy and the others in the market responded to this switch.Arora et al. [2018] introduced the notion of a policy equilibrium that corresponds to outcomes in games played by no-policy-regret learning algorithms.They showed that Policy equilibrium is a strictly larger class than correlated equilibrium.As correlated equilibrium corresponds to calibrated best response by each agent in each round, the policy equilibria that are not correlated equilibria are not best responding in each round.We view these outcomes as collusion.Hence, we view no-policy-regret learning algorithms as problematic for algorithmic pricing.

FRAMEWORK FOR REGULATING COLLUSION
This section presents our definition of plausible non-collusion for sellers and an empirical framework for auditing it.In our model, the seller has a potentially private cost, which is static across rounds, and a potentially private signal that correlates with the demand (and possibly the competition, which might also correlate with the demand).Our framework is based on the following sufficient condition for non-collusion: Definition 3.1.It is non-collusive for a seller to approximately best respond to their competitive environment.
While it is plausible that a seller who has not approximately best responded is not colluding, our framework will not be able to conclude that they have not colluded.The existence of algorithms that can easily satisfy the empirical definition we propose is evidence that it is permissible to hold sellers to such a standard.
We may not know the seller's cost and/or the seller's beliefs on the competitive environment.In fact, these beliefs on the environment may be changing over time (though we assume that the sellers' costs are stationary).We will not require that the regulator knows anything about the seller's costs or beliefs.Instead, we will apply econometric principles of revealed preference and revealed information.If a seller is approximately best responding to their competitive environment, we can infer their cost and whether or not they are consistently using information that reveals what information they possess.For this reason, our empirical notion of non-collusion is only plausible, i.e., there exists a cost and belief that is consistent with the data for which the seller has approximately best responded.
It is possible that sellers collude to act as though they have costs or information about the demand seemingly plausible to the regulator, but different from what they actually have.A regulator uninformed about the true cost and true information possessed by the sellers will not be able to detect such collusion.Our philosophy is that these possibilities exist already in the regulation of collusion absent algorithms, and our focus is on solving new challenges introduced by algorithms by essentially reducing them to the old challenges of regulating collusion.
Our definition of non-collusion is unilateral by definition.A seller can satisfy it regardless of the actions of other sellers.Specifically, it will not be important to explicitly model the detailed actions of other sellers, only the impact of those actions on the outcome of a seller.A seller's outcome, given the actions of other sellers and buyers, is a function  : P → [0, 1] from their price  ∈ P to a quantity of goods sold at this price, a.k.a., a demand function.We will assume that the goods are normal goods, i.e., the demand function is monotonic where increasing price results in (weakly) decreasing allocation.
We first give a static definition of non-collusion that applies to a single round of pricing.We then generalize the definition to repeated pricing and allow for statistical learning.Definition 3.2.A joint distribution on pairs of price and demand function Π ∈ Δ(P ×(P → [0, 1])) is in calibrated best-response for a seller with cost  if, conditioned on the seller's price ,  is a best response: Calibrated best response captures what it means to be a good algorithm and allows the algorithm to use side information.Collusion is a potentially tacit agreement between sellers to keep prices higher than those in each seller's best interest, given the prices of the other sellers.On the other hand, best responding to the market and in particular what other sellers are doing is not collusion.Calibration allows side information.If the side information is useful, it manifests in distinct prices.The definition conditions the best response on the prices.In other words, calibration requires an internal consistency with respect to information that is revealed to be possessed in variation of prices.It is easy to observe that the calibrated best response is the unilateral version of correlated equilibrium.If all sellers' prices satisfy the calibrated best response then the joint distribution of prices is a correlated equilibrium.
While it might seem that allowing correlation is allowing collusion, we argue that, in fact, no reasonable definition of collusion can forbid correlation of prices.Specifically, non-collusion is inherently about best responding to market conditions.When consumer demand changes, the best response prices change.Consumer demand is something that all sellers should be measuring and it is correlated across sellers that are in price competition.Therefore, correlation must be allowed.Calibration is a minimal allowance of correlation and, in particular, it is agnostic to various potential sources of correlation and does not require that they be explicitly modeled.Definition 3.3.A joint distribution on pairs of price and demand Π is non-collusive for a seller  if Π satisfies the calibrated best response for .
In repeated environments, where sellers are learning about what prices are good, their prices might not be in the best response.However, as the learning proceeds, the distance from the best response should diminish.This property is captured by the following definition for the dynamic settings.Definition 3.4.An infinite sequence of pairs of price and demand {(  ,   )}  is calibrated vanishing regret for a seller with cost  if, the maximum average per-round benefit of deviation over the set of price remaps  : P → P, up to a given round, approaches zero as the number of rounds goes to infinity: where the payoff for a price  on demand  is  (, ) = ( − )  () Calibrated vanishing regret and calibrated best response are related in that: • If we draw a sequence of prices from a joint distribution that satisfies calibrated best response for the seller (and payoffs are bounded), then this sequence of prices will satisfy calibrated vanishing regret for the seller; and • in the limit with the number of rounds, the uniform distribution on price-demand pairs (a.k.a. the empirical distribution) given by a sequence that satisfies calibrated vanishing regret for the seller approaches a distribution of prices in calibrated best responses for her.
These two properties give a unilateral version of an equivalence observed by Foster and Vohra [1997]: If the conditions hold for all sellers, then the empirical distribution of the price sequence approaches a correlated equilibrium.
Note that requiring calibration is important in our definition of non-collusion.The weaker notion of vanishing external regret (Definition 2.3) does not require calibration, and it fails to rule out certain collusive behaviors when the sellers have private information about the demand.
We demonstrate the problem with external regret with the numerical example discussed in Section 2.1: Two sellers have cost  1 = 0.1 and  2 = 0.2 respectively and the buyer's valuations are i.i.d.uniform over [0, 1].However, Seller 1 now possesses private information.She can tell if an incoming buyer has a low valuation for both sellers, i.e.,  1 ≤ 0.5 and  2 ≤ 0.5.Thus, she can post a different price for these buyers.Seller 1 can take advantage of this private information to collude with Seller 2 while still having non-positive external regret: Seller 2 posts a fixed price  2 = 0.66.Seller 1 posts   1 = 0.3 when she knows the buyer has a low valuation, and the same price  1 = 0.66 as Seller 2 otherwise.Recall that external regret compares Seller 1's revenue against the best single price she could set.However, compared to a single price that always undercuts Seller 2, it is better off for her to get more revenue from low buyers and not compete against Seller 2 on high buyers.On the other hand, in this example, Seller 1 does have positive calibrated regret.She can get a higher revenue by posting  ′ 1 = 0.60 whenever she posts  1 = 0.66 under the current collusive strategy.In other words, the calibrated best response condition fails to hold, i.e., conditional on her posting  1 = 0.66,  1 = 0.66 is not a best response.
Our methods will not require the regulator to know the exact cost of a seller.It will be sufficient to know that the seller's cost is in a bounded range [ c, c].The regulator will assume the seller's regret is the minimum one that achieves costs in this range.Definition 3.5.An infinite sequence of price profiles of a seller is plausibly non-collusive for cost range [ c, c] if it satisfies calibrated vanishing regret for some cost  ∈ [ c, c].There is a long literature that develops good learning algorithms for pricing with unknown demand, specifically by satisfying vanishing calibrated regret.Algorithms that do not satisfy vanishing calibrated regret are making mistakes in optimization that are apparent from the data.Given the information that the algorithms have which is revealed in the prices, they are not optimizing well enough that calibrated regret vanishes.We view this failure of optimization as a mistake, and algorithms that make this mistake as not good.
Definition 3.6.An environment is a process that generates the sequence of demands based on an algorithm's past decisions, i.e., a sequence of functions mapping a history of prices or distributions of prices, to a distribution of demand functions.
A stochastic environment is an environment where the demand functions for each round are independent and identically distributed.An adversarial environment is an environment where the demand functions for each round are assumed to be generated in a way against the algorithm running in it.Definition 3.7.An algorithm is good for a cost  in an environment if it satisfies vanishing calibrated regret for cost .
Calibrated vanishing regret cannot be directly observed in the data of a learning algorithm because (a) in practice, only data from a finite horizon can be observed, and (b) outcomes for counterfactual prices are not generally known.
With data observed from a finite horizon of length  , the methodology of property testing [Goldreich 2010] can be used to check whether the expected calibrated regret of a seller at  is below a threshold, which approximates the calibrated vanishing regret.
Definition 3.8.The expected calibrated regret at time  for a seller with cost  against price remapping  is Definition 3.9.The seller's plausible calibrated regret at time  is   * = min  ∈ [ c,c ] max    (, ).Note that the seller's plausible calibrated regret   * is always smaller than her true calibrated regret   = max    (,  0 ) when her cost  0 ∈ [ c, c].While in round , the seller uses price   and obtains some utility for it, to test if her regret is low, we need counterfactual outcomes for other prices that could have been used, which we can only estimate based on the data.
We measure the statistical complexity of a low plausible calibrated regret test by the number of rounds  that is sufficient to distinguish the two scenarios with high confidence: • the seller's true calibrated regret is below a given threshold (for sufficiently auditable algorithms); • the plausible calibrated regret of the seller is far above the given threshold.This gives a two-sided bound while allowing for a failure to identify the low regret of algorithms that do not collect enough data to accurately make such a determination.Definition 3.10 (Sample complexity with auditability requirement).A low plausible calibrated regret test has sample complexity  with auditability requirement , confidence 1 −  and target regret level  , if  is the minimum  such that • if the seller's true calibrated regret   ≤  and the transcript satisfies the auditability requirement , she passes the test with probability at least 1 − ; • if the plausible calibrated regret   * ≥ 2 , the seller fails the test with probability at least 1 − .
Since counterfactual demand for other prices that could have been used can not be observed, an algorithm needs to keep additional data in the transcript to demonstrate that it has low regret.
Algorithms might not be designed to record such information.Our goal for auditing collusion is to allow any good algorithm to be used.Thus, we look for tests for which any algorithm can be retrofitted to collect the data so that, if their regret is low, they pass the test.
Definition 3.11.A low plausible calibrated regret test with auditability requirement  is (  ,   )audit compatible if the following holds: Any algorithm  with expected calibrated regret at most  for any time horizon at least  can be augmented to an algorithm that produces a transcript satisfying  with expected calibrated regret at most  +   for a time horizon with expected length (1 +   ) .
To instantiate the above framework for auditing non-collusion, we must do the following: • define a low plausible calibrated regret test; • prove that the test has good sample complexity; • define a black-box transcription algorithm for converting any good learning algorithm into one that additionally produces an auditable transcript; and • prove that the test is audit-completable with a small loss and small additional time (by analyzing the transcription algorithm).
The next section completes these steps.

EMPIRICAL PROPENSITY SCORE TEST
In this section, we give one instantiation of our framework for auditing the collusion of a seller in dynamic imperfect price competition based on the propensity score estimator, which is a standard method in multi-armed bandit algorithms that have been developed for pricing.Since we are focusing on one particular seller, as we did in the previous section, we drop the subscript  from the notations for ease of reading and use   (•) to denote the demand determined by the environment at round .We will also use regret to refer to expected regret for simplicity as we are not concerned with the realized regret.
Definition 4.1.The propensity score transcript includes the sequences of produced by the seller's algorithm, • the actual prices posted {  }   =1 , and, • the observed demand {  (  )}   =1 , i.e., the outcomes of posting price   at round , the seller experienced.
It is assumed that the price   is actually drawn from the distribution   .It is not hard for the seller to commit to doing so and convince the regulator with modern cryptography.
With the price transcript described above, we define an estimated calibrated regret using the propensity score estimator for the unobserved probabilities of sale for counterfactual prices.
Definition 4.2.Given a price transcript, the propensity score estimator for   (•) is The propensity score estimator weights the outcome of each observation inversely proportional to its rarity.Note that for any fixed , x () is an unbiased estimator for   () as E ∼ [ x (  )] =   (  ).We define the following estimated calibrated regret for a seller with cost  and against price remapping  : P → P: and the estimated calibrated regret is max    (, ).The estimator estimates the true regret with the propensity score estimator for demand.
We define minimum exploration probability to quantify the exploration demonstrated by a transcript.
Definition 4.3.The minimum exploration probability of a transcript is To infer the cost of the seller, the regulator can compute the estimated rationalizable set à la Nekipelov et al. [2015]: as the estimated plausible cost for the seller.c is the cost with which the seller has the lowest estimated calibrated regret according to the data."Having cost c" is a plausible explanation of the observed data that is most favorable in terms of estimated calibrated regret to the seller.
To test if a seller's plausible calibrated regret   * ≤  for target regret level  , the regulator conducts the following test on a transcript defined in Definition 4.1.For target regret level  , • pass if UCB  ≤ 2 ; and • fail, otherwise.
The accuracy of the propensity score regret estimator depends on how often the seller's algorithm explores.The estimation is accurate only when the algorithm explores often enough so that enough information is revealed.The upper margin of error term   is added to the estimated calibrated regret in order to account for the error given the exploration of the seller's algorithm.This ensures that when the transcript fails to demonstrate that the algorithm producing it conducted enough exploration, it is hard for the seller to pass the test.Hence, a seller with high plausible calibrated regret can not pass the test for getting a low estimated plausible calibrated regret when the estimator is actually unreliable.As discussed above, to be able to pass the empirical propensity score test, the seller's algorithm needs to explore often enough so that the transcript satisfies the minimum exploration requirement.The transcripts produced by an algorithm that does not explore often enough are not auditable using the empirical propensity score test even if the algorithms are actually non-collusive.
As long as an algorithm is robust enough in an environment, it can be modified to produce auditable transcripts in the same environment by mixing it with a small probability of uniform sampling of all the prices.Running the modification for a few more rounds will have roughly the same performance as the original one.An algorithm is robust in its operating environment if it regret is approximately preserved when the algorithm skips some rounds, as long as these rounds are randomly drawn independently from the algorithm and the environment.
Definition 4.5.An algorithm  is blackout robust in an environment if the following holds: If running  in the environment for any time horizon at least  has regret at most   , then for any time horizon  ′ ≥  , the regret of running  on an independently selected subset of the  ′ rounds with length at least  is no greater than   .
Theorem 4.2.Let  = |P | be the number of price levels and assume that the maximum possible price is normalized to 1.Given any algorithm , for minimum exploration requirement , consider the algorithm Â, which at each round : • w.p. , output  drawn uniformly from P • w.p. 1 − , call  with the inputs and output its output . Then, • The distribution   produced by algorithm Â has minimum exploration probability at least .• If  is blackout robust in the environment and has regret at most   for any time horizon at least  , the regret of running Â in the same environment until  is called at least  times is no greater than   + , and the expected number of rounds it takes is  /(1 − ).
Corollary 4.3 (Audit Compatibility).b Let  = |P | be the number of price levels and assume that the maximum possible price is normalized to 1.The empirical propensity score test with minimum exploration requirement  is (,  )-audit compatible for any blackout robust algorithms and environment where  = /(1 − ) and  = .
Good algorithms in various environments are automatically blackout robust, and thus can be modified to pass the empirical propensity score test.
Lemma 4.4.Any no-regret algorithm in a stochastic environment is blackout robust in the environment.
Lemma 4.5.Any no-regret algorithm in an adversarial environment is blackout robust in the environment.
As we discussed in the Section 3, calibrated vanishing regret cannot be directly observed in the data of a learning algorithm from a finite horizon.The empirical propensity score test introduced in this section checks an approximation of the calibrated vanishing regret with data observed from a finite horizon of length  .We conclude this section by showing that the limiting behavior of the test as  goes to infinity is consistent with checking whether plausible calibrated regret vanishes.
The empirical propensity score test checks whether the upper confidence bound of the estimated plausible calibrated regret is below the target regret level, where the upper confidence bound of the estimated plausible calibrated regret is the sum of the plausible calibrated regret estimator   = max    (, c) and the error margin   .To show the limiting behavior of the test, we establish the asymptotic consistency of the plausible calibrated regret estimator with the error margin: When the algorithm explores reasonably enough, if the true calibrated regret vanishes as  goes to infinity, then so does the estimated plausible calibrated regret.On the other hand, when the regulator sets an increasing sequence of confidence levels that goes to one as  goes to infinity and grows slowly enough, if the plausible calibrated regret does not vanish, then the upper confidence bound does not vanish.
Lemma 4.7 (Lower-bound consistency with error margin).Suppose the regulator chooses a vanishing sequence of From the above two lemmas, we obtain the following theorem showing that the plausible calibrated regret estimator   along with the error margin   has the desirable limiting behaviors as  goes to infinity.Algorithms satisfy calibrated vanishing regret and explore reasonably enough have vanishing upper confidence bound and can thus pass tests with target regret level vanishes as  goes to infinity, while algorithms with non-vanishing plausible calibrated regret can not.
Theorem 4.8.As  → ∞, suppose the regulator chooses a vanishing sequence of {  }  satisfying   = Θ( −2 ), • if the seller's algorithm satisfies the upper confidence bound UCB  =   +   vanishes when the seller's true regret   vanishes almost surely; • if seller's plausible calibrated regret   * does not vanish then the upper confidence bound UCB  =   +   does not vanish almost surely.

CONCLUSION
In this work, we propose a definition for algorithmic non-collusion for pricing algorithms and a framework for empirically auditing non-collusion based on statistical tests on the data.Based on our framework, we give an instantiation with propensity score estimators and provide its statistical complexity.
The propensity score estimator for plausible maximum regret used in the empirical propensity score test introduced in Section 4 makes a few assumptions on the seller's algorithm.The seller's algorithm is required to either distribute some amount of probabilities on each action at every round or be robustly good in its operating environment.The accuracy and efficiency depend reversely on the magnitude of these probabilities.This raises the question of whether there are estimators without such restrictions.
One natural direction for future work is to find low plausible regret tests with lower statistical complexity or looser auditability requirements for transcripts.
Another interesting question is whether auditing non-collusion can be formulated as a continuing process without the regulator deciding a fixed time horizon  .

A PROOFS
A.1 Theorem 4.1 Fact (Azuma's Inequality5 ).Given a sequence of random variables {  }  and a filtration {F  }  such that E[  | F  −1 ] = 0.If there exists {  } such that |  | ≤   for every , then for any  ≥ 0 and  : Pr[max where i.e., {Δ  , ′ ()}  as the {  } in the formulation of Azuma's Inequality given above.Let F  be the information available to the seller's algorithm up to .We first show that E[Δ  , ′ () | F  −1 ] = 0, where.In fact, For all  ∈ P, by definition of x (), we have Since for any  ∈ P, 0 ≤   () ≤ 1, 0 ≤   () ≤ 1 and − ≤  −  ≤ , we have • When  =   , Since for any  ∈ P, 0 ≤   () ≤ 1, 0 ≤   () ≤ 1 and − ≤  −  ≤ , we have • When  ′ ≠   and  ≠   , Since for any  ∈ P, 0 ≤   () ≤ 1, 0 ≤   () ≤ 1 and − ≤  −  ≤ , we have By the fact that   ( ′ ) ≤ 1, we have The first inequality comes from the simple fact that at least one element of the sum must be no less than the average.Note that since c is a random variable, we can not simply treat it as a fixed  and obtain a probability bound using the exact same argument as the upper tail for max    (, c) −max    (, c).Instead, we consider the event across all fixed .
Note that with similar arguments for reasoning about the upper tail, we have Pr and we conclude that

□
Main Result of Theorem 4.1.Finally, we restate our theorem and give its proof: For a given confidence 1 − , target regret level  and minimum exploration probability , when the following holds: where with probability 1 − .
Proof.Assume that □ Skipped Algebraic Steps.We elaborate the algebraic steps in the skipped in the proof immediately above.
We first plug in from ( 34) into ( 44) and ( 67).Solving for  fixing other parameters, i.e., for any given  , from ( 44) and ( 67) respectively, we get is taken as max(   ,    ).(85) We get the desired results by plugging it into (44) and into   .Given a no regret algorithm  in an environment, with (average) regret upper bound   =  ( ) =  (1) for running for time horizon at least  in the environment.When the environment is stochastic, i.e.   () ∼  () for all , any independently selected subset of  ′ rounds with length at least  still satisfies   () ∼  () .... Therefore, the regret is at most   =  ( ).When the environment is adversarial, the algorithm's regret bound should hold when it runs on any rounds with length at least  .
The second bullet point directly comes from Lemma (4.7).
with minimum exploration requirement , confidence 1 − , and target regret level  , where  = |P | is the number of price levels, and  = max  |P | is the maximum possible price.

)
Lemma A.1.Let  0 be seller's true cost, c = arg min  max    (, ) be the estimated plausible cost,  * = arg min  max    (, ) be the plausible cost,  = |P | be the number of price levels, min    () = min    () be the minimum among the probabilities of posting each price level at round  by the seller.We have (,  0 Proof.Observe that for any fixed , since remapping  to  ′ does not affect the payoff of ′′ ≠  ′ ,   ( ′ , ) −   (, ) , r , ′ () =We first show that for the deviation Δ  , ′ () = R , ′ () −   , ′ () for each ,  ′ is small with high probability using Azuma's Inequality.