Vertical Allocation-based Fair Exposure Amortizing in Ranking

Result ranking often affects consumer satisfaction as well as the amount of exposure each item receives in the ranking services. Myopically maximizing customer satisfaction by ranking items only according to relevance will lead to unfair distribution of exposure for items, followed by unfair opportunities and economic gains for item producers/providers. Such unfairness will force providers to leave the system and discourage new providers from coming in. Eventually, fewer purchase options would be left for consumers, and the utilities of both consumers and providers would be harmed. Thus, to maintain a balance between ranking relevance and fairness is crucial for both parties. In this paper, we focus on the exposure fairness in ranking services. We demonstrate that existing methods for amortized fairness optimization could be suboptimal in terms of fairness-relevance tradeoff because they fail to utilize the prior knowledge of consumers. We further propose a novel algorithm named Vertical Allocation-based Fair Exposure Amortizing in Ranking, or VerFair, to reach a better balance between exposure fairness and ranking performance. Extensive experiments on three real-world datasets show that VerFair significantly outperforms state-of-the-art fair ranking algorithms in fairness-performance trade-offs from both the individual level and the group level.


INTRODUCTION
Ranking techniques have been extensively studied and applied across online marketplaces (e.g.e-commerce websites, such as Amazon) and social media (e.g.suggested people to follow on Twitter/TikTok).Traditionally, the focus of ranking models is to maximize consumer-side satisfication, or relevance.The ranklists presented to the consumers are usually constructed by sorting the candidate items according to the estimated relevance of the consumer-item pair.However, some recent studies [10,47] have revealed that this consumer-centered strategy will allocate most of the exposure to few top-ranked popular items and their providers (e.g.products/sellers on e-commerce websites and content/content creators on media platforms), which often referred to as the Winners-Take-All phenomenon.Since exposure directly influences opinion (e.g.ideological orientation of presented news articles) or economic gain (e.g.revenue from item sales or streaming), the unbalanced distribution of exposure will eventually drive the other less popular items out of the platform while discouraging new items to come in, where few options left for consumers.At the end of the day, the utility of the platform, consumers, and providers will all be harmed.Therefore, how to allocate exposure to items fairly, or in other words, to guarantee Provider-side Utility 1 is crucial in ranking.
Recently, several provider-side fairness definitions [1,2,10,31,35,47,66] have been proposed by the community.One of the most well-recognized principles is Amortized Fairness [10,47], which hypothesizes that fairness can be reached if the items' exposure distribution could match their relevance distribution.Specifically, amortized fairness in ranking is defined from both the individual level and the group level.Individual-level amortized fairness considers each result candidate separately.If an item is twice as relevant as another item, it should get twice the exposure as well.Similarly, group-level amortized fairness considers result candidates in groups (e.g., items can be grouped by their brands).If the accumulated relevance of one group is twice that of another group, the group should get twice the exposure as well.
The key research focus of amortized fairness is mainly on how to reach a good balance between ranking relevance and fairness.While purely ranking according to relevance could harm ranking systems in the long term, purely considering fairness may also sacrifice consumers' utility [40], hurt consumers' experience, and eventually drive consumers away from the platform.To balance ranking relevance and fairness, existing methods [10,37] proposed to dynamically mitigate unfairness in an online manner where they assume no prior knowledge of the customers is available.However, this assumption can be suboptimal in terms of fairness-relevance tradeoff [55] in scenarios where the system does possess some customer information and can pre-compute result rankings for consumers before the serving time, such as email advertisement and e-commerce recommendation.
In this work, we focus on the problem of fairness-relevance balance in ranking and propose a novel algorithm named Vertical Allocation-based Fair Exposure Amortizing in Ranking (VerFair).Compared to existing amortized exposure algorithms, VerFair could achieve a better balance between ranking performance and fairness constraints.VerFair is a post-processing method that does not depend on any specific relevance estimation model and, therefore can be seamlessly integrated into existing ranking applications.While most of the existing fairness ranking methods use a horizontal allocation (details in §4.2) paradigm to allocate items to customers, we propose a novel vertical allocation paradigm that can put more relevant items at the top ranks while still maintaining the fairness of result rankings.Based on the vertical allocation paradigm, we subsequently introduce a mechanism to guarantee minimum relevance-induced exposure for each item, given a predefined tolerance of unfairness.Through extensive experiments, we demonstrate the proposed method significantly outperforms the state-of-the-art fair method in terms of top ranks' relevance while the minimum relevance-induced exposures of items are still guaranteed.To summarize, our contributions are three-fold: ➊ We propose a novel post-processing amortized fairness method VerFair that can provably achieve fairness in both group level and individual level for ranking.
➋ We additionally introduce a novel mechanism to guarantee a minimum relevance-induced exposure for all items/item groups.
➌ Extensive experiments demonstrate that VerFair can reach a significantly better balance between fairness and top ranks' relevance compared to existing exposure-amortized algorithms.

RELATED WORK
Fairness.With development of ML techniques, researchers have been interested in the fairness issue brought by them and the corresponding social impacts [20,25,30,32,51,65].Fairness in Ranking [11,19,21,38,45,52,54,57,63] has attracted much attention as ranking plays an important role in modern Internet services, including E-commerce websites and social media platforms.Given that ranking is a two-sided market, with customers on one side and item providers on another side, we need to consider customers' satisfaction as well as a fair environment for providers [1,3,12,16,18,23,28,33,46,49].While existing definitions of fairness in ranking vary a lot (refer to [3,22,42,44] for a comprehensive survey), in this work we focus on amortized exposure fairness.Existing works on amortized fairness [13,40,59,64] mainly focus on allocating exposure for each item whose relevance has been estimated.Patro et al. [40] propose an allocation method named FairRec to achieve fairness, which guarantees an equal frequency for all items in ranklists.However, such frequency-based fairness [41] ignores the fact that there exists a large skew in the distribution of exposure for different ranks such as position bias [17,24,29].
Amortized Fairness.In this paper, we focus on how to achieve amortized fairness [10,42,47] in a post-processing manner.In Table 1, we make a comparison between several amortized fairness methods which we include as baselines.Specifically, let's consider a ranking task where there are  users,  items, and length of ranklist for each user is .Biega et al. [10] proposed to carry out  rounds integer linear program (ILP) with  2 decision variables in each round to amortize exposure.Since the size of decision variables is a bottleneck for ILP solvers, Biega et al. [10] proposed a downsampling step that helps to reduce the size of candidate sets in each  round and there are  ( 2 ) decision variables in each round.Instead of trying to amortize exposure dynamically, Singh and Joachims [47] adopt Linear Programming (LP) with  2 decision variables to give a static probabilistic ranking, which is mostly infeasible, given a large number of items.Besides, LP methods assume one single relevance distribution for items, while the ILP method can work with multiple relevance distributions.Thus, the ILP method is more suitable in ranking system given that the distribution of personal relevance varies from person to person.Besides linear programming, Morik et al. [37] propose a more efficient fair ranking algorithm, FairCo, which first determines each item's unfairness and then boosts ranking score of under-exposed items with a proportional controller.Unlike the postprocessing method mentioned above, methods such as PG-Rank [48], MMF [59], PLRank [39], MCFair [63], and FARA [62] opt to achieve amortized fairness in learning to rank procedure.Wu et al. [56] provide a theoretically analysis of the relationship between ranking relevance and fairness.One additional note is that, in this paper, the fairness we are considering is different from learning representations to achieve a fair model where relevance rating should be independent of some sensitive attribute [2,9,66].

BACKGROUND AND PRIOR KNOWLEDGE
In this section, we will introduce related definitions.A summary of notations used in this paper is shown in Table 2.
• Exposure and Fairness.To optimize ranking fairness, there are two concepts that are of importance: relevance and exposure.Personal relevance (, ) indicates preference of consumer  toward an item .Aside from personal relevance, average relevance () is also widely used in ranking fairness [37].Expected average relevance () or global relevance indicates global preference for all consumers toward item .It is defined by marginalizing personal relevance: where U is the set of all consumers, .() is consumer 's probability in U.In this paper, we refer to average relevance as relevance unless otherwise explicitly specified.
In existing works [10,37,47], exposure is defined as the examination probability, or in other words, how likely an item will be viewed in a ranked list.Previous studies mostly model the item exposure following the Position Bias Assumption [15,29] where examination probability depends on the rank position in a ranklist.As an item could be at different ranks in different consumers' ranklists, we compute the accumulated exposure an item  gets by: where B denotes all ranklists, rnk( |) is the rank of item  in ranklist  and  rnk( | ) is the examination probability of item  in the ranklist .
Aside from above definition of relevance and exposure for individual item , we could also define group-level relevance and exposure over items of the same brand, or from the same producer.In the group level, we can accumulate relevance  and exposure  for items within group  respectively, • Two-side Utility Measurement.Ranking is a two-sided market, with consumers on one side and providers on the other side.Consumers care more about ranking relevance while providers care more about fairness [1,3,55].
(1) Ranking Relevance: Here we use NDCG [27], a widely adopted ranking metric to measure ranking relevance from consumer side.Specifically, we use NDCG@  , where   is the cutoff position of the ranklist.Note that NDCG@  is bounded within [0, 1].
(2) Amortized Fairness: Fairness is used to measure the ranking quality from the provider side.For amortized fairness, most studies evaluate fairness by measuring the distance between empirical distributions of exposure  and relevance .For example, Biega et al. [10] use the 1 distance.Here we choose to adopt Jensen-Shannon divergence instead of 1 distance since it is bounded to the same range as NDCG, i.e., [0, 1], which is better for model comparison and result visualization.In both individual level and group level, we define the fairness as, where JSD denotes Jensen-Shannon divergence which gives the divergence between exposure distribution  and relevance distribution  among all items/item groups.Fairness is within [0, 1].Higher divergence means more unfair ranklists for the providers.

OUR METHOD
In this section, we introduce VerFair, an algorithm for amortized fairness.We start this section by extending the discussion of amortized fairness with Exposure Quota ( §4.1), followed by two motivating examples to help readers understand the concept of vertical allocation and how VerFair could guarantee the items' exposure quota ( §4.2) while reaching better ranking relevance at top ranks.We illustrate the details of VerFair ( §4.3) and provide theoretical proof to VerFair's exposure quota guarantee ( §4.4).

Exposure Quota
Here, we consider a ranking task, where we need to select and rank  unique items to each consumer, and there are in total  consumers and  candidate items.For this ranking task, the total exposure  total is fixed: where  , is the examination probability of consumer  towards rank .And the exposure of all items/item groups should sum to the total exposure, Then the key question is how to distribute this total exposure to each item/group fairly.To tackle this, we define Quota( |) as the fair share of exposure for an item  as: where D is the set of candidate items and  indicates the fraction of total exposure to be allocated for fairness.Here, we require the exposure that item  actually gets (i.e.,  ()) should be greater than or equal to its fair share of exposure: where ( |), the fair share of exposure, can be viewed as relevance-induced minimum exposure.Similarly, at group level, the fair share of exposure for group  is: A minimum fair exposure for group  is guaranteed if: When  = 1.0, i.e., all exposure are used for fair exposure allocation, considering Equations ( 5) to (7), we have equality constraints: Inequality constraints defined in Eq. ( 8) and Eq. ( 10) will degenerate to the above equality constraints when  = 1.0.The equality constraints are the exact constraints of amortized fairness defined in [10,47], where an item's exposure should be proportional to its relevance.When 0 ≤  ≤ 1.0,  not only decides Quota( |) (i.e. the minimum fair exposure), but also decides the degree of fairness we try to guarantee.An additional note is that our goal is not to find a  that maximize ranking relevance or fairness; instead, our goal is to maximize ranking relevance given the same degree of fairness.We achieve this by requiring the exposure of each item to satisfy constraints in Eq. ( 8) and Eq. ( 10).We will illustrate more in the rest of this section.3a.We assume constant exposure (=1) for all ranks and the total exposure is 6.We guarantee all ( = 1.0) of the total exposure to be fair.All items are of the same relevance and each one of them should get exposure of 2 for fairness.
(Step) indicate time steps.Vertical allocation can help to get a higher NDCG at rank@1 (top ranks).

Motivating Examples.
4.2.1 Vertical allocation and horizontal allocation.We give a motivating example in Table 3 before introducing our ranking method.
In this example, there are 3 consumers and 3 items.Table 3a shows the consumer-item pair relevance.Our ranking task is to construct ranklists of length 2 for each consumer given the relevance matrix.We follow the amortized fairness principle to construct the ranklists, where items of similar relevance should get similar exposure.Since items A, B, and C have the same averaged relevance, they should get the same exposure.For simplicity, we assume that the exposure of each rank position is the same and equals 1 for each consumer.Thus the total exposure is 6 and each item should fairly show twice.Shown in Table 3b, there are two ways to allocate items to consumers, vertical allocation and horizontal allocation.In vertical allocation, we fill up one rank for all consumers and then move to the next rank; in horizontal allocation, we fill up all ranks for one consumer and move to the next consumer.In this example, we can't choose item B at step 5 in vertical allocation because it has been used up in previous steps.The similar situation applies to step 5 in horizontal allocation.
In general, vertical allocation achieves higher NDCG at top ranks, as shown in Table 3b.Compared to horizontal, vertical allocation have fewer conflicts in allocating relevant items to top ranks and thus higher NDCG at top ranks (see Sec. 4.4.1 for theoretical analysis).In contrast, we expect horizontal allocation to have a higher long list NDCG since there are more relevant items available for lower ranks.Previous methods [10,37,47,48] mostly adopt the horizontal allocation, while we choose vertical allocation due to its superior performance at top ranks.Vertical allocation assumes that consumers' information is already known.Such assumption is We assume constant exposure (=1) and guarantee half (= 0.5) of the total exposure to be fair, where each item should get at least exposure of 1.We carry out the allocation phase (steps 1-3), then appending phase (steps 4-6), and finally re-sorting phase; and indicate moving an item to a higher rank (forward) and to lower rank (backward) in the re-sorting phase.Direction indicates the resorting direction of fairly allocated items (steps 1-3).

Ranklists Start from origin
Start from anchor Backward Forward reasonable for certain ranking tasks including email advertisement (ranklists are constructed to all consumers once at the same time) and offline recommendation [40,41].

How to guarantee a minimal exposure?
We show a sample usage of vanilla ranking strategy in left side of Table 4b.In the allocation phase (step 1-3), the vertical allocation starts from (1, 1) and moves to (2, 1), (3, 1), after which each item's minimum exposure (quota) is met.In the appending phase (step 4-6), the allocation algorithms fill the rest parts of the ranklist.In the re-sorting phase, for consumer 2, item C is moved to the 2  rank, because item A has higher relevance for consumer 2. In practice, as exposure drops from higher ranks to lower ranks, this may lead to the reduced exposure of item C, breaking the minimum exposure guarantee.We argue that the break of minimum exposure guarantee in vanilla ranking strategy is due to the fact that the items that satisfies the minimum exposure requirements in the allocation phase can only be moved to the lower part of the ranklists in the re-sorting phase, leading to their reduced exposure.Then what if we can make the items that satisfies the minimum exposure requirements be only moved to the higher part of the ranklists in the re-sorting phase?
In right side of Table 4b, the allocation phase (step 1-3) starts from (1, 2) and moves to (2, 2), (3, 2).In the appending phase (4-6), similarly, the rest parts of the ranklists are filled.In the re-sorting phase, for consumer 1, item A is moved to rank 1  because of its higher relevance.And similar re-sort is performed for consumer 3 and item B. In this case, the items that satisfies the minimum exposure requirements can only be moved to the higher part of the ranklists, and the minimum exposure guarantee stays intact.The latter ranking strategy is different from the vanilla one because its allocation phase starts from the middle of the ranklist, while the vanilla strategy starts from the top.
Formally, we introduce the definition of Anchor Point: instead of starting from the first customer and the first rank, the vertical allocation starts from ĉℎ consumer and rℎ rank, or anchor point ( ĉ, r ).As we can observe from the example above, the usage of anchor point guarantees the minimum exposure requirements in the vertical allocation algorithms.We cover the detailed algorithm to locate the anchor point in §4.3.1.

VerFair: Algorithm for Amortized Fairness
In this section, we formally propose a fair ranking algorithm which starts from the anchor point to perform vertical allocation.The algorithm can reach both individual fairness and group fairness, which are denoted as VerFair(Ind) and VerFair(Group) respectively.Since individual-level method (i.e., VerFair(Ind)) can be viewed as a special case of group-level method (i.e., VerFair(Group)) when treating each individual item as a unique group, we provide the VerFair(Group) algorithm in Algo. 1.We first introduce how to determine the anchor point, then we walk through three phases of VerFair(Group), namely, the Allocation Phase, the Appending Phase and the Re-sorting Phase.

Determination of Anchor Point.
As discussed in the motivation example in §4.2.2, the anchor point will help guarantee a minimal exposure.In this section, we provide the detailed algorithm to find the anchor point in Algo. 2. To search the anchor point, we start from the last consumer's last rank, i.e. (, ), and move vertically backwards towards the first consumer's first rank, i.e. (1, 1).The search path sequentially includes, (, ), ( − 1, ), . . ., (1, ), (,  − 1), ( − 1,  − 1) . . ., (1,  − 1), (,  − 2) . . . .The search procedure stops when the accumulated exposure quota is met.Formally, the search stops at ( ĉ, r ) when where  denotes the fraction of fair exposure in total exposure,  is the number of consumers,  is length of each ranklist, and  , −  is the exposure (examination probability) of consumer  towards rank  − .The first part of the right side of Eq. ( 13) denotes the total exposure from rank  + 1 ℎ to rank  across all consumers; while the second part of the right side is the total exposure from consumer ĉ +1 to consumer  at rank r .As the search proceeds, there is a point ( ĉ, r ) where total exposure before ( ĉ, r ) is less than the exposure quota, and the exposure  ( ĉ −1, r ) for next point ( ĉ − 1, r ) is greater than or equal to the exposure quota. () stores items that haven't been selected for consumer .The final available set, i.e.Candidate_Set is an intersection between  and  ().Starting from anchor point ( ĉ, r ), i.e.  ℎ consumer and  ℎ rank, we select the most relevant items in _ to fill the ranklists .If Candidate_Set is already empty because quota has been used up, Quota is no longer under consideration.We guarantee the minimum exposure constraint in Eq. ( 8) in this phase and provide theoretical proof in §4.4.2.

4.3.3
The Appending Phase.As discussed in the motivation example of §4.2.2, there exists much empty space after the allocation phase.In the appending phase, for each consumer , we fill the the empty spaces on () with the most relevant items from the items that are not in her current ranklist, i.e., from the feasible set.Note that the selection is no longer constrained by the fair exposure requirement, thus it is purely based on relevance.

4.3.4
The Re-sorting Phase.After the appending phase, for each consumer, her ranklist () is full.We need to re-sort each consumer's ranklist according to personal relevance since it is not sorted according to relevance (example in §4.2.2).After re-sorting, items selected in the allocation phase will only be moved to higher ranks of her ranklist, as shown in §4.2.2, and we assume the exposure will not drop when the item is moved from lower ranks to higher ranks.Thus the minimal exposure guarantee still stays intact after the re-sorting phase.Although VerFair is an offline method, it can also be extended to online ranking setting.For example, we can limit the set of consumers to only the active consumers at a certain timestamp.Here we leave this to future work.() @  ( )   where   [ ] indicates the  ℎ item in ranklist   , 1 is an indicator function,   is the  ℎ rank's examining probability,   is the weight put on rank .We follow Singh and Joachims [47] to set   =   .

Theoretical Analysis
To maximize DCG@  , we should let item of greater  get more exposure, i.e., greater  @  ( ) .When items' exposure is fixed (e.g.,  = 1.0 in Eq. 11), it is important to follow a greedy selection strategy to let item of greater  fulfill its exposure quota at the highest ranks as we can if we think (N) at higher ranks are more important.In line 10 and line 11 of Algorithm 1, VerFair just follows the greedy selection strategy to prioritize allocating top ranks' exposure first, i.e., finish allocating all consumers'  ℎ rank before the ( + 1) ℎ rank.VerFair follows the greedy selection strategy that can maximize top ranks , so it can reach better (N)DCG at top ranks.
4.4.2Proof for Minimum Exposure Guarantee. .In this section, we discuss the exposure allocation error bound between  (the minimum exposure) and the actual allocated exposure Ẽ in Scenario 2:  ∩  (  ) = ∅ does not happen.In other words, line 16 in Algorithm 1 never happens.Considering line 12 and line 20 in Algorithm 1, () − Ẽ () ≥ 0 ∀ ∈ G.And we know that  ∈ G () =  ∈ G Ẽ (), i.e., the actual allocated total exposure should equal the sum of quota.The only situation is that () = Ẽ () ∀ ∈ G.In other words, all groups get the exact exposure according the .
In the above analysis, we have showed the accuracy of minimum exposure guarantee in the allocation phase.And we know that items are in lower ranks in the allocation phase and those items can only be put to higher ranks in the Appending phase and the Re-sorting phase (see examples in Tab. 4).Being put at higher ranks will make minimum exposure Quota better guaranteed.So the accuracy in the allocation phase still holds in the Appending phase and the Re-sorting phase.

EXPERIMENTAL SETUP AND RESULTS
In this section, we will introduce our experimental settings.Implementations will be available online2 .

Experimental Setup
We walk through the detailed experimental setup in this section.5.For the individual fairness setting, we use the Google Local Ratings dataset3 [26] and Yahoo!R3 datasts 4 .In order to use them in our experiment, we need to fill out the missing customer-item pair relevance in the two datasets.Patro et al. [40] already use Matrix Factorization to fill out the Google Local Ratings dataset and can be directly downloaded here 5 .
For Yahoo! R3 datast, following [40], we randomly sampled 1% data to learn a relevance prediction model and predict all the missing customer-item pair relevance scores.Specifically, we use SVD algorithm from Surprise Library6 with learning rate of 5e-3, L2 reguarization coefficient of 2e-2 and 100-d latent factors.The relevance scores are derived after 20 training iterations.Based on the estimated consumer-item pair relevance, we construct and evaluate ranklists for consumers.
For the group fairness setting, we adopt Movielens-Groups dataset preprocessed from MovieLens Datasets (20M) by [37].The missing consumer-item pair relevance are already filled out and made public 7 .It contains 10,000 users and 100 movies from 5 companies/providers.Following the group fairness setting in [37], movies are grouped according to their producer companies; and there are in total 5 groups/providers.This partition matches our definition of provider-side fairness to fairly allocate exposure to providers.
We should note that movies are not grouped by sensitive attributes (e.g., gender, religion, or ethnicity) since fairness caused by sensitive attributes is not our main focus in this work.Instead of protecting candidates/candidate groups with sensitive attributes, we consider the amortized fairness principle [10,47], where candidates of similar relevance should get similar exposure.Extending our work to fairness concerning sensitive attributes is traightforward and we leave for future works.

Task Definition.
Given item set D, consumer set U, the presented ranklist length , the consumer-item pair relevance (,  |∀ ∈ D, ∀ ∈ U), our task is to generate |U| ranklists of length .In other words, we will generate 15400, 11172, and 10000 ranklists for the three datasets in Table 5 respectively.In our experiment,  is set to 10 as default.The goal of the ranklist construction is to achieve better ranking relevance given the same fairness.Based on the consumer-item pair relevance, we use NDCG [27] to evaluate ranking relevance.To evaluate fairness, we use fairness definition in Equation (4).Note that in this work, we only focus on a postprocessing setting where personal relevance is already estimated.As for how to get the relevance estimation, there have been many existing algorithms [4,7,8,34,36,50,58,60,61].

Position Bias.
Following the experiment setting in [37], we use the position-biased model (PBM [14]) to model consumer's examination behavior.In PBM, the probability that a consumer examines an item only depends on its position.We adopt the discount function of NDCG as the consumer's examination probability.For the  ℎ rank in simulation, the examination probability is , where  indicates the severity of position bias.
The greater  is, the more exposure consumers put on top ranks.In our experiment, we adopt the same setup as [10,37] where we assume examination probabilities are already known and all consumers follow the same position bias.As for how to estimate the examination probabilities, many mature methods [5,6,43,53] have been proposed, which is beyond the scope of this paper.

Baseline Methods.
We summarize the methods we will compare in this paper as follows: • Top-k: Select top- items according to personal relevance.Among the above methods, only FairRec and our method VerFair construct ranklists in vertical way while all other methods follow a horizontal setup.All methods above except Top-k, Random-k, and PR-k have tradeoff parameters to tradeoff between ranking relevance and fairness.We have adjustable tradeoff parameter to make balance the weight between relevance and fairness.For example, when the tradeoff parameter is set 0, the minimum value, VerFair(Ind), FairCo, ILP-Pers,VerFair(Group), and FairRec will degenerate to Top-k methods, where they only care about relevance and ignore fairness.For methods Top-k, Random-k, and PR-k, they don't have tradeoff parameters and can't adjust the weight between fairness and ranking relevance.

How does
VerFair perform compared to baselines?Figure 1 shows the tradeoff curves between NDCG and fairness for different methods after we iterate the tradeoff parameters.As there are no tradeoff parameters for Top-k, PR-k and Random-k, their performances are actually points in the graphs.Among the methods, Top-k is the best method for NDCG, while PR-k is the best method for fairness.All amortized fairness methods, i.e., VerFair, ILP-Aver, ILP-Pers, FairCo, and PR-k, can have fairness metrics near 1.0 when the tradeoff parameter reaches its maximum (i.e., bottom right area), which prove their effectiveness to reach fairness.For Random-k, it randomly selects k items and thus can't reach fairness.
As is shown in Figure (1a,1b,1c) and Figure (1i,1j,1k), our method VerFair(Ind) and VerFair(Group) significantly outperform ILP-Pers, ILP-Aver and FairCo in terms of balance between NDCG@3 and fairness under various degrees of position bias severity, i.e.,  = 0, 1, 2. Given the same degree of fairness, VerFair can reach higher NDCG@3 than ILP-Pers, ILP-Aver and FairCo.Given the same NDCG@3, VerFair can get fairer ranklists.ILP-Pers performs better than ILP-Aver because ILP-Pers can perform personalized ranking.In addition, we didn't observe a clear tradeoff between relevance and fairness in FairRec when  is greater than 0. It meets our expectation since FairRec is not an amortized fairness method.
Figure 1d, 1h and 1l show the tradeoff between NDCG@10 and fairness when ranklists evaluated at  = 10.As we can see from the figures, our method VerFair(Ind) and VerFair(Group) show similar or slightly inferior results on long prefixes (@10) compared to FairCo.Given same degree of fairness, our algorithm VerFair focuses more on top ranks and puts more relevant items on top ranks.We believe slight compromise is unavoidable.Since VerFair tends to put more relevant items on top ranks, to keep the same degree of fairness, some relatively irrelevant items will be put at lower ranks.Thus, the advantages of our methods are more significant on top ranks than low ranks.An additional note is that different from NDCG, fairness doesn't need to do cutoff evaluation [10,47] (Eq.4).Since fairness evaluation cares about exposure and it is not reasonable to ignore lower ranks' exposure even if they are small.
Since fair methods at an individual level can automatically reach group fairness, individual fairness methods VerFair(Ind), ILP-Pers, ILP-Aver also reach fairness in group fairness settings, as shown in Figure 1i, 1j, 1k and 1l.However, as individual fairness brings more constraint than group fairness, all individual fairness methods show a dramatic drop in NDCG in Movielens-Groups dataset.

Can
VerFair reach fairness while maintaining good ranking quality?Due to limited space, we only provide analysis on Yahoo R3! dataset.Figure 2a show the density distribution of relevance and exposure by using different methods.The red line with triangles stands for relevance distribution of items.The goal of amortized fairness is that exposure distribution should match the relevance distribution.In other words, a perfect amortized fairness ranking algorithm should produce an exposure distribution that can exactly match the line of Relevance Dist (the red lines with triangles) in  highly overlapping with Relevance Dist.In contrast, distributions of exposure from unfair method Top-k are dramatically different from Relevance Dist., thus showing a huge sacrifice of amortized fairness.We now take a look at customers' satisfaction, i.e., the NDCG distribution in Figure 2b.The unfair method Top-k recommends top k items according to personal relevance, so it can always reach the skyline NDCG, i.e., 1.The highest NDCG is 1 because we focus on post-processing and assume (personal) relevance is already known in this paper.PR-k shows a huge drop in NDCG since it only centers on fairness of item rankings.In contrast, our method VerFair can achieve significantly better NDCG than PR-k while achieving similar amortized fairness.since result exposure distribution curves of VerFair(()) are above the Quota().The two curves overlap when  = 1.0 because all exposure are used to calculate the minimum exposure quota (Eq.7).Also, with VerFair method, items still have a chance to gain more exposure than their quota if they have better personalization to their target users.To demonstrate it, we select items of average relevance within [2.7, 2.8] for Yahoo R3!.We choose this interval as it has the largest number of items.Since the interval is narrow, we can assume those selected items have the same average relevance.Items with greater standard deviation (Std) typically do a higher personalization.As shown in Figure 3b, items usually don't get extra exposure when amortized fairness is strictly maintained, i.e.  = 1.0.When  ≤ 1, we can see a clear positive correlation between std and exposure.Such correlation means our method actually promotes items that are personalized for specific users but not all users.

How does
VerFair perform compared to baselines in terms of computational efficiency?In order to show the efficiency of VerFair, we test the average time (seconds) to generate 1k ranklists, which is shown in Table 6.As shown in the table, ILP methods are NPcomplete [9], both ILP-Aver and ILP-Pers are time-consuming and not likely to satisfy the requirement of large-scale ranking services in practice.While for other methods, theoretically, FairRec, VerFair and FairCo have time complexity as  ( ×  × ) when there are  users,  items, and the length of ranklist is .Among them, FairRec is not an amortized fairness method.FairCo is originally designed for group fairness and is efficient in group settings.However, its efficiency drops when we apply it on individual levels.Empirically, VerFair has better computational efficiency than all the baselines.More comparisons of those methods can be found in related works.
All the experiments are conducted on Intel(R) Xeon(R) CPU E5-2640 (2.4GHz) and 252G of memory.

CONCLUSION AND FUTURE WORK
We propose VerFair with the aim of reaching a better balance between fairness and ranking relevance.With a novel vertical allocation strategy, VerFair can effectively amortize exposure and achieve amortized fairness at both the individual level and the group level.
In the future, we will extend current work to explore further the dynamic interactions among consumers, items, and platforms.
Ranking performance comparison between vertical and horizontal allocation, where we rank  = 2 items for all three consumers according to relevance matrix in Table

5. 1 . 1
Dataset and Preprocessing.The statistics of the three prepossessed datasets are shown in Table

Figure 1 :
Figure1: Tradeoff between fairness (x-axis) and NDCG (y-axis) for ranklists of length  = 10.Our methods VerFair(Ind) and VerFair(Group) are shown in solid lines, while others are dotted lines.Result curves start from top left to bottom right when  increases, i.e., caring more about fairness and sacrificing NDCG.Curves lies in top right show better tradeoff.Top-k, PR-k and Random-k don't have tradeoff parameters so they are points.Note that x-axis is not .

Figure 2 :
Figure 2: Experimental results on Yahoo R3! dataset.(a) and (b) show the density distribution of relevance and exposure (exposure defined in Eq.2 and in Eq.3) where position bias severity  = 1, ranklists length  = 10.Exposure distribution from fair methods PR-k and VerFair ( = 1.0) match relevance distribution (i.e., PR-k and VerFair curves overlap Relevance Dist.curves), while results from unfair method Top-k don't.

Figure 3 :
Figure 3: (a) Exposure distribution along average relevance when  are 0.3, 0.7 and 1.0 respectively.Quota() indicates the relevance induced minimum exposures (see Eq. (7)).(b) Exposure distribution along the deviation for items of similar average relevance (within range [2.7, 2.8]).Items of greater deviation are more tailored to customers.Flat curves indicate discouraging personalization. = 10,  = 1.0 VerFair guarantee minimum exposure?Due to the limited space, we use Yahoo R3! experiments results as an example.As shown in Figure3a, VerFair can guarantee the minimum exposure quota under different  (select  as 0.3, 0.7, 1.0 as examples)

Table 1 :
Comparison of different amortized fairness methods.Attributes include whether they can achieve amortized fairness, work with personal relevance or not, need to do down sampling or not, and their computational complexity.FairRec is not an amortized fairness method, but we still include it here for completeness.

Table 2 :
A summary of notations.The item set D, an item , the consumer set U, a consumer , a group , the group set G.

Table 3 :
Ranking performance comparison between vertical allocation and horizontal allocation.

Table 4 :
Minimal exposure guarantee comparison.Comparison of minimal exposure guarantee between starting from origin and from anchor, where we rank  = 2 items according to relevance matrix defined in Table4a.* indicates the anchor point.

Table 5 :
A detailed statistics of the datasets we use.Individual fairness datasets are used for evaluating methods for individual fairness and they don't contain any groups.| Ẽ () − ()|.In the allocation phase, there exist two possible scenarios in line 12-16 of Algo.1: Scenario 1: There exists a ( * ,  * ) pair where  ∩  (  * ) = ∅.If this happens, ( * , ) will also have  ∩  (  * ) = ∅ since the size of  and   * monotonically decrease for lower rank of the same user.As  (  * ) is the set of unselected items for current user, we know that there are at least  −  items in  (  ), i.e., | (  * )| ≥  − .If  ∩  (  * ) = ∅, those  −  items should not be in  .In other words, there are at least | | −  items whose corresponding group satisfies () − Ẽ () <   * , , i.e., the error is less than   * , .As we assume that  << | | and   * , < 1 , we still claim VerFair can guarantee exposure required by ().