Learning Fair Ranking Policies via Differentiable Optimization of Ordered Weighted Averages

Learning to Rank (LTR) is one of the most widely used machine learning applications. It is a key component in platforms with profound societal impacts, including job search, healthcare information retrieval, and social media content feeds. Conventional LTR models have been shown to produce biases results, stimulating a discourse on how to address the disparities introduced by ranking systems that solely prioritize user relevance. However, while several models of fair learning to rank have been proposed, they suffer from deficiencies either in accuracy or efficiency, thus limiting their applicability to real-world ranking platforms. This paper shows how efficiently-solvable fair ranking models, based on the optimization of Ordered Weighted Average (OWA) functions, can be integrated into the training loop of an LTR model to achieve favorable balances between fairness, user utility, and runtime efficiency. In particular, this paper is the first to show how to backpropagate through constrained optimizations of OWA objectives, enabling their use in integrated prediction and decision models.


Introduction
Ranking models have become a pervasive aspect of everyday life.They are at the center of how people find information online, serving as the main mechanisms by which we interact with products, content, and other people.In these systems, the items to be ranked are videos, job candidates, research papers, and almost anything else.As models based on machine learning, they are primarily trained to provide maximum utility to users, by serving the results deemed most relevant to their search queries.In the modern economy of information, the position of an item in the ranking has a strong influence on its exposure, selection, and, ultimately its economic success.
Because of this influence, increasing attention has been placed on the disparate impacts of ranking systems on underrepresented groups.In these data-driven systems, the relevance of an item is measured by implicit feedback from users such as clicks and dwell times.As such, the disparate impacts of rankings can go well beyond their immediate effects.Disproportionate exposure in rankings results leads to higher selection rates, in turn boosting relevance scores based on implicit feedback [Yadav et al., 2019, Sun et al., 2020].This can create self-reinforcing feedback loops, leading to winner-take-all dynamics.The ability to control these disparate impacts is essential in order to avoid reinforcement of systemic biases, ensure the health and stability of online markets, and implement anti-discrimination measures [Edelman et al., 2017, Singh andJoachims, 2019].
When search results are ranked purely based on relevance, disparate exposure between groups may be greatly increased in order to achieve marginal gains in relevance.For example, in a job search system it is possible for male candidates to receive overwhelmingly more exposure even when female candidates may have been rated only marginally lower in relevance.It has indeed been found in [Elbassuoni et al., 2019] that in a job candidate ranking system, small differences in relevance can lead to large differences in exposure for candidates from a minority group.Thus, fairness-aware ranking models often suffer little to no degradation to user utility when compared to their conventional counterparts [Zehlike and Castillo, 2020].
Figure 1: The differentiable optimization module proposed in SOFaiR.Its forward pass is calculated by an efficient Frank-Wolfe method, and its backward pass computes the SPO+ subgradient of the OWA problem's regret due to prediction error.
On the other hand, these fair learning to rank models are more difficult to design, since they require the outputs of a machine learning model to obey potentially complex constraints while simultaneously achieving high relevance.For this reason, conventional learning to rank methods are often maladjusted to incorporate fairness.For example, the popular listwise learning to rank method, through modifications to its loss function, is only capable of modeling fairness of exposure in the top ranking position [Zehlike and Castillo, 2020].In an alternative paradigm, first investigated by Kotary et al. [2022], a fair ranking linear programming model is integrated with LTR in an end-to-end training process.By incorporating fair ranking optimization into the model's training loop, rather than in post-processing, utility of the downstream fair ranking policies can be maximized as a loss function.This allows to provide fairness guarantees at the level of each predicted ranking policy, and precise control over the fairness-utility trade-off.However, the method comes with a significant computational cost, as it requires solving a large optimization problem for each sample in each training iteration, challenging its application to real-world ranking systems.A further limitation of fair LTR systems, including that of [Kotary et al., 2022], is their inability to effectively deal with multi-group fairness criteria (i.e., going beyond binary group treatment), which are overwhelmingly common in real world applications.
Contributions.To address these limitations, this paper makes the following novel contributions: (1) It shows how to adopt an alternative approach based on Ordered Weighted Averages (OWA) to design efficient policy optimization modules for the fair learning to rank setting.(2) For the first time, it shows how to backpropagate gradients through the highly discontinuous optimization of OWA functions, enabling its use in end-to-end learning.(3) The resulting end-to-end optimization and learning scheme, called Smart OWA Optimization for Fair Learning to Rank (SOFaiR), is compared with contemporary fair LTR methods, demonstrating not only substantial advantages in fairness over previous fair LTR, but also advantages in efficiency and modeling flexibility over the end-to-end fair LTR scheme of [Kotary et al., 2022].A schematic illustration of the proposed scheme is depicted in Figure 1.
These contributions are significant: They demonstrate that by incorporating modern fair ranking optimization techniques, the integration of post-processing optimization models in end-to-end LTR training can be a viable and scalable paradigm to achieve highly accurate learning to rank system that also provides strong fairness properties.

Preliminaries
Throughout the paper, vectors and matrices are denoted in bold font.The inner product of two vectors a and b is written a T b, while the outer product is a b T .For a matrix M, the vector − → M is formed by concatenation of its rows.A hatted vector â is the prediction of a machine learning model, and a starred vector a ⋆ is the optimal solution to some optimization problem.The list of integers {1 . . .n} is written [n].When a ∈ R n and σ is a permutation of [n], a σ is the corresponding permuted vector.The vectors of all ones and zeros are denoted 1 and 0, respectively.Commonly used symbols throughout the paper are organized in Table 1 for reference.

Problem Setting and Goals
Given a user query, the goal is to predict a ranking over n items, in order of most to least relevant, with respect to the query.Relevance of each item to be ranked, with respect to a search query q, is generally measured by a vector of relevance scores y q ∈ R n , often modeled on the basis of empirical observations such as historical click rates [Xu et al., Symbol Semantic

Size of the training dataset n
Number of items to be ranked m Number of protected groups List of feature embeddings for items to rank, given query q a q = (a i q ) n i=1 Protected groups associated with items x i q y q = (y i q ) n i=1 Relevance scores for each of n items given query q G The set of all protected group indicators M θ End-to-end trainable fair ranking model with weights θ Symbol Semantic σ A permutation of the list [n] for some n P A permutation matrix corresponding to some σ P n The set of all permutations of [n] τ The sorting operator Π A ranking policy, or its representative bistochastic matrix u(Π, y) Expected utility of policy Π under relevance scores y B Birkhoff Polytope, the convex set of all ranking policies E(i, σ) Exposure of item i Table 1: Common symbols adopted throughout the paper.
2010].This setting considers a ground-truth dataset (x q , a q , y q ) N q=1 , where x q ∈ X is a list of feature vectors (x i q ) n i=1 , one for each of n items to be ranked in response to query q. a q = (a i q ) n i=1 is a vector that indicates which (protected) group g within domain G to which each item belongs.y q = (y i q ) n i=1 ∈ Y is a vector of relevance scores, for each item with respect to query q.For example, on an image web-search context as depicted in Figure 1, a query denotes the search keywords, e.g., "CEO", the vectors x i q in x q are feature embeddings for the images relative to q, each associated with a gender (attribute a i q ), and the associated relevance scores y i q describe the relevance of item i to query q.Rankings can be viewed as permutations which rearrange the order of a predefined item list.Intermediate between the user input and final ranking is often a ranking policy which produces discrete rankings (randomly or deterministically).
Learning to Rank.In learning to rank (LTR), a ML model M θ is often adopted to estimate relevance scores ŷq of items given their features x q relative to user query q (see Figure 1).From this a ranking policy Π is constructed.Its where Π is viewed as a distribution from which rankings σ are sampled randomly, and their utility ∆ is a measure of the overall relevance of a given ranking σ, with respect to given relevance scores y q .Although its framework is applicable to any linear utility metric ∆ for rankings, this paper uses the widely adopted Discounted Cumulative Gain (DCG): where P (σ) is the corresponding permutation matrix, y q are the true relevance scores, and b is a position bias vector which models the probability that each position is viewed by a user, defined with elements b j = 1 /log 2 (1+j), for j ∈ [n].
Ranking policy representation.The methods of this paper adopt a particular representation of the ranking policy, as bistochastic matrix Π ∈ R n×n , where Π jk indicates the probability that item j takes position k in the ranking.The set of feasible ranking policies is expressed as Π ∈ B where B is the Birkhoff Polytope: Its conditions on a matrix Π require, in the order of (3), that each column of Π sums to one, each row of Π sum to one, and each element of Π lie between 0 and 1.Each of these conditions is a linear constraint on the variables Π.
Linearity of the DCG function (1) w.r.t.P allows it to commute with the expectation, leading to the practical closed form u(Π, y) = y ⊤ Π b for u as a linear function of Π: This is an important observation that enables the constrained optimization of utility functions on the policy Π in end-to-end differentiable pipelines, as discussed later in the paper.

Fairness of Exposure
Item exposure is commonly adopted in ranking systems, where items in higher ranking positions receive more exposure, and it is with respect to this metric that fairness is concerned.This paper aims at learning ranking policies that satisfy group fairness of exposure, while maintaining high relevance to user queries.The exposure E(i, σ) of item i within some ranking σ is a function of only its position, with higher positions receiving more exposure than lower ones.Throughout the paper, the common modeling choice E(i, σ) = b σi .
Notions of item exposure in rankings can also be extended to group exposure in ranking policies.The exposure of group g in ranking σ is measured by the mean exposure in σ of items belonging to g.The exposure of group g in ranking policy Π is the mean value of its exposure over all rankings sampled from the policy: and we let E G (Π) be the vector of values (5) for each g in G. Derived similarly to (4), linearity of E leads to a closed form for (5) when Π is represented by a bistochastic matrix, where 1 g indicates 1 for items in g and 0 elsewhere [Singh and Joachims, 2018]: Imposing fairness in LTR.It is well-known that individual rankings σ, as discrete structures, cannot exactly satisfy most notions of individual or group fairness [Zehlike et al., 2017].Therefore a common strategy in fair ranking optimization is to view ranking policies as random distributions of rankings, upon which a feasible notion of fairness can be imposed in expectation [Zehlike et al., 2017, Singh and Joachims, 2018, Do and Usunier, 2022].For ranking policy Π and query q, fairness of exposure requires that every group indicated by g ∈ G receives equal exposure on average over rankings produced by the policy.This condition can be expressed by requiring that the average exposure among items of each group is equal to the average exposure among all items: where α is the group containing all items.Enforcing the condition (7) on each predicted policy Π is the mechanism by which protected groups are ensured equal exposure in SOFaiR.In the image search example, it corresponds to male and female candidates receiving equal exposure on average over rankings sampled from Π.The violation of fairness with respect to group g is measured by the absolute gap in this condition: Note that group fairness encompasses individual item fairness is a special case, where each item belongs to a distinct group.While the fairness and utility metrics described above are the ones used throughout the paper, the methodology of the paper is compatible with any alternative metrics u and E which are linear functions of the policy Π.This is because the methodology of Sections 5 and 6 depend on linearity of ( 4) and ( 6).

Limitations of Fair LTR Methods
Current Fair LTR models present a combination of the following limitations:  Regardless of how the policy is represented, fair learning to rank methods typically train a model M θ to find parameters θ * that maximize its empirical utility, along with possibly a weighted penalty term F which promotes fairness: For example, the fair LTR method of Zehlike and Castillo [2020] (called DELTR) is based on listwise learning to rank [Cao et al., 2007], and thus uses the model M θ to predict activation scores per each individual item, over which a softmax layer defines the probabilities of each item taking the top ranking position.Thus, Zehlike and Castillo [2020] can only use F to encourage group fairness of exposure in the top position, leading to poor overall satisfaction of the fairness condition (7) (limitation A) as illustrated in Figure 2. To impose fairness over all ranking positions, [Singh and Joachims, 2019] (FULTR) also uses softmax over the activations of M θ to define probabilities, which are sampled without replacement to generate rankings using a policy gradient method.However, this penalty-based method still does not ensure fairness in each predicted policy, as illustrated in Figure 2, since the penalty is imposed only on average over all predicted policies (limitation A).By a similar reasoning, these methods do not translate naturally to the case of multigroup fairness, where m > 2 (limitation B): Because the penalty F must scalarize the collection of all group fairness violations (8) (by taking their overall sum), it is possible to reduce F while increasing the exposure of a single outlier group [Kotary et al., 2022].
Later work [Kotary et al., 2022] shows how to overcome limitation A, by integrating the fair ranking optimization model of Singh and Joachims [2018] together with prediction of relevance scores ŷq = M θ .The modeling of predicted policies Π as solutions to an optimization problem under fairness constraints allows for their representation as bistochastic matrices which satisfy the fairness notions (7) exactly.However, this method suffers limitation C as it requires to solve a linear programming problem at each iteration of training and at inference, whose number of variables in Π ∈ R n×n scales quadratically as O(n 2 ) becoming prohibitively large as the item list grows.Additionally, at inference time, the policy must be sampled to produce rankings; this requires a Birhoff-Von Neumann (BVM) decomposition of the matrix Π into a convex combination of permutation matrices, which is also expensive when n is large [Singh and Joachims, 2018].Finally, in the case of multiple groups (m > 2), the fairness constraints can become infeasible, making this formulation unwieldly (limitation B).An extended review of related work is provided in Section 8.
Figure 2 shows the query-level fairness violations due to each method discussed in this section, where fairness parameters in each case are increased maximally without substantially compromising utility.In addition to higher average violations, penalty-based methods [Zehlike andCastillo, 2020, Singh andJoachims, 2019] also lead to prevalence of outliers.These three existing fair LTR methods are used as baselines for comparison in Section 7. The SOFaiR framework proposed next most resembles [Kotary et al., 2022], as it combines learning of relevance scores end-to-end with constrained optimization.At the same time, it aims to improve over [Kotary et al., 2022] by addressing the three main limitations stated above.By integrating an alternative optimization component with its predictive model, SOFaiR can achieve faster runtime, and avoid the BVM decomposition at inference time, while naturally accommodating fairness over an arbitrary number of groups.
4 Smart OWA Optimization for Fair Learning to Rank (SOFaiR) This section provides an overview of the proposed SOFaiR framework for learning fair ranking policies that overcomes limitations A, B, and C. Sections 5 and 6 will then detail the core solution approaches required to incorporate its proposed fair ranking optimization module into efficient, end-to-end trainable fair LTR models.As illustrated in Figure 1, SOFaiR's core concept is to integrate the learning of relevance scores with a module which optimizes fair ranking policies in-the-loop.By doing so it achieves a favorable balance of fairness and utility relative to other in-processing methods.The key difference in its approach relative to [Kotary et al., 2022] is in the design of its optimization model which leverages Ordered Weighted Average (OWA) objectives (reviewed next) to enforce fairness of exposure.By avoiding the imposition of fairness of exposure (see Equation ( 7)) as a set of hard constraints on the optimization as in [Kotary et al., 2022, Singh andJoachims, 2018], it maintains the simple feasible region Π ∈ B, over which efficient Frank-Wolfe based solution methods can be employed to optimize its OWA objective function as described in Section 5.In turn, the particular form of the OWA optimization model in-the-loop necessitates a novel technique for its backpropagation, detailed in Section 6.The OWA aggregation and its fairness properties in optimization problems are introduced next, followed by its role in the SOFaiR learning framework.

Ordered Weighted Averaging Operator
The Ordered Weighted Average (OWA) operator [YAGER, 1993] has found applications in various decision-making fields [Yager and Kacprzyk, 2012] as a means of fairly aggregating multiple objective criteria.Let x ∈ R m be a vector of m distinct criteria, and τ : R m → R m be the sorting map for which τ (x) ∈ R m holds the elements of x in increasing order.Then for any w satisfying {w ∈ R m : i w i = 1, w ≥ 0}, the OWA aggregation with weight w is defined as a linear functional on τ (x): which is convex and piecewise-linear in x [Ogryczak and Śliwiński, 2003].The so-called Generalized Gini Functions, or Fair OWA, are those for which the OWA weights w 1 > w 2 . . .> w n are decreasing.Fair OWA functions possess the following three key properties for fairness in optimizing multiple criteria [Ogryczak and Śliwiński, 2003].
(1) Impartiality means that all criteria are treated equally, in the sense that OWA w (x) = OWA w (x σ ) for any σ ∈ P m .
(2) Equitability is the property that marginal transfers from a criterion with higher value to one with lower value results in an increase in aggregated OWA value.That is , when x i > x j + ϵ and letting x ϵ = x except at positions i and j where (x ϵ ) i = x i − ϵ and (x ϵ ) j = x j + ϵ, it holds that OWA w (x ϵ ) > OWA w (x).
(3) Monotonicity means that OWA w (x) is an increasing function of each element of x.The monotonicity property implies that solutions which optimize (10) are Pareto Efficient solutions of the underlying multiobjective problem, thus that no single criteria can be raised without reducing another [Ogryczak and Śliwiński, 2003].Taken together, it is known that maximization of aggregation functions which satisfy these three properties produces so-called equitably efficient solutions, which possess the main intuitive properties needed for a solution to be deemed "fair"; see [Kostreva and Ogryczak, 1999] for a formal definition.As shown next, the SOFaiR framework ensures group fairness by leveraging a fair OWA aggregation of group exposures OWA w (E G (Π)) in the objective function of its integrated fair ranking optimization module.

End-to-End Learning in SOFaiR
As illustrated in Figure 1, the SOFaiR framework uses a prediction model M θ with learnable weights θ, which produces relevance scores ŷq from a list of item features x q .Its key component is an optimization module which maps the prediction ŷq to an associated ranking policy Π ⋆ ( ŷq ).The following optimization problem defines Π ⋆ ( ŷq ) as the ranking policy which optimizes a trade-off between fair OWA aggregation of group exposures with the expected DCG (as per Equation ( 4)) under relevance scores ŷq .In SOFaiR, it defines, for any chosen weight 0 ≤ λ ≤ 1, a mapping which can be viewed akin to a neural network layer, representing the last layer of M: wherein the Birkhoff Polytope B is the set of all bistochastic matrices, as defined in Equation ( 3).Let the objective function of (11) be named f (Π, ŷq ).It is a convex combination of two terms measuring user utility and fairness, whose trade-off is controlled by a single coefficient 0 ≤ λ ≤ 1.The former term measures expected user utility u(Π, ŷq ) = ŷ⊤ q Π b, while the latter term measures OWA aggregation of the group exposures.It is intuitive to see that when λ = 1, the optimization (11) returns a ranking policy that minimizes disparities in group exposure, without regard for relevance.When λ = 0, it returns a deterministic policy which ranks the items in order of the estimated scores ŷq .Intermediate values 0 < λ < 1 result in policies which trade off the effects of each term, balancing utility and fairness to various degrees.As λ increases, disparity between the exposure of protected groups must decrease; this leads to a practical mechanism for achieving a desired level of fairness with minimal compromise to utility.
Since Equation ( 11) defines a direct mapping from ŷq to Π ⋆ ( ŷq ), the problem of learning fair ranking policies reduces to a problem of learning relevance scores.This corresponds to estimating the objective function f via its missing coefficients y q .The SOFaiR training method defines a loss function between predicted and ground-truth relevance scores, as the loss of optimality in Π ⋆ ( ŷq ) with respect to objective f under ground-truth y q , caused by prediction error in ŷq .That is, the training objective is to minimize regret in f induced by ŷq , defined as: The composition Π ⋆ • M θ defines an integrated prediction and optimization model which maps item features to fair ranking policies.Training the integrated model by stochastic gradient descent follows these steps in a single iteration: 1.For sample query q and item features x q , a predictive model M θ produces estimated relevance scores ŷq .2. The predicted scores ŷq are used to populate the unknown parameters of an optimization problem (11).A solution algorithm is employed to find Π * (ŷ q ), the optimal fair ranking policy relative to ŷq .3. The regret loss ( 12) is backpropagated through the calculations of steps (1) and (2), in order to update the model weights θ by a gradient descent step.
The following sections detail the main solution schemes for implementing steps (2) and (3).Section 5 shows how recently proposed fair ranking optimization techniques from [Do and Usunier, 2022] can be adapted to the setting of this paper, in which fair ranking policies must be learned from empirical data.From this choice of optimization design arises a novel challenge in the backpropagation step (3), since no known work has shown how to backpropagate the regret of a highly discontinuous OWA optimization program.Section 6 shows how to efficiently backpropagate the regret due to problem (11) for end-to-end learning.Then, Section 7 evaluates the SOFaiR framework against several other methods for learning fair ranking policies, on a set of benchmark tasks from the web search domain.

Forward Pass Optimization
The main motivation for the formulation (11) of SOFaiR's fair ranking optimization layer is to render the optimization problem efficiently solvable.Its main exploitable attribute is its feasible region Π, over which a linear objective function can be quickly optimized by simply sorting a vector in R n , which has time complexity n log n [Cormen et al., 2022].This suggests an efficient solution by Frank-Wolfe methods, which solve a constrained optimization problem by a sequence of subproblems optimizing a linear approximation of the true objective function [Beck, 2017].This efficient solution pattern is made possible by the absence of additional group fairness constraints on the policy variable Π.
Frank-Wolfe methods solve a convex constrained optimization problem argmax x∈S f (x) by computing the iterations Optimization Layer  Convergence to an optimal solution is guaranteed when f is differentiable and with α (k) = 2 k+2 [Beck, 2017].However, the main obstruction to solving (11) by the method ( 13) is that f in our case includes a non-differentiable OWA function.A path forward is shown in [Lan, 2013], which shows convergence can be guaranteed by optimizing a smooth surrogate function f (k) in place of the nondifferentiable f at each step of ( 13), in such a way that the f (k) converge to the true f as k → ∞.
It is proposed in [Do and Usunier, 2022] to solve a two-sided fair ranking optimization with OWA objective terms, by the method of [Lan, 2013], where f (k) is chosen to be a Moreau envelope h β k of f , a 1 β k -smooth approximation of f defined as [Beck, 2017]: When f = OWA w , let its Moreau envelope be denoted ∇OWA β w ; it is shown in [Do and Usunier, 2022] that its gradient can be computed as a projection onto the permutahedron induced by modified OWA weights w = −(w m , . . ., w 1 ).By definition, the permutahedron C( w) = CONV({w σ : ∀σ ∈ P m }) induced by a vector w is the convex hull of all its permutations.In turn, it is shown in [Blondel et al., 2020] that the permutahedral projection ∇OWA β w (x) = proj C( w) ( x /β) can be computed in m log m time as the solution to an isotonic regression problem using the Pool Adjacent Violators algorithm.To find the overall gradient of OWA β w with respect to optimization variables Π, a convenient form can be derived from the chain rule: where µ = proj C( w) ( E(Π) /β) and E(Π) is the vector of all item exposures [Do and Usunier, 2022].For the case where group exposures E G (Π) are aggregated by OWA, first note that by Equation 6, E G (Π) = AΠb, where A is the matrix composed of stacking together all group indicator vectors (16) by the chain rule, and where μ = proj C( w) ( E G (AΠ) /β).It remains now to compute the gradient of the user relevance term u(Π, ŷq ) = ŷT q Π b in Problem 11.As a linear function of the matrix variable Π, its gradient is ∇ Π u(Π, ŷq ) = ŷq b T , which is evident by comparing to the equivalent vectorized form ŷT Combining this with (16), the total gradient of the objective function in (11) with smoothed OWA term is To implement the Frank-Wolfe iteration (13), this linearized subproblem should have an efficient solution.To this end, the form of each gradient above as a cross-product of some vector with the position biases b can be exploited.Note that as the expected DCG under relevance scores y, the function y T Π b is maximized by the permutation matrix P ∈ P n which sorts the relevance scores y decreasingly.But since y Output: ranking policy Π (T ) ∈ R n×n 1 Initialize Π (0) as P ∈ P which sorts ŷ in decreasing order; Therefore problem ( 17) can be solved in O(n log n), simply by finding P ∈ P n as the argsort of the vector ((1 − λ) • ŷq + λ • (A T μ)) in decreasing order.A more formal proof, cited in [Do et al., 2021], makes use of [Hardy et al., 1952].
The overall method is presented in Algorithm 1. Decay of the smoothing parameter β t = β0 √ t satisfies the conditions for convergence stated in [Lan, 2013] when β 0 is sufficiently large.Sparse matrix additions each require O(n) operations, so that Algorithm 1 maintains O(n log n) complexity per iteration.An important advantage of Algorithm 1 over the fair ranking LP employed in SPOFR [Kotary et al., 2022], is that the solution iterates P (k) automatically provide a decomposition of the policy matrix Π = ρ k P (k) as a convex combination of rankings, by which it can be readily sampled as a discrete probability distribution.In contrast, the LP module used in SPOFR [Kotary et al., 2022] provides as its solution only a matrix Π ∈ B, which must be decomposed using the Birkhoff Von Neumann decomposition, adding substantially to its total runtime.

Backpropagation
The formulation of the optimization module (11) allows for efficient solution via Algorithm 1, but gives rise to a novel challenge in backpropagating the regret loss function through Π ⋆ ( ŷq ).By including an OWA aggregation of group exposure, its objective function is nonlinear and nondifferentiable.This section shows how to train the integrated prediction and OWA optimization model Π ⋆ • M θ to minimize the regret loss (12), despite this challenge.As a starting point, we recognize the existing literature on "Predict-Then-Optimize" frameworks [Kotary et al., 2021, Mandi et al., 2023] for minimizing the regret due to prediction error in the objective coefficients of a linear program, denoted c below: x Several known methods have been proposed [Elmachtoub and Grigas, 2021, Pogančić et al., 2020, Wilder et al., 2019, Berthet et al., 2020] and well-established in the literature [Mandi et al., 2023] for end-to-end training of combined prediction and optimization models employing (18).Due to its OWA objective term, the fair ranking module (11) does not satisfy the LP form (18) for which the aforementioned methods are taylored.The implementation of SOFaiR described here uses the "Smart Predict-Then-Optimize" (SPO) approach [Elmachtoub and Grigas, 2021], since its simple backpropagation rule requires only a solution to (18) using a blackbox solution oracle.This allows its adaptation to the OWA optimization setting by constructing (but not solving) an equivalent but intractable linear programming form to (11), as shown next.
End-to-End learning with SPO+ Loss.Viewed as a loss function, the regret (12) in solutions to problem ( 18) is nondifferentiable and discontinuous with respect to predicted coefficients ĉ, since solutions x ⋆ (c) must occur at one of finitely many vertices in Ax ≤ b.The SPO+ loss function proposed in [Elmachtoub and Grigas, 2021] is by construction a Fischer-consistent, subdifferentiable upper bound on regret.In particular, it is shown in [Elmachtoub and Grigas, 2021] that possesses these properties, and a subgradient at ĉ is Minimizing the surrogate loss ( 19) by gradient descent using ( 20) is key to minimizing the solution regret in problem (18) due to error in a predictive model which predicts the parameter c.
SPO+ loss in SOFaiR.We now show how the SPO training framework described above for the problem type ( 18) can be used to efficiently learn optimal fair policies in conjunction with problem (11).The main idea is to derive an SPO+ subgradient for regret in (11), through an equivalent linear program ( 18), but without solving it as such.This is made possible by the fact that the subgradient (20) can be expressed as a difference of two optimal solutions, which can be furnished by any optimization oracle which solves the mapping (11), which includes Algorithm (1).
First note, as it is shown in [Ogryczak and Śliwiński, 2003], that the OWA function ( 10) can be expressed as OWA w (r) = min σ∈P w σ • r, (21) and as an equivalent linear programming problem which views the minimum inner product above as the maximum lower bound among all possible inner products with the permuted OWA weights: ) where P contains all possible permutations of [n] when w ∈ R n .This allows SOFaiR's OWA optimization model ( 11) to be recast in a linear programming form using auxiliary optimization variables r and z: According to [Ogryczak and Śliwiński, 2003], this alternative LP form of OWA optimization is mostly of theoretical significance, since the set of constraints (23b) grows factorially in the size of r, one for each possible permutation thereof.This makes ( 23) impractical for computing a solution to the original OWA problem ( 11), which we instead solve by Algorithm 1.On the other hand, we show that problem ( 23) is practical for deriving a backpropagation rule through the OWA problem (11).
Since the unknown parameters ŷq appear only in its linear objective function, this parametric LP problem ( 23) fits the form (18) required for training with SPO+ subgradients.To derive the subgradient explicitly, rewrite the linear objective Then in terms of the augmented variables (Π, r, z), the objective function ( 23a) is Now the SPO+ loss subgradient can be readily expressed with respect to the augmented scores γ defined as above: using ( 20), and where γ is the augmented score based on ground-truth y q .Finally, backpropagation from γ to the base prediction ŷq = M θ (x q ) is performed by automatic differentiation, and likewise from ŷq to the model weights θ.
Both terms in ( 25) can be produced by using Algorithm 1 to solve (11) for Π ⋆ .Then, the remaining variables r ⋆ and z ⋆ are easily completed as groups exposures r = E G (Π ⋆ ) and their associated OWA value z, respectively.Importantly, the rightmost term of ( 25) is independent of any prediction; therefore it is precomputed in advance of training.Thus, backpropagation using (25) consists of computing the difference between two solutions, one of which comes from the forward pass and while the other is precomputed before training.The complexity of this backward pass consists of O(n 2 ) subtractions, which grows only linearly in the size of the matrix variable Π ∈ R n×n .The differentiable fair ranking optimization module of SOFaiR, with its forward and backward passes, is summarized in Figure 3.

Experiments
Next we evaluate SOFaiR against two prior in-processing methods [Singh andJoachims, 2019, Zehlike et al., 2017], and the end-to-end framework [Kotary et al., 2021], denoted as FULTR, DELTR, and SPOFR, respectively.We assess the performance on two datasets: • Microsoft Learn to Rank (MSLR) is a standard benchmark for LTR with queries from Bing and manuallyjudged relevance labels.It includes 30,000 queries, each with an average of 125 assessed documents and 136 ranking feature.Binary protected groups is defined using the 50th percentile of QualityScore attribute.For multi-group cases, group labels are defined using evenly-spaced quantiles.
• Yahoo!Learning to Rank Challenge (Yahoo LETOR) contains 19,944 queries and 473,134 documents with 519 ranking features.Binary protected groups is defined using feature id 9 following [Jia and Wang, 2021] and the 50th percentile as the threshold.
For MSLR, we randomly sample 10,000 queries for training and 1,000 queries each for validation and testing.We create datasets with varying list sizes (20, 40, 60, 80, 100 documents) for MSLR and (20, 40 documents) for Yahoo LETOR.

Models and hyperparameters.
A neural network (NN) with three hidden layers is trained using Adam Optimizer with a learning rate of 0.1 and a batch size of 256.The size of each layer is halved, and the output is a scalar item score.Results of each hyperparameter setting is are taken on average over five random seeds.
Fairness parameters, considered as hyperparameters, are treated differently.LTR systems aim to offer a trade-off between utility and group fairness, since the cost of increased fairness results in decreased utility.In DELTR, FULTR, and SOFaiR, this trade-off is indirectly controlled through the fairness weight, denoted as λ in ( 9) and ( 11).Larger values of λ indicate more preference towards fairness.In SPOFR, the allowed violation (8) of group fairness is specified directly.Ranking utility and fairness violation are assessed using average DCG ( Equation ( 1)) and fairness violation (Equation ( 8)), respectively.The metrics are computed as averages over the entire test dataset.

Running Time Analysis
Our analysis begins with a runtime comparison between SOFaiR and other LTR frameworks, to show how it overcomes limitation C, described in Section 3. Figure 4 shows the average training and inference time per query for each method, focusing on the binary group MSLR dataset across various list sizes.First notice the drastic runtime reduction of SOFaiR compared to SPOFR, both during training and inference.While SPOFR's training time exponentially increases with the ranking list size, SOFaiR's runtime increases only moderately, reaching over one order of magnitude speedup over SPOFR for large list sizes.Notably, the number of iterations of Algorithm 1 required for sufficient accuracy in training to compute SPO+ subgradients are found to less than those required for solution of (11) at inference.Thus the reported results use 100 iterations in training and 500 at inference.Importantly, reported runtimes under-estimate the efficiency gained by SOFaiR, since its PyTorch [Paszke et al., 2017] implementation in Python is compared against the highly optimized code implementation of Google OR-Tools solver [Perron, 2011].DELTR and FULTR, as penalty-based methods, are more competitive in runtime.However, this comes at a cost of the achieved fairness level (limitation A), as shown in the next section.

Fairness and Utility Tradeoffs Analysis
Next, we focus on comparing the utility and fairness of the various LTR frameworks analyzed.This section focuses on the two-group case, as none of the methods compared against was able to cope with multi-group case in our experiments (see next section).Figure 5 presents the trade-off between utility and fairness across the test sets for both Yahoo LETOR and MSLR datasets, encompassing their lowest and highest list sizes.For each method, the intensity of colors  represents the magnitude of its fairness parameter.A progression from lighter to darker colors indicates an increase in the importance placed on fairness.Consequently, darker colors are expected to correspond with more restrictive models, characterized by lower DCG scores (y-axis) but also fewer fairness violations (x-axis).Each point in the figure represents the largest DCG score obtained from a fairness hyperparameter search, as detailed in Appendix B. Note that points on the grid that are higher on the y-axis and lower on the x-axis represent superior results.
Firstly, notice that most points associated with methods DELTR and FULTR are clustered in a small region with both high DCG and (log-scaled) fairness violations.While these methods reach an order of magnitude reduction in fairness violation on some datasets, the effect is inconsistent, especially as the item list size increases (limitation A).In contrast, the end-to-end methods (SPOFR and the proposed SOFaiR) reach much lower fairness violations, underlining their effectiveness of their optimization modules in enforcing the fairness constraint.
Both DELTR and FULTR reach competitive utilities, but they consistently display relatively high fairness violations, underscoring their limitations in providing a fair ranking solution.SOFaiR shows competitive fairness and utility performance compared to SPOFR, with a marked advantage in utility on some datasets.SPOFR ensures fairness but at the expense of efficiency, whereas SOFaiR reaches similar fairness levels at a fraction of the required runtime.Additional results on datasets of various list sizes are included in Appendix A.2.

Multi-Group Fairness Analysis
Finally, this section analyzes the fairness-utility trade-off in multi-group scenarios using the SOFaiR framework.The SPOFR method returns infeasible solutions for most chosen fairness levels when multiple groups are introduced, preventing its evaluation on these datasets; this is naturally avoided in SOFaiR as the optimization of OWA aggregation without constraints simply increases fairness to the extent feasible.While FULTR provides no code to evaluate multigroup fairness, its penalty function is in principle ill-equipped to handle multiple groups as it must scalarize all group fairness violations into a single loss function as mentioned in Section 3 (limitation C).Each data point represents a single model's performance, with fairness parameters λ adjusted between 0 and 1. Models prioritizing fairness show reduced fairness violations and lower utilities, indicated by darker colored points, compared to those with a lower emphasis on fairness, represented by lighter colored points.A distinct trend is observed: as

Related Work
This paper is concerned with learning ranking models under fairness constraints, a problem for which a variety of methods have been developed in the web search domain.Unlike most previous works in this domain, its main solution framework follows the Predict-Then-Optimize (PtO) paradigm, in which optimization models are trained end-to-end with neural networks.This section surveys relevant works in the application area, before giving a brief overview of related works with respect to the PtO methodology.
Learning Fair Ranking Policies.Recent advancements in learning fair ranking policies fall into three main categories based on the incorporation of fairness criteria into the training pipeline.In the pre-processing category, methods focus on mitigating bias in training data by transforming input data to make representations similar Lahoti et al. [2019a,b].
For instance, Lahoti et al. [2019a] aims for fairness by ensuring indistinguishable item pairs on non-sensitive attributes become nearly indistinguishable in their fair representation.On the other hand, post-processing methods refine rankings or scores after training to meet fairness criteria Zehlike et al. [2022a].Examples include Zehlike et al. [2020], which utilizes optimal transport to post-process the model's output scores for a fair score distribution, and works like Zehlike et al. [2017Zehlike et al. [ , 2022b]], Geyik et al. [2019], which enforce a minimum portion of protected members in the top k ranking using statistical testing and rule-based selection.While Singh and Joachims [2018] addresses different fairness criteria, its scalability is limited to a small item list due to optimization system complexity.In contrast, methods like Zehlike et al.
[2022b] work with specific fairness definitions and can scale to large item lists, as demonstrated on LinkedIn search.Finally, Do and Usunier [2022] proposes an efficient optimization method for solving a two-sided fair ranking problem.
Post-processing methods guarantee fairness but may sacrifice accuracy in user relevance.Since post-processing methods model the fair ranking policy separately from the learning of relevance scores, they generally guarantee fairness satisfaction at the cost of accuracy in terms of user relevance.
In-processing methods aim for improved accuracy-fairness trade-offs by integrating fairness criteria into the LTR training loop Zehlike et al. [2022a].Typically, these methods enhance the training loss function with penalties for fairness violations, allowing the models to strike a balance between accuracy and fairness.However, it is common for the fairness criteria to be imperfectly satisfied.In this context, Zehlike and Castillo [2020] and Singh and Joachims [2019] focus on equal group exposures, ensuring visibility of items at lower rankings.Additionally, Beutel et al. [2019] employs pairwise accuracy to assess if candidates at higher ranks have higher relevant scores, while Bower et al. [2021], Kamishima et al. [2018] guarantee a similar ranking policy for queries or items that differ in sensitive attributes.
End-to-End Prediction and Optimization.Recent literature has been developed around constrained optimization models that are trained end-to-end with machine learning models Kotary et al. [2021].In the Predict-Then-Optimize setting, a machine learning model predicts the unknown coefficients of an optimization problem.Then, backpropagation through the optimal solution of the resulting problem allows for end-to-end training of its objective value, under ground-truth coefficients, as a loss function.The primary challenge is backpropagation through the optimization model, for which a variety of alternative techniques have been proposed.When the optimization program is differentiable, this can be done through direct differentiation Agrawal et al. [2019a,b], Amos and Kolter [2017], Kotary et al. [2023].
Otherwise, various smoothing, randomization and approximation techniques are employed Pogančić et al.
[2020], Berthet et al. [2020], Elmachtoub andGrigas [2021].The main advantage of this framework is in achieving higher downstream objective function values, when compared to learning the unknown coefficients directly by regression to the ground-truth values.
Most similar to the present paper, Kotary et al. [2022] proposes a fair learning to rank method based on Predict-Then-Optimize with the fair ranking optimization model of Singh and Joachims [2018], whose unknown coefficients are the relevance scores, while the loss function is user relevance via discounted cumulative gain.The main advantage of Kotary et al. [2022] is that it inherits the fairness guarantees of Singh and Joachims [2018] at the level of each query, along with precise control over fair-utility tradeoffs by setting the allowed fairness violation.By bringing a post-processing method Singh and Joachims [2018] into the training loop, it also shows considerable gains in user relevance over the alternative in-processing methods described above.However, this comes at a considerable computational cost of solving a large optimization problem for each sample at each iteration of training.

Conclusions
This paper presented SOFaiR, a method that employs an Ordered Weighted Average (OWA) optimization model to integrate fairness considerations into ranking processes.Its integration of constrained optimization in an end-to-end differentiable machine learning pipeline is motivated by a core limitation of penalty-based fair LTR schemes: their inability to reliably enforce fairness constraints on the predicted policies.A key contribution of this paper is to enable backpropagation through optimization of discontinuous OWA functions, which, in turn, has made it possible to incorporate precise group fairness measures directly into the training process of learning to rank.The paper showed that SOFaiR has three distinctive advantages compared to previous solutions: (1) It is able to produce rankings with high utility while also ensuring that fairness closely aligns with specified requirements; (2) it exhibits substantial efficiency improvements over other fair LTR schemes based on end-to-end optimization, delivering up to an order magnitude speedups in both training and testing; and (3) it extends naturally to fairness criteria beyond binary group treatment.These attributes may help pave the way to scalable LTR systems that are more applicable to real-world fairness requirements.These advantages also underscore the integration of constrained optimization and machine learning techniques as a promising direction for future research in fair learning to rank.

Ethical Statement
This paper was developed on commonly used, open benchmark datasets for learning to rank, and no sensitive data was used in the production of its experiments.As is common in research on fair ranking systems, protected groups were defined on the basis of attributes contained in these datasets, in order to best evaluate the performance of the algorithms.The authors' intended contribution is purely methodological, aimed at enhancing the performance of ranking systems with respect to well-established utility and fairness criteria.When considering possible unintended adverse impacts of the work, it is important to consider that the paper's methodology is generic, and can be used oppositely to its stated goals.The mechanism used to enforce fairness of group exposures in rankings can also be used to enforce arbitrary proportional exposures amongst arbitarily defined groups.Therefore, it is possible to be used in a discriminatory manner.An inherent limitation of the work, with respect to potential fairness impact, is a lack of generalization to two-sided fairness, in which fairness with respect to user utility is enforced in addition to exposure of protected groups.This stems from the fact that the machine learning methodology inherently treats each user query sample independently, thus this fairness goal is usually only pursued by post-processing methods without ML integration.

A Additional Experimental Results
A.1 Multi-Group Fairness Analysis

B Hyper-paramaters
Hyperparameters were selected as the best-performing on average among those listed in Table B).Final hyperparameters for each model are as stated also in Table 2, and Adam optimizer is used in the production of each result.Asterisks (*) indicate that there is no option for a final value, as all values of each parameter are of interest in the analysis of fairness-utility tradeoff, as reported in the experimental setting Section.
For OWA optimization layers, w is set as w j = n−1+j n , T = 100 during training , and T = 500 during testing.
(A) inability to ensure fairness in each of its generated policies, (B) inability or ineffectiveness to handle multiple protected groups, and (C) inefficiency at training and inference time.This section reviews current fair LTR methods in light of these limiting factors.

Figure 3 :
Figure 3: The differentiable optimization module employed in SOFaiR.It forward pass solves the problem (11) by an efficent Frank-Wolfe method.Its backward pass calculates the SPO+ subgradient, relative to its equivalent, but intractably large LP form.

Figure 5 :
Figure 5: Benchmarking performance in term of fairnesss-utility trade-off on Yahoo-20 (top left), and Yahoo-40(top right).MSLR-20(bottom-left), MSLR-100 (bottom-right) Figure 6 compares the average test DCG against the average fairness violation across various numbers of groups (ranging from 3 to 7) in the MSLR dataset, for list sizes of 40 and 100.Additional results for other list sizes in the MSLR dataset are available in A.1.

Figure 7
Figure7illustrates the trade-off between utility and fairness on the other list size of 60, 80  list sizes).Each data point corresponds to the performance of a single model, with fairness parameters λ varied between 0 and 1. Models prioritizing fairness, represented by darker colored points, exhibit reduced fairness violations and lower utilities compared to those with a lower emphasis on fairness, depicted by lighter colored points.A consistent trend emerges across all datasets: as fairness parameters are relaxed, utility increases for all metrics and datasets.Notably, saturation points in all subplots indicate that increasing the fairness weight only reduces utilities without reducing fairness violations.

Figure 8 :
Figure 8: Fairness-Utility tradeoff for MSLR datasets from all benchmark methods.
Frank-Wolfe with Moreau Envelope Smoothing to solve (11) Input: predicted relevance scores ŷ ∈ R n , group mask A, max iteration T, smooth seq.(β k ) we identify y T Π b as the linear Algorithm 1: Fairness-utility tradeoff due to SOFaiR with multiple groups on MSLR-40 (left) and MSLR-100 (right) list size fairness parameters are relaxed, utility increases for all metrics and datasets.It is also evident that multi-group fairness comes at a higher cost to utility.Predictably, saturation occurs in each curve, indicating that beyond a certain point, increasing the fairness weight does not further decrease fairness violations but merely reduces utility.