Unbiased Learning to Rank: On Recent Advances and Practical Applications

Since its inception, the field of unbiased learning to rank (ULTR) has remained very active and has seen several impactful advancements in recent years. This tutorial provides both an introduction to the core concepts of the field and an overview of recent advancements in its foundations, along with several applications of its methods. The tutorial is divided into four parts: Firstly, we give an overview of the different forms of bias that can be addressed with ULTR methods. Secondly, we present a comprehensive discussion of the latest estimation techniques in the ULTR field. Thirdly, we survey published results of ULTR in real-world applications. Fourthly, we discuss the connection between ULTR and fairness in ranking. We end by briefly reflecting on the future of ULTR research and its applications. This tutorial is intended to benefit both researchers and industry practitioners interested in developing new ULTR solutions or utilizing them in real-world applications.

judgments.However, obtaining relevance judgments is costly and often not aligned with actual user preferences [5,36].In contrast, click data is cheaper to collect and is generally better aligned with user intents [17].However, click data is usually a heavily biased signal of user preference [1,7,19] which the field of unbiased learning to rank (ULTR) aims to mitigate [24].
Previous tutorials focused on introducing the fundamentals of the field to researchers and practitioners in the information retrieval and recommender system communities [2,21,31].While very relevant at the time, the field of ULTR has matured significantly, and fundamental advancements have been made since then.At the time of the last tutorials, the primarily studied interaction bias was position bias [7,19,48].Since then, the community has addressed new interaction biases, including trust bias [1,45], item selection bias [28], contextual bias [50,52,58,59], and cascading position bias [20,44].For correcting biases, the method most commonly used was inverse propensity scoring (IPS).However, it is now known that IPS is not effective in correcting for all forms of interaction biases [25,45].Hence, several new and fundamental estimation techniques have been developed to overcome the limitations of IPS, for instance, affine-corrections [30,45] and doubly-robust estimation [26], which can both be seen as extensions of IPS for ULTR.Furthermore, estimation methods that are fundamentally different from IPS have also been proposed, such as two-tower models [10,52,59] and causal-inference-based methods [32,43,57].While many ULTR methods focus on mitigating bias in historic datasets, the area of online learning to rank mitigates biases while directly interacting with users [27,39,54].A recent line of work addresses both online and offline settings with methods that can be applied to either setting and thereby, aims to unify the ULTR field [29,30].
Recently, LTR has also seen significant growth from the application side [1,4,13,50,59], including fair LTR [41,42,51].The focus of the previous tutorials was on the fundamentals of ULTR with a limited emphasis on practical applications.While the focus of LTR was traditionally on relevance ranking, it is now commonly acknowledged that optimizing for relevance alone can result in unfairness issues [3,40,53].In this regard, we believe that the objective of a similar area, such as fair LTR, aligns with ULTR's mission, which is to provide fair and unbiased rankings to the user.
To scale up to large-scale applications, fair LTR work relies on unbiased LTR [37,46,51], and we hope that our tutorial will encourage further exploration in this area.
Given these significant advancements in the area of ULTR and the increased applications of its methodology, we believe it is the right time to provide an overview of the state-of-the-art of the field.Hereby, we aim to benefit both academic researchers and industry practitioners who are either interested in developing new ULTR solutions or utilizing them in their applications.

OBJECTIVES
The tutorial is based on the following two main objectives: • To motivate and introduce the fundamental concepts of ULTR to academics or practitioners who are new to the topic.• To provide a comprehensive overview of the important recent developments to the foundations and applications of ULTR, that are useful to both newcomers and experts in the field.
Furthermore, we aim for the following additional objectives: • Provide the most up-to-date explanation of the mathematical foundations of the ULTR field, covering the different forms of bias that can and cannot be corrected for and the latest estimation techniques.Our tutorial should provide a strong foundation for researchers in the ULTR field for their future work.• Present an in-depth survey of real-world applications of ULTR so that practitioners can have a realistic expectation of the potential impact of applying ULTR.• Motivate and stimulate cross-disciplinary research, by enabling researchers from other areas to understand how ULTR could be useful for them.In particular, we will highlight the connection with the topic of fairness in ranking.

RELEVANCE TO THE INFORMATION RETRIEVAL COMMUNITY
The area of ULTR has grown significantly in the last couple of years, with several fundamental contributions and diverse IR applications.The earliest tutorial in the area was by Joachims and Swaminathan [18], where they introduced counterfactual learning in the context of search and recommendation.Recent tutorials on counterfactual evaluation and learning have focused mostly on bandit feedback data [34,35].In the context of recommender systems, Chen et al. [6] introduce biases and debiasing strategies.
For LTR specifically, there have been tutorials introducing offline ULTR [2,21,31] and online LTR [9].However, to the best of our knowledge, no existing tutorial covers the important advancements in the ULTR field that has been made in the last three years, nor their recent applications, including fair LTR [41,51].

FORMAT AND DETAILED SCHEDULE
The tutorial will consist of three hours, excluding breaks, with the following schedule: Preliminaries (20 minutes) The first session focuses on the preliminaries; we discuss the basics of supervised LTR and some of the earliest works in position-bias and counterfactual LTR.
• Learning to rank basics (5 minutes): Discuss basics of supervised LTR by introducing pointwise, pairwise, and listwise LTR methods and the concept of learning from user interactions.• Position bias (5 minutes): Discuss position bias that arises when applying traditional LTR methods on user clicks [7].• Counterfactual LTR (10 minutes): Introduce the basics of counterfactual LTR for learning from user feedback data with position bias [19].
Biases (40 minutes) In this session, we cover the recent advances in the types of interaction bias that can be tackled with ULTR methods beyond position bias.
• Trust bias (10 minutes): We discuss trust bias, where users are likely to click on items in top positions regardless of item relevance because they trust the search engine [1,45].• Item Selection Bias (10 minutes): We discuss item selection bias, where users can only interact with a fixed set of  items, and items outside the top-k position have zero chances of exposure [28].• Contextual Bias (10 minutes): We discuss contextual bias, where the item's click probability is affected by its surrounding items on the display page [50,52,58,59].• Cascading Position Bias (10 minutes): Under the cascade model [7], the position bias of an item depends not only on the rank an item is displayed at (as many works in ULTR assume [7,19,48]), but also on the relevance of the items the user has inspected before [20,44].Thus, cascading position bias is often a more realistic assumption of user behavior, e.g., when users tend to stop searching after finding the first relevant result [7].
Novel Estimation Methods (70 minutes) The most prevalent estimation technique for correcting bias from user interaction data is IPS, introduced in the seminal works by Wang et al. [47] and Joachims et al. [19].However, recently there have been fundamental contributions in the ULTR field with respect to novel estimation techniques.In this session, we will discuss the recent estimation techniques in detail, as per the following schedule: • Affine Correction Method (14 minutes): We discuss the affine correction method introduced by Vardasbi et al. [45] as an extension to IPS, as IPS is ineffective for trust-bias correction.• Doubly Robust Estimation (14 minutes): Despite its popularity in the offline bandit learning literature [8,14,16,33,49], doubly robust estimation methods for position bias correction in LTR were only proposed recently [26].We discuss this fundamental contribution to the area which overcomes some of the theoretical and practical disadvantages of IPS [26].• Online & Counterfactual methods (14 minutes): Online learning methods are an alternative class of methods to counter biases in LTR [27,39,54], where the user preferences are learned in an online/interactive fashion, as opposed to purely from offline data.Recently, Oosterhuis and de Rijke [30] argued that the field of ULTR can benefit from using online learning to rank via a novel online intervention-aware counterfactual estimator.Online learning has also been used to collect additional data from the logging policy to minimize the variance of the counterfactual estimate of a new ranking policy [29].• Safe Counterfactual Policy Learning (14 minutes): ULTR relies on exposure-based IPS, which can provide unbiased and consistent estimates but often suffers from high variance.Especially when little click data is available, this variance can cause ULTR to learn sub-optimal ranking models, which can subsequently bring significant risks of a negative user experience.Recently, Gupta et al. [12] introduced a risk-aware ULTR method with a novel exposure-based concept of risk regularization with strong theoretical guarantees for safe deployment.The method averts the risk of learning a new policy that is worse than the current production system.• Two-tower Models (14 minutes): The focus of ULTR was primarily on identifying and developing novel bias correction methods, with limited focus on model design.Recently, with the introduction of two-tower networks, this trend has been slowly changing [10,52,55,59].We discuss the two-tower family of correction methods, their capacity to utilize bias-related signals beyond position (e.g., device type), and their ability to correct click data containing a mixture of different user behaviors [52].
Survey Applications (20 minutes) A significant number of works apply ULTR methods in real-world settings.We discuss the different settings explored by existing work, the practical changes they require, and the reported performance impact.Topics include: • Grid layouts: ULTR methods for tackling bias in 2D grid layouts that are common in the industry [50,59].• Multi-feedback: Integrating multiple types of user feedback beyond clicks, e.g., views, add-to-cart, or purchase actions [11].
From Unbiased to Fair LTR (20 minutes) Traditionally, fair LTR has relied on manual relevance judgments for learning fair ranking policies [3,40].Similar to the arguments in favor of click-based learning for relevance rankings, fair LTR needs to adopt click data for its widespread application.In this part of the tutorial, we discuss applications of ULTR for fair policy learning [23,51].

Conclusion and Future Work (10 minutes)
We close by summarizing the main points discussed in the tutorial, in addition, we also discuss some important limitations of the existing overarching counterfactual approach in the ULTR field [25] and some promising avenues for future research.

Extensions to Previous Versions
Compared to our previous offering, we have extended our proposal to include the latest research on ULTR from this year's conferences including SIGIR, KDD, and RecSys [12,15,22,38,56].We also extended individual sections to have more time for Q&A sessions in-between to encourage engagement with the audience.Finally, we have removed a section related to off-policy bandit learning to maintain a coherent theme of ULTR and to facilitate a longer discussion of fairness in LTR.Lastly, we are aware of the "Practical Bandits: An Industry Perspective" tutorial proposal and are actively collaborating with the authors to ensure our two presentations complement each other.Many techniques in unbiased learning to rank (ULTR) originated in the bandit literature and were adjusted to the intricacies of the ranking setting.Thus, the two tutorials should provide interested attendees with a comprehensive and complementary curriculum.