skip to main content
10.1145/3219819.3220028acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Offline Evaluation of Ranking Policies with Click Models

Published: 19 July 2018 Publication History

Abstract

Many web systems rank and present a list of items to users, from recommender systems to search and advertising. An important problem in practice is to evaluate new ranking policies offline and optimize them before they are deployed. We address this problem by proposing evaluation algorithms for estimating the expected number of clicks on ranked lists from historical logged data. The existing algorithms are not guaranteed to be statistically efficient in our problem because the number of recommended lists can grow exponentially with their length. To overcome this challenge, we use models of user interaction with the list of items, the so-called click models, to construct estimators that learn statistically efficiently. We analyze our estimators and prove that they are more efficient than the estimators that do not use the structure of the click model, under the assumption that the click model holds. We evaluate our estimators in a series of experiments on a real-world dataset and show that they consistently outperform prior estimators.

Supplementary Material

MP4 File (li_ranking_policies.mp4)

References

[1]
Christophe Andrieu, Nando de Freitas, Arnaud Doucet, and Michael Jordan . 2003. An Introduction to MCMC for Machine Learning. Machine Learning Vol. 50 (2003), 5--43.
[2]
Léon Bottou, Jonas Peters, Joaquin Qui nonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson . 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research Vol. 14, 1 (2013), 3207--3260.
[3]
Ben Carterette . 2011. System effectiveness, user models, and user utility: a conceptual framework for investigation. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. ACM, 903--912.
[4]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan . 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 621--630.
[5]
Olivier Chapelle and Ya Zhang . 2009. A Dynamic Bayesian Network Click Model for Web Search Ranking Proceedings of the 18th International Conference on World Wide Web.
[6]
Wei Chen, Yajun Wang, and Yang Yuan . 2013. Combinatorial Multi-Armed Bandit: General Framework, Results and Applications. In Proceedings of the 30th International Conference on Machine Learning. 151--159.
[7]
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke . 2015. Click Models for Web Search. Morgan & Claypool.
[8]
Aleksandr Chuklin, Pavel Serdyukov, and Maarten De Rijke . 2013. Click model-based information retrieval metrics. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM, 493--502.
[9]
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey . 2008. An Experimental Comparison of Click Position-bias Models Proceedings of the 2008 International Conference on Web Search and Data Mining.
[10]
Miroslav Dud'ık, John Langford, and Lihong Li . 2011. Doubly Robust Policy Evaluation and Learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11). 1097--1104.
[11]
Georges Dupret, Vanessa Murdock, and Benjamin Piwowarski . 2007. Web search engine evaluation using clickthrough data and a user model WWW2007 workshop Query Log Analysis: Social and Technological Challenges.
[12]
Georges E. Dupret and Benjamin Piwowarski . 2008. A User Browsing Model to Predict Search Engine Click Data from Past Observations. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[13]
Yi Gai, Bhaskar Krishnamachari, and Rahul Jain . 2012. Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards and Individual Observations. IEEE/ACM Transactions on Networking Vol. 20, 5 (2012), 1466--1478.
[14]
Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé . 2018. Offline A/B testing for Recommender Systems. arXiv preprint arXiv:1801.07030 (2018).
[15]
Fan Guo, Chao Liu, and Yi Min Wang . 2009. Efficient Multiple-click Models in Web Search. In Proceedings of the Second ACM International Conference on Web Search and Data Mining.
[16]
Katja Hofmann, Lihong Li, and Filip Radlinski . 2016. Online Evaluation for Information Retrieval. Foundations and Trends in Information Retrieval Vol. 10, 1 (2016).
[17]
Katja Hofmann, Shimon Whiteson, and Maarten de Rijke . 2012. Estimating Interleaved Comparison Outcomes from Historical Click Data Proceedings of the 21st ACM International Conference on Information and Knowledge Management.
[18]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay . 2005. Accurately interpreting clickthrough data as implicit feedback Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, 2005. ACM New York, 154--161.
[19]
Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel . 2017. Unbiased learning-to-rank with biased feedback. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 781--789.
[20]
Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari . 2015. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits Proceedings of the 18th International Conference on Artificial Intelligence and Statistics.
[21]
John Langford, Alexander Strehl, and Jennifer Wortman . 2008. Exploration scavenging. In Proceedings of the 25th international conference on Machine learning. ACM, 528--535.
[22]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang . 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306.
[23]
Alistair Moffat and Justin Zobel . 2008. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems (TOIS) Vol. 27, 1 (2008), 2.
[24]
Olivier Nicol, Jérémie Mary, and Philippe Preux . 2014. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques. In Proceedings of the 31th International Conference on Machine Learning (ICML-2014), Beijing, China, Vol. Vol. 32. 172--180.
[25]
Filip Radlinski, Madhu Kurup, and Thorsten Joachims . 2008. How does clickthrough data reflect retrieval quality? Proceedings of the 17th ACM conference on Information and knowledge management. ACM, 43--52.
[26]
Matthew Richardson, Ewa Dominowska, and Robert Ragno . 2007. Predicting Clicks: Estimating the Click-through Rate for New Ads Proceedings of the 16th International Conference on World Wide Web.
[27]
Dan Siroker and Pete Koomen . 2013. A/B testing: The most powerful way to turn clicks into customers. John Wiley & Sons.
[28]
Alex Strehl, John Langford, Lihong Li, and Sham M Kakade . 2010. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems. 2217--2225.
[29]
Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dud'ık, John Langford, Damien Jose, and Imed Zitouni . 2016. Off-policy evaluation for slate recommendation. arXiv preprint arXiv:1605.04812 (2016).
[30]
Kuansan Wang, Toby Walker, and Zijian Zheng . 2009. PSkip: estimating relevance ranking quality from web search clickthrough data Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1355--1364.
[31]
Xuanhui Wang, Nadav Golbandi, Michael Bendersky, Donald Metzler, and Marc Najork . 2018. Position Bias Estimation for Unbiased Learning to Rank in Personal Search. (2018).
[32]
Zheng Wen, Branislav Kveton, and Azin Ashkan . 2015. Efficient Learning in Large-Scale Combinatorial Semi-Bandits Proceedings of the 32nd International Conference on Machine Learning.
[33]
Yandex 2013. Yandex Personalized Web Search Challenge. https://www.kaggle.com/c/yandex-personalized-web-search-challenge.
[34]
Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson . 2010. Expected browsing utility for web search evaluation Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, 1561--1564.

Cited By

View all
  • (2024)Towards Simulation-Based Evaluation of Recommender Systems with Carousel InterfacesACM Transactions on Recommender Systems10.1145/36437092:1(1-25)Online publication date: 30-Jan-2024
  • (2024)Causal Inference in Recommender Systems: A Survey and Future DirectionsACM Transactions on Information Systems10.1145/363904842:4(1-32)Online publication date: 2-Jan-2024
  • (2024)Counterfactual Ranking Evaluation with Flexible Click ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657810(1200-1210)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2018
2925 pages
ISBN:9781450355520
DOI:10.1145/3219819
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. click models
  2. importance sampling
  3. offline evaluation
  4. ranking

Qualifiers

  • Research-article

Conference

KDD '18
Sponsor:

Acceptance Rates

KDD '18 Paper Acceptance Rate 107 of 983 submissions, 11%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)3
Reflects downloads up to 24 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Towards Simulation-Based Evaluation of Recommender Systems with Carousel InterfacesACM Transactions on Recommender Systems10.1145/36437092:1(1-25)Online publication date: 30-Jan-2024
  • (2024)Causal Inference in Recommender Systems: A Survey and Future DirectionsACM Transactions on Information Systems10.1145/363904842:4(1-32)Online publication date: 2-Jan-2024
  • (2024)Counterfactual Ranking Evaluation with Flexible Click ModelsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657810(1200-1210)Online publication date: 10-Jul-2024
  • (2024)Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionProceedings of the ACM Web Conference 202410.1145/3589334.3645343(3150-3161)Online publication date: 13-May-2024
  • (2023)Off-policy evaluation for large action spaces via conjunct effect modelingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619642(29734-29759)Online publication date: 23-Jul-2023
  • (2023)Multi-task off-policy learning from bandit feedbackProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618942(13157-13173)Online publication date: 23-Jul-2023
  • (2023)Towards a Causal Decision-Making Framework for Recommender SystemsACM Transactions on Recommender Systems10.1145/36291692:2(1-34)Online publication date: 26-Oct-2023
  • (2023)Towards Sequential Counterfactual Learning to RankProceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3624918.3625325(122-128)Online publication date: 26-Nov-2023
  • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
  • (2023)Off-Policy Evaluation of Ranking Policies under Diverse User BehaviorProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599447(1154-1163)Online publication date: 6-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media