skip to main content
10.1145/3534678.3539295acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Practical Counterfactual Policy Learning for Top-K Recommendations

Published: 14 August 2022 Publication History

Abstract

For building recommender systems, a critical task is to learn a policy with collected feedback (e.g., ratings, clicks) to decide which items to be recommended to users. However, it has been shown that the selection bias in the collected feedback leads to biased learning and thus a sub-optimal policy. To deal with this issue, counterfactual learning has received much attention, where existing approaches can be categorized as either value learning or policy learning approaches. This work studies policy learning approaches for top-K recommendations with a large item space and points out several difficulties related to importance weight explosion, observation insufficiency, and training efficiency. A practical framework for policy learning is then proposed to overcome these difficulties. Our experiments confirm the effectiveness and efficiency of the proposed framework.

Supplemental Material

MP4 File
In this pre-recorded presentation video, we briefly introduce the concept of top-K recommender systems and counterfactual policy learning. Then, we analyze the cause and side effects of weight explosion, which is a fundamental issue of counterfactual policy learning in top-K scenarios. Based on our analysis, we give a solution called the regularized per-item inverse propensity score weighting (RIIPS) method for this issue. In the end, we report concrete experiment results to confirm our points and the efficiency of our solution.

References

[1]
Yoshua Bengio and Jean-Sébastien Senécal. 2003. Quick Training of Probabilistic Neural Nets by Importance Sampling. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics. 17--24.
[2]
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456--464.
[3]
Minmin Chen, Bo Chang, Can Xu, and Ed H. Chi. 2021. User Response Models to Improve a REINFORCE Recommender System. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (WSDM). 121--129.
[4]
Miroslav Dudík, John Langford, and Lihong Li. 2011. Doubly Robust Policy Evaluation and Learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML). 1097--1104.
[5]
Daniel G. Horvitz and Donovan J. Thompson. 1952. A Generalization of Sampling Without Replacement From a Finite Universe. J. Amer. Statist. Assoc. 47 (1952), 663--685.
[6]
Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM international conference on Information and Knowledge Management. 2333--2338.
[7]
Tzu-Kuo Huang, Ruby C. Weng, and Chih-Jen Lin. 2006. Generalized BradleyTerry Models and Multi-class Probability Estimates. Journal of Machine Learning Research 7 (2006), 85--115. http://www.csie.ntu.edu.tw/~cjlin/papers/generalBT. pdf
[8]
Olivier Jeunen, David Rohde, Flavian Vasile, and Martin Bompaire. 2020. Joint Policy-Value Learning for Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1223--1233.
[9]
Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. 2018. Deep learning with logged bandit feedback. In Proceedings of the International Conference on Learning Representations (ICLR).
[10]
Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer 42 (2009), 30--37.
[11]
Damien Lefortier, Adith Swaminathan, Xiaotao Gu, Thorsten Joachims, and Maarten de Rijke. 2016. Large-scale validation of counterfactual learning methods: A test-bed. In NIPS Workshop on Inference and Learning of Hypothetical and Counterfactual Interventions in Complex Systems.
[12]
Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, Shan Muthukrishnan, Vishwa Vinay, and Zheng Wen. 2018. Offline evaluation of ranking policies with click models. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1685--1694.
[13]
Romain Lopez, Inderjit Dhillon, and Michael I. Jordan. 2021. Learning from eXtreme Bandit Feedback. (2021).
[14]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H Chi. 2020. Off-policy Learning in Two-stage Recommender Systems. In Proceedings of The Web Conference. 463--473.
[15]
John I. Marden. 1995. Analyzing and Modeling Rank Data. Chapman & Hall, London.
[16]
Doina Precup, Richard S. Sutton, and Satinder P. Singh. 2000. Eligibility Traces for Off-Policy Policy Evaluation. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML). 759--766.
[17]
Steffen Rendle. 2010. Factorization machines. In Proceedings of IEEE International Conference on Data Mining (ICDM). 995--1000.
[18]
Noveen Sachdeva, Yi Su, and Thorsten Joachims. 2020. Off-policy bandits with deficient support. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 965--975.
[19]
Adith Swaminathan and Thorsten Joachims. 2015. Batch learning from logged bandit feedback through counterfactual risk minimization. Journal of Machine Learning Research 16, 1 (2015), 1731--1755.
[20]
Adith Swaminathan and Thorsten Joachims. 2015. The Self-normalized Estimator for Counterfactual Learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS). 3231--3239.
[21]
Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, and Imed Zitouni. 2017. Off-Policy Evaluation for Slate Recommendation. (2017), 3635--3645.
[22]
Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269--277.
[23]
Hsiang-Fu Yu, Mikhail Bilenko, and Chih-Jen Lin. 2017. Selection of Negative Samples for One-class Matrix Factorization. In Proceedings of SIAM International Conference on Data Mining (SDM). http://www.csie.ntu.edu.tw/~cjlin/papers/oneclass-mf/biased-mf-sdm-with-supp.pdf
[24]
Hsiang-Fu Yu, Hsin-Yuan Huang, Inderjit S. Dihillon, and Chih-Jen Lin. 2017. A Unified Algorithm for One-class Structured Matrix Factorization with Side Information. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI). http://www.csie.ntu.edu.tw/~cjlin/papers/ocmf-side/biasedleml-aaai-with-supp.pdf
[25]
Bowen Yuan, Jui-Yang Hsia, Meng-Yuan Yang, Hong Zhu, Chihyao Chang, Zhenhua Dong, and Chih-Jen Lin. 2019. Improving Ad Click Prediction by Considering Non-displayed Events. In Proceedings of the 28th ACM International Conference on Conference on Information and Knowledge Management (CIKM). http://www.csie.ntu.edu.tw/~cjlin/papers/occtr/ctr_oc.pdf
[26]
Bowen Yuan, Yu-Sheng Li, Pengrui Quan, and Chih-Jen Lin. 2021. Efficient optimization methods for extreme similarity learning with nonlinear embeddings. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). http://www.csie.ntu.edu.tw/~cjlin/papers/similarity_ learning/pq.pdf
[27]
Bowen Yuan, Yaxu Liu, Jui-Yang Hsia, Zhenhua Dong, and Chih-Jen Lin. 2020. Unbiased Ad click prediction for position-aware advertising systems. In Proceedings of the 14th ACM Conference on Recommender Systems. http://www.csie.ntu. edu.tw/~cjlin/papers/debiases/debiases.pdf
[28]
Bowen Yuan, Meng-Yuan Yang, Jui-Yang Hsia, Hong Zhu, Zhirong Liu, Zhenhua Dong, and Chih-Jen Lin. 2019. One-class Field-aware Factorization Machines for Recommender Systems with Implicit Feedbacks. Technical Report. National Taiwan University. http://www.csie.ntu.edu.tw/~cjlin/papers/ocffm/imp_ffm.pdf

Cited By

View all
  • (2024)Causal Inference in Recommender Systems: A Survey and Future DirectionsACM Transactions on Information Systems10.1145/363904842:4(1-32)Online publication date: 9-Feb-2024
  • (2024)ERMPD: causal intervention for popularity debiasing in recommendation via empirical risk minimizationCCF Transactions on Pervasive Computing and Interaction10.1007/s42486-024-00149-w6:1(36-51)Online publication date: 4-Mar-2024
  • (2023)Uncertainty-aware instance reweighting for off-policy learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669346(73691-73718)Online publication date: 10-Dec-2023
  • Show More Cited By

Index Terms

  1. Practical Counterfactual Policy Learning for Top-K Recommendations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2022
    5033 pages
    ISBN:9781450393850
    DOI:10.1145/3534678
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. counterfactual learning
    2. policy learning
    3. recommender systems
    4. selection bias

    Qualifiers

    • Research-article

    Funding Sources

    • MOST of Taiwan

    Conference

    KDD '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)110
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Causal Inference in Recommender Systems: A Survey and Future DirectionsACM Transactions on Information Systems10.1145/363904842:4(1-32)Online publication date: 9-Feb-2024
    • (2024)ERMPD: causal intervention for popularity debiasing in recommendation via empirical risk minimizationCCF Transactions on Pervasive Computing and Interaction10.1007/s42486-024-00149-w6:1(36-51)Online publication date: 4-Mar-2024
    • (2023)Uncertainty-aware instance reweighting for off-policy learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669346(73691-73718)Online publication date: 10-Dec-2023
    • (2023)A Complete Framework for Offline and Counterfactual Evaluations of Interactive Recommendation SystemsProceedings of the 29th Brazilian Symposium on Multimedia and the Web10.1145/3617023.3617049(193-197)Online publication date: 23-Oct-2023
    • (2023)C-AOI: Contour-based Instance Segmentation for High-Quality Areas-of-Interest in Online Food Delivery PlatformProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599786(5750-5759)Online publication date: 6-Aug-2023
    • (2023)Exploring Scenarios of Uncertainty about the Users' Preferences in Interactive Recommendation SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591684(1178-1187)Online publication date: 19-Jul-2023
    • (2023)Integrating Counterfactual Evaluations into Traditional Interactive Recommendation FrameworksComputational Science and Its Applications – ICCSA 202310.1007/978-3-031-36805-9_41(635-647)Online publication date: 3-Jul-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media