skip to main content
10.1145/3292500.3330852acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Focused Context Balancing for Robust Offline Policy Evaluation

Published: 25 July 2019 Publication History

Abstract

Precisely evaluating the effect of new policies (e.g. ad-placement models, recommendation functions, ranking functions) is one of the most important problems for improving interactive systems. The conventional policy evaluation methods rely on online A/B tests, but they are usually extremely expensive and may have undesirable impacts. Recently, Inverse Propensity Score (IPS) estimators are proposed as alternatives to evaluate the effect of new policy with offline logged data that was collected from a different policy in the past. They tend to remove the distribution shift induced by past policy. However, they ignore the distribution shift that would be induced by the new policy, which results in imprecise evaluation. Moreover, their performances rely on accurate estimation of propensity score, which can not be guaranteed or validated in practice. In this paper, we propose a non-parametric method, named Focused Context Balancing (FCB) algorithm, to learn sample weights for context balancing, so that the distribution shift induced by the past policy and new policy can be eliminated respectively. To validate the effectiveness of our FCB algorithm, we conduct extensive experiments on both synthetic and real world datasets. The experimental results clearly demonstrate that our FCB algorithm outperforms existing estimators by achieving more precise and robust results for offline policy evaluation.

References

[1]
Aman Agarwal, Soumya Basu, Tobias Schnabel, and Thorsten Joachims. 2017. Effective Evaluation Using Logged Bandit Feedback from Multiple Loggers. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 687--696.
[2]
Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, and Robert Schapire. 2014. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning. 1638--1646.
[3]
Susan Athey, Guido W Imbens, and Stefan Wager. 2016. Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2016).
[4]
Peter C Austin. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, Vol. 46, 3 (2011), 399--424.
[5]
Heejung Bang and James M Robins. 2005. Doubly robust estimation in missing data and causal inference models. Biometrics, Vol. 61, 4 (2005), 962--973.
[6]
Léon Bottou, Jonas Peters, Joaquin Qui nonero-Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. 2013. Counterfactual reasoning and learning systems: The example of computational advertising. The Journal of Machine Learning Research, Vol. 14, 1 (2013), 3207--3260.
[7]
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, and Whitney K Newey. 2016. Double machine learning for treatment and causal parameters . Technical Report. cemmap working paper, Centre for Microdata Methods and Practice.
[8]
Miroslav Dudik, John Langford, and Lihong Li. 2011. Doubly robust policy evaluation and learning. In International Conference on International Conference on Machine Learning. 1097--1104.
[9]
Max H Farrell. 2015. Robust inference on average treatment effects with possibly more covariates than observations. Journal of Econometrics, Vol. 189, 1 (2015), 1--23.
[10]
Jens Hainmueller. 2012. Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Political Analysis, Vol. 20, 1 (2012), 25--46.
[11]
John Hammersley. 2013. Monte carlo methods .Springer Science & Business Media.
[12]
Daniel G Horvitz and Donovan J Thompson. 1952. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, Vol. 47, 260 (1952), 663--685.
[13]
Ron Kohavi and Roger Longbotham. 2011. Unexpected results in online controlled experiments. ACM SIGKDD Explorations Newsletter, Vol. 12, 2 (2011), 31--35.
[14]
Augustine Kong. 1992. A note on importance sampling using standardized weights. University of Chicago, Dept. of Statistics, Tech. Rep, Vol. 348 (1992).
[15]
Kun Kuang, Peng Cui, Bo Li, Meng Jiang, and Shiqiang Yang. 2017a. Estimating Treatment Effect in the Wild via Differentiated Confounder Balancing. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 265--274.
[16]
Kun Kuang, Peng Cui, Bo Li, Meng Jiang, Shiqiang Yang, and Fei Wang. 2017b. Treatment effect estimation with data-driven variable decomposition. In Thirty-First AAAI Conference on Artificial Intelligence .
[17]
Kun Kuang, Meng Jiang, Peng Cui, and Shiqiang Yang. 2016. Steering social media promotions with effective strategies. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 985--990.
[18]
John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi-armed bandits with side information. In Advances in neural information processing systems. 817--824.
[19]
Randall Lewis and David Reiley. 2009. Retail advertising works! measuring the effects of advertising on sales via a controlled experiment on yahoo! (2009).
[20]
Lihong Li, Shunbao Chen, Jim Kleban, and Ankur Gupta. 2015. Counterfactual estimation and optimization of click metrics in search engines: A case study. In Proceedings of the 24th International Conference on World Wide Web. ACM, 929--934.
[21]
Lihong Li, Wei Chu, John Langford, and Xuanhui Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 297--306.
[22]
Art B Owen. 2013. Monte Carlo theory, methods and examples. Monte Carlo Theory, Methods and Examples. Art Owen (2013).
[23]
Michael JD Powell and J Swann. 1966. Weighted uniform samplinga Monte Carlo technique for reducing variance. IMA Journal of Applied Mathematics, Vol. 2, 3 (1966), 228--236.
[24]
Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, Vol. 70, 1 (1983), 41--55.
[25]
Reuven Y Rubinstein and Dirk P Kroese. 2016. Simulation and the Monte Carlo method . Vol. 10. John Wiley & Sons.
[26]
Tobias Schnabel, Adith Swaminathan, Ashudeep Singh, Navin Chandak, and Thorsten Joachims. 2016. Recommendations as treatments: Debiasing learning and evaluation. arXiv preprint arXiv:1602.05352 (2016).
[27]
Alex Strehl, John Langford, Lihong Li, and Sham M Kakade. 2010. Learning from logged implicit exploration data. In Advances in Neural Information Processing Systems. 2217--2225.
[28]
Adith Swaminathan and Thorsten Joachims. 2015a. Counterfactual risk minimization: Learning from logged bandit feedback. In International Conference on Machine Learning. 814--823.
[29]
Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems. 3231--3239.
[30]
Daniel Westreich, Justin Lessler, and Michele Jonsson Funk. 2010. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. Journal of clinical epidemiology, Vol. 63, 8 (2010), 826--833.
[31]
José R Zubizarreta. 2015. Stable weights that balance covariates for estimation with incomplete outcome data. J. Amer. Statist. Assoc., Vol. 110, 511 (2015), 910--922.

Cited By

View all
  • (2024)Learning Individual Treatment Effects under Heterogeneous Interference in NetworksACM Transactions on Knowledge Discovery from Data10.1145/367376118:8(1-21)Online publication date: 16-Aug-2024
  • (2024)Perovskite-based optoelectronic systems for neuromorphic computingNano Energy10.1016/j.nanoen.2023.109169120(109169)Online publication date: Mar-2024
  • (2023)Debiased Recommendation with User Feature BalancingACM Transactions on Information Systems10.1145/358059441:4(1-25)Online publication date: 15-Feb-2023
  • Show More Cited By

Index Terms

  1. Focused Context Balancing for Robust Offline Policy Evaluation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
    July 2019
    3305 pages
    ISBN:9781450362016
    DOI:10.1145/3292500
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. context balancing
    2. distribution shift
    3. policy evaluation

    Qualifiers

    • Research-article

    Conference

    KDD '19
    Sponsor:

    Acceptance Rates

    KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Learning Individual Treatment Effects under Heterogeneous Interference in NetworksACM Transactions on Knowledge Discovery from Data10.1145/367376118:8(1-21)Online publication date: 16-Aug-2024
    • (2024)Perovskite-based optoelectronic systems for neuromorphic computingNano Energy10.1016/j.nanoen.2023.109169120(109169)Online publication date: Mar-2024
    • (2023)Debiased Recommendation with User Feature BalancingACM Transactions on Information Systems10.1145/358059441:4(1-25)Online publication date: 15-Feb-2023
    • (2023)Specify Robust Causal Representation from Mixed ObservationsProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599512(2978-2987)Online publication date: 6-Aug-2023
    • (2023)Rectifying Unfairness in Recommendation Feedback LoopProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591754(28-37)Online publication date: 19-Jul-2023
    • (2021)Out-of-distribution Generalization and Its Applications for MultimediaProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3478876(5681-5682)Online publication date: 17-Oct-2021
    • (2021)Top-N Recommendation with Counterfactual User Preference SimulationProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482305(2342-2351)Online publication date: 26-Oct-2021
    • (2021)A Survey on Causal InferenceACM Transactions on Knowledge Discovery from Data10.1145/344494415:5(1-46)Online publication date: 10-May-2021
    • (2021)Interventional Video Grounding with Dual Contrastive Learning2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR46437.2021.00279(2764-2774)Online publication date: Jun-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media