ABSTRACT
Predicting ad click-through rates (CTR) is a massive-scale learning problem that is central to the multi-billion dollar online advertising industry. We present a selection of case studies and topics drawn from recent experiments in the setting of a deployed CTR prediction system. These include improvements in the context of traditional supervised learning based on an FTRL-Proximal online learning algorithm (which has excellent sparsity and convergence properties) and the use of per-coordinate learning rates.
We also explore some of the challenges that arise in a real-world system that may appear at first to be outside the domain of traditional machine learning research. These include useful tricks for memory savings, methods for assessing and visualizing performance, practical methods for providing confidence estimates for predicted probabilities, calibration methods, and methods for automated management of features. Finally, we also detail several directions that did not turn out to be beneficial for us, despite promising results elsewhere in the literature. The goal of this paper is to highlight the close relationship between theoretical advances and practical engineering in this industrial setting, and to show the depth of challenges that appear when applying traditional machine learning methods in a complex dynamic system.
References
- D. Agarwal, B.-C. Chen, and P. Elango. Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th international conference on World wide web, pages 21--30. ACM, 2009. Google Scholar
Digital Library
- R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault-tolerant and scalable joining of continuous data streams. In SIGMOD Conference, 2013. To appear. Google Scholar
Digital Library
- R. Bekkerman, M. Bilenko, and J. Langford. Scaling up machine learning: Parallel and distributed approaches. 2011. Google Scholar
Digital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7), July 1970. Google Scholar
Digital Library
- A. Blum, A. Kalai, and J. Langford. Beating the hold-out: Bounds for k-fold and progressive cross-validation. In COLT, 1999. Google Scholar
Digital Library
- O. Chapelle. Click modeling for display advertising. In AdML: 2012 ICML Workshop on Online Advertising, 2012.Google Scholar
- C. Cortes, M. Mohri, M. Riley, and A. Rostamizadeh. Sample selection bias correction theory. In ALT, 2008. Google Scholar
Digital Library
- J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012.Google Scholar
Digital Library
- T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine learning, 40(2):139--157, 2000. Google Scholar
Digital Library
- J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In COLT, 2010.Google Scholar
- J. Duchi and Y. Singer. Efficient learning using forward-backward splitting. In Advances in Neural Information Processing Systems 22, pages 495--503. 2009.Google Scholar
- L. Fan, P. Cao, J. Almeida, and A. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking, 8(3), jun 2000. Google Scholar
Digital Library
- T. Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861--874, 2006. Google Scholar
Digital Library
- D. Golovin, D. Sculley, H. B. McMahan, and M. Young. Large-scale learning with a small-scale footprint. In ICML, 2013. To appear.Google Scholar
- T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-scale Bayesian click-through rate prediction for sponsored search advertising in microsofts bing search engine. In Proc. 27th Internat. Conf. on Machine Learning, 2010.Google Scholar
Digital Library
- D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. Improving ad relevance in sponsored search. In Proceedings of the third ACM international conference on Web search and data mining, WSDM '10, pages 361--370, 2010. Google Scholar
Digital Library
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580, 2012.Google Scholar
- D. W. Hosmer and S. Lemeshow. Applied logistic regression. Wiley-Interscience Publication, 2000.Google Scholar
Cross Ref
- H. A. Koepke and M. Bilenko. Fast prediction of new feature utility. In ICML, 2012.Google Scholar
- J. Langford, L. Li, and T. Zhang. Sparse online learning via truncated gradient. JMLR, 10, 2009. Google Scholar
Digital Library
- S.-M. Li, M. Mahdian, and R. P. McAfee. Value of learning in sponsored search auctions. In WINE, 2010. Google Scholar
Digital Library
- W. Li, X. Wang, R. Zhang, Y. Cui, J. Mao, and R. Jin. Exploitation and exploration in a performance based contextual advertising system. In KDD, 2010. Google Scholar
Digital Library
- R. Luss, S. Rosset, and M. Shahar. Efficient regularized isotonic regression with application to gene--gene interaction search. Ann. Appl. Stat., 6(1), 2012.Google Scholar
- H. B. McMahan. Follow-the-regularized-leader and mirror descent: Equivalence theorems and L1 regularization. In AISTATS, 2011.Google Scholar
- H. B. McMahan and O. Muralidharan. On calibrated predictions for auction selection mechanisms. CoRR, abs/1211.3955, 2012.Google Scholar
- H. B. McMahan and M. Streeter. Adaptive bound optimization for online convex optimization. In COLT, 2010.Google Scholar
- A. Niculescu-Mizil and R. Caruana. Predicting good probabilities with supervised learning. In ICML, ICML '05, 2005. Google Scholar
Digital Library
- M. Richardson, E. Dominowska, and R. Ragno. Predicting clicks: estimating the click-through rate for new ads. In Proceedings of the 16th international conference on World Wide Web, pages 521--530. ACM, 2007. Google Scholar
Digital Library
- M. J. Streeter and H. B. McMahan. Less regret via online conditioning. CoRR, abs/1002.4862, 2010.Google Scholar
- D. Tang, A. Agarwal, D. O'Brien, and M. Meyer. Overlapping experiment infrastructure: more, better, faster experimentation. In KDD, pages 17--26, 2010. Google Scholar
Digital Library
- K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. Feature hashing for large scale multitask learning. In ICML, pages 1113--1120. ACM, 2009. Google Scholar
Digital Library
- L. Xiao. Dual averaging method for regularized stochastic learning and online optimization. In NIPS, 2009.Google Scholar
- Z. A. Zhu, W. Chen, T. Minka, C. Zhu, and Z. Chen. A novel click model and its applications to online advertising. In Proceedings of the third ACM international conference on Web search and data mining, pages 321--330. ACM, 2010. Google Scholar
Digital Library
- M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In ICML, 2003.Google Scholar
Digital Library
Index Terms
Ad click prediction




Comments