ABSTRACT
Learning to rank has been intensively studied and has shown great value in many fields, such as web search, question answering and recommender systems. This paper focuses on listwise document ranking, where all documents associated with the same query in the training data are used as the input. We propose a novel ranking method, referred to as WassRank, under which the problem of listwise document ranking boils down to the task of learning the optimal ranking function that achieves the minimum Wasserstein distance. Specifically, given the query level predictions and the ground truth labels, we first map them into two probability vectors. Analogous to the optimal transport problem, we view each probability vector as a pile of relevance mass with peaks indicating higher relevance. The listwise ranking loss is formulated as the minimum cost (the Wasserstein distance) of transporting (or reshaping) the pile of predicted relevance mass so that it matches the pile of ground-truth relevance mass. The smaller the Wasserstein distance is, the closer the prediction gets to the ground-truth. To better capture the inherent relevance-based order information among documents with different relevance labels and lower the variance of predictions for documents with the same relevance label, ranking-specific cost matrix is imposed. To validate the effectiveness of WassRank, we conduct a series of experiments on two benchmark collections. The experimental results demonstrate that: compared with four non-trivial listwise ranking methods (i.e., LambdaRank, ListNet, ListMLE and ApxNDCG), WassRank can achieve substantially improved performance in terms of nDCG and ERR across different rank positions. Specifically, the maximum improvements of WassRank over LambdaRank, ListNet, ListMLE and ApxNDCG in terms of [email protected] are 15%, 5%, 7%, 5%, respectively.
References
- Jason Altschuler, Jonathan Weed, and Philippe Rigollet. 2017. Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration. In Proceedings of NIPS conference. 1964--1974. Google Scholar
Digital Library
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th ICML. 214--223. Google Scholar
Digital Library
- Peter L. Bartlett and Shahar Mendelson. 2003. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. Journal of Machine Learning Research, Vol. 3 (2003), 463--482. Google Scholar
Digital Library
- Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi. 2004. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning (2004), 169--207.Google Scholar
- Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. Learning to rank using gradient descent. In Proceedings of the 22nd ICML. 89--96. Google Scholar
Digital Library
- Christopher J.C. Burges, Robert Ragno, and Quoc Viet Le. 2006. Learning to Rank with Nonsmooth Cost Functions. In Proceedings of NIPS conference. 193--200. Google Scholar
Digital Library
- Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. 2007. Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th ICML. 129--136. Google Scholar
Digital Library
- Olivier Chapelle and Yi Chang. 2010. Yahoo! Learning to Rank Challenge Overview. In Proceedings of the 2010 International Conference on YLRC. 1--24. Google Scholar
Digital Library
- Olivier Chapelle, Quoc Le, and Alex Smola. 2007. Large margin optimization of ranking measures. In NIPS workshop on Machine Learning for Web Search.Google Scholar
- Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th CIKM. 621--630. Google Scholar
Digital Library
- Wei Chu and Zoubin Ghahramani. 2005. Gaussian Processes for Ordinal Regression. Journal of Machine Learning Research, Vol. 6 (2005), 1019--1041. Google Scholar
Digital Library
- Wei Chu and S. Sathiya Keerthi. 2005. New Approaches to Support Vector Ordinal Regression. In Proceedings of the 22nd ICML. 145--152. Google Scholar
Digital Library
- David Cossock and Tong Zhang. 2006. Subset Ranking Using Regression. In Proceedings of the 19th Annual Conference on Learning Theory. 605--619. Google Scholar
Digital Library
- Marco Cuturi. 2013. Sinkhorn Distances: Lightspeed Computation of Optimal Transport. In Proceedings of NIPS 26. 2292--2300. Google Scholar
Digital Library
- Julie Delon. 2006. Movie and video scale-time equalization application to flicker reduction. IEEE Transactions on Image Processing, Vol. 15, 1 (2006), 241--248. Google Scholar
Digital Library
- Sira Ferradans, Gui-Song Xia, Gabriel Peyré, and Jean-Francc ois Aujol. 2013. Static and Dynamic Texture Mixing Using Optimal Transport. In Scale Space and Variational Methods in Computer Vision. 137--148.Google Scholar
- Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram Singer. 2003. An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, Vol. 4 (2003), 933--969. Google Scholar
Digital Library
- Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, and Tomaso Poggio. 2015. Learning with a Wasserstein Loss. In Proceedings of NIPS 28. 2053--2061. Google Scholar
Digital Library
- Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-tracking Analysis of User Behavior in WWW Search. In Proceedings of the 27th SIGIR. 478--479. Google Scholar
Digital Library
- John Guiver and Edward Snelson. 2008. Learning to Rank with SoftRank and Gaussian Processes. In Proceedings of the 31st SIGIR. 259--266. Google Scholar
Digital Library
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th CIKM. 55--64. Google Scholar
Digital Library
- Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional Neural Network Architectures for Matching Natural Language Sentences. In Proceedings of NIPS 27. 2042--2050. Google Scholar
Digital Library
- Gao Huang, Chuan Quo, Matt J. Kusner, Yu Sun, Kilian Q. Weinberger, and Fei Sha. 2016. Supervised Word Mover's Distance. In Proceedings of NIPS conference. 4869--4877. Google Scholar
Digital Library
- Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM2013. 2333--2338. Google Scholar
Digital Library
- Thomas Hurtut, Yann Gousseau, and Francis Schmitt. 2008. Adaptive image retrieval based on the spatial organization of colors. Computer Vision and Image Understanding, Vol. 112, 2 (2008), 101--113. Google Scholar
Digital Library
- Kalervo J"arvelin and Jaana Kek"al"ainen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, Vol. 20, 4 (2002), 422--446. Google Scholar
Digital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. Proceedings of the 8th KDD. 133--142. Google Scholar
Digital Library
- Thorsten Joachims. 2006. Training Linear SVMs in Linear Time. In Proceedings of the 12th KDD. 217--226. Google Scholar
Digital Library
- Yanyan Lan, Tie-Yan Liu, Zhiming Ma, and Hang Li. 2009. Generalization Analysis of Listwise Learning-to-rank Algorithms. In Proceedings of the 26th ICML. 577--584. Google Scholar
Digital Library
- Yanyan Lan, Yadong Zhu, Jiafeng Guo, Shuzi Niu, and Xueqi Cheng. 2014. Position-aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the 30th Conference on UAI. 449--458. Google Scholar
Digital Library
- Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.Google Scholar
Cross Ref
- Hang Li. 2011. Learning to Rank for Information Retrieval and Natural Language Processing. Vol. 4. Synthesis Lectures on Human Language Technologies.Google Scholar
- Tie-Yan Liu. 2011. Learning to Rank for Information Retrieval.Springer.Google Scholar
- Grégoire Montavon, Klaus-Robert Müller, and Marco Cuturi. 2016. Wasserstein Training of Restricted Boltzmann Machines. In Proceedings of NIPS conference. 3718--3726. Google Scholar
Digital Library
- Ramesh Nallapati. 2004. Discriminative Models for Information Retrieval. Proceedings of the 27th SIGIR. 64--71. Google Scholar
Digital Library
- Kezban Dilek Onal, Ye Zhang, Ismail Sengor Altingovde, et almbox. 2018. Neural Information Retrieval: At the End of the Early Years. Journal of Information Retrieval, Vol. 21, 2--3 (2018), 111--182. Google Scholar
Digital Library
- Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text Matching As Image Recognition. In Proceedings of AAAI Conference on Artificial Intelligence. 2793--2799. Google Scholar
Digital Library
- Gabriel Peyré and Marco Cuturi. 2018. Computational Optimal Transport.Google Scholar
- Gabriel Peyré, Jalal Fadili, and Julien Rabin. 2012. Wasserstein active contours. In 19th IEEE International Conference on Image Processing. 2541--2544.Google Scholar
Cross Ref
- Tao Qin, Tie-Yan Liu, and Hang Li. 2010. A general approximation framework for direct optimization of information retrieval measures. Journal of Information Retrieval, Vol. 13, 4 (2010), 375--397. Google Scholar
Digital Library
- Tao Qin, Xu-Dong Zhang, Ming-Feng Tsai, De-Sheng Wang, Tie-Yan Liu, and Hang Li. 2008. Query-level loss functions for information retrieval. Information Processing and Management, Vol. 44, 2 (2008), 838--855. Google Scholar
Digital Library
- Pradeep Ravikumar, Ambuj Tewari, and Eunho Yang. 2011. On NDCG Consistency of Listwise Ranking Methods. Proceedings of Machine Learning Research. 618--626.Google Scholar
- Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. 1994. Okapi at TREC-3. In Proceedings of TREC.Google Scholar
- Antoine Rolet, Marco Cuturi, and Gabriel Peyré. 2016. Fast Dictionary Learning with a Smoothed Wasserstein Loss. In Proceedings of the 19th International Conference on AIS. 630--638.Google Scholar
- Libin Shen and Aravind K. Joshi. 2005. Ranking and Reranking with Perceptron. Machine Learning, Vol. 60, 1--3 (2005), 73--96. Google Scholar
Digital Library
- Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. 2014. Learning Semantic Representations Using Convolutional Neural Networks for Web Search. In Proceedings of the 23rd WWW. 373--374. Google Scholar
Digital Library
- Richard Sinkhorn. 1967. Diagonal Equivalence to Matrices with Prescribed Row and Column Sums. The American Mathematical Monthly, Vol. 74, 4 (1967), 402--405.Google Scholar
Cross Ref
- Karen Sparck Jones. 1972. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, Vol. 28, 1 (1972), 11--21.Google Scholar
Cross Ref
- Michael Taylor, John Guiver, Stephen Robertson, and Tom Minka. 2008. SoftRank: Optimizing Non-smooth Rank Metrics. In Proceedings of the 1st WSDM. 77--86. Google Scholar
Digital Library
- Maksims N. Volkovs and Richard S. Zemel. 2009. BoltzRank: Learning to Maximize Expected Ranking Gain. In Proceedings of ICML conference. 1089--1096. Google Scholar
Digital Library
- Shengxian Wan, Yanyan Lan, Jun Xu, Jiafeng Guo, Liang Pang, and Xueqi Cheng. 2016. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN. In Proceedings of IJCAI conference. 2922--2928. Google Scholar
Digital Library
- Chao Wang, Yiqun Liu, Meng Wang, Ke Zhou, Jian-yun Nie, and Shaoping Ma. 2015. Incorporating Non-sequential Behavior into Click Models. In Proceedings of the 38th SIGIR. 283--292. Google Scholar
Digital Library
- Qiang Wu, Christopher J. Burges, Krysta M. Svore, and Jianfeng Gao. 2010. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, Vol. 13, 3 (2010), 254--270. Google Scholar
Digital Library
- Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th ICML. 1192--1199. Google Scholar
Digital Library
- Jia Xu, Bin Lei, Yu Gu, Marianne Winslett, Ge Yu, and Zhenjie Zhang. 2015. Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, 8 (2015), 2148--2162.Google Scholar
Cross Ref
- Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In Proceedings of the 30th SIGIR. 391--398. Google Scholar
Digital Library
- Fajie Yuan, Guibing Guo, Joemon Jose, Long Chen, Hai-Tao Yu, and Weinan Zhang. 2016. LambdaFM: Learning Optimal Ranking with Factorization Machines Using Lambda Surrogates. In Proceedings of the 25th CIKM. 227--236. Google Scholar
Digital Library
- Yisong Yue, Thomas Finley, Filip Radlinski, and Thorsten Joachims. 2007. A Support Vector Method for Optimizing Average Precision. In Proceedings of the 30th SIGIR. 271--278. Google Scholar
Digital Library
- Martin A. Zinkevich, Markus Weimer, Alex Smola, and Lihong Li. 2010. Parallelized Stochastic Gradient Descent. In Proceedings of NIPS conference. 2595--2603. Google Scholar
Digital Library
Index Terms
WassRank


Hideo Joho


Comments