skip to main content
research-article

Pre-trained Language Model-based Retrieval and Ranking for Web Search

Published:20 December 2022Publication History
Skip Abstract Section

Abstract

Pre-trained language representation models (PLMs) such as BERT and Enhanced Representation through kNowledge IntEgration (ERNIE) have been integral to achieving recent improvements on various downstream tasks, including information retrieval. However, it is nontrivial to directly utilize these models for the large-scale web search due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web document, prohibit their deployments in the web search system that demands extremely low latency; (2) the discrepancy between existing task-agnostic pre-training objectives and the ad hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online retrieval and ranking effectiveness; and (3) to create a significant impact on real-world applications, it also calls for practical solutions to seamlessly interweave the resultant PLM and other components into a cooperative system to serve web-scale data. Accordingly, we contribute a series of successfully applied techniques in tackling these exposed issues in this work when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first present novel practices to perform expressive PLM-based semantic retrieval with a flexible poly-interaction scheme and cost-efficiently contextualize and rank web documents with a cheap yet powerful Pyramid-ERNIE architecture. We then endow innovative pre-training and fine-tuning paradigms to explicitly incentivize the query-document relevance modeling in PLM-based retrieval and ranking with the large-scale noisy and biased post-click behavioral data. We also introduce a series of effective strategies to seamlessly interwoven the designed PLM-based models with other conventional components into a cooperative system. Extensive offline and online experimental results show that our proposed techniques are crucial to achieving more effective search performance. We also provide a thorough analysis of our methodology and experimental results.

REFERENCES

  1. [1] Aghajanyan Armen, Shrivastava Akshat, Gupta Anchit, Goyal Naman, Zettlemoyer Luke, and Gupta Sonal. 2020. Better fine-tuning by reducing representational collapse. arXiv:2008.03156. Retrieved from https://arxiv.org/abs/2008.03156.Google ScholarGoogle Scholar
  2. [2] Arumae Kristjan, Sun Qing, and Bhatia Parminder. 2020. An empirical investigation towards efficient multi-domain language model pre-training. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 48544864.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Baevski Alexei, Edunov Sergey, Liu Yinhan, Zettlemoyer Luke, and Auli Michael. 2019. Cloze-driven pretraining of self-attention networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 53605369.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Beltagy Iz, Peters Matthew E., and Cohan Arman. 2020. Longformer: The long-document transformer. arXiv:2004.05150. Retrieved from https://arxiv.org/abs/2004.05150.Google ScholarGoogle Scholar
  5. [5] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, et al. 2020. Language models are few-shot learners. arXiv:2005.14165. Retrieved from https://arxiv.org/abs/2005.14165.Google ScholarGoogle Scholar
  6. [6] Burges Christopher J. C.. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning.Google ScholarGoogle Scholar
  7. [7] Cai Yinqiong, Fan Yixing, Guo Jiafeng, Sun Fei, Zhang Ruqing, and Cheng Xueqi. 2021. Semantic models for the first-stage retrieval: A comprehensive review. arXiv:2103.04831. Retrieved from https://arxiv.org/abs/2103.04831.Google ScholarGoogle Scholar
  8. [8] Cao Zhe, Qin Tao, Liu Tie-Yan, Tsai Ming-Feng, and Li Hang. 2007. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning. 129136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chang Wei-Cheng, Felix X. Yu, Chang Yin-Wen, Yang Yiming, and Kumar Sanjiv. 2019. Pre-training tasks for embedding-based large-scale retrieval. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  10. [10] Chapelle Olivier, Joachims Thorsten, Radlinski Filip, and Yue Yisong. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Trans. Inf. Sys. 30, 1 (2012), 141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chapelle Olivier and Zhang Ya. 2009. A dynamic bayesian network click model for web search ranking. In Proceedings of the 18th International Conference on World Wide Web. 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Chen J., Dong Hande, Wang Xiao lei, Feng Fuli, Wang Ming-Chieh, and He X.. 2020. Bias and debias in recommender system: A survey and future directions. arXiv:2010.03240. Retrieved from https://arxiv.org/abs/2010.03240.Google ScholarGoogle Scholar
  13. [13] Choromanski Krzysztof, Likhosherstov Valerii, Dohan David, Song Xingyou, Gane Andreea, Sarlos Tamas, Hawkins Peter, Davis Jared, Mohiuddin Afroz, Kaiser Lukasz, et al. 2020. Rethinking attention with performers. arXiv:2009.14794. Retrieved from https://arxiv.org/abs/2009.14794.Google ScholarGoogle Scholar
  14. [14] Chu Xiaokai, Zhao Jiashu, Zou Lixin, and Yin Dawei. 2022. H-ERNIE: A multi-granularity pre-trained language model for web search. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 14781489.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Cooper William S., Gey Fredric C., and Dabney Daniel P.. 1992. Probabilistic retrieval based on staged logistic regression. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 198210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Ding Ming, Zhou Chang, Yang Hongxia, and Tang Jie. 2020. CogLTX: Applying BERT to long texts. In Advances in Neural Information Processing Systems.Google ScholarGoogle Scholar
  17. [17] Dong Xingping and Shen Jianbing. 2018. Triplet loss in siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV’18). 459474.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Firat Orhan, Sankaran Baskaran, Al-Onaizan Yaser, Vural Fatos T. Yarman, and Cho Kyunghyun. 2016. Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 268277.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Freund Yoav, Iyer Raj, Schapire Robert E., and Singer Yoram. 2003. An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4 (November 2003), 933969.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Freund Yoav and Schapire Robert E.. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (1997), 119–139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Gao Jianfeng, Toutanova Kristina, and Yih Wen-tau. 2011. Clickthrough-based latent semantic models for web search. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 675684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Gao Luyu and Dai Zhuyun. 2020. Modularized transformer-based ranking framework. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  23. [23] Gao Luyu, Dai Zhuyun, and Callan Jamie. 2021. Rethink training of BERT rerankers in multi-stage retrieval pipeline. In Proceedings of the European Conference on Information Retrieval. Springer, 280286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Goodfellow Ian, Bengio Yoshua, Courville Aaron, and Bengio Yoshua. 2016. Deep Learning. Vol. 1. MIT Press Cambridge.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Gu Yulong, Ding Zhuoye, Wang Shuaiqiang, Zou Lixin, Liu Yiding, and Yin Dawei. 2020. Deep multifaceted transformers for multi-objective ranking in large-scale e-commerce recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 24932500.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Guo Jiafeng, Fan Yixing, Ai Qingyao, and Croft W. Bruce. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 5564.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Guo Siyuan, Zou Lixin, Liu Yiding, Ye Wenwen, Cheng Suqi, Wang Shuaiqiang, Chen Hechang, Yin Dawei, and Chang Yi. 2021. Enhanced doubly robust learning for debiasing post-click conversion rate estimation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 275284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Gururangan Suchin, Marasović Ana, Swayamdipta Swabha, Lo Kyle, Beltagy Iz, Downey Doug, and Smith Noah A.. 2020. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv:2004.10964. Retrieved from https://arxiv.org/abs/2004.10964.Google ScholarGoogle Scholar
  29. [29] Haldar Malay, Ramanathan Prashant, Sax Tyler, Abdool Mustafa, Zhang Lanbo, Mansawala Aamir, Yang Shulin, Turnbull Bradley, and Liao Junshuo. 2020. Improving deep learning for airbnb search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 28222830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Hinton Geoffrey, Vinyals Oriol, and Dean Jeff. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https://arxiv.org/abs/1503.02531.Google ScholarGoogle Scholar
  31. [31] Hou Lu, Huang Zhiqi, Shang Lifeng, Jiang Xin, Chen Xiao, and Liu Qun. 2020. Dynabert: Dynamic bert with adaptive width and depth. arXiv:2004.04037. Retrieved from https://arxiv.org/abs/2004.04037.Google ScholarGoogle Scholar
  32. [32] Huang Jui-Ting, Sharma Ashish, Sun Shuying, Xia Li, Zhang David, Pronin Philip, Padmanabhan Janani, Ottaviano Giuseppe, and Yang Linjun. 2020. Embedding-based retrieval in facebook search. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 25532561.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Huang Po-Sen, He Xiaodong, Gao Jianfeng, Deng Li, Acero Alex, and Heck Larry. 2013. Learning deep structured semantic models for web search using clickthrough data. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 23332338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Hubara Itay, Courbariaux Matthieu, Soudry Daniel, El-Yaniv Ran, and Bengio Yoshua. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. J. Mach. Learn. Res. 18, 1 (2017), 68696898.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Humeau Samuel, Shuster Kurt, Lachaux Marie-Anne, and Weston Jason. 2019. Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring. arXiv:1905.01969.Google ScholarGoogle Scholar
  36. [36] Jacob Benoit, Kligys Skirmantas, Chen Bo, Zhu Menglong, Tang Matthew, Howard Andrew, Adam Hartwig, and Kalenichenko Dmitry. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27042713.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Järvelin Kalervo and Kekäläinen Jaana. 2017. IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, 243250.Google ScholarGoogle Scholar
  38. [38] Jiang Haoming, He Pengcheng, Chen Weizhu, Liu Xiaodong, Gao Jianfeng, and Zhao Tuo. 2019. Smart: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. arXiv:1911.03437.Google ScholarGoogle Scholar
  39. [39] Joachims Thorsten. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 133142.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Johnson Melvin, Schuster Mike, Le Quoc V., Krikun Maxim, Wu Yonghui, Chen Zhifeng, Thorat Nikhil, Viégas Fernanda, Wattenberg Martin, Corrado Greg, et al. 2017. Google’s multilingual neural machine translation system: Enabling zero-shot translation. Trans. Assoc. Comput. Ling. 5 (2017), 339351.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Karpukhin Vladimir, Oguz Barlas, Min Sewon, Lewis Patrick, Wu Ledell, Edunov Sergey, Chen Danqi, and Yih Wen-tau. 2020. Dense passage retrieval for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 67696781.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Kenton Jacob Devlin Ming-Wei Chang and Toutanova Lee Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT’19). 41714186.Google ScholarGoogle Scholar
  43. [43] Khattab Omar and Zaharia Matei. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 3948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. (2014).Google ScholarGoogle Scholar
  45. [45] Kitaev Nikita, Kaiser Łukasz, and Levskaya Anselm. 2020. Reformer: The efficient transformer. arXiv:2001.04451. Retrieved from https://arxiv.org/abs/2001.04451.Google ScholarGoogle Scholar
  46. [46] Krishnamoorthi Raghuraman. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342. Retrieved from https://arxiv.org/abs/1806.08342.Google ScholarGoogle Scholar
  47. [47] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942. Retrieved from https://arxiv.org/abs/1909.11942.Google ScholarGoogle Scholar
  48. [48] Lee Jinhyuk, Yoon Wonjin, Kim Sungdong, Kim D., Kim Sunkyu, So Chan Ho, and Kang Jaewoo. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. BioinformaticsBioinformatics 36, 4 (2020), 1234–1240.Google ScholarGoogle Scholar
  49. [49] Lee Kenton, Chang Ming-Wei, and Toutanova Kristina. 2019. Latent retrieval for weakly supervised open domain question answering. arXiv:1906.00300. Retrieved from https://arxiv.org/abs/1906.00300.Google ScholarGoogle Scholar
  50. [50] Li Hang and Xu Jun. 2014. Semantic matching in search. Found. Trends Inf. Retriev. 7, 5 (2014), 343469.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Li Ping, Wu Qiang, and Burges Christopher. 2007. Mcrank: Learning to rank using multiple classification and gradient boosting. Adv. Neural Inf. Process. Syst. 20 (2007).Google ScholarGoogle Scholar
  52. [52] Lin Jimmy, Nogueira Rodrigo, and Yates Andrew. 2021. Pretrained transformers for text ranking: Bert and beyond. Synth. Lect. Hum. Lang. Technol. 14, 4 (2021), 1325.Google ScholarGoogle Scholar
  53. [53] Liu Tie-Yan et al. 2009. Learning to rank for information retrieval. Found. Trends Inf. Retriev. 3, 3 (2009), 225331.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Liu Yiding, Gu Yulong, Ding Zhuoye, Gao Junchao, Guo Ziyi, Bao Yongjun, and Yan Weipeng. 2020. Decoupled graph convolution network for inferring substitutable and complementary items. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 26212628.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Liu Yiding, Lu Weixue, Cheng Suqi, Shi Daiting, Wang Shuaiqiang, Cheng Zhicong, and Yin Dawei. 2021. Pre-trained language model for web-scale retrieval in baidu search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 33653375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Liu Y., Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis M., Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. Retrieved from https://arxiv.org/abs/1907.11692.Google ScholarGoogle Scholar
  57. [57] Lu Jing, Abrego Gustavo Hernandez, Ma Ji, Ni Jianmo, and Yang Yinfei. 2020. Neural passage retrieval with improved negative contrast. arXiv:2010.12523. Retrieved from https://arxiv.org/abs/2010.12523.Google ScholarGoogle Scholar
  58. [58] Lu Wenhao, Jiao Jian, and Zhang Ruofei. 2020. Twinbert: Distilling knowledge to twin-structured compressed bert models for large-scale retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 26452652.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Lu Zhengdong and Li Hang. 2013. A deep architecture for matching short texts. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NIPS’13), Vol. 26. Curran Associates, Inc.Google ScholarGoogle Scholar
  60. [60] Luan Yi, Eisenstein Jacob, Toutanova Kristina, and Collins Michael. 2020. Sparse, dense, and attentional representations for text retrieval. arXiv:2005.00181. Retrieved from https://arxiv.org/abs/2005.00181.Google ScholarGoogle Scholar
  61. [61] Luo Dan, Zou Lixin, Ai Qingyao, Chen Zhiyu, Yin Dawei, and Davison Brian D.. 2022. Model-based unbiased learning to rank. arXiv:2207.11785. Retrieved from https://arxiv.org/abs/2207.11785.Google ScholarGoogle Scholar
  62. [62] Ma Xinyu, Guo Jiafeng, Zhang Ruqing, Fan Yixing, Ji Xiang, and Cheng Xueqi. 2020. PROP: Pre-training with representative words prediction for ad-hoc retrieval. arXiv:2010.10137. Retrieved from https://arxiv.org/abs/2010.10137.Google ScholarGoogle Scholar
  63. [63] Ma Xinyu, Guo Jiafeng, Zhang Ruqing, Fan Yixing, Ji Xiang, and Cheng Xueqi. 2021. B-PROP: Bootstrapped pre-training with representative words prediction for ad-hoc retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval.Google ScholarGoogle Scholar
  64. [64] Maimon Oded Z. and Rokach Lior. 2014. Data Mining with Decision Trees: Theory and Applications. World scientific.Google ScholarGoogle Scholar
  65. [65] Malkov Yury, Ponomarenko Alexander, Logvinov Andrey, and Krylov Vladimir. 2012. Scalable distributed algorithm for approximate nearest neighbor search problem in high dimensional general metric spaces. In International Conference on Similarity Search and Applications. Springer, 132147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Malkov Yury, Ponomarenko Alexander, Logvinov Andrey, and Krylov Vladimir. 2014. Approximate nearest neighbor algorithm based on navigable small world graphs. Inf. Syst. 45 (2014), 6168.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Malkov Yu A. and Yashunin Dmitry A.. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42, 4 (2018), 824836.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] McDonald Ryan, Brokos George, and Androutsopoulos Ion. 2018. Deep relevance ranking using enhanced document-query interactions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’18).Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Mitra Bhaskar, Craswell Nick, et al. 2018. An introduction to neural information retrieval. Found. Trends Inf. Retriev. (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Mitra Bhaskar, Diaz Fernando, and Craswell Nick. 2017. Learning to match using local and distributed representations of text for web search. In Proceedings of the 26th International Conference on World Wide Web. 12911299.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Nogueira Rodrigo and Cho Kyunghyun. 2019. Passage re-ranking with BERT. arXiv:1901.04085. Retrieved from https://arxiv.org/abs/1901.04085.Google ScholarGoogle Scholar
  72. [72] Nogueira Rodrigo, Yang W., Cho Kyunghyun, and Lin Jimmy. 2019. Multi-stage document ranking with BERT. arXiv:1910.14424. Retrieved from https://arxiv.org/abs/1910.14424.Google ScholarGoogle Scholar
  73. [73] Palangi Hamid, Deng Li, Shen Yelong, Gao Jianfeng, He Xiaodong, Chen Jianshu, Song Xinying, and Ward R.. 2014. Semantic modelling with long-short-term memory for information retrieval. arXiv:1412.6629. Retrieved from https://arxiv.org/abs/1412.6629.Google ScholarGoogle Scholar
  74. [74] Palangi Hamid, Palangi H., Deng L., Shen Y., Gao J., He X., Chen J., Song X., and Ward R.. 2015. Deep sentence embedding using the long short term memory network: Analysis and application to information retrieval.Google ScholarGoogle Scholar
  75. [75] Pasandi Morteza Mousa, Hajabdollahi Mohsen, Karimi Nader, and Samavi Shadrokh. 2020. Modeling of pruning techniques for deep neural networks simplification. arXiv:2001.04062. Retrieved from https://arxiv.org/abs/200104062.Google ScholarGoogle Scholar
  76. [76] Peters Matthew E., Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. arXiv:1802.05365. Retrieved from https://arxiv.org/abs/1802.05365.Google ScholarGoogle Scholar
  77. [77] Ponomarenko Alexander, Malkov Yury, Logvinov Andrey, and Krylov Vladimir. 2011. Approximate nearest neighbor search small world approach. In Proceedings of the International Conference on Information and Communication Technologies & Applications, Vol. 17.Google ScholarGoogle Scholar
  78. [78] Pradeep Ronak, Nogueira Rodrigo, and Lin Jimmy. 2021. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv:2101.05667. Retrieved from https://arxiv.org/abs/2101.05667.Google ScholarGoogle Scholar
  79. [79] Pruksachatkun Yada, Phang Jason, Liu Haokun, Htut Phu Mon, Zhang Xiaoyi, Pang Richard Yuanzhe, Vania Clara, Kann Katharina, and Bowman Samuel R.. 2020. Intermediate-task transfer learning with pretrained models for natural language understanding: When and why does it work? arXiv:2005.00628. Retrieved from https://arxiv.org/abs/2005.00628.Google ScholarGoogle Scholar
  80. [80] Roberts A. and Raffel C.. 2020. Exploring transfer learning with T5: The text-to-text transfer transformer. Google AI Blog. https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html.Google ScholarGoogle Scholar
  81. [81] Robertson Stephen and Zaragoza Hugo. 2009. The Probabilistic Relevance Framework: BM25 and Beyond. Now Publishers Inc.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Rosasco Lorenzo, Vito Ernesto De, Caponnetto Andrea, Piana Michele, and Verri Alessandro. 2004. Are loss functions all the same? Neural Comput. (2004).Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Salakhutdinov Ruslan and Hinton Geoffrey. 2009. Semantic hashing. Int. J. Adv. Res. 50, 7 (2009), 969978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. [84] Severyn Aliaksei and Moschitti Alessandro. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. 373382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. [85] Shen Yelong, He Xiaodong, Gao Jianfeng, Deng Li, and Mesnil Grégoire. 2014. A latent semantic model with convolutional-pooling structure for information retrieval. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 101110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Shen Yelong, He Xiaodong, Gao Jianfeng, Deng Li, and Mesnil Grégoire. 2014. Learning semantic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Conference on World Wide Web. 373374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. [87] Sun Yu, Wang Shuohuan, Li Yukun, Feng Shikun, Chen Xuyi, Zhang Han, Tian Xin, Zhu Danxiang, Tian Hao, and Wu Hua. 2019. ERNIE: Enhanced representation through knowledge integration. arXiv:1904.09223.Google ScholarGoogle Scholar
  88. [88] Tay Yi, Dehghani Mostafa, Bahri Dara, and Metzler Donald. 2020. Efficient transformers: A survey. arXiv:2009.06732. Retrieved from https://arxiv.org/abs/2009.06732.Google ScholarGoogle Scholar
  89. [89] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017).Google ScholarGoogle Scholar
  90. [90] Wan Shengxian, Lan Yanyan, Guo Jiafeng, Xu Jun, Pang Liang, and Cheng Xueqi. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Wang Shuohang, Fang Yuwei, Sun Siqi, Gan Zhe, Cheng Yu, Jiang Jing, and Liu Jingjing. 2020. Cross-thought for sentence encoder pre-training. arXiv:2010.03652. Retrieved from https://arxiv.org/abs/2010.06352.Google ScholarGoogle Scholar
  92. [92] Wang Sinong, Li Belinda, Khabsa Madian, Fang Han, and Ma Hao. 2020. Linformer: Self-attention with linear complexity. arXiv:2006.04768. Retrieved from https://arxiv.org/abs/2006.04768.Google ScholarGoogle Scholar
  93. [93] Xia Yingce, Tan Xu, Tian Fei, Qin Tao, Yu Nenghai, and Liu Tie-Yan. 2018. Model-level dual learning. In International Conference on Machine Learning. PMLR, 53835392.Google ScholarGoogle Scholar
  94. [94] Xiong Chenyan, Dai Zhuyun, Callan Jamie, Liu Zhiyuan, and Power Russell. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 5564.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Xiong Lee, Xiong Chenyan, Li Ye, Tang Kwok-Fung, Liu Jialin, Bennett Paul, Ahmed Junaid, and Overwijk Arnold. 2020. Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv:2007.00808.Google ScholarGoogle Scholar
  96. [96] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime, Salakhutdinov Russ R., and Le Quoc V.. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019).Google ScholarGoogle Scholar
  97. [97] Yih Wen-tau, Toutanova Kristina, Platt John C., and Meek Christopher. 2011. Learning discriminative projections for text similarity measures. In Proceedings of the 15th Conference on Computational Natural Language Learning. 247256.Google ScholarGoogle Scholar
  98. [98] Yin Dawei, Hu Yuening, Tang Jiliang, Daly Tim, Zhou Mianwei, Ouyang Hua, Chen Jianhui, Kang Changsung, Deng Hongbo, Nobata Chikashi, et al. 2016. Ranking relevance in yahoo search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 323332.Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. [99] Zhang Han, Wang Songlin, Zhang Kang, Tang Zhiling, Jiang Yunjiang, Xiao Yun, Yan Weipeng, and Yang Wen-Yun. 2020. Towards personalized and semantic retrieval: An end-to-end solution for e-commerce search via embedding learning. arXiv:2006.02282. Retrieved from https://arxiv.org/abs/2006.02282.Google ScholarGoogle Scholar
  100. [100] Zhang Jingqing, Zhao Yao, Saleh Mohammad, and Liu Peter. 2020. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning. PMLR, 1132811339.Google ScholarGoogle Scholar
  101. [101] Zhang R., Reddy Revanth Reddy Gangi, Sultan Md Arafat, Castelli V., Ferritto Anthony, Florian Radu, Kayi Efsun Sarioglu, Roukos S., Sil A., and Ward T.. 2020. Multi-stage pre-training for low-resource domain adaptation. arXiv:2010.05904. Retrieved from https://arxiv.org/abs/2010.05904.Google ScholarGoogle Scholar
  102. [102] Zhao Shiqi, Wang H., Li Chao, Liu T., and Guan Y.. 2011. Automatically generating questions from queries for community-based question answering. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP’11).Google ScholarGoogle Scholar
  103. [103] Zhao Xiangyu, Liu Haochen, Liu Hui, Tang Jiliang, Guo Weiwei, Shi Jun, Wang Sida, Gao Huiji, and Long Bo. 2020. Memory-efficient embedding for recommendations. arXiv:2006.14827. Retrieved from https://arxiv.org/abs/2006.14827.Google ScholarGoogle Scholar
  104. [104] Zhao Xiangyu, Wang Chong, Chen Ming, Zheng Xudong, Liu Xiaobing, and Tang Jiliang. 2020. Autoemb: Automated embedding dimensionality search in streaming recommendations. arXiv:2002.11252. Retrieved from https://arxiv.org/abs/2002.11252.Google ScholarGoogle Scholar
  105. [105] Zhao Xiangyu, Xia Long, Zou Lixin, Liu Hui, Yin Dawei, and Tang Jiliang. 2020. Whole-chain recommendations. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 18831891.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. [106] Zheng Zhaohui, Chen Keke, Sun Gordon, and Zha Hongyuan. 2007. A regression framework for learning ranking functions using relative relevance judgments. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 287294.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. [107] Zhou Wangchunshu, Lee Dong-Ho, Selvam Ravi Kiran, Lee Seyeon, Lin Bill Yuchen, and Ren Xiang. 2020. Pre-training text-to-text transformers for concept-centric common sense. arXiv:2011.07956. Retrieved from https://arxiv.org/abs/2011.07956.Google ScholarGoogle Scholar
  108. [108] Ziegler Daniel M., Stiennon Nisan, Wu Jeffrey, Brown Tom B., Radford Alec, Amodei Dario, Christiano Paul, and Irving Geoffrey. 2019. Fine-tuning language models from human preferences. arXiv:1909.08593. Retrieved from https://arxiv.org/abs/1909.08593.Google ScholarGoogle Scholar
  109. [109] Zou Lixin, Mao Haitao, Chu Xiaokai, Tang Jiliang, Ye Wenwen, Wang Shuaiqiang, and Yin Dawei. 2022. A large scale search dataset for unbiased learning to rank. In NeurIPS Dataset Track.Google ScholarGoogle Scholar
  110. [110] Zou Lixin, Hao Changying, Cai Hengyi, Cheng Suqi, Wang Shuaiqiang, Ye Wenwen, Cheng Zhicong, Gu Simiu, and Yin Dawei. 2022. Approximated doubly robust search relevance estimation. In Proceedings of the 31th ACM International Conference on Information & Knowledge Management.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. [111] Zou Lixin, Xia Long, Ding Zhuoye, Song Jiaxing, Liu Weidong, and Yin Dawei. 2019. Reinforcement learning to optimize long-term user engagement in recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 28102818.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. [112] Zou Lixin, Xia Long, Du Pan, Zhang Zhuo, Bai Ting, Liu Weidong, Nie Jian-Yun, and Yin Dawei. 2020. Pseudo dyna-Q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining. 816824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Zou Lixin, Xia Long, Gu Yulong, Zhao Xiangyu, Liu Weidong, Huang Jimmy Xiangji, and Yin Dawei. 2020. Neural interactive collaborative filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 749758.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. [114] Zou Lixin, Zhang Shengqiang, Cai Hengyi, Ma Dehong, Cheng Suqi, Wang Shuaiqiang, Shi Daiting, Cheng Zhicong, and Yin Dawei. 2021. Pre-trained language model based ranking in baidu search. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 40144022.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Pre-trained Language Model-based Retrieval and Ranking for Web Search

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 17, Issue 1
        February 2023
        189 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/3575872
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 December 2022
        • Online AM: 20 October 2022
        • Accepted: 13 September 2022
        • Revised: 13 August 2022
        • Received: 27 May 2022
        Published in tweb Volume 17, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!