ABSTRACT
To cope with ambiguous and/or underspecified queries, search result diversification (SRD) is a key technique that has attracted a lot of attention. This paper focuses on implicit SRD, where the possible subtopics underlying a query are unknown beforehand. We formulate implicit SRD as a process of selecting and ranking k exemplar documents that utilizes integer linear programming (ILP). Unlike the common practice of relying on approximate methods, this formulation enables us to obtain the optimal solution of the objective function. Based on four benchmark collections, our extensive empirical experiments reveal that: (1) The factors, such as different initial runs, the number of input documents, query types and the ways of computing document similarity significantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. (2) The proposed method can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD.
References
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the 2nd WSDM, pages 5--14, 2009. Google Scholar
Digital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd ICML, pages 89--96, 2005. Google Scholar
Digital Library
- Z. Cao, T. Qin, T. Liu, M. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th ICML, pages 129--136, 2007. Google Scholar
Digital Library
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st SIGIR, pages 335--336, 1998. Google Scholar
Digital Library
- C. L. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC, 2009.Google Scholar
- C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st SIGIR, pages 659--666, 2008. Google Scholar
Digital Library
- V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In Proceedings of the 35th SIGIR, pages 65--74, 2012. Google Scholar
Digital Library
- T. Deng and W. Fan. On the complexity of query result diversification. ACM Transactions on Database Systems, 39(15):15:1--15:46, 2014.Google Scholar
Digital Library
- B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972--976, 2007. Google Scholar
Cross Ref
- M. J. Garnett, E. J. Edelman, S. J. Heidorn, et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, pages 570--575, 2012. Google Scholar
Cross Ref
- I. E. Givoni and B. J. Frey. A binary variable model for affinity propagation. Neural Computation, 21(6):1589--1600, 2009. Google Scholar
Digital Library
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th WWW, pages 381--390, 2009. Google Scholar
Digital Library
- S. Guo and S. Sanner. Probabilistic latent maximal marginal relevance. In Proceedings of the 33rd SIGIR, pages 833--834, 2010. Google Scholar
Digital Library
- J. He, E. Meij, and M. de Rijke. Result diversification based on Query-specific cluster ranking. JASIST, 62(3):550--571, 2011. Google Scholar
Digital Library
- O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, 2009. Google Scholar
Digital Library
- O. Kurland. The cluster hypothesis in information retrieval. In SIGIR2013 tutorial, 2013. Google Scholar
Digital Library
- O. Kurland and L. Lee. Clusters, language models, and ad hoc information retrieval. ACM Transactions on Information Systems, 27(3):13:1--13:39, 2009.Google Scholar
Digital Library
- J. H. Lee. Analyses of multiple evidence combination. In Proceedings of the 20th SIGIR, pages 267--276, 1997. Google Scholar
Digital Library
- X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of the 27th SIGIR, pages 186--193, 2004. Google Scholar
Digital Library
- X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the 30th ECIR, pages 454--462, 2008. Google Scholar
Cross Ref
- A. F. T. Martins, N. A. Smith, and E. P. Xing. Concise integer linear programming formulations for dependency parsing. In Proceedings of the 47th ACL, pages 342--350, 2009. Google Scholar
Cross Ref
- F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th ICML, pages 784--791, 2008. Google Scholar
Digital Library
- F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proceedings of the 36th SIGIR, pages 333--342, 2013. Google Scholar
Digital Library
- C. J. V. Rijsbergen. Information Retrieval. 2nd edition, 1979.Google Scholar
- S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proceedings of TREC, 1994.Google Scholar
- D. Roth and W. Yih. Integer linear programming inference for conditional random fields. In Proceedings of the 22nd ICML, pages 736--743, 2005. Google Scholar
Digital Library
- S. Sanner, S. Guo, T. Graepel, S. Kharazmi, and S. Karimi. Diverse retrieval via greedy optimization of expected [email protected] in a latent subtopic relevance model. In Proceedings of the 20th CIKM, pages 1977--1980, 2011. Google Scholar
Digital Library
- R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th WWW, pages 881--890, 2010. Google Scholar
Digital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Search result diversification. Foundations and Trends in Information Retrieval, 9(1):1--90, 2015. Google Scholar
Digital Library
- J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proceedings of the 32nd SIGIR, pages 115--122, 2009. Google Scholar
Digital Library
- Y. Wang and L. Chen. K-MEAP: multiple exemplars affinity propagation with specified K clusters. IEEE Transactions on Neural Networks and Learning Systems, PP(99):1--13, 2015.Google Scholar
- K. Woodsend and M. Lapata. Multiple aspect summarization using integer linear programming. In EMNLP-CoNLL2012, pages 233--243, 2012.Google Scholar
- L. Xia, J. Xu, Y. Lan, J. Guo, and X. Cheng. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In Proceedings of the 38th SIGIR, pages 113--122, 2015. Google Scholar
Digital Library
- J. Xiao, J. Wang, P. Tan, and L. Quan. Joint affinity propagation for multiple view segmentation. In Proceedings of the 11th ICCV, pages 1--7, 2007. Google Scholar
Cross Ref
- H. Yu and F. Ren. Search result diversification via filling up multiple knapsacks. In Proceedings of the 23rd CIKM, pages 609--618, 2014. Google Scholar
Digital Library
- J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T. Liu, and W. Ma. LightLDA: big topic models on modest computer clusters. In Proceedings of the 24th WWW, pages 1351--1361, 2015. Google Scholar
Digital Library
- Y. Yue and T. Joachims. Predicting diverse subsets using structural SVMs. In Proceedings of the 25th ICML, pages 1224--1231, 2008. Google Scholar
Digital Library
- C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2):179--214, 2004. Google Scholar
Digital Library
- G. Zuccon and L. Azzopardi. Using the quantum probability ranking principle to rank interdependent documents. In Proceedings of the 32nd ECIR, pages 357--369, 2010. Google Scholar
Digital Library
- G. Zuccon, L. Azzopardi, D. Zhang, and J. Wang. Top-k retrieval using facility location analysis. In Proceedings of the 34th ECIR, pages 305--316, 2012. Google Scholar
Digital Library
Index Terms
A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification


Hideo Joho


Comments