10.1145/3018661.3018710acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedings
research-article

A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification

ABSTRACT

To cope with ambiguous and/or underspecified queries, search result diversification (SRD) is a key technique that has attracted a lot of attention. This paper focuses on implicit SRD, where the possible subtopics underlying a query are unknown beforehand. We formulate implicit SRD as a process of selecting and ranking k exemplar documents that utilizes integer linear programming (ILP). Unlike the common practice of relying on approximate methods, this formulation enables us to obtain the optimal solution of the objective function. Based on four benchmark collections, our extensive empirical experiments reveal that: (1) The factors, such as different initial runs, the number of input documents, query types and the ways of computing document similarity significantly affect the performance of diversification models. Careful examinations of these factors are highly recommended in the development of implicit SRD methods. (2) The proposed method can achieve substantially improved performance over the state-of-the-art unsupervised methods for implicit SRD.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proceedings of the 2nd WSDM, pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd ICML, pages 89--96, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Z. Cao, T. Qin, T. Liu, M. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th ICML, pages 129--136, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st SIGIR, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. L. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 web track. In TREC, 2009.Google ScholarGoogle Scholar
  6. C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proceedings of the 31st SIGIR, pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Dang and W. B. Croft. Diversity by proportionality: an election-based approach to search result diversification. In Proceedings of the 35th SIGIR, pages 65--74, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Deng and W. Fan. On the complexity of query result diversification. ACM Transactions on Database Systems, 39(15):15:1--15:46, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972--976, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  10. M. J. Garnett, E. J. Edelman, S. J. Heidorn, et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature, pages 570--575, 2012. Google ScholarGoogle ScholarCross RefCross Ref
  11. I. E. Givoni and B. J. Frey. A binary variable model for affinity propagation. Neural Computation, 21(6):1589--1600, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proceedings of the 18th WWW, pages 381--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Guo and S. Sanner. Probabilistic latent maximal marginal relevance. In Proceedings of the 33rd SIGIR, pages 833--834, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. He, E. Meij, and M. de Rijke. Result diversification based on Query-specific cluster ranking. JASIST, 62(3):550--571, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. O. Kurland. Re-ranking search results using language models of query-specific clusters. Journal of Information Retrieval, 12(4):437--460, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Kurland. The cluster hypothesis in information retrieval. In SIGIR2013 tutorial, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Kurland and L. Lee. Clusters, language models, and ad hoc information retrieval. ACM Transactions on Information Systems, 27(3):13:1--13:39, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. H. Lee. Analyses of multiple evidence combination. In Proceedings of the 20th SIGIR, pages 267--276, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proceedings of the 27th SIGIR, pages 186--193, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. Liu and W. B. Croft. Evaluating text representations for retrieval of the best group of documents. In Proceedings of the 30th ECIR, pages 454--462, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  21. A. F. T. Martins, N. A. Smith, and E. P. Xing. Concise integer linear programming formulations for dependency parsing. In Proceedings of the 47th ACL, pages 342--350, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  22. F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In Proceedings of the 25th ICML, pages 784--791, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. F. Raiber and O. Kurland. Ranking document clusters using markov random fields. In Proceedings of the 36th SIGIR, pages 333--342, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. J. V. Rijsbergen. Information Retrieval. 2nd edition, 1979.Google ScholarGoogle Scholar
  25. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In Proceedings of TREC, 1994.Google ScholarGoogle Scholar
  26. D. Roth and W. Yih. Integer linear programming inference for conditional random fields. In Proceedings of the 22nd ICML, pages 736--743, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Sanner, S. Guo, T. Graepel, S. Kharazmi, and S. Karimi. Diverse retrieval via greedy optimization of expected [email protected] in a latent subtopic relevance model. In Proceedings of the 20th CIKM, pages 1977--1980, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. In Proceedings of the 19th WWW, pages 881--890, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. L. T. Santos, C. Macdonald, and I. Ounis. Search result diversification. Foundations and Trends in Information Retrieval, 9(1):1--90, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proceedings of the 32nd SIGIR, pages 115--122, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Y. Wang and L. Chen. K-MEAP: multiple exemplars affinity propagation with specified K clusters. IEEE Transactions on Neural Networks and Learning Systems, PP(99):1--13, 2015.Google ScholarGoogle Scholar
  32. K. Woodsend and M. Lapata. Multiple aspect summarization using integer linear programming. In EMNLP-CoNLL2012, pages 233--243, 2012.Google ScholarGoogle Scholar
  33. L. Xia, J. Xu, Y. Lan, J. Guo, and X. Cheng. Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In Proceedings of the 38th SIGIR, pages 113--122, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Xiao, J. Wang, P. Tan, and L. Quan. Joint affinity propagation for multiple view segmentation. In Proceedings of the 11th ICCV, pages 1--7, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  35. H. Yu and F. Ren. Search result diversification via filling up multiple knapsacks. In Proceedings of the 23rd CIKM, pages 609--618, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Yuan, F. Gao, Q. Ho, W. Dai, J. Wei, X. Zheng, E. P. Xing, T. Liu, and W. Ma. LightLDA: big topic models on modest computer clusters. In Proceedings of the 24th WWW, pages 1351--1361, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Yue and T. Joachims. Predicting diverse subsets using structural SVMs. In Proceedings of the 25th ICML, pages 1224--1231, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems, 22(2):179--214, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. G. Zuccon and L. Azzopardi. Using the quantum probability ranking principle to rank interdependent documents. In Proceedings of the 32nd ECIR, pages 357--369, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. Zuccon, L. Azzopardi, D. Zhang, and J. Wang. Top-k retrieval using facility location analysis. In Proceedings of the 34th ECIR, pages 305--316, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!