ABSTRACT
Result diversification has many important applications in databases, operations research, information retrieval, and finance. In this paper, we study and extend a particular version of result diversification, known as max-sum diversification. More specifically, we consider the setting where we are given a set of elements in a metric space and a set valuation function f defined on every subset. For any given subset S, the overall objective is a linear combination of f(S) and the sum of the distances induced by S. The goal is to find a subset S satisfying some constraints that maximizes the overall objective.
This problem is first studied by Gollapudi and Sharma in [17] for modular set functions and for sets satisfying a cardinality constraint (uniform matroids). In their paper, they give a 2-approximation algorithm by reducing to an earlier result in [20]. The first part of this paper considers an extension of the modular case to the monotone submodular case, for which the algorithm in [17] no longer applies. Interestingly, we are able to maintain the same 2-approximation using a natural, but different greedy algorithm. We then further extend the problem by considering any matroid constraint and show that a natural single swap local search algorithm provides a 2-approximation in this more general setting. This extends the Nemhauser, Wolsey and Fisher approximation result [20] for the problem of submodular function maximization subject to a matroid constraint (without the distance function component).
The second part of the paper focuses on dynamic updates for the modular case. Suppose we have a good initial approximate solution and then there is a single weight-perturbation either on the valuation of an element or on the distance between two elements. Given that users expect some stability in the results they see, we ask how easy is it to maintain a good approximation without significantly changing the initial set. We measure this by the number of updates, where each update is a swap of a single element in the current solution with a single element outside the current solution. We show that we can maintain an approximation ratio of 3 by just a single update if the perturbation is not too large.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google Scholar
Digital Library
- N. Bansal, K. Jain, A. Kazeykina, and J. Naor. Approximation algorithms for diversified search ranking. In ICALP (2), pages 273--284, 2010. Google Scholar
Digital Library
- C. Brandt, T. Joachims, Y. Yue, and J. Bank. Dynamic ranked retrieval. In WSDM, pages 247--256, 2011. Google Scholar
Digital Library
- R. A. Brualdi. Comments on bases in dependence structures. Bulletin of the Australian Mathematical Society, 1(02):161--167, 1969.Google Scholar
Cross Ref
- G. Călinescu, C. Chekuri, M. Pál, and J. Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740--1766, 2011. Google Scholar
Digital Library
- J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '98, pages 335--336, New York, NY, USA, 1998. ACM. Google Scholar
Digital Library
- B. Chandra and M. M. Halldórsson. Facility dispersion and remote subgraphs. In Proceedings of the 5th Scandinavian Workshop on Algorithm Theory, pages 53--65, London, UK, 1996. Springer-Verlag. Google Scholar
Digital Library
- B. Chandra and M. M. Halldórsson. Approximation algorithms for dispersion problems. J. Algorithms, 38(2):438--465, 2001. Google Scholar
Digital Library
- H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google Scholar
Digital Library
- E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. Divq: diversification for keyword search over structured databases. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 331--338. ACM, 2010. Google Scholar
Digital Library
- Z. Dou, S. Hu, K. Chen, R. Song, and J.-R. Wen. Multi-dimensional search result diversification. In WSDM, pages 475--484, 2011. Google Scholar
Digital Library
- M. Drosou and E. Pitoura. Diversity over continuous data. IEEE Data Eng. Bull., 32(4):49--56, 2009.Google Scholar
- M. Drosou and E. Pitoura. Search result diversification. SIGMOD Record, 39(1):41--47, 2010. Google Scholar
Digital Library
- J. Edmonds. Matroids and the greedy algorithm. Mathematical Programming, 1:127--136, 1971.Google Scholar
Digital Library
- E. Erkut. The discrete p-dispersion problem. European Journal of Operational Research, 46(1):48--60, May 1990.Google Scholar
- E. Erkut and S. Neuman. Analytical models for locating undesirable facilities. European Journal of Operational Research, 40(3):275--291, June 1989.Google Scholar
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In World Wide Web Conference Series, pages 381--390, 2009. Google Scholar
Digital Library
- M. M. Halldórsson, K. Iwano, N. Katoh, and T. Tokuyama. Finding subsets maximizing minimum structures. In Symposium on Discrete Algorithms, pages 150--159, 1995. Google Scholar
Digital Library
- P. Hansen and I. D. Moon. Dispersion facilities on a network. Presentation at the TIMS/ORSA Joint National Meeting, Washington, D.C., 1988.Google Scholar
- R. Hassin, S. Rubinstein, and A. Tamir. Approximation algorithms for maximum dispersion. Oper. Res. Lett., 21(3):133--137, 1997. Google Scholar
Digital Library
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '03, pages 137--146, 2003. Google Scholar
Digital Library
- S. Khanna, R. Motwani, M. Sudan, and U. V. Vazirani. On syntactic versus computational views of approximability. Electronic Colloquium on Computational Complexity (ECCC), 2(23), 1995. Google Scholar
Digital Library
- M. J. Kuby. Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geographical Analysis, 19(4):315--329, 1987.Google Scholar
Cross Ref
- H. Lin and J. Bilmes. Multi-document summarization via budgeted maximization of submodular functions. In HLT-NAACL, pages 912--920, 2010. Google Scholar
Digital Library
- H. Lin and J. Bilmes. A class of submodular functions for document summarization. In North American chapter of the Association for Computational Linguistics/Human Language Technology Conference (NAACL/HLT-2011), Portland, OR, June 2011. (long paper). Google Scholar
Digital Library
- H. Lin, J. Bilmes, and S. Xie. Graph-based submodular selection for extractive summarization. In Proc. IEEE Automatic Speech Recognition and Understanding (ASRU), Merano, Italy, December 2009.Google Scholar
Cross Ref
- Z. Liu, P. Sun, and Y. Chen. Structured search result differentiation. PVLDB, 2(1):313--324, 2009. Google Scholar
Digital Library
- E. Minack, W. Siberski, and W. Nejdl. Incremental diversification for very large sets: a streaming-based approach. In SIGIR, pages 585--594, 2011. Google Scholar
Digital Library
- G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Programming, 1978.Google Scholar
- F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML, pages 784--791, 2008. Google Scholar
Digital Library
- R. Rado. A note on independence functios. Proceedings of the London Mathematical Society, 7:300--320, 1957.Google Scholar
Cross Ref
- D. Rafiei, K. Bharat, and A. Shukla. Diversifying web search results. In WWW, pages 781--790, 2010. Google Scholar
Digital Library
- S. S. Ravi, D. J. Rosenkrantz, and G. K. Tayi. Heuristic and special case algorithms for dispersion problems. Operations Research, 42(2):299--310, March-April 1994.Google Scholar
Digital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Intent-aware search result diversification. In SIGIR, pages 595--604, 2011. Google Scholar
Digital Library
- A. Schrijver. Combinatorial Optimization: Polyhedra and Efficiency. Springer, 2003.Google Scholar
- A. Slivkins, F. Radlinski, and S. Gollapudi. Learning optimally diverse rankings over large document collections. In ICML, pages 983--990, 2010.Google Scholar
- M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. T. Jr., and V. J. Tsotras. Divdb: A system for diversifying query results. PVLDB, 4(12):1395--1398, 2011.Google Scholar
- M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, C. T. Jr., and V. J. Tsotras. On query result diversification. In ICDE, pages 1163--1174, 2011. Google Scholar
Digital Library
- D. W. Wang and Y.-S. Kuo. A study on two geometric location problems. Inf. Process. Lett., 28:281--286, August 1988. Google Scholar
Digital Library
- C. Yu, L. Lakshmanan, and S. Amer-Yahia. It takes variety to make a world: diversification in recommender systems. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '09, pages 368--378, 2009. Google Scholar
Digital Library
- Y. Yue and T. Joachims. Predicting diverse subsets using structural svms. In ICML, pages 1224--1231, 2008. Google Scholar
Digital Library
- C. Zhai, W. W. Cohen, and J. D. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google Scholar
Digital Library
- F. Zhao, X. Zhang, A. K. H. Tung, and G. Chen. Broad: Diversified keyword search in databases. PVLDB, 4(12):1355--1358, 2011.Google Scholar
- X. Zhu, A. B. Goldberg, J. V. Gael, and D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In HLT-NAACL, pages 97--104, 2007.Google Scholar
Index Terms
Max-Sum diversification, monotone submodular functions and dynamic updates
Recommendations
Max-Sum Diversification, Monotone Submodular Functions, and Dynamic Updates
Result diversification is an important aspect in web-based search, document summarization, facility location, portfolio management, and other applications. Given a set of ranked results for a set of objects (e.g., web documents, facilities, etc.) with a ...
Non-monotone submodular maximization under matroid and knapsack constraints
STOC '09: Proceedings of the forty-first annual ACM symposium on Theory of computingSubmodular function maximization is a central problem in combinatorial optimization, generalizing many important problems including Max Cut in directed/undirected graphs and in hypergraphs, certain constraint satisfaction problems, maximum entropy ...
Best Algorithms for Approximating the Maximum of a Submodular Set Function
A real-valued function z whose domain is all of the subsets of N = {1,..., n is said to be submodular if zS + zT ≥ zS ∪ T + zS ∩ T, ∀S, T ⊆ N, and nondecreasing if zS ≤ zT, ∀S ⊂ T ⊆ N. We consider the problem maxS⊂N {zS: |S| ≤ K, z submodular and ...






Comments