ABSTRACT
The phenomenal growth in the volume of easily accessible information via various web-based services has made it essential for service providers to provide users with personalized representative summaries of such information. Further, online commercial services including social networking and micro-blogging websites, e-commerce portals, leisure and entertainment websites, etc. recommend interesting content to users that is simultaneously diverse on many different axes such as topic, geographic specificity, etc. The key algorithmic question in all these applications is the generation of a succinct, representative, and relevant summary from a large stream of data coming from a variety of sources. In this paper, we formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm. We analyze the algorithm using theoretical techniques to show that it always produces a nearly optimal solution. In addition, we perform large-scale experiments on both real-world and synthetically generated datasets, which confirm that our algorithm performs even better than its analytical guarantees in practice, and also outperforms other candidate algorithms for the problem by a wide margin.
Supplemental Material
- Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google Scholar
Digital Library
- Nikhil Bansal and Maxim Sviridenko. The Santa Claus problem. In STOC, pages 31--40, 2006. Google Scholar
Digital Library
- Jaime G. Carbonell and Jade Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google Scholar
Digital Library
- Nikhil R. Devanur, Kamal Jain, Balasubramanian Sivan, and Christopher A. Wilkens. Near optimal online algorithms and fast approximation algorithms for resource allocation problems. In ACM Conference on Electronic Commerce, pages 29--38, 2011. Google Scholar
Digital Library
- Marina Drosou and Evaggelia Pitoura. Diversity over continuous data. IEEE Data Eng. Bull., 32(4):49--56, 2009.Google Scholar
- Marina Drosou and Evaggelia Pitoura. Search result diversification. SIGMOD Record, 39(1):41--47, 2010. Google Scholar
Digital Library
- Khalid El-Arini, Gaurav Veda, Dafna Shahaf, and Carlos Guestrin. Turning down the noise in the blogosphere. In KDD, pages 289--298, 2009. Google Scholar
Digital Library
- Sreenivas Gollapudi and Aneesh Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google Scholar
Digital Library
- Sean M. McNee, John Riedl, and Joseph A. Konstan. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI Extended Abstracts, pages 1097--1101, 2006. Google Scholar
Digital Library
- R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1997. Google Scholar
Digital Library
- Filip Radlinski, Paul N. Bennett, Ben Carterette, and Thorsten Joachims. Redundancy, diversity and interdependent document relevance. SIGIR Forum, 43(2):46--52, 2009. Google Scholar
Digital Library
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Exploiting query reformulations for web search result diversification. In WWW, pages 881--890, 2010. Google Scholar
Digital Library
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Selectively diversifying web search results. In CIKM, pages 1179--1188, 2010. Google Scholar
Digital Library
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. How diverse are web search results? In SIGIR, pages 1187--1188, 2011. Google Scholar
Digital Library
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. Intent-aware search result diversification. In SIGIR, pages 595--604, 2011. Google Scholar
Digital Library
- Aleksandrs Slivkins, Filip Radlinski, and Sreenivas Gollapudi. Learning optimally diverse rankings over large document collections. In ICML, pages 983--990, 2010.Google Scholar
- Erik Vee, Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, and Sihem Amer-Yahia. Efficient computation of diverse query results. In ICDE, 2008. Google Scholar
Digital Library
- Cong Yu, Laks V. S. Lakshmanan, and Sihem Amer-Yahia. It takes variety to make a world: diversification in recommender systems. In EDBT, pages 368--378, 2009. Google Scholar
Digital Library
- Cong Yu, Laks V. S. Lakshmanan, and Sihem Amer-Yahia. Recommendation diversification using explanations. In ICDE, pages 1299--1302, 2009. Google Scholar
Digital Library
- ChengXiang Zhai and John D. Lafferty. A risk minimization framework for information retrieval. Inf. Process. Manage., 42(1):31--55, 2006. Google Scholar
Digital Library
- Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, and Georg Lausen. Improving recommendation lists through topic diversification. In WWW, pages 22--32, 2005. Google Scholar
Digital Library
Index Terms
Online selection of diverse results
Recommendations
Online change-point detection with kernels
Highlights- Nonparametric change detection.
- Computationally efficient online change-point ...
AbstractChange-points in time series data are usually defined as the time instants at which changes in their properties occur. Detecting change-points is critical in a number of applications as diverse as detecting credit card and insurance ...
Efficient approximation algorithms for offline and online unit disk multiple coverage
AbstractMultiple coverage with unit disks is a problem widely seen in monitoring applications of wireless sensor networks. In this problem, let T = t 1 , t 2 , … , t n be a set of targets to be covered (or monitored) which are distributed on ...
Graphical abstractDisplay Omitted
Highlights- Propose a 5-approximation algorithm for offline UDMC with linear time and space complexity.
Online privacy; replicating research results
The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.
twitter
Follow us on Twitter at http://twitter.com/blogCACM
http://...





Comments