skip to main content
10.1145/1135777.1135823acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
Article

To randomize or not to randomize: space optimal summaries for hyperlink analysis

Authors Info & Claims
Published:23 May 2006Publication History

ABSTRACT

Personalized PageRank expresses link-based page quality around user selected pages. The only previous personalized PageRank algorithm that can serve on-line queries for an unrestricted choice of pages on large graphs is our Monte Carlo algorithm [WAW 2004]. In this paper we achieve unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom [WWW 2003]. We evaluate the precision of approximation experimentally on large scale real-world data and find significant improvement over previous results. As a key theoretical contribution we show that our algorithms use an optimal amount of space by also improving earlier asymptotic worst-case lower bounds. Our lower bounds and algorithms apply to the SimRank as well; of independent interest is the reduction of the SimRank computation to personalized PageRank.

References

  1. Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: Lower bounds and applications. Proc of 33rd STOC, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Boldi and S. Vigna. The webgraph framework I: Compression techniques. Proc of 13th WWW, pp. 595--602, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Z. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Proc of 29th ICALP, pp. 693--703, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating PageRank values. Proc of 12th CIKM, pp. 381--389, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cormode and S. Muthukrishnan. An improved data stream summary: The Count-Min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. Proc of 5th SIAM Intl. Conf. on Data Mining, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee. Comparing and aggregating rankings with ties. Proc of 23rd PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Fogaras. Where to start browsing the web? Proc of 3rd I2CS, Springer LNCS vol. 2877, pp. 65--79, 2003.Google ScholarGoogle Scholar
  11. D. Fogaras and B. Rácz. Towards scaling fully personalized PageRank. Proc of 3rd WAW, pp. 105--117, 2004. Full version to appear in Internet Mathematics.Google ScholarGoogle ScholarCross RefCross Ref
  12. D. Fogaras and B. Rácz. Scaling link-based similarity search. Proc of 14th WWW, pp. 641--650, 2005. Full version available at www.ilab.sztaki.hu/websearch/Publications/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. H. Haveliwala. Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4):784--796, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External Memory Algorithms, DIMACS Book Series vol. 50., pp. 107--118. American Mathematical Society, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: A repository of web pages. Proc of 9th WWW, pp. 277--293, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. Pro of 8th SIGKDD, pp. 538--543, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. Jeh and J. Widom. Scaling personalized web search. Proc of 12th WWW, pp. 271--279, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kamvar, T. H. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing PageRank. Technical Report 2003-17, Stanford University, 2003.Google ScholarGoogle Scholar
  19. M. G. Kendall. Rank Correlation Methods. Hafner Publishing Co., New York, 1955.Google ScholarGoogle Scholar
  20. J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. McSherry. A uniform approach to accelerated PageRank computation. Proc of 14th WWW, pp. 575--582, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Comp. Sci., 1(2), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, 1998.Google ScholarGoogle Scholar
  25. C. R. Palmer, P. B. Gibbons, and C. Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. Proc of 8th SIGKDD, pp. 81--90, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. Proc of 30th VLDB, pp. 624--635, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. To randomize or not to randomize: space optimal summaries for hyperlink analysis

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!