ABSTRACT
Personalized PageRank expresses link-based page quality around user selected pages. The only previous personalized PageRank algorithm that can serve on-line queries for an unrestricted choice of pages on large graphs is our Monte Carlo algorithm [WAW 2004]. In this paper we achieve unrestricted personalization by combining rounding and randomized sketching techniques in the dynamic programming algorithm of Jeh and Widom [WWW 2003]. We evaluate the precision of approximation experimentally on large scale real-world data and find significant improvement over previous results. As a key theoretical contribution we show that our algorithms use an optimal amount of space by also improving earlier asymptotic worst-case lower bounds. Our lower bounds and algorithms apply to the SimRank as well; of independent interest is the reduction of the SimRank computation to personalized PageRank.
- Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Sampling algorithms: Lower bounds and applications. Proc of 33rd STOC, 2001. Google Scholar
Digital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, 1970. Google Scholar
Digital Library
- P. Boldi and S. Vigna. The webgraph framework I: Compression techniques. Proc of 13th WWW, pp. 595--602, 2004. Google Scholar
Digital Library
- A. Z. Broder and M. Mitzenmacher. Network applications of Bloom filters: A survey. Internet Mathematics, 1(4):485--509, 2005.Google Scholar
Cross Ref
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. Proc of 29th ICALP, pp. 693--703, 2002. Google Scholar
Digital Library
- Y.-Y. Chen, Q. Gan, and T. Suel. Local methods for estimating PageRank values. Proc of 12th CIKM, pp. 381--389, 2004. Google Scholar
Digital Library
- G. Cormode and S. Muthukrishnan. An improved data stream summary: The Count-Min sketch and its applications. Journal of Algorithms, 55(1):58--75, 2005. Google Scholar
Digital Library
- G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. Proc of 5th SIAM Intl. Conf. on Data Mining, 2005.Google Scholar
Cross Ref
- R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee. Comparing and aggregating rankings with ties. Proc of 23rd PODS, 2004. Google Scholar
Digital Library
- D. Fogaras. Where to start browsing the web? Proc of 3rd I2CS, Springer LNCS vol. 2877, pp. 65--79, 2003.Google Scholar
- D. Fogaras and B. Rácz. Towards scaling fully personalized PageRank. Proc of 3rd WAW, pp. 105--117, 2004. Full version to appear in Internet Mathematics.Google Scholar
Cross Ref
- D. Fogaras and B. Rácz. Scaling link-based similarity search. Proc of 14th WWW, pp. 641--650, 2005. Full version available at www.ilab.sztaki.hu/websearch/Publications/. Google Scholar
Digital Library
- T. H. Haveliwala. Topic-sensitive PageRank: A context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering, 15(4):784--796, 2003. Google Scholar
Digital Library
- M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. In External Memory Algorithms, DIMACS Book Series vol. 50., pp. 107--118. American Mathematical Society, 1999. Google Scholar
Digital Library
- J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: A repository of web pages. Proc of 9th WWW, pp. 277--293, 2000. Google Scholar
Digital Library
- G. Jeh and J. Widom. SimRank: A measure of structural-context similarity. Pro of 8th SIGKDD, pp. 538--543, 2002. Google Scholar
Digital Library
- G. Jeh and J. Widom. Scaling personalized web search. Proc of 12th WWW, pp. 271--279, 2003. Google Scholar
Digital Library
- S. Kamvar, T. H. Haveliwala, C. Manning, and G. Golub. Exploiting the block structure of the web for computing PageRank. Technical Report 2003-17, Stanford University, 2003.Google Scholar
- M. G. Kendall. Rank Correlation Methods. Hafner Publishing Co., New York, 1955.Google Scholar
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google Scholar
Digital Library
- E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997. Google Scholar
Digital Library
- F. McSherry. A uniform approach to accelerated PageRank computation. Proc of 14th WWW, pp. 575--582, 2005. Google Scholar
Digital Library
- S. Muthukrishnan. Data streams: Algorithms and applications. Foundations and Trends in Theoretical Comp. Sci., 1(2), 2005. Google Scholar
Digital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford University, 1998.Google Scholar
- C. R. Palmer, P. B. Gibbons, and C. Faloutsos. ANF: A fast and scalable tool for data mining in massive graphs. Proc of 8th SIGKDD, pp. 81--90, 2002. Google Scholar
Digital Library
- P. K. C. Singitham, M. S. Mahabhashyam, and P. Raghavan. Efficiency-quality tradeoffs for vector score aggregation. Proc of 30th VLDB, pp. 624--635, 2004. Google Scholar
Digital Library
- J. S. Vitter. External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, 33(2):209--271, 2001. Google Scholar
Digital Library
Index Terms
To randomize or not to randomize: space optimal summaries for hyperlink analysis
Recommendations
A near-optimal algorithm for estimating the entropy of a stream
We describe a simple algorithm for approximating the empirical entropy of a stream of m values up to a multiplicative factor of (1+ϵ) using a single pass, O(ϵ−2 log (δ−1) log m) words of space, and O(log ϵ−1 + log log δ−1 + log log m) processing time ...
A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques
A new method is developed for recommender systems.The recommender system is developed based on collaborative filtering.Scalability and sparsity issues in recommender systems are solved.MovieLens and Yahoo! Webscope R4 datasets are used for method ...
Trackback-Rank: An Effective Ranking Algorithm for the Blog Search
IITA '08: Proceedings of the 2008 Second International Symposium on Intelligent Information Technology Application - Volume 03Today, most web pages are being created in the blog space or evolving into the blog space. A major problem is that a blog entry (blog page) includes non-traditional features of Web pages. Those are trackback links, bloggers' authority, tags, and users’ ...





Comments