ABSTRACT
This paper proposes a new method for computing page importance, referred to as BrowseRank. The conventional approach to compute page importance is to exploit the link graph of the web and to build a model based on that graph. For instance, PageRank is such an algorithm, which employs a discrete-time Markov process as the model. Unfortunately, the link graph might be incomplete and inaccurate with respect to data for determining page importance, because links can be easily added and deleted by web content creators. In this paper, we propose computing page importance by using a 'user browsing graph' created from user behavior data. In this graph, vertices represent pages and directed edges represent transitions between pages in the users' web browsing history. Furthermore, the lengths of staying time spent on the pages by users are also included. The user browsing graph is more reliable than the link graph for inferring page importance. This paper further proposes using the continuous-time Markov process on the user browsing graph as a model and computing the stationary probability distribution of the process as page importance. An efficient algorithm for this computation has also been devised. In this way, we can leverage hundreds of millions of users' implicit voting on page importance. Experimental results show that BrowseRank indeed outperforms the baseline methods such as PageRank and TrustRank in several tasks.
- B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In SIGIR ' 00. ACM, 2000. Google Scholar
Digital Library
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, May 1999. Google Scholar
Digital Library
- M. Bianchini, M. Gori, and F. Scarselli. Inside pagerank. ACM Trans. Interet Technol., 5(1):92--128, 2005. Google Scholar
Digital Library
- P. Boldi, M. Santini, and S. Vigna. Pagerank as a function of the damping factor. In WWW ' 05. ACM, 2005. Google Scholar
Digital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107--117, 1998. Google Scholar
Digital Library
- G. H. Golub and C. F. V. Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. Google Scholar
Digital Library
- Z. Gyongyi and H. Garcia-Molina. Web spam taxonomy, 2005.Google Scholar
- Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with trustrank. In VLDB '04, pages 576--587. VLDB Endowment, 2004. Google Scholar
Digital Library
- T. Haveliwala. Efficient computation of pageRank. Technical Report 1999-31, 1999.Google Scholar
- T. Haveliwala and S. Kamvar. The second eigenvalue of the google matrix, 2003.Google Scholar
- T. Haveliwala, S. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing pagerank, 2003.Google Scholar
- T. H. Haveliwala. Topic-sensitive pagerank. In WWW ' 02, Honolulu, Hawaii, May 2002. Google Scholar
Digital Library
- K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR '00, pages 41--48, New York, NY, USA, 2000. ACM. Google Scholar
Digital Library
- K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002. Google Scholar
Digital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA '98, pages 668--677, Philadelphia, PA, USA, 1998. Society for Industrial and Applied Mathematics. Google Scholar
Digital Library
- A. N. Langville and C. D. Meyer. Deeper inside pagerank. Internet Mathematics, 1(3):335--400, 2004.Google Scholar
- F. McSherry. A uniform approach to accelerated pagerank computation. In WWW '05, pages 575--582, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- J. A. Rice. Mathematical Statistics and Data Analysis (2nd ed.). Duxbery Press, 1995.Google Scholar
- M. Richardson and P. Domingos. The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.Google Scholar
- S. E. Robertson. Overview of okapi projects. Journal of Documentatioin, 53(1):3--7, 1997.Google Scholar
Cross Ref
- W. J. Stewart. Introduction to the Numerical Solution of Markov Chains. Princeton University Press, Princeton, N,J., 1994.Google Scholar
- Z. K. Wang and X. Q. Yang. Birth and Death Processes and Markov Chains. Springer-Verlag, New York, 1992.Google Scholar
- R. W. White, M. Bilenko, and S. Cucerzan. Studying the use of popular destinations to enhance web search interaction. In SIGIR '07, pages 159--166, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
Index Terms
BrowseRank: letting web users vote for page importance
Recommendations
Comparison of two algorithms for computing page importance
AAIM'10: Proceedings of the 6th international conference on Algorithmic aspects in information and managementIn this paper we discuss the relation and the difference between two algorithms BrowseRank and PageRank. We analyze their stationary distributions by the ergodic theory of Markov processes. We compare in detail the link graph used in PageRank and the ...
Web Spam Identification with User Browsing Graph
AIRS '09: Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval TechnologyCombating Web spam has become one of the top challenges for Web search engines. Most previous researches in link-based Web spam identification focus on exploiting hyperlink graphs and corresponding user-behavior models. However, the fact that hyperlinks ...
Signless Laplacian eigenvalues and circumference of graphs
In this paper, we investigate the relation between the Q-spectrum and the structure of G in terms of the circumference of G. Exploiting this relation, we give a novel necessary condition for a graph to be Hamiltonian by means of its Q-spectrum. We also ...






Comments