ABSTRACT
This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.
- B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 296--303, 2000. Google Scholar
Digital Library
- M. Bianchini, M. Gori, and F. Scarselli. Inside PageRank. ACM Transactions on Internet Technology 5(1):92--128, 2005. Google Scholar
Digital Library
- A. Borodin, G.O. Roberts, and J.S. Rosenthal. Finding authorities and hubs from link structures on the World Wide Web. In Proc. of the 10th International World Wide Web Conference pages 415--429, 2001. Google Scholar
Digital Library
- A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Interet Technology 5(1):231--297, 2005. Google Scholar
Digital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7):107--117, 1998. Google Scholar
Digital Library
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proc. of the 22nd International Conference on Machine Learning pages 89--96, New York, NY, USA, 2005. ACM Press. Google Scholar
Digital Library
- D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the 17th International Conference on Machine Learning pages 167--174, 2000. Google Scholar
Digital Library
- N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 416--423, 2005. Google Scholar
Digital Library
- E. Garfield. Citation analysis as a tool in journal evaluation. Science 178(4060):471--479, 1972.Google Scholar
Digital Library
- Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In 1st International Workshop on Adversarial Information Retrieval on the Web 2005.Google Scholar
- Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proc.of the 30th International Conference on Very Large Databases pages 576--587, 2004. Google Scholar
Digital Library
- B.J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: a study of user queries on the web. ACM SIGIR Forum 32(1):5--17, 1998. Google Scholar
Digital Library
- K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4):422--446, 2002. Google Scholar
Digital Library
- S.D. Kamvar, T.H. Haveliwala, C.D. Manning, and G.H. Golub. Extrapolation methods for accelerating PageRank computations. In Proc. of the 12th International World Wide Web Conference pages 261--270, 2003. Google Scholar
Digital Library
- M.M. Kessler. Bibliographic coupling between scientific papers. American Documentation 14(1):10--25, 1963.Google Scholar
- J.M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms pages 668--677, 1998. Google Scholar
Digital Library
- J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604--632, 1999. Google Scholar
Digital Library
- A.N. Langville and C.D. Meyer. Deeper inside PageRank. Internet Mathematics 1(3):2005, 335--380.Google Scholar
Cross Ref
- R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA)and the TKC effect. Computer Networks and ISDN Systems 33(1-6):387--401, 2000. Google Scholar
Digital Library
- A.Y. Ng, A.X. Zheng, and M.I. Jordan. Stable algorithms for link analysis. In Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 258--266, 2001. Google Scholar
Digital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- J.A. Tomlin. A new paradigm for ranking pages on the World Wide Web. In Proc. of the 12th International World Wide Web Conference pages 350--355, 2003. Google Scholar
Digital Library
- T. Upstill, N. Craswell, and D. Hawking. Predicting fame and fortune: Pagerank or indegree? In Proc. of the Australasian Document Computing Symposium pages 31--40, 2003.Google Scholar
- H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference 2004.Google Scholar
Index Terms
Hits on the web: how does it compare?
Recommendations
Comparing the effectiveness of hits and salsa
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementThis paper compares the effectiveness of two well-known query-dependent link-based ranking algorithms, "Hyperlink-Induced Topic Search" (HITS) and the "Stochastic Approach for Link-Structure Analysis" (SALSA). The two algorithms are evaluated on a very ...
Content and link-structure perspective of ranking webpages: A review
AbstractThe delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
PageRank, HITS and Impact Factor for Journal Ranking
CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 06Journal citation measures are one of the most widely used bibliometric tools. The most well-known measure is the ISI Impact Factor, under the standard definition, the impact factor of journal j in a given year is the average number of citations received ...






Comments