skip to main content
10.1145/1277741.1277823acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Hits on the web: how does it compare?

Published:23 July 2007Publication History

ABSTRACT

This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.

References

  1. B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 296--303, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Bianchini, M. Gori, and F. Scarselli. Inside PageRank. ACM Transactions on Internet Technology 5(1):92--128, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Borodin, G.O. Roberts, and J.S. Rosenthal. Finding authorities and hubs from link structures on the World Wide Web. In Proc. of the 10th International World Wide Web Conference pages 415--429, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Interet Technology 5(1):231--297, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proc. of the 22nd International Conference on Machine Learning pages 89--96, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the 17th International Conference on Machine Learning pages 167--174, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 416--423, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Garfield. Citation analysis as a tool in journal evaluation. Science 178(4060):471--479, 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In 1st International Workshop on Adversarial Information Retrieval on the Web 2005.Google ScholarGoogle Scholar
  11. Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proc.of the 30th International Conference on Very Large Databases pages 576--587, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B.J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: a study of user queries on the web. ACM SIGIR Forum 32(1):5--17, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4):422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S.D. Kamvar, T.H. Haveliwala, C.D. Manning, and G.H. Golub. Extrapolation methods for accelerating PageRank computations. In Proc. of the 12th International World Wide Web Conference pages 261--270, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M.M. Kessler. Bibliographic coupling between scientific papers. American Documentation 14(1):10--25, 1963.Google ScholarGoogle Scholar
  16. J.M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms pages 668--677, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604--632, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A.N. Langville and C.D. Meyer. Deeper inside PageRank. Internet Mathematics 1(3):2005, 335--380.Google ScholarGoogle ScholarCross RefCross Ref
  19. R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA)and the TKC effect. Computer Networks and ISDN Systems 33(1-6):387--401, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A.Y. Ng, A.X. Zheng, and M.I. Jordan. Stable algorithms for link analysis. In Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 258--266, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google ScholarGoogle Scholar
  22. J.A. Tomlin. A new paradigm for ranking pages on the World Wide Web. In Proc. of the 12th International World Wide Web Conference pages 350--355, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Upstill, N. Craswell, and D. Hawking. Predicting fame and fortune: Pagerank or indegree? In Proc. of the Australasian Document Computing Symposium pages 31--40, 2003.Google ScholarGoogle Scholar
  24. H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference 2004.Google ScholarGoogle Scholar

Index Terms

  1. Hits on the web: how does it compare?

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
          July 2007
          946 pages
          ISBN:9781595935977
          DOI:10.1145/1277741

          Copyright © 2007 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 July 2007

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!