article

Extracting macroscopic information from Web links

Online:01 November 2001Publication History

Abstract

Much has been written about the potential and pitfalls of macroscopic Web-based link analysis, yet there have been no studies that have provided clear statistical evidence that any of the proposed calculations can produce results over large areas of the Web that correlate with phenomena external to the Internet. This article attempts to provide such evidence through an evaluation of Ingwersen's ([1998]) proposed external Web Impact Factor (WIF) for the original use of the Web: the interlinking of academic research. In particular, it studies the case of the relationship between academic hyperlinks and research activity for universities in Britain, a country chosen for its variety of institutions and the existence of an official government rating exercise for research. After reviewing the numerous reasons why link counts may be unreliable, it demonstrates that four different WIFs do, in fact, correlate with the conventional academic research measures. The WIF delivering the greatest correlation with research rankings was the ratio of Web pages with links pointing at research-based pages to faculty numbers. The scarcity of links to electronic academic papers in the data set suggests that, in contrast to citation analysis, this WIF is measuring the reputations of universities and their scholars, rather than the quality of their publications.

References

  1. Almind, T.C., & Ingwersen, P. (1998). Informetric analyses on the end wide web: Methodological approaches to "wehomeirics '' Journal of Documentation. 53(4).404-426.Google ScholarGoogle ScholarCross RefCross Ref
  2. Amento, B., Hil, W., Terveen, L., Hix. D., & Ju, P. (1999). An empirical evaluation of user interfaces for topic management of web sites CHI 99 Conference Proceedings (pp. 552-559). New York: Addison Wesley. Google ScholarGoogle Scholar
  3. Anderson, A. (1991) No citation analyses please, we're British. Science. 252. 639Google ScholarGoogle Scholar
  4. Bar-Ilan, J. 11999). Search engine results over time-A case study on search engine stability. Cybermetrics. 213. Available http://wwss.cindoc csic.es/cybermetrics/articles/v2iIpl.htmlGoogle ScholarGoogle Scholar
  5. Bar-Ilan, J. (2000). The Web as an information source on Informetrics A content analysis. Journal of the American Society for Information Science. 51(5). 432-443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ban-Ilan, J. (2001). Data collection methods on the Web for informetnc purposes-A review and analysis. Scientometrics. 50( I ), 7-32.Google ScholarGoogle Scholar
  7. Berners-Lee, T. & Hendler, J. (2001). Scientific publishing on the "semantic web." Nature, 410, 1023-1024,Google ScholarGoogle ScholarCross RefCross Ref
  8. Biddle, J. (1996). A citation analysis of the sources and extent of Wesley Mitchell's reputation. History of Political Economy. 28(2). 37-169Google ScholarGoogle ScholarCross RefCross Ref
  9. Bjrneborn, L., & lngwersen, P. (2001). Perspectives of webometrics Scientometrics. 50(1). 65-82.Google ScholarGoogle Scholar
  10. Borgman, C.L. (20000). Digital libraries and the continuum of scholarly communication Journal of Documentation, 56(4). 412-430.Google ScholarGoogle Scholar
  11. Brin, S., & Page, L. (1998). The anatomy of a large scale hypertextual web search engine. Computer Networks and ISDN Systems. 30 (1-7).117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Broder, A., Kumar, R., Maghoull, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins. A., & Wiener, J. (2(8)0). Graph structure in the ss oh Computer Networks. 33(1-6). 309-320. Google ScholarGoogle Scholar
  13. Case, D.O., & Higgins, G.M. (2000). How can we investigate citation behaviour? A study of reasons for citing literature in communication Journal of the American Society for Information Science. 51(7), 635-645. Google ScholarGoogle Scholar
  14. Chakrabarti, S., Dom. B., Gibson, D., Kleinberg, J., Kumar, S.R., Raghavan, P., Rajagopalan, S., & Tomkins, A. (1999). Hypersearching the web, Scientific American, June, 54-60.Google ScholarGoogle Scholar
  15. Chen, C. (1997). Structuring and visualising the World-Wide Web with generalised similarity analysis, Proceedings of the 8th ACM conference on hypertext (Hypertext '97). April, 1997. Southampton. UK Available, http://www.brunet.ac.uk/~ cssrccc2/papers/ht97.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chen, C., Newman, J., Newman, R., & Rada, R. (t998). How did university departments interweave the Web: A study of connectivity and underlying factors. Interacting with Computers. 1O. 353-373Google ScholarGoogle Scholar
  17. Choo, J., & Garcia-Molina, H. (2000). The evolution of the Web and implications for an incremental crawler. Proceedings of the 26th VLDB Conference, Cairo. Egypt (pp. 200-209). Google ScholarGoogle Scholar
  18. Cronin, B. (2001a) Bibliometrics and beyond: Some thoughts on webbased citation analysts. Journal of Information Science, 27(1). -7Google ScholarGoogle ScholarCross RefCross Ref
  19. Cronin, B. (2001b). Hyperauthorship: A postmodern perversion or evidence of a structural shift in scholarly communication practices? Journal of the American Society for Information Science & Technology, 52(7). Google ScholarGoogle ScholarCross RefCross Ref
  20. Cronin, B., & McKim, G. (1996). Science and scholarship on the worId wide web: A North American perspective. Journal of Documentation, 52(2). 163-171.Google ScholarGoogle ScholarCross RefCross Ref
  21. Cronin, B., Snyder, H.W., Rosenbaum, H., Martinson, A., & Callahan, E. (1998). Invoked on the web. Journal of the American Society Information Science, 49(14),1319-1328 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Davenport, E., & Cronin, B. (2000). The citation network as a prototype for representing trust in virtual environments. In: B. Cronin & H.B. Atkins Eds. The web of knowledge. A festschrift in honor of Eugene Garfield (pp. 517-534). Metford, NJ Information Today Inc. ASIS Monograph SeriesGoogle ScholarGoogle Scholar
  23. Davenport, E.,& Contain, B. (in press). Who dunnit? Metatags and hyperauthorship Journal of the American Society for Information Science & Technology. Google ScholarGoogle Scholar
  24. Egghe, L. (2000) New informetric aspects of the Internet: some reflections-Many problems Journal of Information Science. 26(5), 329-335.Google ScholarGoogle Scholar
  25. Elkin, J., & Law. D. (1997) The 1996 Research Assessment Exercise: The Library and Information Management Panel. Journal of Librarianship and Information Science. 29(3), 131141.Google ScholarGoogle ScholarCross RefCross Ref
  26. Europa, (20001). Fifth framework programme. Available: http://europa.eu.int/cornm./research/fp5.himl. Accessed 16 February, 2001.Google ScholarGoogle Scholar
  27. Fosmire, M., & Yu, S. (2004)). Free scholarly electronic journals: How good are they Issues in Science and Technology Librananship. Summer 2000. As ailahle http://www.library.ucsb.edu/istl/00-summer/refereed.html.Google ScholarGoogle Scholar
  28. Garfield, E. (1979). Citation indexing: Its theory and applications in science, technology and the humanities. New York. Wiley lnterscienceGoogle ScholarGoogle Scholar
  29. Garfield, E. (1994) The impact factor. Current Contents. June 20 Available: http://www.isinet.com/isi/hot/essays/journalcitationreports/7.html.Google ScholarGoogle Scholar
  30. Gibson, D., Kleinberg, J., & Raghavan, P. (1998). Inferring web communities from link topology. Hypertext 98: Ninth ACM Conference on Hypertext and Hypermedia. New York: ACM. Google ScholarGoogle Scholar
  31. Gowrishankar, J., Divakar, P., Baylis, M., Gravenor, M., & Kao, R. (1999). Sprucing up one's Impact Factor (two letters to the editor) Nature. 401. 321-322,Google ScholarGoogle Scholar
  32. Haas, S.W., & Grams, E.S. (2000). Readers, authors and page structure: A discussion of four questions arising from a content analysis of web pages. Journal of the American Society for Information Science, 51(2), 18 1-192, Google ScholarGoogle ScholarCross RefCross Ref
  33. Harnad, S., & Carr, L. (2000). Integrating, navigating, and analysing open eprint archives through open citation linking (the OpCit project). Current Science. 79(5). 629-638.Google ScholarGoogle Scholar
  34. Harter, S.P., & Ford, C.E. (2000). Web-based analyses of c-journal impact: Approaches, problems. and issues. Journal of the American Society for Information Science. 51(13). 11591176. Google ScholarGoogle ScholarCross RefCross Ref
  35. Harter, S.P., & Taemin, K.P. (2000) Impact of prior electronic publication on manuscript consideration policies of scholarly journals. Journal of the American Society for Information Science. 51 (10). 940-948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. HEFCE (1998) An introduction to the work of the Higher Education Funding Council for England Available hitp://www.hefce.ac.uPuhs/HEFCE/l998/98_16 him.Google ScholarGoogle Scholar
  37. Hernndez-Borges, A.A., Macas-Cervi, P., Gaspar-Guardado, M.A., Torres-lvarez de Arcaya, M.L., Ruiz-Rabaza, A., & Jimnez,-Sosa, A. (1999) Can examination of WWW usage statistics and other indirect quality indicators distinguish the relative quality of medical web sites? Journal of Medical Internet Research. 1(I). Available: hitp://www.jmir.org/l999/l/el/index.htmGoogle ScholarGoogle Scholar
  38. Heydon, A., & Najork, M. (1999). Mercator: A scalable, extensible Web crawler, World Wide Web. 2 219-229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Holmes, A., & Oppenheim, C. (2001). Use of citation analysis 10 predict the outcome of the 2001 RAE for Unit of Assessment 61: Library and Information Management. Information Research, 6(2). Available: http:/www shef.ac.uk/-is/poblications/infresl6-2/paperlO3.html.Google ScholarGoogle Scholar
  40. Ingwersen, P. (1998) The calculation of web impact factors. Journal of Documentation, 54(2), 236-243Google ScholarGoogle Scholar
  41. Kelly, B. (2000) WebWatch: A survey of links to UK university web sites. Ariadne, 23. Available: http://wwsv.anadne.ac.uk/issue23/weh-watch/.Google ScholarGoogle Scholar
  42. Kim, H.J. (2000). Motivations or hyperlinking in scholarly electronic articles: A qualitative study Journal of the American Society or Information Science, 51(10). 887-899. Google ScholarGoogle Scholar
  43. Kleinberg, J., (1999) Authoritative sources in a hyperlinked environment. Journal of the ACM Google ScholarGoogle Scholar
  44. Kling, R., & McKim, G. (1999) Scholarly communication and the continuum of electronic publishing. Journal of the American Society for Information Science, 50 (10). 890-906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Knudsen, I., Haug, G., & Kirstein, J. (1999). Trends in learning structures in higher education. Available: http.J/www.rks.dUtrendsl him Accessed: 7 March 2001Google ScholarGoogle Scholar
  46. Lawrence, S., & Giles, C.L. (1999) Accessibility of information on the web. Nature, 400.107109. Google ScholarGoogle ScholarCross RefCross Ref
  47. Leydesdorff, L., & Curran, M. (2000). Mapping university-industry-government relations on the Internet: The construction of indicators for a knowledge-based economy. Cybermetrics, 4. Available: htip://www.cindoc.csic.es/cybermetrics/articles/v4iIp2.html.Google ScholarGoogle Scholar
  48. Mayfield University Consultants. (2000). League tables 2000. The Times Higher Education Supplement. April 14. 11-111,Google ScholarGoogle Scholar
  49. McDonald, S., & Stevenson, R.J. (1998). Navigation in hyperspace: An evaluation of the effects of navigational tools and subject matter expertise on browsing and information retrieval in hypertext. Interacting with Computers. 0(2), 129-l42.Google ScholarGoogle ScholarCross RefCross Ref
  50. Middleton, I., McConnell, M., & Davidson, G. (1999). Presenting a model for the structure and content of a university World Wide Web site Journal of Information Science, 25(3), 219-227.Google ScholarGoogle Scholar
  51. Oppenheim, C. (1995). The correlation between citation counts and the 1992 research assessment exercises ratings for British library and information science departments. Journal of Documentation. 51. 18-27Google ScholarGoogle ScholarCross RefCross Ref
  52. Oppenheim, C. (1997), The correlation between citation counts and the 1992 research assessment exercise ratings for British research in genetics. anatomy and archaeology. Journal of Documentation. 53. 477-487.Google ScholarGoogle ScholarCross RefCross Ref
  53. Oppenheim, C. (2000). Do patent citations count? In: B. Cronin & H B. Atkins (Eds.), The web of knowledge: A festschrift in honor of Eugene Garfield (pp. 405-432). Metford, NJ. Information Today Inc ASS Monograph Series.Google ScholarGoogle Scholar
  54. TheNoble Publishing Company. (1999). Noble's higher education financial yearbook. Noble.Google ScholarGoogle Scholar
  55. Rousseau, R. (l997). Sitations: An exploratory study. Cybermetrics. I. Scotland: Edinburgh. Available: http://www.cindoc.csic.es/cybermetrics/ariicles/vlilpl.html.Google ScholarGoogle Scholar
  56. Rousseau, R. (1999). Daily time series of common single word searches in AltaVista and NorthernLight. Cybermetrics, 2/3. Available http://www.cindoc.csic.es/cybermetrics/articles/v2iIp2.html.Google ScholarGoogle Scholar
  57. Smith, A.G. (1999). A tale of two web spaces Comparing sites using Web Impact Factors. Journal of Documentation, 55(5). 577-592Google ScholarGoogle Scholar
  58. Snyder, H., & Rosenbaum, H. (1999). Can search engines be used for web-link analysis? A critical review Journal of Documentation. 55(4). 375-384.Google ScholarGoogle Scholar
  59. Steiger, J.H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin. 87(2). 245-251.Google ScholarGoogle ScholarCross RefCross Ref
  60. TheIwall, M. (1999) Will MANs and SuperJANET dominate educational technology in the UK' International Journal of Educational Technology. 1(1). Available: http://www.amstat.org/puhlicaiions/jse/.Google ScholarGoogle Scholar
  61. Thelwall, M. (2000). Web impact (actors and search engine coverage Journal of Documentation. 56(2), 185-189.Google ScholarGoogle Scholar
  62. Thelwall, M. (2001a). Results from a Web impact factor crawler Journal of Documentation, 57(2), 177-191.Google ScholarGoogle Scholar
  63. TheIwall, M. (2001b). The responsiveness of search engine indexes. Cybermetrics, 5(l). Available: http://www.cindoc.csic es/cyhermetrics/articles/vSiIpl.html.Google ScholarGoogle Scholar
  64. Thelwall, M. (2001c, in press). A Web crawler design for data mining. Journal of Information Science.Google ScholarGoogle ScholarCross RefCross Ref
  65. Thelwall, M. (2001d). Applying multivariate statistical analysis to university web links. University of Wolverhampton.Google ScholarGoogle Scholar
  66. Thomas, 0., & Willet, P. (2000). Webometric analysis of departments of Librarianship and information science. Journal of Information Science 26(6). 421-428.Google ScholarGoogle ScholarCross RefCross Ref
  67. Warner, J. (2000a) A critical review of the application of citation studies to the research assessment exercises. Journal of Information Science. 26(6). 453-460. (Includes comment by Oppenheim.)Google ScholarGoogle ScholarCross RefCross Ref
  68. Warner, J, (2000b) Research assessment and citation analysis. The Scientist (4(21), 39. Available hop://www.ihe-scientist.com/yr2000/oct/opin_001030.html.Google ScholarGoogle Scholar
  69. World Wide Web Consortium, (1999). Performance, implementation. and design notes, Accessed February 27. 2001. Available. http://www.w3.org/TRThiml4/appendix/notes.html#h-B.4 IGoogle ScholarGoogle Scholar

Index Terms

  1. Extracting macroscopic information from Web links

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        Journal of the American Society for Information Science and Technology cover image
        Journal of the American Society for Information Science and Technology  Volume 52, Issue 13
        November 2001
        101 pages

        Publisher

        John Wiley & Sons, Inc.

        United States

        Publication History

        • Online: 1 November 2001

        Qualifiers

        • article
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!