skip to main content
article

Core algorithms in the CLEVER system

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

This article describes the CLEVER search system developed at the IBM Almaden Research Center. We present a detailed and unified exposition of the various algorithmic components that make up the system, and then present results from two user studies.

References

  1. Achlioptas, D., Fiat, A., Karlin, A., and McSherry, F. 2001. Web search via hub synthesis. In Proceedings of the 42nd IEEE Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 500--509. Google ScholarGoogle Scholar
  2. Bharat, K. and Henzinger, M. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 104--111. Google ScholarGoogle Scholar
  3. Borodin, A., Roberts, G., Rosenthal, J., and Tsaparas, P. 2006. Link analysis ranking: Algorithms, theory, and experiments. ACM Trans. Internet Tech. 5, 1, 231--297. Google ScholarGoogle Scholar
  4. Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the World Wide Web. In Proceedings of the 10th International Conference on World Wide Web. ACM Press, New York, NY, 415--429. Google ScholarGoogle Scholar
  5. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. WWW7/Comput. Netw. 30, 1--7, 107--117. Google ScholarGoogle Scholar
  6. Broder, A., Glassman, S. C., Manasse, M. S., and Zweig, G. 1997. Syntactic clustering of the Web. WWW6/Comput. Netw. 29, 8--13, 1157--1166. Google ScholarGoogle Scholar
  7. Chakrabarti, S. 2001. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of the 10th International World Wide Web Conference. ACM Press, New York, NY, 211--220. Google ScholarGoogle Scholar
  8. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., and Rajagopalan, S. 1998a. Automatic resource compilation by analyzing hyperlink structure and associated text. WWW7/Comput. Netw. 30, 1--7, 65--74. Google ScholarGoogle Scholar
  9. Chakrabarti, S., Dom, B., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1998b. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Analysis. ACM Press, New York, NY, 13--21.Google ScholarGoogle Scholar
  10. Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of the 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 167--174. Google ScholarGoogle Scholar
  11. Dean, J. and Henzinger, M. 1999. Finding related pages in the World Wide Web. WWW8/Comput. Netw. 31, 11--16, 1467--1479. Google ScholarGoogle Scholar
  12. Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. 2002. Pagerank, hits, and a unified framework for link analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 354--354. Google ScholarGoogle Scholar
  13. Farahat, A., Lofaro, T., Miller, J., Rae, G., Schaefer, F., and Ward, L. 2001. Modification of Kleinberg's HITS algorithm using matrix exponentiation and Web log records. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 444--445. Google ScholarGoogle Scholar
  14. Feller, W. 1968. An Introduction to Probability Theory and its Applications, I & II. John Wiley, New York, NY.Google ScholarGoogle Scholar
  15. Gibson, D., Kleinberg, J., and Raghavan, P. 2000. Clustering categorical data: An approach based on dynamical systems. VLDB J. 8, 3--4, 222--236. Google ScholarGoogle Scholar
  16. Golub, G. and Loan, C. V. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, MD.Google ScholarGoogle Scholar
  17. Haveliwala, T. 2002. Topic sensitive page rank. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, NY, 517--526. Google ScholarGoogle Scholar
  18. Hofmann, T. 2000. Learning probabilistic models of the web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 369--371. Google ScholarGoogle Scholar
  19. Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 271--279. Google ScholarGoogle Scholar
  20. Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. Assoc. Comput. Mach. 46, 5, 604--632. Google ScholarGoogle Scholar
  21. Kleinberg, J. M. and Tomkins, A. 1999. Applications of linear algebra in information retrieval and hypertext analysis. In Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 185--193. Google ScholarGoogle Scholar
  22. Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. Trawling the Web for emerging cyber-communities. Comput. Netw. 31, 11--16, 1481--1493. Google ScholarGoogle Scholar
  23. Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 2001. On semi-automated Web taxonomy construction. In Proceedings of the 4th ACM WebDB. ACM Press, New York, NY, 91--96.Google ScholarGoogle Scholar
  24. Langville, A. and Meyer, C. D. 2005. A survey of eigenvector methods for Web information retrieval. SIAM Rev. 47, 1, 135--161. Google ScholarGoogle Scholar
  25. Lempel, R. and Moran, S. 2000. The stochastic approach for link-structure analysis ( SALSA ) and the TKC effect. WWW9/Comput. Netw. 33, 1--6, 387--401. Google ScholarGoogle Scholar
  26. Lempel, R. and Moran, S. 2001. SALSA: The stochastic approach for link-structure analysis. ACM Trans. Informat. Syst. 19, 2, 131--160. Google ScholarGoogle Scholar
  27. Li, L., Shang, Y., and Zhang, W. 2002. Improvement of HITS -based algorithms on Web documents. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, NY, 527--535. Google ScholarGoogle Scholar
  28. Maarek, Y. and Smadja, F. 1989. Full text indexing based on lexical relations. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 198--206. Google ScholarGoogle Scholar
  29. Ng, A., Zheng, A., and Jordan, M. 2001. Stable algorithms for link analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 258--266. Google ScholarGoogle Scholar
  30. Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the Web. Tech. rep. Stanford University, Stanford, CA.Google ScholarGoogle Scholar
  31. Rafei, D. and Mendelzon, A. 2000. What is this page known for? Computing Web page reputations. WWW9/Comput. Netw. 33, 1--6, 823--835. Google ScholarGoogle Scholar
  32. Reddy, P. and Kitsuregawa, M. 2001. An approach to relate the web communities through bipartite graphs. In Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE '01, Kyoto, Japan, Dec.3--6). Google ScholarGoogle Scholar
  33. Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems (NIPS). Morgan Kaufmann, San Francisco, CA, 1441--1448.Google ScholarGoogle Scholar
  34. Salton, G. and Buckley, C. 1990. Improving retrieval performance for relevance feedback. J. Amer. Soc. Informat. Sci. 41, 4, 288--297.Google ScholarGoogle Scholar
  35. Tomlin, J. A. 2003. A new paradigm for ranking pages on the World Wide Web. In Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 350--355. Google ScholarGoogle Scholar
  36. Toyoda, M. and Kitsuregawa, M. 2001. A Web community chart for navigating related communities. In Proceedings of the 10th International World Wide Web Conference. Poster.Google ScholarGoogle Scholar
  37. Tsaparas, P. 2003. Link analysis ranking algorithms. Ph.D. thesis, University of Toronto. Google ScholarGoogle Scholar
  38. Tsaparas, P. 2004. Using non-linear dynamical systems for Web searching and ranking. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems. ACM Press, New York, NY, 59--70. Google ScholarGoogle Scholar

Index Terms

  1. Core algorithms in the CLEVER system

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!