Abstract
This article describes the CLEVER search system developed at the IBM Almaden Research Center. We present a detailed and unified exposition of the various algorithmic components that make up the system, and then present results from two user studies.
- Achlioptas, D., Fiat, A., Karlin, A., and McSherry, F. 2001. Web search via hub synthesis. In Proceedings of the 42nd IEEE Annual Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 500--509. Google Scholar
- Bharat, K. and Henzinger, M. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 104--111. Google Scholar
- Borodin, A., Roberts, G., Rosenthal, J., and Tsaparas, P. 2006. Link analysis ranking: Algorithms, theory, and experiments. ACM Trans. Internet Tech. 5, 1, 231--297. Google Scholar
- Borodin, A., Roberts, G. O., Rosenthal, J. S., and Tsaparas, P. 2001. Finding authorities and hubs from link structures on the World Wide Web. In Proceedings of the 10th International Conference on World Wide Web. ACM Press, New York, NY, 415--429. Google Scholar
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. WWW7/Comput. Netw. 30, 1--7, 107--117. Google Scholar
- Broder, A., Glassman, S. C., Manasse, M. S., and Zweig, G. 1997. Syntactic clustering of the Web. WWW6/Comput. Netw. 29, 8--13, 1157--1166. Google Scholar
- Chakrabarti, S. 2001. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. In Proceedings of the 10th International World Wide Web Conference. ACM Press, New York, NY, 211--220. Google Scholar
- Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., and Rajagopalan, S. 1998a. Automatic resource compilation by analyzing hyperlink structure and associated text. WWW7/Comput. Netw. 30, 1--7, 65--74. Google Scholar
- Chakrabarti, S., Dom, B., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1998b. Spectral filtering for resource discovery. In Proceedings of the ACM SIGIR Workshop on Hypertext Analysis. ACM Press, New York, NY, 13--21.Google Scholar
- Cohn, D. and Chang, H. 2000. Learning to probabilistically identify authoritative documents. In Proceedings of the 17th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 167--174. Google Scholar
- Dean, J. and Henzinger, M. 1999. Finding related pages in the World Wide Web. WWW8/Comput. Netw. 31, 11--16, 1467--1479. Google Scholar
- Ding, C., He, X., Husbands, P., Zha, H., and Simon, H. 2002. Pagerank, hits, and a unified framework for link analysis. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 354--354. Google Scholar
- Farahat, A., Lofaro, T., Miller, J., Rae, G., Schaefer, F., and Ward, L. 2001. Modification of Kleinberg's HITS algorithm using matrix exponentiation and Web log records. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 444--445. Google Scholar
- Feller, W. 1968. An Introduction to Probability Theory and its Applications, I & II. John Wiley, New York, NY.Google Scholar
- Gibson, D., Kleinberg, J., and Raghavan, P. 2000. Clustering categorical data: An approach based on dynamical systems. VLDB J. 8, 3--4, 222--236. Google Scholar
- Golub, G. and Loan, C. V. 1989. Matrix Computations. Johns Hopkins University Press, Baltimore, MD.Google Scholar
- Haveliwala, T. 2002. Topic sensitive page rank. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, NY, 517--526. Google Scholar
- Hofmann, T. 2000. Learning probabilistic models of the web. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 369--371. Google Scholar
- Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 271--279. Google Scholar
- Kleinberg, J. 1999. Authoritative sources in a hyperlinked environment. J. Assoc. Comput. Mach. 46, 5, 604--632. Google Scholar
- Kleinberg, J. M. and Tomkins, A. 1999. Applications of linear algebra in information retrieval and hypertext analysis. In Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 185--193. Google Scholar
- Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 1999. Trawling the Web for emerging cyber-communities. Comput. Netw. 31, 11--16, 1481--1493. Google Scholar
- Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. 2001. On semi-automated Web taxonomy construction. In Proceedings of the 4th ACM WebDB. ACM Press, New York, NY, 91--96.Google Scholar
- Langville, A. and Meyer, C. D. 2005. A survey of eigenvector methods for Web information retrieval. SIAM Rev. 47, 1, 135--161. Google Scholar
- Lempel, R. and Moran, S. 2000. The stochastic approach for link-structure analysis ( SALSA ) and the TKC effect. WWW9/Comput. Netw. 33, 1--6, 387--401. Google Scholar
- Lempel, R. and Moran, S. 2001. SALSA: The stochastic approach for link-structure analysis. ACM Trans. Informat. Syst. 19, 2, 131--160. Google Scholar
- Li, L., Shang, Y., and Zhang, W. 2002. Improvement of HITS -based algorithms on Web documents. In Proceedings of the 11th International World Wide Web Conference. ACM Press, New York, NY, 527--535. Google Scholar
- Maarek, Y. and Smadja, F. 1989. Full text indexing based on lexical relations. In Proceedings of the 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 198--206. Google Scholar
- Ng, A., Zheng, A., and Jordan, M. 2001. Stable algorithms for link analysis. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York, NY, 258--266. Google Scholar
- Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the Web. Tech. rep. Stanford University, Stanford, CA.Google Scholar
- Rafei, D. and Mendelzon, A. 2000. What is this page known for? Computing Web page reputations. WWW9/Comput. Netw. 33, 1--6, 823--835. Google Scholar
- Reddy, P. and Kitsuregawa, M. 2001. An approach to relate the web communities through bipartite graphs. In Proceedings of the 2nd International Conference on Web Information Systems Engineering (WISE '01, Kyoto, Japan, Dec.3--6). Google Scholar
- Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in pagerank. In Advances in Neural Information Processing Systems (NIPS). Morgan Kaufmann, San Francisco, CA, 1441--1448.Google Scholar
- Salton, G. and Buckley, C. 1990. Improving retrieval performance for relevance feedback. J. Amer. Soc. Informat. Sci. 41, 4, 288--297.Google Scholar
- Tomlin, J. A. 2003. A new paradigm for ranking pages on the World Wide Web. In Proceedings of the 12th International Conference on World Wide Web. ACM Press, New York, NY, 350--355. Google Scholar
- Toyoda, M. and Kitsuregawa, M. 2001. A Web community chart for navigating related communities. In Proceedings of the 10th International World Wide Web Conference. Poster.Google Scholar
- Tsaparas, P. 2003. Link analysis ranking algorithms. Ph.D. thesis, University of Toronto. Google Scholar
- Tsaparas, P. 2004. Using non-linear dynamical systems for Web searching and ranking. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems. ACM Press, New York, NY, 59--70. Google Scholar
Index Terms
Core algorithms in the CLEVER system
Recommendations
Mining the web with hierarchical crawlers – a resource sharing based crawling approach
An important component of any web search engine is its crawler, which is also known as robot or spider. An efficient set of crawlers make any search engine more powerful, apart from its other measures of performance, such as its ranking algorithm, ...
Implicit link analysis for small web search
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalCurrent Web search engines generally impose link analysis-based re-ranking on web-page retrieval. However, the same techniques, when applied directly to small web search such as intranet and site search, cannot achieve the same performance because their ...
Link analysis ranking: algorithms, theory, and experiments
The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. The seminal papers of Kleinberg [1998, 1999] and Brin and Page [1998] introduced Link ...






Comments