skip to main content
research-article

Design trade-offs for search engine caching

Published:27 October 2008Publication History
Skip Abstract Section

Abstract

In this article we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs. caching posting lists. Using a query log spanning a whole year, we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log influence the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.

References

  1. Anh, V. N. and Moffat, A. 2006. Pruned query evaluation using pre-computed impacts. In Proceedings of the 29th International ACM Conference on Research and Development in Information Retrieval (SIGIR'06). ACM, New York, NY, 372--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2007. The impact of caching on search engines. In Proceedings of the 30th International ACM Conference on Research and Development in Information Retrieval (SIGIR'07). ACM, New York, NY, 183--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Symposium on String Processing and Information Retrieval (SPIRE'07). Lecture Notes in Computer Science, Vol. 4726, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of the 10th International Symposium on String Processing and Information Retrieval (SPIRE'03). Lecture Notes in Computer Science, Vol. 2857, 56--65.Google ScholarGoogle ScholarCross RefCross Ref
  5. Beitzel, S. M., Jensen, E. C., Chowdhury, A., Grossman, D., and Frieder, O. 2004. Hourly analysis of a very large topically categorized web query log. In Proceedings of the 27th International ACM Conference on Research and Development in Information Retrieval (SIGIR'04). ACM, New York, NY, 321--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Boldi, P., Codenotti, B., Santini, M., and Vigna, S. 2004. Ubicrawler: a scalable fully distributed web crawler. Softw. Pract. Exper. 34, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Buckley, C. and Lewit, A. F. 1985. Optimization of inverted vector searches. In Proceedings of the 8th International ACM Conference on Research and Development in Information Retrieval (SIGIR'85). ACM, New York, NY, 97--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Büttcher, S. and Clarke, C. L. A. 2006. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM'06). ACM, New York, NY, 182--189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cao, P. and Irani, S. 1997. Cost-aware WWW proxy caching algorithms. In USENIX Symposium on Internet Technologies and Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Castillo, C., Donato, D., Becchetti, L., Boldi, P., Leonardi, S., Santini, M., and Vigna, S. 2006. A reference collection for web spam. SIGIR Forum 40, 2, 11--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Denning, P. 1980. Working sets past and present. IEEE Trans. Softw. Eng. SE-6, 1, 64--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of web search engines: caching and prefetching query results by exploiting historical usage data. ACM Trans. Inform. Syst. 24, 1, 51--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jansen, B. and Spink, A. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Inform. Process. Manag. 42, 248--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jansen, B. J., Spink, A., Bateman, J., and Saracevic, T. 1998. Real life information retrieval: a study of user queries on the web. SIGIR Forum 32, 1, 5--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International World Wide Web Conference (WWW'03). ACM, New York, NY, 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large web search engines. In Proceedings of the 14th International World Wide Web Conference (WWW'05). ACM, New York, NY, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Markatos, E. P. 2001. On caching search engine query results. Comput. Commun. 24, 2, 137--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ntoulas, A. and Cho, J. 2007. Pruning policies for two-tiered inverted index with correctness guarantee. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'07). ACM, New York, NY, 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., and Lioma, C. 2006. Terrier: a high performance and scalable information retrieval platform. In SIGIR Workshop on Open Source Information Retrieval.Google ScholarGoogle Scholar
  20. Podlipnig, S. and Boszormenyi, L. 2003. A survey of web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Raghavan, V. V. and Sever, H. 1995. On the reuse of past optimal queries. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95). ACM, New York, NY, 344--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Saraiva, P. C., de Moura, E. S., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th International ACM Conference on Research and Development in Information Retrieval (SIGIR'01). ACM, New York, NY, 51--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Slutz, D. R. and Traiger, I. L. 1974. A note on the calculation of average working set size. Commun. ACM 17, 10, 563--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Strohman, T., Turtle, H., and Croft, W. B. 2005. Optimization strategies for complex queries. In Proceedings of the 28th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05). ACM, New York, NY, 219--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tsegay, Y., Turpin, A., and Zobel, J. 2007. Dynamic index pruning for effective caching. In Proceedings of the 16th ACM conference on Conference on Information and Knowledge Management (CIKM'07). ACM, New York, NY, 987--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Witten, I. H., Bell, T. C., and Moffat, A. 1994. Managing Gigabytes: Compressing and Indexing Documents and Images. John Wiley & Sons, Inc., New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Xie, Y. and O'Hallaron, D. R. 2002. Locality in search engine queries and its implications for caching. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'02).Google ScholarGoogle Scholar
  29. Young, N. E. 2002. On-line file caching. Algorithmica 33, 3, 371--383.Google ScholarGoogle Scholar
  30. Zhang, J., Long, X., and Suel, T. 2008. Performance of compressed inverted list caching in search engines. In Proceedings of the 17th International World Wide Web Conference (WWW'08). ACM, New York, NY, 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Design trade-offs for search engine caching

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on the Web
              ACM Transactions on the Web  Volume 2, Issue 4
              October 2008
              118 pages
              ISSN:1559-1131
              EISSN:1559-114X
              DOI:10.1145/1409220
              Issue’s Table of Contents

              Copyright © 2008 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 27 October 2008
              • Accepted: 1 August 2008
              • Revised: 1 July 2008
              • Received: 1 December 2007
              Published in tweb Volume 2, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!