skip to main content
research-article

Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

Published:01 December 2013Publication History
Skip Abstract Section

Abstract

Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved.

References

  1. Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Timestamp-based result cache invalidation for Web search engines. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 973--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2012. Adaptive time-to-live strategies for query result caching in Web search engines. In Proceedings of the 34th European Conference Advances in Information Retrieval. 401--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Second chance: A hybrid approach for dynamic result caching in search engines. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 510--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arroyuelo, D., González, S., Marin, M., Oyarzún, M., and Suel, T. 2012. To index or not to index: Time-space trade-offs in search engines with positional ranking functions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 255--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baeza-Yates, R. and Jonassen, S. 2012. Modeling static caching in Web search engines. In Proceedings of the 34th European Conference on Advances in Information Retrieval. 436--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of the 10th International Conference on String Processing and Information Retrieval. 56--65.Google ScholarGoogle Scholar
  7. Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Conference on String Processing and Information Retrieval. 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 1--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bailey, P., White, R. W., Liu, H., and Kumaran, G. 2010. Mining historic query trails to label long and rare search engine queries. ACM Trans. Web 4, 4, 15:1--15:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010a. Caching search engine results over incremental indices. In Proceedings of the 19th International Conference on World Wide Web. 1065--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010b. Caching search engine results over incremental indices. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 82--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bortnikov, E., Lempel, R., and Vornovitsky, K. 2011. Caching for realtime search. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 104--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using Web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 231--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cambazoglu, B. B. and Baeza-Yates, R. 2011. Scalability challenges in Web search engines. In Advanced Topics in Information Retrieval, M. Melucci, R. Baeza-Yates, and W. B. Croft Eds., The Information Retrieval Series, vol. 33. Springer, Berlin Heidelberg, 27--50.Google ScholarGoogle Scholar
  15. Cambazoglu, B. B., Junqueira, F., Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., and Silvestri, F. 2011. Caching query-biased snippets for efficient retrieval. In Proceedings of the 14th International Conference on Extending Database Technology. 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Elias, P. 1975. Universal codeword sets and the representation of the integers. IEEE Trans. Inf. Theory 21, 194--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inform. Syst. 24, 1, 51--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Gan, Q. and Suel, T. 2009. Improved techniques for result caching in Web search engines. In Proceedings of the 18th International Conference on World Wide Web. 431--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA data mining software: An update. SIGKDD Explor. 11, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jonassen, S., Cambazoglu, B. B., and Silvestri, F. 2012. Prefetching query results and its impact on search engines. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 631--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International Conference on World Wide Web. 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lempel, R. and Moran, S. 2004. Optimizing result prefetching in Web search engines with segmented indices. ACM Trans. Int. Technol. 4, 1, 31--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web. 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Marin, M., Gil-Costa, V., and Gomez-Pantoja, C. 2010. New caching techniques for Web search engines. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Markatos, E. P. 2001. On caching search engine query results. Comput. Commun. 24, 2, 137--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011a. Cost-aware strategies for query result caching in Web search engines. ACM Trans. Web 5, 2, 9:1--9:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011b. Exploiting navigational queries for result presentation and caching in Web search engines. J. Am. Soc. Inf. Sci. Technol. 62, 4, 714--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ozcan, R., Altingovde, I. S., Cambazoglu, B. B., Junqueira, F. P., and Ulusoy, O. 2012. A five-level static cache architecture for Web search engines. Inf. Process. Manage. 48, 5, 828--840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Podlipnig, S. and Boszormenyi, L. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 51--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2013a. A financial cost metric for result caching. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 873--876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2013b. Strategies for setting time-to-live values in result caches. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 1881--1884. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Tsegay, Y., Puglisi, S. J., Turpin, A., and Zobel, J. 2009. Document compaction for efficient query biased snippet generation. In Proceedings of the 31th European Conference on Advances in Information Retrieval. 509--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in Web search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhang, J., Long, X., and Suel, T. 2008. Performance of compressed inverted list caching in search engines. In Proceedings of the 17th International Conference on World Wide Web. 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 8, Issue 1
        December 2013
        204 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/2560539
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 December 2013
        • Accepted: 1 October 2013
        • Revised: 1 July 2013
        • Received: 1 August 2012
        Published in tweb Volume 8, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!