skip to main content
research-article

Cost-Aware Strategies for Query Result Caching in Web Search Engines

Published:01 May 2011Publication History
Skip Abstract Section

Abstract

Search engines and large-scale IR systems need to cache query results for efficiency and scalability purposes. Static and dynamic caching techniques (as well as their combinations) are employed to effectively cache query results. In this study, we propose cost-aware strategies for static and dynamic caching setups. Our research is motivated by two key observations: (i) query processing costs may significantly vary among different queries, and (ii) the processing cost of a query is not proportional to its popularity (i.e., frequency in the previous logs). The first observation implies that cache misses have different, that is, nonuniform, costs in this context. The latter observation implies that typical caching policies, solely based on query popularity, can not always minimize the total cost. Therefore, we propose to explicitly incorporate the query costs into the caching policies. Simulation results using two large Web crawl datasets and a real query log reveal that the proposed approach improves overall system performance in terms of the average query execution time.

References

  1. Altingovde, I. S., Ozcan, R., and Ulusoy, Ö. 2009. A cost-aware strategy for query result caching in Web search engines. In Proceedings of 31st European Conference on Information Retrieval, Lecture Notes in Computer Science, vol. 5478. Springer-Verlag, 628--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arlitt, M. F., Cherkasova, L., Dilley, J., Friedrich, R. J., and Jin, T. Y. 2000. Evaluating content management techniques for Web proxy caches. ACM SIGMETRICS Perform. Eval. Rev. 27, 4, 3--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of 10th International Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 2857. Springer-Verlag, 56--65.Google ScholarGoogle Scholar
  4. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2007a. The impact of caching on search engines. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 183--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007b. Admission policies for caches of search engine results. In Proceedings of 14th International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, vol. 4726. Springer-Verlag, 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 1--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010a. Caching search engine results over incremental indices. In Proceedings of the 19th International Conference on World Wide Web. ACM, New York, 1065--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010b. Caching search engine results over incremental indices. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 82--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cacheda, F. and Vina, A. 2001. Experiencies retrieving information in the World Wide Web. In Proceedings of the 6th IEEE Symposium on Computers and Communications. 72--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cambazoglu, B. B. 2006. Models and algorithms for parallel text retrieval. Ph.D. dissertation, Bilkent University, Ankara, Turkey.Google ScholarGoogle Scholar
  11. Cambazoglu, B. B., Junqueira, F. P, Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. ACM, New York, 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cao, P. and Irani, S. 1997. Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems. USENIX Association, Berkeley, CA, 18--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cherkasova, L. and Ciardo, G. 2001. Role of aging, frequency and size in Web caching replacement strategies. In Proceedings of the 2001 Conference on High Performance Computing and Networking (HPCN’01). Lecture Notes in Computer Science, vol. 2110. Springer-Verlag, 114--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dean, J. 2009. Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. de Moura, E. S., dos Santos, C. F., Fernandes, D. R., Silva, A. S., Calado, P., and Nascimento, M. A. 2005. Improving Web search efficiency via a locality based static pruning method. In Proceedings of the 14th International Conference on World Wide Web, ACM, New York, 235--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1, 51--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gan, Q. and Suel, T. 2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, 431--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Garcia, S. 2007. Search engine optimisation using past queries. Ph.D. dissertation. RMIT University.Google ScholarGoogle Scholar
  19. Jeong, J. and Dubois, M. 2003. Cost-sensitive cache replacement algorithms. In Proceedings of 9th International Symposium on High-Performance Computer Architecture. IEEE Computer Society, 327--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jeong, J. and Dubois, M. 2006. Cache replacement algorithms with nonuniform miss costs. IEEE Trans. Comput. 55, 4, 353--365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Liang, S., Chen, K., Jiang, S., and Zhang, X. 2007. Cost-aware caching algorithms for distributed storage servers. In Proceedings of the 21st International Symposium on Distributed Computing (DISC). 373--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lester, N., Moffat, A., Webber, W, Zobel, J. 2005. Space-limited ranked query evaluation using adaptive pruning. In Proceedings of 6th International Conference on Web Information Systems Engineering. Lecture Notes in Computer Science, vol. 3806. Springer-Verlag, 470--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web. ACM, New York, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Markatos, E. P. 2001. On caching search engine query results. Comput. Comm. 24, 2, 137--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ntoulas, A. and Cho, J. 2007. Pruning policies for two-tiered inverted index with correctness guarantee. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ozcan, R., Altingovde, I. S., and Ulusoy, Ö. 2008a. Static query result caching revisited. In Proceedings of the 17th International Conference on World Wide Web. ACM, New York, 1169--1170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ozcan, R., Altingovde, I. S., and Ulusoy, Ö. 2008b. Utilization of navigational queries for result presentation and caching in search engines. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM). ACM, New York, 1499--1500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st international Conference on Scalable information Systems. ACM, New York, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Podlipnig, S. and Böszörményi, L. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems. 3rd Ed., McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 51--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 1, 6--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Strohman, T. and Croft, W. B. 2007. Efficient document retrieval in main memory. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 175--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tsegay, Y., Turpin, A., and Zobel, J. 2007. Dynamic index pruning for effective caching. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management (CIKM). ACM, New York, 987--990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tsegay, Y., Puglisi, S. J., Turpin, A., and Zobel, J. 2009. Document compaction for efficient query biased snippet generation. In Proceedings of the 31st European Conference on Information Retrieval. Lecture Notes In Computer Science, vol. 5478. Springer-Verlag, 509--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in web search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. WebBase. 2007. Stanford University WebBase Project. www-diglib.stanford.edu/~testbed/doc2/WebBase.Google ScholarGoogle Scholar
  38. Webber, W. and Moffat, A. 2005. In search of reliable retrieval experiments. In Proceedings of the 10th Australasian Document Computing Symposium. 26--33.Google ScholarGoogle Scholar
  39. Xie, Y. and O’Hallaron, D. 2002. Locality in search engine queries and its implications for caching. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communication Societies. IEEE Computer Society, 1238--1247.Google ScholarGoogle Scholar
  40. Yahoo! 2009. http://developer.yahoo.com/search/web/V1/webSearch.html.Google ScholarGoogle Scholar
  41. Young, N. E. 2002. On-line file caching. Algorithmica 33, 3, 371--383.Google ScholarGoogle Scholar
  42. Zettair. 2007. The zettair search engine. http://www.seg.rmit.edu.au/zettair/.Google ScholarGoogle Scholar

Index Terms

  1. Cost-Aware Strategies for Query Result Caching in Web Search Engines

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on the Web
        ACM Transactions on the Web  Volume 5, Issue 2
        May 2011
        190 pages
        ISSN:1559-1131
        EISSN:1559-114X
        DOI:10.1145/1961659
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 May 2011
        • Accepted: 1 August 2010
        • Revised: 1 June 2010
        • Received: 1 November 2009
        Published in tweb Volume 5, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!