Abstract
Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved.
- Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Timestamp-based result cache invalidation for Web search engines. In Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 973--982. Google Scholar
Digital Library
- Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2012. Adaptive time-to-live strategies for query result caching in Web search engines. In Proceedings of the 34th European Conference Advances in Information Retrieval. 401--412. Google Scholar
Digital Library
- Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Second chance: A hybrid approach for dynamic result caching in search engines. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 510--516. Google Scholar
Digital Library
- Arroyuelo, D., González, S., Marin, M., Oyarzún, M., and Suel, T. 2012. To index or not to index: Time-space trade-offs in search engines with positional ranking functions. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 255--264. Google Scholar
Digital Library
- Baeza-Yates, R. and Jonassen, S. 2012. Modeling static caching in Web search engines. In Proceedings of the 34th European Conference on Advances in Information Retrieval. 436--446. Google Scholar
Digital Library
- Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of the 10th International Conference on String Processing and Information Retrieval. 56--65.Google Scholar
- Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Conference on String Processing and Information Retrieval. 74--85. Google Scholar
Digital Library
- Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 1--28. Google Scholar
Digital Library
- Bailey, P., White, R. W., Liu, H., and Kumaran, G. 2010. Mining historic query trails to label long and rare search engine queries. ACM Trans. Web 4, 4, 15:1--15:27. Google Scholar
Digital Library
- Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010a. Caching search engine results over incremental indices. In Proceedings of the 19th International Conference on World Wide Web. 1065--1066. Google Scholar
Digital Library
- Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010b. Caching search engine results over incremental indices. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 82--89. Google Scholar
Digital Library
- Bortnikov, E., Lempel, R., and Vornovitsky, K. 2011. Caching for realtime search. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 104--116. Google Scholar
Digital Library
- Broder, A. Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., and Zhang, T. 2007. Robust classification of rare queries using Web knowledge. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 231--238. Google Scholar
Digital Library
- Cambazoglu, B. B. and Baeza-Yates, R. 2011. Scalability challenges in Web search engines. In Advanced Topics in Information Retrieval, M. Melucci, R. Baeza-Yates, and W. B. Croft Eds., The Information Retrieval Series, vol. 33. Springer, Berlin Heidelberg, 27--50.Google Scholar
- Cambazoglu, B. B., Junqueira, F., Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. 181--190. Google Scholar
Digital Library
- Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., and Silvestri, F. 2011. Caching query-biased snippets for efficient retrieval. In Proceedings of the 14th International Conference on Extending Database Technology. 93--104. Google Scholar
Digital Library
- Elias, P. 1975. Universal codeword sets and the representation of the integers. IEEE Trans. Inf. Theory 21, 194--203. Google Scholar
Digital Library
- Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inform. Syst. 24, 1, 51--78. Google Scholar
Digital Library
- Gan, Q. and Suel, T. 2009. Improved techniques for result caching in Web search engines. In Proceedings of the 18th International Conference on World Wide Web. 431--440. Google Scholar
Digital Library
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA data mining software: An update. SIGKDD Explor. 11, 1. Google Scholar
Digital Library
- Jonassen, S., Cambazoglu, B. B., and Silvestri, F. 2012. Prefetching query results and its impact on search engines. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. 631--640. Google Scholar
Digital Library
- Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International Conference on World Wide Web. 19--28. Google Scholar
Digital Library
- Lempel, R. and Moran, S. 2004. Optimizing result prefetching in Web search engines with segmented indices. ACM Trans. Int. Technol. 4, 1, 31--59. Google Scholar
Digital Library
- Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web. 257--266. Google Scholar
Digital Library
- Marin, M., Gil-Costa, V., and Gomez-Pantoja, C. 2010. New caching techniques for Web search engines. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 215--226. Google Scholar
Digital Library
- Markatos, E. P. 2001. On caching search engine query results. Comput. Commun. 24, 2, 137--143. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011a. Cost-aware strategies for query result caching in Web search engines. ACM Trans. Web 5, 2, 9:1--9:25. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011b. Exploiting navigational queries for result presentation and caching in Web search engines. J. Am. Soc. Inf. Sci. Technol. 62, 4, 714--726. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., Cambazoglu, B. B., Junqueira, F. P., and Ulusoy, O. 2012. A five-level static cache architecture for Web search engines. Inf. Process. Manage. 48, 5, 828--840. Google Scholar
Digital Library
- Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems. Google Scholar
Digital Library
- Podlipnig, S. and Boszormenyi, L. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google Scholar
Digital Library
- Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 51--58. Google Scholar
Digital Library
- Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2013a. A financial cost metric for result caching. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 873--876. Google Scholar
Digital Library
- Sazoglu, F. B., Cambazoglu, B. B., Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2013b. Strategies for setting time-to-live values in result caches. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 1881--1884. Google Scholar
Digital Library
- Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large Web search engine query log. SIGIR Forum 33, 1, 6--12. Google Scholar
Digital Library
- Tsegay, Y., Puglisi, S. J., Turpin, A., and Zobel, J. 2009. Document compaction for efficient query biased snippet generation. In Proceedings of the 31th European Conference on Advances in Information Retrieval. 509--520. Google Scholar
Digital Library
- Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in Web search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 127--134. Google Scholar
Digital Library
- Zhang, J., Long, X., and Suel, T. 2008. Performance of compressed inverted list caching in search engines. In Proceedings of the 17th International Conference on World Wide Web. 387--396. Google Scholar
Digital Library
Index Terms
Second Chance: A Hybrid Approach for Dynamic Result Caching and Prefetching in Search Engines
Recommendations
Second Chance: A Hybrid Approach for Dynamic Result Caching in Search Engines
ECIR 2011: Proceedings of the 33rd European Conference on Advances in Information Retrieval - Volume 6611Result caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result ...
Second chance: a hybrid approach for dynamic result caching in search engines
ECIR'11: Proceedings of the 33rd European conference on Advances in information retrievalResult caches are vital for efficiency of search engines. In this work, we propose a novel caching strategy in which a dynamic result cache is split into two layers: an HTML cache and a docID cache. The HTML cache in the first layer stores the result ...
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial ...






Comments