Abstract
In practice, a search engine may fail to serve a query due to various reasons such as hardware/network failures, excessive query load, lack of matching documents, or service contract limitations (e.g., the query rate limits for third-party users of a search service). In this kind of scenarios, where the backend search system is unable to generate answers to queries, approximate answers can be generated by exploiting the previously computed query results available in the result cache of the search engine. In this work, we propose two alternative strategies to implement this cache-based query processing idea. The first strategy aggregates the results of similar queries that are previously cached in order to create synthetic results for new queries. The second strategy forms an inverted index over the textual information (i.e., query terms and result snippets) present in the result cache and uses this index to answer new queries. Both approaches achieve reasonable result qualities compared to processing queries with an inverted index built on the collection.
- Adali, S., Candan, K. S., Papakonstantinou, Y., and Subrahmanian, V. S. 1996. Query caching and optimization in distributed mediator systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 137--148. Google Scholar
Digital Library
- Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Timestamp-Based result cache invalidation for web search engines. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 973--982. Google Scholar
Digital Library
- Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2012. Adaptive time-to-live strategies for query result caching in web search engines. In Proceedings of the 34th European Conference on Advances in Information Retrieval. 401--412. Google Scholar
Digital Library
- Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Second chance: A hybrid approach for dynamic result caching in search engines. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 510--516. Google Scholar
Digital Library
- Altingovde, I. S., Ozcan, R., and Ulusoy, O. 2012. Static index pruning in web search engines: Combining term and document popularities with query views. ACM Trans. Inf. Syst. 30, 1, 2:1--2:28. Google Scholar
Digital Library
- Aslam, J. A. and Montague, M. 2001. Models for metasearch. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 276--284. Google Scholar
Digital Library
- Baeza-Yates, R., Junqueira, F. P., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Symposium on String Processing and Information Retrieval. 74--85. Google Scholar
Digital Library
- Baeza-Yates, R. A., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 20:1--20:28. Google Scholar
Digital Library
- Barroso, L. A., D., J., and Holzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 23, 2, 22--28. Google Scholar
Digital Library
- Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010. Caching search engine results over incremental indices. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 82--89. Google Scholar
Digital Library
- Borda, J. C. 1781. Memorie sur les elections au scrutin. In Histoire de l’Academic Royale des Sciences.Google Scholar
- Cambazoglu, B. B., Junqueira, F. P., Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010a. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. 181--190. Google Scholar
Digital Library
- Cambazoglu, B. B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., and Degenhardt, J. 2010b. Early exit optimizations for additive machine learned ranking systems. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 411--420. Google Scholar
Digital Library
- Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 43--50. Google Scholar
Digital Library
- Carmel, D., Yom-Tov, E., Darlow, A., and Pelleg, D. 2006. What makes a query difficult? In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval. 390--397. Google Scholar
Digital Library
- Chen, C.-M. and Roussopoulos, N. 1994. The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching. In Proceedings of the 4th International Conference on Extending Database Technology. 323--336. Google Scholar
Digital Library
- Cheong, J.-H., goo Lee, S., and Chun, J. 2001. A method for processing boolean queries using a result cache. In Proceedings of the 12th International Conference on Database and Expert Systems Applications. 974--983. Google Scholar
Digital Library
- Chidlovskii, B. and Borghoff, U. 2000. Semantic caching of web queries. VLDB J. 9, 1, 2--17. Google Scholar
Digital Library
- Chirerichetti, F., Kumar, R., and Vassilvitskii, S. 2009. Similarity caching. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 127--136. Google Scholar
Digital Library
- Dar, S., Franklin, M. J., Jónsson, B. T., Srivastava, D., and Tan, M. 1996. Semantic data caching and replacement. In Proceedings of the 22nd International Conference on Very Large Data Bases. 330--341. Google Scholar
Digital Library
- Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001. Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on World Wide Web. 613--622. Google Scholar
Digital Library
- Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1, 51--78. Google Scholar
Digital Library
- Falchi, F., Lucchese, C., Orlando, S., Perego, R., and Rabitti, F. 2008. A metric cache for similarity search. In Proceedings of the 6th Workshop on Large-Scale Distributed Systems for Information Retrieval. 43--50. Google Scholar
Digital Library
- Ferrarotti, F., Marin, M., and Mendoza, M. 2009. A last-resort semantic cache for web queries. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval. 310--321. Google Scholar
Digital Library
- Friedman, J. H. 2000. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 1189--1232.Google Scholar
Cross Ref
- Gan, Q. and Suel, T. 2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web. 431--440. Google Scholar
Digital Library
- Houle, M. E., Oria, V., and Qasim, U. 2010. Active caching for similarity queries based on shared-neighbor information. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 669--678. Google Scholar
Digital Library
- Jonassen, S., Cambazoglu, B. B., and Silvestri, F. 2012. Prefetching query results and its impact on search engines. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Google Scholar
Digital Library
- Keller, A. M. and Basu, J. 1994. A predicate-based caching scheme for client-server database architectures. In Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems. 229--238. Google Scholar
Digital Library
- Kumaran, G. and Carvalho, V. R. 2009. Reducing long queries using query quality predictors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 564--571. Google Scholar
Digital Library
- Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International Conference on World Wide Web. 19--28. Google Scholar
Digital Library
- Li, H., Lee, W.-C., Sivasubramaniam, A., and Giles, C. L. 2007. A hybrid cache and prefetch mechanism for scientific literature search engines. In Proceedings of the 7th International Conference on Web Engineering. 121--136. Google Scholar
Digital Library
- Long, X. and Suel, T. 2005. Three-Level caching for efficient query processing in large web search engines. In Proceedings of the 14th International Conference on World Wide Web. 257--266. Google Scholar
Digital Library
- Marin, M., Gil-Costa, V., and Gomez-Pantoja, C. 2010. New caching techniques for web search engines. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 215--226. Google Scholar
Digital Library
- Markatos, E. 2001. On caching search engine query results. Comput. Comm. 24, 2, 137--143. Google Scholar
Digital Library
- Miranker, D. P., Taylor, M. C., and Padmanaban, A. 2002. A tractable query cache by approximation. In Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. 140--151. Google Scholar
Digital Library
- Ntoulas, A. and Cho, J. 2007. Pruning policies for two-tiered inverted index with correctness guarantee. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. 191--198. Google Scholar
Digital Library
- Oppenheimer, D., Ganapathi, A., and Patterson, D. A. 2003. Why do Internet services fail, and what can be done about it? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems. 1--16. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2008. Static query result caching revisited. In Proceedings of the 17th International Conference on World Wide Web. 1169--1170. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011. Cost-Aware strategies for query result caching in web search engines. ACM Trans. Web 5, 2, 9:1--9:25. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., Cambazoglu, B. B., Junqueira, F. P., and Özgür Ulusoy. 2012. A five-level static cache architecture for web search engines. Inf. Process. Manag. 48, 5, 828--840. Google Scholar
Digital Library
- Pandey, S., Broder, A., Chierichetti, F., Josifovski, V., Kumar, R., and Vassilvitskii, S. 2009. Nearest-Neighbor caching for content-match applications. In Proceedings of the 18th International Conference on World Wide Web. 441--450. Google Scholar
Digital Library
- Papadakis, M. 2010. Set cover-based results caching for best match retrieval models. M.S. thesis, University of Crete.Google Scholar
- Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems. Google Scholar
Digital Library
- Poblete, B. and Baeza-Yates, R. 2008. Query-Sets: Using implicit feedback and query patterns to organize web documents. In Proceedings of the 17th International Conference on World Wide Web. 41--50. Google Scholar
Digital Library
- Puppin, D. and Silvestri, F. 2006. The query-vector document model. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 880--881. Google Scholar
Digital Library
- Qian, X. 1996. Query folding. In Proceedings of the 12th International Conference on Data Engineering. 48--55. Google Scholar
Digital Library
- Saraiva, C. P., de Moura, E. S., Ziviani, N., Meira, W., Fonseca, R., and Ribeiro-Neto, B. 2001. Rank-Preserving two-level caching for scalable search engines. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 51--58. Google Scholar
Digital Library
- Schalekamp, F. and van Zuylen, A. 2009. Rank aggregation: Together we’re strong. In Proceedings of the 11th Workshop on Algorithm Engineering and Experiments. 38--51.Google Scholar
- Skobeltsyn, G., Junqueira, F. P., Plachouras, V., and Baeza-Yates, R. 2008. ResIn: A combination of results caching and index pruning for high-performance web search engines. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. 131--138. Google Scholar
Digital Library
- Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in web search. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. 127--134. Google Scholar
Digital Library
- Zimmer, C., Bedathur, S., and Weikum, G. 2008. Flood little, cache more: Effective result-reuse in P2P IR systems. In Proceedings of the 13th International Conference on Database Systems for Advanced Applications. 235--250. Google Scholar
Digital Library
- Zobel, J. and Moffat, A. 2006. Inverted files for text search engines. ACM Comput. Surv. 38, 2, 1--56. Google Scholar
Digital Library
Index Terms
Cache-Based Query Processing for Search Engines
Recommendations
Prefetching query results and its impact on search engines
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalWe investigate the impact of query result prefetching on the efficiency and effectiveness of web search engines. We propose offline and online strategies for selecting and ordering queries whose results are to be prefetched. The offline strategies rely ...
A study of results overlap and uniqueness among major web search engines
The performance and capabilities of Web search engines is an important and significant area of research. Millions of people world wide use Web search engines very day. This paper reports the results of a major study examining the overlap among results ...
Efficient query processing in geographic web search engines
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataGeographic web search engines allow users to constrain and order search results in an intuitive manner by focusing a query on a particular geographic region. Geographic search technology, also called local search, has recently received significant ...






Comments