skip to main content
research-article

Cache-Based Query Processing for Search Engines

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

In practice, a search engine may fail to serve a query due to various reasons such as hardware/network failures, excessive query load, lack of matching documents, or service contract limitations (e.g., the query rate limits for third-party users of a search service). In this kind of scenarios, where the backend search system is unable to generate answers to queries, approximate answers can be generated by exploiting the previously computed query results available in the result cache of the search engine. In this work, we propose two alternative strategies to implement this cache-based query processing idea. The first strategy aggregates the results of similar queries that are previously cached in order to create synthetic results for new queries. The second strategy forms an inverted index over the textual information (i.e., query terms and result snippets) present in the result cache and uses this index to answer new queries. Both approaches achieve reasonable result qualities compared to processing queries with an inverted index built on the collection.

References

  1. Adali, S., Candan, K. S., Papakonstantinou, Y., and Subrahmanian, V. S. 1996. Query caching and optimization in distributed mediator systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 137--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Timestamp-Based result cache invalidation for web search engines. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 973--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alici, S., Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2012. Adaptive time-to-live strategies for query result caching in web search engines. In Proceedings of the 34th European Conference on Advances in Information Retrieval. 401--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Altingovde, I. S., Ozcan, R., Cambazoglu, B. B., and Ulusoy, O. 2011. Second chance: A hybrid approach for dynamic result caching in search engines. In Proceedings of the 33rd European Conference on Advances in Information Retrieval. 510--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Altingovde, I. S., Ozcan, R., and Ulusoy, O. 2012. Static index pruning in web search engines: Combining term and document popularities with query views. ACM Trans. Inf. Syst. 30, 1, 2:1--2:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Aslam, J. A. and Montague, M. 2001. Models for metasearch. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 276--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Baeza-Yates, R., Junqueira, F. P., Plachouras, V., and Witschel, H. F. 2007. Admission policies for caches of search engine results. In Proceedings of the 14th International Symposium on String Processing and Information Retrieval. 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Baeza-Yates, R. A., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 20:1--20:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barroso, L. A., D., J., and Holzle, U. 2003. Web search for a planet: The Google cluster architecture. IEEE Micro 23, 2, 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010. Caching search engine results over incremental indices. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 82--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Borda, J. C. 1781. Memorie sur les elections au scrutin. In Histoire de l’Academic Royale des Sciences.Google ScholarGoogle Scholar
  12. Cambazoglu, B. B., Junqueira, F. P., Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010a. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. 181--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cambazoglu, B. B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., and Degenhardt, J. 2010b. Early exit optimizations for additive machine learned ranking systems. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 411--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Carmel, D., Yom-Tov, E., Darlow, A., and Pelleg, D. 2006. What makes a query difficult? In Proceedings of the 29th International ACM SIGIR Conference on Research and Development in Information Retrieval. 390--397. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chen, C.-M. and Roussopoulos, N. 1994. The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching. In Proceedings of the 4th International Conference on Extending Database Technology. 323--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cheong, J.-H., goo Lee, S., and Chun, J. 2001. A method for processing boolean queries using a result cache. In Proceedings of the 12th International Conference on Database and Expert Systems Applications. 974--983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chidlovskii, B. and Borghoff, U. 2000. Semantic caching of web queries. VLDB J. 9, 1, 2--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chirerichetti, F., Kumar, R., and Vassilvitskii, S. 2009. Similarity caching. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dar, S., Franklin, M. J., Jónsson, B. T., Srivastava, D., and Tan, M. 1996. Semantic data caching and replacement. In Proceedings of the 22nd International Conference on Very Large Data Bases. 330--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dwork, C., Kumar, R., Naor, M., and Sivakumar, D. 2001. Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on World Wide Web. 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1, 51--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Falchi, F., Lucchese, C., Orlando, S., Perego, R., and Rabitti, F. 2008. A metric cache for similarity search. In Proceedings of the 6th Workshop on Large-Scale Distributed Systems for Information Retrieval. 43--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ferrarotti, F., Marin, M., and Mendoza, M. 2009. A last-resort semantic cache for web queries. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval. 310--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Friedman, J. H. 2000. Greedy function approximation: A gradient boosting machine. Ann. Statist. 29, 1189--1232.Google ScholarGoogle ScholarCross RefCross Ref
  26. Gan, Q. and Suel, T. 2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web. 431--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Houle, M. E., Oria, V., and Qasim, U. 2010. Active caching for similarity queries based on shared-neighbor information. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. 669--678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jonassen, S., Cambazoglu, B. B., and Silvestri, F. 2012. Prefetching query results and its impact on search engines. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Keller, A. M. and Basu, J. 1994. A predicate-based caching scheme for client-server database architectures. In Proceedings of the 3rd International Conference on Parallel and Distributed Information Systems. 229--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Kumaran, G. and Carvalho, V. R. 2009. Reducing long queries using query quality predictors. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 564--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lempel, R. and Moran, S. 2003. Predictive caching and prefetching of query results in search engines. In Proceedings of the 12th International Conference on World Wide Web. 19--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Li, H., Lee, W.-C., Sivasubramaniam, A., and Giles, C. L. 2007. A hybrid cache and prefetch mechanism for scientific literature search engines. In Proceedings of the 7th International Conference on Web Engineering. 121--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Long, X. and Suel, T. 2005. Three-Level caching for efficient query processing in large web search engines. In Proceedings of the 14th International Conference on World Wide Web. 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Marin, M., Gil-Costa, V., and Gomez-Pantoja, C. 2010. New caching techniques for web search engines. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Markatos, E. 2001. On caching search engine query results. Comput. Comm. 24, 2, 137--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Miranker, D. P., Taylor, M. C., and Padmanaban, A. 2002. A tractable query cache by approximation. In Proceedings of the 5th International Symposium on Abstraction, Reformulation and Approximation. 140--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ntoulas, A. and Cho, J. 2007. Pruning policies for two-tiered inverted index with correctness guarantee. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Oppenheimer, D., Ganapathi, A., and Patterson, D. A. 2003. Why do Internet services fail, and what can be done about it? In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2008. Static query result caching revisited. In Proceedings of the 17th International Conference on World Wide Web. 1169--1170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ozcan, R., Altingovde, I. S., and Ulusoy, O. 2011. Cost-Aware strategies for query result caching in web search engines. ACM Trans. Web 5, 2, 9:1--9:25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ozcan, R., Altingovde, I. S., Cambazoglu, B. B., Junqueira, F. P., and Özgür Ulusoy. 2012. A five-level static cache architecture for web search engines. Inf. Process. Manag. 48, 5, 828--840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Pandey, S., Broder, A., Chierichetti, F., Josifovski, V., Kumar, R., and Vassilvitskii, S. 2009. Nearest-Neighbor caching for content-match applications. In Proceedings of the 18th International Conference on World Wide Web. 441--450. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Papadakis, M. 2010. Set cover-based results caching for best match retrieval models. M.S. thesis, University of Crete.Google ScholarGoogle Scholar
  44. Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st International Conference on Scalable Information Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Poblete, B. and Baeza-Yates, R. 2008. Query-Sets: Using implicit feedback and query patterns to organize web documents. In Proceedings of the 17th International Conference on World Wide Web. 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Puppin, D. and Silvestri, F. 2006. The query-vector document model. In Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 880--881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Qian, X. 1996. Query folding. In Proceedings of the 12th International Conference on Data Engineering. 48--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Saraiva, C. P., de Moura, E. S., Ziviani, N., Meira, W., Fonseca, R., and Ribeiro-Neto, B. 2001. Rank-Preserving two-level caching for scalable search engines. In Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval. 51--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Schalekamp, F. and van Zuylen, A. 2009. Rank aggregation: Together we’re strong. In Proceedings of the 11th Workshop on Algorithm Engineering and Experiments. 38--51.Google ScholarGoogle Scholar
  50. Skobeltsyn, G., Junqueira, F. P., Plachouras, V., and Baeza-Yates, R. 2008. ResIn: A combination of results caching and index pruning for high-performance web search engines. In Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval. 131--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in web search. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zimmer, C., Bedathur, S., and Weikum, G. 2008. Flood little, cache more: Effective result-reuse in P2P IR systems. In Proceedings of the 13th International Conference on Database Systems for Advanced Applications. 235--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zobel, J. and Moffat, A. 2006. Inverted files for text search engines. ACM Comput. Surv. 38, 2, 1--56. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cache-Based Query Processing for Search Engines

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!