Abstract
Search engines and large-scale IR systems need to cache query results for efficiency and scalability purposes. Static and dynamic caching techniques (as well as their combinations) are employed to effectively cache query results. In this study, we propose cost-aware strategies for static and dynamic caching setups. Our research is motivated by two key observations: (i) query processing costs may significantly vary among different queries, and (ii) the processing cost of a query is not proportional to its popularity (i.e., frequency in the previous logs). The first observation implies that cache misses have different, that is, nonuniform, costs in this context. The latter observation implies that typical caching policies, solely based on query popularity, can not always minimize the total cost. Therefore, we propose to explicitly incorporate the query costs into the caching policies. Simulation results using two large Web crawl datasets and a real query log reveal that the proposed approach improves overall system performance in terms of the average query execution time.
- Altingovde, I. S., Ozcan, R., and Ulusoy, Ö. 2009. A cost-aware strategy for query result caching in Web search engines. In Proceedings of 31st European Conference on Information Retrieval, Lecture Notes in Computer Science, vol. 5478. Springer-Verlag, 628--636. Google Scholar
Digital Library
- Arlitt, M. F., Cherkasova, L., Dilley, J., Friedrich, R. J., and Jin, T. Y. 2000. Evaluating content management techniques for Web proxy caches. ACM SIGMETRICS Perform. Eval. Rev. 27, 4, 3--11. Google Scholar
Digital Library
- Baeza-Yates, R. and Saint-Jean, F. 2003. A three level search engine index based in query log distribution. In Proceedings of 10th International Symposium on String Processing and Information Retrieval. Lecture Notes in Computer Science, vol. 2857. Springer-Verlag, 56--65.Google Scholar
- Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2007a. The impact of caching on search engines. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 183--190. Google Scholar
Digital Library
- Baeza-Yates, R., Junqueira, F., Plachouras, V., and Witschel, H. F. 2007b. Admission policies for caches of search engine results. In Proceedings of 14th International Symposium on String Processing and Information Retrieval, Lecture Notes in Computer Science, vol. 4726. Springer-Verlag, 74--85. Google Scholar
Digital Library
- Baeza-Yates, R., Gionis, A., Junqueira, F., Murdock, V., Plachouras, V., and Silvestri, F. 2008. Design trade-offs for search engine caching. ACM Trans. Web 2, 4, 1--28. Google Scholar
Digital Library
- Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010a. Caching search engine results over incremental indices. In Proceedings of the 19th International Conference on World Wide Web. ACM, New York, 1065--1066. Google Scholar
Digital Library
- Blanco, R., Bortnikov, E., Junqueira, F., Lempel, R., Telloli, L., and Zaragoza, H. 2010b. Caching search engine results over incremental indices. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 82--89. Google Scholar
Digital Library
- Cacheda, F. and Vina, A. 2001. Experiencies retrieving information in the World Wide Web. In Proceedings of the 6th IEEE Symposium on Computers and Communications. 72--79. Google Scholar
Digital Library
- Cambazoglu, B. B. 2006. Models and algorithms for parallel text retrieval. Ph.D. dissertation, Bilkent University, Ankara, Turkey.Google Scholar
- Cambazoglu, B. B., Junqueira, F. P, Plachouras, V., Banachowski, S., Cui, B., Lim, S., and Bridge, B. 2010. A refreshing perspective of search engine caching. In Proceedings of the 19th International Conference on World Wide Web. ACM, New York, 181--190. Google Scholar
Digital Library
- Cao, P. and Irani, S. 1997. Cost-aware WWW proxy caching algorithms. In Proceedings of the USENIX Symposium on Internet Technologies and Systems. USENIX Association, Berkeley, CA, 18--18. Google Scholar
Digital Library
- Cherkasova, L. and Ciardo, G. 2001. Role of aging, frequency and size in Web caching replacement strategies. In Proceedings of the 2001 Conference on High Performance Computing and Networking (HPCN’01). Lecture Notes in Computer Science, vol. 2110. Springer-Verlag, 114--123. Google Scholar
Digital Library
- Dean, J. 2009. Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining ACM, New York. Google Scholar
Digital Library
- de Moura, E. S., dos Santos, C. F., Fernandes, D. R., Silva, A. S., Calado, P., and Nascimento, M. A. 2005. Improving Web search efficiency via a locality based static pruning method. In Proceedings of the 14th International Conference on World Wide Web, ACM, New York, 235--244. Google Scholar
Digital Library
- Fagni, T., Perego, R., Silvestri, F., and Orlando, S. 2006. Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst. 24, 1, 51--78. Google Scholar
Digital Library
- Gan, Q. and Suel, T. 2009. Improved techniques for result caching in web search engines. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, 431--440. Google Scholar
Digital Library
- Garcia, S. 2007. Search engine optimisation using past queries. Ph.D. dissertation. RMIT University.Google Scholar
- Jeong, J. and Dubois, M. 2003. Cost-sensitive cache replacement algorithms. In Proceedings of 9th International Symposium on High-Performance Computer Architecture. IEEE Computer Society, 327--337. Google Scholar
Digital Library
- Jeong, J. and Dubois, M. 2006. Cache replacement algorithms with nonuniform miss costs. IEEE Trans. Comput. 55, 4, 353--365. Google Scholar
Digital Library
- Liang, S., Chen, K., Jiang, S., and Zhang, X. 2007. Cost-aware caching algorithms for distributed storage servers. In Proceedings of the 21st International Symposium on Distributed Computing (DISC). 373--387. Google Scholar
Digital Library
- Lester, N., Moffat, A., Webber, W, Zobel, J. 2005. Space-limited ranked query evaluation using adaptive pruning. In Proceedings of 6th International Conference on Web Information Systems Engineering. Lecture Notes in Computer Science, vol. 3806. Springer-Verlag, 470--477. Google Scholar
Digital Library
- Long, X. and Suel, T. 2005. Three-level caching for efficient query processing in large Web search engines. In Proceedings of the 14th International Conference on World Wide Web. ACM, New York, 257--266. Google Scholar
Digital Library
- Markatos, E. P. 2001. On caching search engine query results. Comput. Comm. 24, 2, 137--143. Google Scholar
Digital Library
- Ntoulas, A. and Cho, J. 2007. Pruning policies for two-tiered inverted index with correctness guarantee. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 191--198. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., and Ulusoy, Ö. 2008a. Static query result caching revisited. In Proceedings of the 17th International Conference on World Wide Web. ACM, New York, 1169--1170. Google Scholar
Digital Library
- Ozcan, R., Altingovde, I. S., and Ulusoy, Ö. 2008b. Utilization of navigational queries for result presentation and caching in search engines. In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM). ACM, New York, 1499--1500. Google Scholar
Digital Library
- Pass, G., Chowdhury, A., and Torgeson, C. 2006. A picture of search. In Proceedings of the 1st international Conference on Scalable information Systems. ACM, New York, 1. Google Scholar
Digital Library
- Podlipnig, S. and Böszörményi, L. 2003. A survey of Web cache replacement strategies. ACM Comput. Surv. 35, 4, 374--398. Google Scholar
Digital Library
- Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems. 3rd Ed., McGraw-Hill. Google Scholar
Digital Library
- Saraiva, P. C., Silva de Moura, E., Ziviani, N., Meira, W., Fonseca, R., and Riberio-Neto, B. 2001. Rank-preserving two-level caching for scalable search engines. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 51--58. Google Scholar
Digital Library
- Silverstein, C., Marais, H., Henzinger, M., and Moricz, M. 1999. Analysis of a very large web search engine query log. SIGIR Forum 33, 1, 6--12. Google Scholar
Digital Library
- Strohman, T. and Croft, W. B. 2007. Efficient document retrieval in main memory. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 175--182. Google Scholar
Digital Library
- Tsegay, Y., Turpin, A., and Zobel, J. 2007. Dynamic index pruning for effective caching. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management (CIKM). ACM, New York, 987--990. Google Scholar
Digital Library
- Tsegay, Y., Puglisi, S. J., Turpin, A., and Zobel, J. 2009. Document compaction for efficient query biased snippet generation. In Proceedings of the 31st European Conference on Information Retrieval. Lecture Notes In Computer Science, vol. 5478. Springer-Verlag, 509--520. Google Scholar
Digital Library
- Turpin, A., Tsegay, Y., Hawking, D., and Williams, H. E. 2007. Fast generation of result snippets in web search. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 127--134. Google Scholar
Digital Library
- WebBase. 2007. Stanford University WebBase Project. www-diglib.stanford.edu/~testbed/doc2/WebBase.Google Scholar
- Webber, W. and Moffat, A. 2005. In search of reliable retrieval experiments. In Proceedings of the 10th Australasian Document Computing Symposium. 26--33.Google Scholar
- Xie, Y. and O’Hallaron, D. 2002. Locality in search engine queries and its implications for caching. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communication Societies. IEEE Computer Society, 1238--1247.Google Scholar
- Yahoo! 2009. http://developer.yahoo.com/search/web/V1/webSearch.html.Google Scholar
- Young, N. E. 2002. On-line file caching. Algorithmica 33, 3, 371--383.Google Scholar
- Zettair. 2007. The zettair search engine. http://www.seg.rmit.edu.au/zettair/.Google Scholar
Index Terms
Cost-Aware Strategies for Query Result Caching in Web Search Engines
Recommendations
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data
This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial ...
A machine learning approach for result caching in web search engines
To the best of our knowledge, our work is therst in literature to apply machine learning techniques to the result caching problem in search engines, for both static, dynamic, and state-of-the-art static-dynamic cache organizations.We evaluate a large ...
Evaluating leading web search engines on children's queries
HCII'11: Proceedings of the 14th international conference on Human-computer interaction: users and applications - Volume Part IVThis study compared retrieved results, relevance ranking, and overlap across Google, Yahoo!, Bing, Yahoo Kids!, and Ask Kids on 15 queries constructed by middle school children. Queries included one word, two words, and multiple words/phrases/natural ...






Comments