Abstract
A commercial web search engine shards its index among many servers, and therefore the response time of a search query is dominated by the slowest server that processes the query. Prior approaches target improving responsiveness by reducing the tail latency, or high-percentile response time, of an individual search server. They predict query execution time, and if a query is predicted to be long-running, it runs in parallel; otherwise, it runs sequentially. These approaches are, however, not accurate enough for reducing a high tail latency when responses are aggregated from many servers because this requires each server to reduce a substantially higher tail latency (e.g., the 99.99th percentile), which we call extreme tail latency.
To address tighter requirements of extreme tail latency, we propose a new design space for the problem, subsuming existing work and also proposing a new solution space. Existing work makes a prediction using features available at indexing time and focuses on optimizing prediction features for accelerating tail queries. In contrast, we identify “when to predict?” as another key optimization question. This opens up a new solution of delaying a prediction by a short duration to allow many short-running queries to complete without parallelization and, at the same time, to allow the predictor to collect a set of dynamic features using runtime information. This new question expands a solution space in two meaningful ways. First, we see a significant reduction of tail latency by leveraging “dynamic” features collected at runtime that estimate query execution time with higher accuracy. Second, we can ask whether to override prediction when the “predictability” is low. We show that considering predictability accelerates the query by achieving a higher recall.
With this prediction, we propose to accelerate the queries that are predicted to be long-running. In our preliminary work, we focused on parallelization as an acceleration scenario. We extend to consider heterogeneous multicore hardware for acceleration. This hardware combines processor cores with different microarchitectures such as energy-efficient little cores and high-performance big cores, and accelerating web search using this hardware has remained an open problem.
We evaluate the proposed prediction framework in two scenarios: (1) query parallelization on a multicore processor and (2) query scheduling on a heterogeneous processor. Our extensive evaluation results show that, for both scenarios of query acceleration using parallelization and heterogeneous cores, the proposed framework is effective in reducing the extreme tail latency compared to a start-of-the-art predictor because of its higher recall, and it improves server throughput by more than 70% because of its improved precision.
- R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. 2008. Design trade-offs for search engine caching. ACM Transactions on Web 2, 4 (Oct. 2008), 1--28. Google Scholar
Digital Library
- R. Baeza-Yates, V. Murdock, and C. Hauff. 2009. Efficiency trade-offs in two-tier web search systems. In SIGIR. Google Scholar
Digital Library
- M. Becchi and P. Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. ACM Computing Frontiers (2006). Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report (2008).Google Scholar
- Z. Bosnic and I. Kononenko. 2008. Comparison of approaches for estimating reliability of individual regression predictions. Data Knowledge Engineering (2008). Google Scholar
Digital Library
- S. Briesemeister, J. Rahnenfuhrer, and O. Kohlbacher. 2012. No longer confidential: Estimating the confidence of individual regression predictions. PLos ONE (2012).Google Scholar
- S. Brin and L. Page. 1998. The anatomy of a large-scale hypertextual web search engine. In WWW. Google Scholar
Digital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. 2003. Efficient query evaluation using a two-level retrieval process. In CIKM. Google Scholar
Digital Library
- F. Cacheda and R. Baeza-Yates. 2004. An optimistic model for searching web directories. In ECIR.Google Scholar
- F. Cacheda, V. Plachouras, and I. Ounis. 2005. A case study of distributed information retrieval architectures to index one terabyte of text. Information Processing Management 41, 5 (Sept. 2005). Google Scholar
Digital Library
- J. Chen and L. K. John. 2009. Efficient program scheduling for heterogeneous multi-core processors. In DAC. Google Scholar
Digital Library
- K. Van Craeynest, A. Jalelle, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ISCA. Google Scholar
Digital Library
- N. Craswell, B. Billerbeck, D. Fetterly, and M. Najork. 2013. Robust query rewriting using anchor data. In WSDM. Google Scholar
Digital Library
- J. Dean and L. A. Barroso. 2013. The tail at scale. Communications of the ACM 56, 2 (Feb. 2013). Google Scholar
Digital Library
- E. Frachtenberg. 2009. Reducing query latencies in web search using fine-grained parallelism. World Wide Web (2009). Google Scholar
Digital Library
- A. Freire, C. Macdonald, N. Tonellotto, I. Ounis, and F. Cacheda. 2013. Hybrid query scheduling for a replicated search engine. In ECIR. Google Scholar
Digital Library
- A. Freire, C. Macdonald, N. Tonellotto, I. Ounis, and F. Cacheda. 2014. A self-adapting latency/power tradeoff model for replicated search engines. In WSDM. Google Scholar
Digital Library
- J. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189--1232.Google Scholar
Cross Ref
- P. Greenhalgh. 2011. Big.LITTLE processing with ARM Cortex-A15 & Cortex-A7. ARM Whitepaper (2011).Google Scholar
- V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. 2010. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In ISCA. Google Scholar
Digital Library
- M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. 2013. Adaptive parallelism for web search. In EuroSys. Google Scholar
Digital Library
- M. Jeon, S. Kim, S. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. 2014. Predictive parallelization: Taming tail latencies in web search. In SIGIR. Google Scholar
Digital Library
- S. Kim, Y. He, S. Hwang, S. Elnikety, and S. Choi. 2015. Delayed-dynamic-selective (DDS) prediction for reducing extreme tail latency in web search. In WSDM. Google Scholar
Digital Library
- Y. Kim, A. Hassan, R. W. White, and Y.-M. Wang. 2013. Playing by the rules: Mining query associations to predict search performance. In WSDM. Google Scholar
Digital Library
- D. Koufaty, D. Reddy, and S. Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In EuroSys. Google Scholar
Digital Library
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. 2003. Single-ISA heterogeneous multicore architectures: The potential for processor power reduction. In MICRO. Google Scholar
Digital Library
- N. B. Lakshminarayana, J. Lee, and H. Kim. 2009. Age based scheduling for asymmetric multiprocessors. In SC. Google Scholar
Digital Library
- T. Li, P. Brett, R. C. Knauerhase, D. A. Koufaty, D. Reddy, and S. Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In HPCA.Google Scholar
- C. Macdonald, I. Ounis, and N. Tonellotto. 2011. Upper-bound approximations for dynamic pruning. ACM Transactions on Information Systems 29, 4 (Dec. 2011), 17:1--17:28. Google Scholar
Digital Library
- C. Macdonald, N. Tonellotto, and I. Ounis. 2012. Learning to predict response times for online query scheduling. In SIGIR. Google Scholar
Digital Library
- A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. 2007. A pipelined architecture for distributed text query evaluation. Information Retrieval 10, 3 (June 2007), 205--231. Google Scholar
Digital Library
- R. J. Oentaryo, E. P. Lim, D. J. W. Low, D. Lo, and M. Finegold. 2014. Predicting response in mobile advertising with hierarchical importance-aware factorization machine. In WSDM. Google Scholar
Digital Library
- B. Page and T. Lechler. 2005. Desmo-J. http://desmoj.sourceforge.net/overview.html.Google Scholar
- J. Pitkow, H. Schütze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T. Breuel. 2002. Personalized search. Communications of the ACM 45, 9 (Sept. 2002), 50--55. Google Scholar
Digital Library
- C. E. Rasmussen and C. K. I. Williams. 2006. Gaussian Processes for Machine Learning. MIT Press. Google Scholar
Digital Library
- S. Ren, Y. He, S. Elnikety, and K. S. McKinley. 2013. Exploiting processor heterogeneity for interactive services. In ICAC.Google Scholar
- K. Magne Risvik, T. Chilimbi, H. Tan, K. Kalyanaraman, and C. Anderson. 2013. Maguro, a system for indexing and searching over very large text collections. In WSDM. Google Scholar
Digital Library
- J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. 2011. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. JPDC 71, 1 (Jan. 2011). Google Scholar
Digital Library
- E. Schurman and J. Brutlag. 2009. Performance related changes and their user impact. Velocity (2009).Google Scholar
- S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. 2011. Posting list intersection on multicore architectures. In SIGIR. Google Scholar
Digital Library
- N. Tonellotto, C. Macdonald, and I. Ounis. 2013. Efficient and effective retrieval using selective pruning. In WSDM. Google Scholar
Digital Library
- H. Turtle and J. Flood. 1995. Query evaluation: Strategies and optimizations. Information Processing Management 31, 6 (Nov. 1995), 831--850. Google Scholar
Digital Library
Index Terms
Prediction and Predictability for Search Query Acceleration
Recommendations
Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data MiningA commercial web search engine shards its index among many servers, and therefore the response time of a search query is dominated by the slowest server that processes the query. Prior approaches target improving responsiveness by reducing the tail ...
Using general-purpose computing on graphics processing units (GPGPU) to accelerate the ordinary kriging algorithm
Spatial interpolation methods have been applied to many disciplines, the ordinary kriging interpolation being one of the methods most frequently used. However, kriging comprises a computational cost that scales as the cube of the number of data points. ...
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...






Comments