Abstract
In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to identify the opportunities and challenges for reducing tail latency. We find that the workload consists of mainly short requests that do not benefit from parallelism, and a few long requests which significantly impact the tail but exhibit high parallelism speedup. This motivates estimating request execution time, using a predictor, to identify long requests and to parallelize them. Prediction, however, is not perfect; a long request mispredicted as short is likely to contribute to the server tail latency, setting a ceiling on the achievable tail latency. We propose TPC, an approach that combines prediction information judiciously with dynamic correction for inaccurate prediction. Dynamic correction increases parallelism to accelerate a long request that is mispredicted as short. TPC carefully selects the appropriate target latencies based on system load and parallelism efficiency to reduce tail latency.
We implement TPC and several prior approaches to compare them experimentally on a single search server and on a cluster of 40 search servers. The experimental results show that TPC reduces the 99th- and 99.9th-percentile latency by up to 40% compared with the best prior work. Moreover, we evaluate TPC on a finance server, demonstrating its effectiveness on reducing tail latency of interactive services beyond web search.
- M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. VLDB, 5 (10): 1064--1075, June 2012.Google Scholar
Digital Library
- R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In SIGIR, 2007.Google Scholar
Digital Library
- L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23 (2): 22--28, Mar. 2003.Google Scholar
Digital Library
- F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, C. D. Antonopoulos, and M. Curtis-Maury. Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Comput., 33 (10--11): 700--719, Nov. 2007.Google Scholar
Digital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPOPP, 1995.Google Scholar
Digital Library
- B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In WWW, 2010.Google Scholar
- G. Contreras and M. Martonosi. Characterizing and improving the performance of intel threading building blocks. In IISWC, 2008.Google Scholar
Cross Ref
- J. Dean. Challenges in building large-scale information retrieval systems: invited talk. In WSDM, 2009.Google Scholar
Digital Library
- J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56 (2): 74--80, Feb. 2013.Google Scholar
Digital Library
- S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In WWW, 2009.Google Scholar
Digital Library
- S. Eyerman and L. Eeckhout. The benefit of smt in the multi-core era: Flexibility towards degrees of thread-level parallelism. In ASPLOS, 2014.Google Scholar
Digital Library
- E. Frachtenberg. Reducing query latencies in web search using fine-grained parallelism. In WWW, 2009.Google Scholar
- Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In WWW, 2009.Google Scholar
Digital Library
- R. Guida. Parallelizing a computationally intensive financial r application with zircon technology. In The R User Conference, 2010.Google Scholar
- M. E. Haque, Y. H. Eom, Y. He, S. Elnikety, R. Bianchini, and K. S. McKinley. Few-to-many: Incremental parallelism for reducing tail latency in interactive services. In ASPLOS, 2015.Google Scholar
Digital Library
- Y. He, W.-J. Hsu, and C. E. Leiserson. Provably efficient online nonclairvoyant adaptive scheduling. IEEE Trans. Parallel Distrib. Syst., 19 (9): 1263--1279, Sept. 2008.Google Scholar
Digital Library
- Y. He, S. Elnikety, and H. Sun. Tians scheduling: Using partial processing in best-effort applications. In ICDCS, 2011.Google Scholar
Digital Library
- C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, J. Mars, L. Tang, and R. G. Dreslinski. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. In HPCA, 2015.Google Scholar
Cross Ref
- V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Rybalkin, and C. Yan. Speeding up distributed request-response workflows. In SIGCOMM '13, 2013.Google Scholar
Digital Library
- M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In EuroSys '13, 2013.Google Scholar
Digital Library
- M. Jeon, S. Kim, S.-W. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming tail latencies in web search. In SIGIR, 2014.Google Scholar
Digital Library
- S. Jonassen, B. B. Cambazoglu, and F. Silvestri. Prefetching query results and its impact on search engines. In SIGIR, 2012.Google Scholar
Digital Library
- J. Kwon, K.-W. Kim, S. Paik, J. Lee, and C.-G. Lee. Multicore scheduling of parallel real-time tasks with multiple parallelization options. In RTAS, 2015.Google Scholar
Cross Ref
- J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In ISCA, 2010.Google Scholar
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In OOPSLA, 2009.Google Scholar
Digital Library
- C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In SIGIR, 2012.Google Scholar
Digital Library
- J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO, 2011.Google Scholar
Digital Library
- C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Trans. Comput. Syst., 11 (2): 146--178, May 1993.Google Scholar
Digital Library
- D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch. Power management of online data-intensive services. In ISCA, 2011.Google Scholar
Digital Library
- MSDN. Using the pdh functions to consume counter data. http://msdn.microsoft.com/en-us/library/windows/desktop/aa373214(v=vs.85).aspx.Google Scholar
- S. C. Muller, G. Alonso, A. Amara, and A. Csillaghy. Pydron: Semi-automatic parallelization for multi-core and the cloud. In OSDI, 2014.Google Scholar
- K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, low latency scheduling. In SOSP, 2013.Google Scholar
Digital Library
- A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using dope: the degree of parallelism executive. In PLDI, 2011.Google Scholar
Digital Library
- S. Ren, Y. He, S. Elnikety, and K. S. McKinley. Exploiting processor heterogeneity in interactive services. In ICAC, 2013.Google Scholar
- B. Schlegel, T. Willhalm, and W. Lehner. Fast sorted-set intersection using simd instructions. In ADMS, 2011.Google Scholar
- S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In SIGIR, 2011.Google Scholar
Digital Library
- B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter tcp (d2tcp). In SIGCOMM, 2012.Google Scholar
Digital Library
- K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: Parallelizing sequential logging and replay. In ASPLOS, 2011.Google Scholar
Digital Library
- S. Venkataraman, A. Panda, G. Ananthanarayanan, M. J. Franklin, and I. Stoica. The power of choice in data-aware cluster scheduling. In OSDI, 2014.Google Scholar
Digital Library
- C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: meeting deadlines in datacenter networks. In SIGCOMM, 2011.Google Scholar
Digital Library
- Y. Zhu and V. J. Reddi. High-performance and energy-efficient mobile web browsing on big/little systems. In HPCA, 2013.Google Scholar
Digital Library
Index Terms
TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services
Recommendations
TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services
ASPLOS'16In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to ...
TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsIn interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to ...
Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services
ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating SystemsInteractive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing ...







Comments