skip to main content
research-article

TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to identify the opportunities and challenges for reducing tail latency. We find that the workload consists of mainly short requests that do not benefit from parallelism, and a few long requests which significantly impact the tail but exhibit high parallelism speedup. This motivates estimating request execution time, using a predictor, to identify long requests and to parallelize them. Prediction, however, is not perfect; a long request mispredicted as short is likely to contribute to the server tail latency, setting a ceiling on the achievable tail latency. We propose TPC, an approach that combines prediction information judiciously with dynamic correction for inaccurate prediction. Dynamic correction increases parallelism to accelerate a long request that is mispredicted as short. TPC carefully selects the appropriate target latencies based on system load and parallelism efficiency to reduce tail latency.

We implement TPC and several prior approaches to compare them experimentally on a single search server and on a cluster of 40 search servers. The experimental results show that TPC reduces the 99th- and 99.9th-percentile latency by up to 40% compared with the best prior work. Moreover, we evaluate TPC on a finance server, demonstrating its effectiveness on reducing tail latency of interactive services beyond web search.

References

  1. M.-C. Albutiu, A. Kemper, and T. Neumann. Massively parallel sort-merge joins in main memory multi-core database systems. VLDB, 5 (10): 1064--1075, June 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In SIGIR, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23 (2): 22--28, Mar. 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, C. D. Antonopoulos, and M. Curtis-Maury. Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Comput., 33 (10--11): 700--719, Nov. 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPOPP, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In WWW, 2010.Google ScholarGoogle Scholar
  7. G. Contreras and M. Martonosi. Characterizing and improving the performance of intel threading building blocks. In IISWC, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Dean. Challenges in building large-scale information retrieval systems: invited talk. In WSDM, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56 (2): 74--80, Feb. 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In WWW, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Eyerman and L. Eeckhout. The benefit of smt in the multi-core era: Flexibility towards degrees of thread-level parallelism. In ASPLOS, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Frachtenberg. Reducing query latencies in web search using fine-grained parallelism. In WWW, 2009.Google ScholarGoogle Scholar
  13. Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In WWW, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Guida. Parallelizing a computationally intensive financial r application with zircon technology. In The R User Conference, 2010.Google ScholarGoogle Scholar
  15. M. E. Haque, Y. H. Eom, Y. He, S. Elnikety, R. Bianchini, and K. S. McKinley. Few-to-many: Incremental parallelism for reducing tail latency in interactive services. In ASPLOS, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. He, W.-J. Hsu, and C. E. Leiserson. Provably efficient online nonclairvoyant adaptive scheduling. IEEE Trans. Parallel Distrib. Syst., 19 (9): 1263--1279, Sept. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. He, S. Elnikety, and H. Sun. Tians scheduling: Using partial processing in best-effort applications. In ICDCS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C.-H. Hsu, Y. Zhang, M. A. Laurenzano, D. Meisner, T. Wenisch, J. Mars, L. Tang, and R. G. Dreslinski. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. In HPCA, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  19. V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Rybalkin, and C. Yan. Speeding up distributed request-response workflows. In SIGCOMM '13, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In EuroSys '13, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Jeon, S. Kim, S.-W. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming tail latencies in web search. In SIGIR, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Jonassen, B. B. Cambazoglu, and F. Silvestri. Prefetching query results and its impact on search engines. In SIGIR, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Kwon, K.-W. Kim, S. Paik, J. Lee, and C.-G. Lee. Multicore scheduling of parallel real-time tasks with multiple parallelization options. In RTAS, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In ISCA, 2010.Google ScholarGoogle Scholar
  25. D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. In OOPSLA, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In SIGIR, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Mars, L. Tang, R. Hundt, K. Skadron, and M. L. Soffa. Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In MICRO, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Trans. Comput. Syst., 11 (2): 146--178, May 1993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Meisner, C. M. Sadler, L. A. Barroso, W.-D. Weber, and T. F. Wenisch. Power management of online data-intensive services. In ISCA, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. MSDN. Using the pdh functions to consume counter data. http://msdn.microsoft.com/en-us/library/windows/desktop/aa373214(v=vs.85).aspx.Google ScholarGoogle Scholar
  31. S. C. Muller, G. Alonso, A. Amara, and A. Csillaghy. Pydron: Semi-automatic parallelization for multi-core and the cloud. In OSDI, 2014.Google ScholarGoogle Scholar
  32. K. Ousterhout, P. Wendell, M. Zaharia, and I. Stoica. Sparrow: Distributed, low latency scheduling. In SOSP, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using dope: the degree of parallelism executive. In PLDI, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Ren, Y. He, S. Elnikety, and K. S. McKinley. Exploiting processor heterogeneity in interactive services. In ICAC, 2013.Google ScholarGoogle Scholar
  35. B. Schlegel, T. Willhalm, and W. Lehner. Fast sorted-set intersection using simd instructions. In ADMS, 2011.Google ScholarGoogle Scholar
  36. S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In SIGIR, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Vamanan, J. Hasan, and T. Vijaykumar. Deadline-aware datacenter tcp (d2tcp). In SIGCOMM, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: Parallelizing sequential logging and replay. In ASPLOS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. S. Venkataraman, A. Panda, G. Ananthanarayanan, M. J. Franklin, and I. Stoica. The power of choice in data-aware cluster scheduling. In OSDI, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better never than late: meeting deadlines in datacenter networks. In SIGCOMM, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Zhu and V. J. Reddi. High-performance and energy-efficient mobile web browsing on big/little systems. In HPCA, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM SIGPLAN Notices
                ACM SIGPLAN Notices  Volume 51, Issue 4
                ASPLOS '16
                April 2016
                774 pages
                ISSN:0362-1340
                EISSN:1558-1160
                DOI:10.1145/2954679
                • Editor:
                • Andy Gill
                Issue’s Table of Contents
                • cover image ACM Conferences
                  ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
                  March 2016
                  824 pages
                  ISBN:9781450340915
                  DOI:10.1145/2872362
                  • General Chair:
                  • Tom Conte,
                  • Program Chair:
                  • Yuanyuan Zhou

                Copyright © 2016 ACM

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 25 March 2016

                Check for updates

                Qualifiers

                • research-article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!