skip to main content
research-article
Open Access

WSMeter: A Performance Evaluation Methodology for Google's Production Warehouse-Scale Computers

Published:19 March 2018Publication History
Skip Abstract Section

Abstract

Evaluating the comprehensive performance of a warehouse-scale computer (WSC) has been a long-standing challenge. Traditional load-testing benchmarks become ineffective because they cannot accurately reproduce the behavior of thousands of distinct jobs co-located on a WSC. We therefore evaluate WSCs using actual job behaviors in live production environments. From our experience of developing multiple generations of WSCs, we identify two major challenges of this approach: 1) the lack of a holistic metric that incorporates thousands of jobs and summarizes the performance, and 2) the high costs and risks of conducting an evaluation in a live environment. To address these challenges, we propose WSMeter, a cost-effective methodology to accurately evaluate a WSC's performance using a live production environment. We first define a new metric which accurately represents a WSC's overall performance, taking a wide variety of unevenly distributed jobs into account. We then propose a model to statistically embrace the performance variance inherent in WSCs, to conduct an evaluation with minimal costs and risks. We present three real-world use cases to prove the effectiveness of WSMeter. In the first two cases, WSMeter accurately discerns 7% and 1% performance improvements from WSC upgrades using only 0.9% and 6.6% of the machines in the WSCs, respectively. We emphasize that naive statistical comparisons incur much higher evaluation costs (> 4 times) and sometimes even fail to distinguish subtle differences. The third case shows that a cloud customer hosting two services on our WSC quantifies the performance benefits of software optimization (+9.3%) with minimal overheads (2.3% of the service capacity).

References

  1. Alaa R. Alameldeen and David A. Wood. 2003. Variability in Architectural Simulations of Multi-Threaded Workloads Proceedings of the 9th International Symposium on High-Performance Computer Architecture (HPCA '03). IEEE Computer Society, Washington, DC, USA, 7--. http://dl.acm.org/citation.cfm?id=822080.822813 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Paul Barham, Rebecca Isaacs, and Dushyanth Narayanan. 2003. Magpie: online modelling and performance-aware systems 9th Workshop on Hot Topics in Operating Systems (HotOS-IX). USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Luis Andre Barroso, Jimmy Clidaras, and Urs Hoelzle. 2013. The Datacenter as a Computer:An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool. 154-- pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro Vol. 23, 2 (March. 2003), 22--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. PerfKit Benchmarker. 2017. PerfKit Benchmarker. (2017). http://googlecloudplatform.github.io/PerfKitBenchmarker/Google ScholarGoogle Scholar
  6. Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Comput. Netw. ISDN Syst. 30, 1-7 (April 1998), 107-117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mike Burrows. 2006. The Chubby Lock Service for Loosely-coupled Distributed Systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI '06). USENIX Association, Berkeley, CA, USA, 335-350. http://dl.acm.org/citation.cfm?id=1298455.1298487 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst. 26, 2, Article 4 (June 2008), 26 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Quan Chen, Hailong Yang, Minyi Guo, Ram Srivatsa Kannan, Jason Mars, and Lingjia Tang. 2017. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse- Scale Computers. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). ACM, New York, NY, USA, 17-32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. 2016. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 681-696. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Tianshi Chen, Qi Guo, Olivier Temam, Yue Wu, Yungang Bao, Zhiwei Xu, and Yunji Chen. 2015. Statistical Performance Comparisons of Computers. IEEE Trans. Comput. 64, 5 (May 2015), 1442-1455.Google ScholarGoogle ScholarCross RefCross Ref
  12. James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. 2012. Spanner: Google's Globally-distributed Database. In Proceedings of the 10th USENIX Conference on Operating Systems De- sign and Implementation (OSDI'12). USENIX Association, Berkeley, CA, USA, 251-264. http://dl.acm.org/citation.cfm?id=2387880.2387905 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Standard Performance Evaluation Corporation. 2017. SPEC. (2017). https://www.spec.orgGoogle ScholarGoogle Scholar
  14. Standard Performance Evaluation Corporation. 2017. SPEC virt_sc 2013. (2017). https://www.spec.org/virt_sc2013Google ScholarGoogle Scholar
  15. Charlie Curtsinger and Emery D. Berger. 2013. STABILIZER: Statistically Sound Performance Evaluation. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 219-228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jeffrey Dean and Sanjay Ghemawat. 2010. MapReduce: A Flexible Data Processing Tool. Commun. ACM 53, 1 (Jan. 2010), 72-77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christina Delimitrou and Christos Kozyrakis. 2013. iBench: Quantifying interference for datacenter applications. In 2013 IEEE International Symposium on Workload Characterization (IISWC). 23-33.Google ScholarGoogle ScholarCross RefCross Ref
  18. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 77-88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware Cluster Management. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). ACM, New York, NY, USA, 127-144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Christina Delimitrou and Christos Kozyrakis. 2016. HCloud: Resource-Efficient Provisioning in Shared Cloud Systems. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). ACM, New York, NY, USA, 473-488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christina Delimitrou, Daniel Sanchez, and Christos Kozyrakis. 2015. Tarcil: Reconciling Scheduling Speed and Quality in Large Shared Clusters. In Proceedings of the Sixth ACM Symposium on Cloud Computing (SoCC '15). ACM, New York, NY, USA, 97-110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Christina Delimitrou, Sriram Sankar, Kushagra Vaid, and Christos Kozyrakis. 2011. Decoupling datacenter studies from access to large-scale applications: A modeling approach for storage workloads. In 2011 IEEE International Symposium on Workload Characterization (IISWC). 51-60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. William Feller. 1968. An introduction to probability theory and its applications: volume I. Vol. 3. John Wiley&Sons New York.Google ScholarGoogle Scholar
  24. Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, USA, 37-48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica. 2007. X-trace: A Pervasive Network Tracing Framework. In Proceedings of the 4th USENIX Conference on Networked Systems Design&Implementation (NSDI'07). USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=1973430.1973450 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Andy Georges, Dries Buytaert, and Lieven Eeckhout. 2007. Statistically Rigorous Java Performance Evaluation. In Proceedings of the 22Nd Annual ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications (OOPSLA '07). ACM, New York, NY, USA, 57-76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, and Lingjia Tang. 2015. DjiNN and Tonic: DNN As a Service and Its Implications for Future Warehouse Scale Computers. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 27-40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. John L. Hennessy and David A. Patterson. 2011. Computer Architecture, Fifth Edition: A Quantitative Approach (5th ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Chang-Hong Hsu, Yunqi Zhang, Michael A. Laurenzano, David Meisner, Thomas F. Wenisch, Jason Mars, Lingjia Tang, and Ronald G. Dreslinski. 2015. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 271-282.Google ScholarGoogle Scholar
  30. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Pro- filing a Warehouse-scale Computer. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 158-169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Svilen Kanev, Kim Hazelwood, Gu-Yeon Wei, and David Brooks. 2014. Tradeoffs between power management and tail latency in warehouse-scale applications. In 2014 IEEE International Symposium on Workload Characterization (IISWC). 31-40.Google ScholarGoogle ScholarCross RefCross Ref
  32. Harshad Kasture and Daniel Sanchez. 2016. Tailbench: a benchmark suite and evaluation methodology for latency-critical applications. In 2016 IEEE International Symposium on Workload Characterization (IISWC). 1-10.Google ScholarGoogle ScholarCross RefCross Ref
  33. Christos Kozyrakis, Aman Kansal, Sriram Sankar, and Kushagra Vaid. 2010. Server Engineering Insights for Large-Scale Online Services. IEEE Micro 30, 4 (July 2010), 8-19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Bin Li, Shaoming Chen, and Lu Peng. 2015. Precise computer comparisons via statistical resampling methods. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 83-92.Google ScholarGoogle ScholarCross RefCross Ref
  35. David Xinliang Li, Raksit Ashok, and Robert Hundt. 2010. Lightweight Feedback-directed Cross-module Optimization. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '10). ACM, New York, NY, USA, 53-61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. David Lo, Liqun Cheng, Rama Govindaraju, Parthasarathy Ranganathan, and Christos Kozyrakis. 2015. Heracles: Improving Resource Efficiency at Scale. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA '15). ACM, New York, NY, USA, 450-462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jason Mars and Lingjia Tang. 2013. Whare-map: Heterogeneity in "Homogeneous" Warehouse-scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 619-630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing Utilization in Modern Warehouse Scale Computers via Sensible Co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44). ACM, New York, NY, USA, 248-259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. David Meisner, Brian T. Gold, and Thomas F. Wenisch. 2009. PowerNap: Eliminating Server Idle Power. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XIV). ACM, New York, NY, USA, 205-216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. David Meisner, Christopher M. Sadler, Luiz André Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. 2011. Power Management of Online Data-intensive Services. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). ACM, New York, NY, USA, 319-330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. David A. Patterson. 2008. Technical Perspective: The Data Center is the Computer. Commun. ACM 51, 1 (Jan. 2008), 105-105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Steven Pelley, David Meisner, Pooya Zandevakili, Thomas F. Wenisch, and Jack Underwood. 2010. Power Routing: Dynamic Power Provisioning in the Data Center. In Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). ACM, New York, NY, USA, 231-242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Transaction Processing performance Council. 2017. TPC-Homepage. (2017). http://www.tpc.orgGoogle ScholarGoogle Scholar
  44. Vinicius Petrucci, Michael A. Laurenzano, John Doherty, Yunqi Zhang, Daniel Mosse, Jason Mars, and Lingjia Tang. 2015. Octopus-Man: QoS-driven task management for heterogeneous multicores in warehouse-scale computers. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 246-258.Google ScholarGoogle ScholarCross RefCross Ref
  45. Google Cloud Platform. 2017. Customer Success. (2017). https://cloud.google.com/customersGoogle ScholarGoogle Scholar
  46. Google Cloud Platform. 2017. Google Cloud Computing, Hosting Services&APIs. (2017). https://cloud.google.comGoogle ScholarGoogle Scholar
  47. Gang Ren, Eric Tune, Tipp Moseley, Yixin Shi, Silvius Rus, and Robert Hundt. 2010. Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers. IEEE Micro 30, 4 (July 2010), 65-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Patrick Reynolds, Charles Killian, Janet L. Wiener, Jeffrey C. Mogul, Mehul A. Shah, and Amin Vahdat. 2006. Pip: Detecting the Unexpected in Distributed Systems. In Proceedings of the 3rd Conference on Networked Systems Design&Implementation - Volume 3 (NSDI'06). USENIX Association, Berkeley, CA, USA, 9-9. http://dl.acm.org/citation.cfm?id=1267680.1267689 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, and John Wilkes. 2013. Omega: Flexible, Scalable Schedulers for Large Compute Clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 351-364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Benjamin H Sigelman, Luiz Andre Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag. 2010. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report. Technical report, Google.Google ScholarGoogle Scholar
  51. Gábor J Székely, Maria L Rizzo, Nail K Bakirov, et al. 2007. Measuring and testing dependence by correlation of distances. The annals of statistics 35, 6 (2007), 2769-2794.Google ScholarGoogle Scholar
  52. Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, and Mary Lou Soffa. 2011. The Impact of Memory Subsystem Resource Sharing on Datacenter Applications. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA '11). ACM, New York, NY, USA, 283-294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Lingjia Tang, Jason Mars, Xiao Zhang, Robert Hagmann, Robert Hundt, and Eric Tune. 2013. Optimizing Google's warehouse scale computers: The NUMA experience. In 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA). 188-197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale Cluster Management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15). ACM, New York, NY, USA, Article 18, 17 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Lei Wang, Jianfeng Zhan, Chunjie Luo, Yuqing Zhu, Qiang Yang, Yongqiang He, Wanling Gao, Zhen Jia, Yingjie Shi, Shujie Zhang, Chen Zheng, Gang Lu, Kent Zhan, Xiaona Li, and Bizhu Qiu. 2014. Big-DataBench: A big data benchmark suite from internet services. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 488-499.Google ScholarGoogle ScholarCross RefCross Ref
  56. Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise Online QoS Management for Increased Utilization in Warehouse Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 607-618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Xiao Zhang, Eric Tune, Robert Hagmann, Rohit Jnagal, Vrigo Gokhale, and John Wilkes. 2013. CPI2: CPU Performance Isolation for Shared Compute Clusters. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys '13). ACM, New York, NY, USA, 379-391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. 2016. Treadmill: Attributing the Source of Tail Latency Through Precise Load Testing and Statistical Inference. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, Piscataway, NJ, USA, 456-468. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. WSMeter: A Performance Evaluation Methodology for Google's Production Warehouse-Scale Computers

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 53, Issue 2
              ASPLOS '18
              February 2018
              809 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/3296957
              Issue’s Table of Contents
              • cover image ACM Conferences
                ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
                March 2018
                827 pages
                ISBN:9781450349116
                DOI:10.1145/3173162

              Copyright © 2018 Owner/Author

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 19 March 2018

              Check for updates

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!