Abstract
Reducing the long tail of the query latency distribution in modern warehouse scale computers is critical for improving performance and quality of service (QoS) of workloads such as Web Search and Memcached. Traditional turbo boost increases a processor’s voltage and frequency during a coarse-grained sliding window, boosting all queries that are processed during that window. However, the inability of such a technique to pinpoint tail queries for boosting limits its tail reduction benefit. In this work, we propose Adrenaline, an approach to leverage finer-granularity (tens of nanoseconds) voltage boosting to effectively rein in the tail latency with query-level precision. Two key insights underlie this work. First, emerging finer granularity voltage/frequency boosting is an enabling mechanism for intelligent allocation of the power budget to precisely boost only the queries that contribute to the tail latency; second, per-query characteristics can be used to design indicators for proactively pinpointing these queries, triggering boosting accordingly. Based on these insights, Adrenaline effectively pinpoints and boosts queries that are likely to increase the tail distribution and can reap more benefit from the voltage/frequency boost. By evaluating under various workload configurations, we demonstrate the effectiveness of our methodology. We achieve up to a 2.50 × tail latency improvement for Memcached and up to a 3.03 × for Web Search over coarse-grained dynamic voltage and frequency scaling (DVFS) given a fixed boosting power budget. When optimizing for energy reduction, Adrenaline achieves up to a 1.81 × improvement for Memcached and up to a 1.99 × for Web Search over coarse-grained DVFS. By using the carefully chosen boost thresholds, Adrenaline further improves the tail latency reduction to 4.82 × over coarse-grained DVFS.
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’12). ACM, New York, NY, 53--64. Google Scholar
Digital Library
- Luiz André Barroso, Jeffrey Dean, and Urs Hölzle. 2003. Web search for a planet: The google cluster architecture. IEEE Micro 23, 2 (Mar. 2003), 22--28. Google Scholar
Digital Library
- Adam Belay, George Prekas, Ana Klimovic, Samuel Grossman, Christos Kozyrakis, and Edouard Bugnion. 2014. IX: A protected dataplane operating system for high throughput and low latency. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 49--65. Google Scholar
Digital Library
- Kihwan Choi, Ramakrishna Soma, and Massoud Pedram. 2005. Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 24, 1 (2005), 18--28. Google Scholar
Digital Library
- Intel Corporation. 2008. Intel Turbo Boost Technology in Intel Core Microarchitecture (Nehalem) Based Processors. White paper, Intel Corporation. (November 2008).Google Scholar
- Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th ACM International Conference on Autonomic Computing (ICAC’11). ACM, New York, NY, 31--40. Google Scholar
Digital Library
- Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (Feb. 2013), 74--80. Google Scholar
Digital Library
- Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F. Wenisch, and Ricardo Bianchini. 2012a. CoScale: Coordinating CPU and memory system DVFS in server systems. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45). IEEE Computer Society, Washington, DC, 143--154. Google Scholar
Digital Library
- Qingyuan Deng, David Meisner, Abhishek Bhattacharjee, Thomas F. Wenisch, and Ricardo Bianchini. 2012b. MultiScale: Memory system DVFS with multiple memory controllers. In Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED’12). ACM, New York, NY, 297--302. Google Scholar
Digital Library
- Qingyuan Deng, David Meisner, Luiz Ramos, Thomas F. Wenisch, and Ricardo Bianchini. 2011. MemScale: Active low-power modes for main memory. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVI). ACM, New York, NY, 225--238. Google Scholar
Digital Library
- Laurel Emurian, Arun Raghavan, Lei Shao, Jeffrey M. Rosen, Marios Papaefthymiou, Kevin Pipe, Thomas F. Wenisch, and Milo Martin. 2014. Pitfalls of accurately benchmarking thermally adaptive chips. Power (W) 5 (2014), 10.Google Scholar
- Stijn Eyerman and Lieven Eeckhout. 2011. Fine-grained DVFS using on-chip regulators. ACM Trans. Arch. Code Opt. 8, 1 (2011), 1. Google Scholar
Digital Library
- Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XVII). ACM, New York, NY, 37--48. Google Scholar
Digital Library
- Waclaw Godycki, Christopher Torng, Ivan Bukreyev, Alyssa Apsel, and Christopher Batten. 2014. Enabling realistic fine-grain voltage scaling with reconfigurable power distribution networks. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (MICRO-47). ACM, New York, NY. Google Scholar
Digital Library
- A. Gordon, N. Amit, N. Har’El, M. Ben-Yehuda, A. Landau, A. Schuster, and D. Tsafrir. 2012. It’s time for low latency. In ACM SIGARCH Comput. Arch. News, Vol. 40. 411--422.Google Scholar
Digital Library
- Chang-Hong Hsu, Yunqi Zhang, Michael A. Laurenzano, David Meisner, Thomas Wenisch, Jason Mars, Lingjia Tang, and Ronald G. Dreslinski. 2015. Adrenaline: Pinpointing and reining in tail queries with quick voltage boosting. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). IEEE, 271--282. Google Scholar
Cross Ref
- Canturk Isci, Alper Buyuktosunoglu, Chen-Yong Cher, Pradip Bose, and Margaret Martonosi. 2006. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 347--358. Google Scholar
Digital Library
- Stefanos Kaxiras and Margaret Martonosi. 2008. Computer architecture techniques for power-efficiency. Synth. Lect. Comput. Arch. 3, 1 (2008), 1--207. Google Scholar
Digital Library
- Wonyoung Kim, D. M. Brooks, and others. 2011. A fully-integrated 3-level DC/DC converter for nanosecond-scale DVS with fast shunt regulation. In Proceedings of the 2011 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 268--270.Google Scholar
Cross Ref
- Wonyoung Kim, M. S. Gupta, et al. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the IEEE 14th International Symposium on High Performance Computer Architecture, 2008 (HPCA’08). 123--134.Google Scholar
- Tejaswini Kolpe, Antonia Zhai, and Sachin S. Sapatnekar. 2011. Enabling improved power management in multicore processors through clustered DVFS. In Proceedings of the Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE), 2011. IEEE, 1--6. Google Scholar
Cross Ref
- Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, and Jason Mars. 2014. Protean code: Achieving near-free online code transformations for warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (MICRO-47). ACM, New York, NY. Google Scholar
Digital Library
- Jungseob Lee and Nam Sung Kim. 2009. Optimizing throughput of power-and thermal-constrained multicore processors using DVFS and per-core power-gating. In Design Automation Conference, 2009. DAC’09. 46th ACM/IEEE. IEEE, 47--50. Google Scholar
Digital Library
- Jacob Leverich, Matteo Monchiero, Vanish Talwar, Parthasarathy Ranganathan, and Christos Kozyrakis. 2009. Power management of datacenter workloads using per-core power gating. Comput. Arch. Lett. 8, 2 (2009), 48--51. Google Scholar
Digital Library
- Kevin Lim, David Meisner, Ali G. Saidi, Parthasarathy Ranganathan, and Thomas F. Wenisch. 2013. Thin servers with smart pipes: Designing SoC accelerators for memcached. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA’13). ACM, New York, NY, 36--47. Google Scholar
Digital Library
- David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. 2014. Towards energy proportionality for large-scale latency-critical workloads. In Proceeding of the 41st Annual International Symposium on Computer Architecuture. IEEE Press, 301--312. Google Scholar
Digital Library
- David Lo and Christos Kozyrakis. 2014. Dynamic management of TurboMode in modern multi-core chips. In Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture (HPCA 2014). 2014. 603--613. Google Scholar
Cross Ref
- Jason Mars and Lingjia Tang. 2013. Whare-map: Heterogeneity in “homogeneous” warehouse-scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA) (ISCA’13). ACM, New York, NY, 619--630. Google Scholar
Digital Library
- Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. 2011. Bubble-Up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (MICRO-44). ACM, New York, NY, 248--259. Acceptance Rate: 21% - Selected for IEEE MICRO TOP PICKS Google Scholar
Digital Library
- David Meisner, Brian T. Gold, and Thomas F. Wenisch. 2009. PowerNap: Eliminating server idle power. ACM SIGARCH Comput. Arch. News 37, 1 (2009), 205--216. Google Scholar
Digital Library
- David Meisner, Christopher M. Sadler, Luiz André Barroso, Wolf-Dietrich Weber, and Thomas F. Wenisch. 2011. Power management of online data-intensive services. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, 319--330. Google Scholar
Digital Library
- David Meisner, Junjie Wu, and Thomas F. Wenisch. 2012. BigHouse: A simulation infrastructure for data center systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’12). IEEE Computer Society, Washington, DC, 35--45. Google Scholar
Digital Library
- Timothy N. Miller, Xiang Pan, Renji Thomas, Naser Sedaghati, and Radu Teodorescu. 2012. Booster: Reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips. In Proceedings of the 2012 IEEE 18th International Symposium on High Performance Computer Architecture (HPCA). IEEE, 1--12. Google Scholar
Digital Library
- Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling memcache at facebook. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). USENIX Association, Berkeley, CA, 385--398. Google Scholar
Digital Library
- Nathaniel Pinckney, Matthew Fojtik, Bharan Giridhar, Dennis Sylvester, and David Blaauw. 2013. Shortstop: An on-chip fast supply boosting technique. In Proceedings of the 2013 Symposium on VLSI Circuits (VLSIC). IEEE, C290--C291.Google Scholar
- Ramya Raghavendra, Parthasarathy Ranganathan, Vanish Talwar, Zhikui Wang, and Xiaoyun Zhu. 2008. No “power” struggles: Coordinated multi-level power management for the data center. SIGARCH Comput. Arch. News 36, 1 (March 2008), 48--59. Google Scholar
Digital Library
- Lingjia Tang, Jason Mars, and Mary Lou Soffa. 2012. Compiling for niceness: Mitigating contention for qos in warehouse scale computers. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO) (CGO’12). ACM, New York, NY, 1--12. Acceptance Rate: 28% - Best Paper Award! Google Scholar
Digital Library
- Lingjia Tang, Jason Mars, Wei Wang, Tanima Dey, and Mary Lou Soffa. 2013. ReQoS: Reactive static/dynamic compilation for QoS in warehouse scale computers. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (ASPLOS’13). ACM, New York, NY, 89--100. Acceptance Rate: 23% Google Scholar
Digital Library
- G. Wang, D. Anand, and others. 2009. Scaling deep trench based eDRAM on SOI to 32nm and Beyond. In Proceedings of the 2009 IEEE International Electron Devices Meeting (IEDM). 1--4. Google Scholar
Cross Ref
- Qiang Wu, Margaret Martonosi, Douglas W. Clark, Vijay Janapa Reddi, Dan Connors, Youfeng Wu, Jin Lee, and David Brooks. 2005. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 271--282. Google Scholar
Digital Library
- Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. 2013. Bubble-flux: Precise online QoS management for increased utilization in warehouse scale computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA) (ISCA’13). ACM, New York, NY, 607--618. Acceptance Rate: 19% Google Scholar
Digital Library
- David Zats, Tathagata Das, Prashanth Mohan, Dhruba Borthakur, and Randy Katz. 2012. DeTail: Reducing the flow completion time tail in datacenter networks. ACM SIGCOMM Comput. Commun. Rev. 42, 4 (2012), 139--150. Google Scholar
Digital Library
- Gerd Zellweger, Simon Gerber, Kornilios Kourtis, and Timothy Roscoe. 2014. Decoupling Cores, Kernels, and Operating Systems. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 17--31. Google Scholar
Digital Library
- Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. 2014. SMiTe: Precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (MICRO-47). ACM, New York, NY. Google Scholar
Digital Library
Index Terms
Reining in Long Tails in Warehouse-Scale Computers with Quick Voltage Boosting Using Adrenaline
Recommendations
The limit of dynamic voltage scaling and insomniac dynamic voltage scaling
Dynamic voltage scaling (DVS) is a popular approach for energy reduction of integrated circuits. Current processors that use DVS typically have an operating voltage range from full to half of the maximum Vdd. However, there is no fundamental reason why ...
Heterogeneous energy-efficient cache design in warehouse scale computers
CF '15: Proceedings of the 12th ACM International Conference on Computing FrontiersEnergy efficiency is becoming the key design concern for modern warehouse-scale computer (WSC) systems, where tens of thousands of server processors consume a significant portion of the total power. Voltage scaling is one of the most effective ...
Power-aware resource allocation in computer clusters using dynamic threshold voltage scaling and dynamic voltage scaling: comparison and analysis
One of the major challenges in the high performance computing (HPC) clusters is intelligent power management to improve energy efficiency. The key contribution of the presented work is the modeling of a Power Aware Job Scheduler (PAJS) for HPC clusters, ...






Comments