skip to main content
extended-abstract

GPGPU Power Estimation with Core and Memory Frequency Scaling

Published:11 October 2017Publication History
Skip Abstract Section

Abstract

With the increasing installation of Graphics Processing Units (GPUs) in supercomputers and data centers, their huge electricity cost brings new environmental and economic concerns. Although Dynamic Voltage and Frequency Scaling (DVFS) techniques have been successfully applied on traditional CPUs to reserve energy, the impact of GPU DVFS on application performance and power consumption is not yet fully understood, mainly due to the complicated GPU memory system. This paper proposes a fast prediction model based on Support Vector Regression (SVR), which can estimate the average runtime power of a given GPU kernel using a set of profiling parameters under different GPU core and memory frequencies. Our experimental data set includes 931 samples obtained from 19 GPU kernels running on a real GPU platform with the core and memory frequencies ranging between 400MHz and 1000MHz. We evaluate the accuracy of the SVR-based prediction model by ten-fold cross validation. We achieve greater accuracy than prior models, being Mean Square Error (MSE) of 0.797 Watt and Mean Absolute Percentage Error (MAPE) of 3.08% on average. Combined with an existing performance prediction model, we can find the optimal GPU frequency settings that can save an average of 13.2% energy across those GPU kernels with no more than 10% performance penalty compared to applying the default setting.

References

  1. Y. Abe, H. Sasaki, S. Kato, K. Inoue, M. Edahiro, and M. Peres. 2014. Power and Performance Characterization and Modeling of GPU-Accelerated Systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 113--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. 2007. Support vector regression. Neural Information Processing-Letters and Reviews 11, 10 (2007), 203--224.Google ScholarGoogle Scholar
  3. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (May 2011), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vincent Chau, Xiaowen Chu, Hai Liu, and Yiu-Wing Leung. 2017. Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems. In Proceedings of e-Energy '17. Shatin, Hong Kong, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. X. Chen, Y. Wang, Y. Liang, Y. Xie, and H. Yang. 2014. Run-time technique for simultaneous aging and power optimization in GPGPUs. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Chu, C. Liu, K. Ouyang, L. S. Yung, H. Liu, and Y.W. Leung. 2015. PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs. In 2015 IEEE International Conference on Communications (ICC). 436--441.Google ScholarGoogle Scholar
  8. Wu chun Feng and Tom Scoglands. 2016. GREEN500. {Online} https://www.top500.org/green500/lists/2016/11/. (2016).Google ScholarGoogle Scholar
  9. J. Coplin and M. Burtscher. 2016. Energy, Power, and Performance Characterization of GPGPU Benchmark Programs. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1190--1199.Google ScholarGoogle Scholar
  10. Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). 1223--1231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sunpyo Hong and Hyesoon Kim. 2010. An Integrated GPU Power and Performance Model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 280--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David HK Kim, Connor Imes, and Henry Hoffmann. 2015. Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics. In Cyber-Physical Systems, Networks, and Applications (CPSNA), 2015 IEEE 3rd International Conference on. IEEE, 78--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Kursun and E. G. Friedman. 2006. Supply and Threshold Voltage Scaling Techniques. Multi-Voltage CMOS Circuit Design (2006), 45--84.Google ScholarGoogle Scholar
  14. Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 487--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xiaohan Ma, Mian Dong, Lin Zhong, and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceeding of ACM SOSP Workshop on Power Aware Computing and Systems (HotPower).Google ScholarGoogle Scholar
  16. Xinxin Mei, Xiaowen Chu, Yiu-Wing Leung, Hai Liu, and Zongpeng Li. 2017. Energy Efficient Real-time Task Scheduling on CPU-GPU Hybrid Clusters. In Proceedings of IEEE INFOCOM 2017. Atlanta, GA, USA.Google ScholarGoogle ScholarCross RefCross Ref
  17. Xinxin Mei, Qiang Wang, and Xiaowen Chu. 2017. A survey and measurement study of GPU DVFS on energy conservation. Digital Communications and Networks 3, 2 (May 2017), 89--100.Google ScholarGoogle ScholarCross RefCross Ref
  18. H. Nagasaka, N. Maruyama, A. Nukada, T. Endo, and S. Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In International Conference on Green Computing. 115--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. NVIDIA. 2014. GeForce GTX 980 Whitepaper. {Online} http://www.geforce.com/hardware/notebook-gpus/geforce-gtx-980/specifications. (2014).Google ScholarGoogle Scholar
  20. NVIDIA. 2016. GPU Computing SDK. {Online} https://developer.nvidia.com/gpucomputing-sdk. (2016).Google ScholarGoogle Scholar
  21. NVIDIA. 2016. NVIDIA Profiler. {Online} http://docs.nvidia.com/cuda/profilerusers-guide. (2016).Google ScholarGoogle Scholar
  22. NVIDIA. 2016. NVIDIA System Management Interface (nvidia-smi). {Online} https://developer.nvidia.com/nvidia-system-management-interface. (2016).Google ScholarGoogle Scholar
  23. S. Song, C. Su, B. Rountree, and K. W. Cameron. 2013. A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 673--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Qiang Wang and Xiaowen Chu. 2017. GPGPU Performance Estimation with Core and Memory Frequency Scaling. arXiv preprint arXiv:1701.05308 (2017).Google ScholarGoogle Scholar
  25. Qiang Wang, Pengfei Xu, Yatao Zhang, and Xiaoowen Chu. 2017. EPPMiner: An Extended Benchmark Suite for Energy, Power and Performance Characterization of Heterogeneous Architecture. In Proceedings of e-Energy '17. Shatin, Hong Kong, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G.Wu, J. L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou. 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 564--576.Google ScholarGoogle Scholar
  27. Kaiyong Zhao and Xiaowen Chu. 2014. G-BLASTN: accelerating nucleotide alignment by graphics processors. Bioinformatics 30, 10 (2014), 1384.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. GPGPU Power Estimation with Core and Memory Frequency Scaling
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGMETRICS Performance Evaluation Review
            ACM SIGMETRICS Performance Evaluation Review  Volume 45, Issue 2
            Setember 2017
            131 pages
            ISSN:0163-5999
            DOI:10.1145/3152042
            Issue’s Table of Contents

            Copyright © 2017 Authors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 11 October 2017

            Check for updates

            Qualifiers

            • extended-abstract

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader