Abstract
With the increasing installation of Graphics Processing Units (GPUs) in supercomputers and data centers, their huge electricity cost brings new environmental and economic concerns. Although Dynamic Voltage and Frequency Scaling (DVFS) techniques have been successfully applied on traditional CPUs to reserve energy, the impact of GPU DVFS on application performance and power consumption is not yet fully understood, mainly due to the complicated GPU memory system. This paper proposes a fast prediction model based on Support Vector Regression (SVR), which can estimate the average runtime power of a given GPU kernel using a set of profiling parameters under different GPU core and memory frequencies. Our experimental data set includes 931 samples obtained from 19 GPU kernels running on a real GPU platform with the core and memory frequencies ranging between 400MHz and 1000MHz. We evaluate the accuracy of the SVR-based prediction model by ten-fold cross validation. We achieve greater accuracy than prior models, being Mean Square Error (MSE) of 0.797 Watt and Mean Absolute Percentage Error (MAPE) of 3.08% on average. Combined with an existing performance prediction model, we can find the optimal GPU frequency settings that can save an average of 13.2% energy across those GPU kernels with no more than 10% performance penalty compared to applying the default setting.
- Y. Abe, H. Sasaki, S. Kato, K. Inoue, M. Edahiro, and M. Peres. 2014. Power and Performance Characterization and Modeling of GPU-Accelerated Systems. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 113--122. Google Scholar
Digital Library
- Debasish Basak, Srimanta Pal, and Dipak Chandra Patranabis. 2007. Support vector regression. Neural Information Processing-Letters and Reviews 11, 10 (2007), 203--224.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2, 3, Article 27 (May 2011), 27 pages. Google Scholar
Digital Library
- Vincent Chau, Xiaowen Chu, Hai Liu, and Yiu-Wing Leung. 2017. Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems. In Proceedings of e-Energy '17. Shatin, Hong Kong, 11 pages. Google Scholar
Digital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S. H. Lee, and K. Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44--54. Google Scholar
Digital Library
- X. Chen, Y. Wang, Y. Liang, Y. Xie, and H. Yang. 2014. Run-time technique for simultaneous aging and power optimization in GPGPUs. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6. Google Scholar
Digital Library
- X. Chu, C. Liu, K. Ouyang, L. S. Yung, H. Liu, and Y.W. Leung. 2015. PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs. In 2015 IEEE International Conference on Communications (ICC). 436--441.Google Scholar
- Wu chun Feng and Tom Scoglands. 2016. GREEN500. {Online} https://www.top500.org/green500/lists/2016/11/. (2016).Google Scholar
- J. Coplin and M. Burtscher. 2016. Energy, Power, and Performance Characterization of GPGPU Benchmark Programs. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1190--1199.Google Scholar
- Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng. 2012. Large Scale Distributed Deep Networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS'12). 1223--1231. Google Scholar
Digital Library
- Sunpyo Hong and Hyesoon Kim. 2010. An Integrated GPU Power and Performance Model. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA '10). ACM, New York, NY, USA, 280--289. Google Scholar
Digital Library
- David HK Kim, Connor Imes, and Henry Hoffmann. 2015. Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics. In Cyber-Physical Systems, Networks, and Applications (CPSNA), 2015 IEEE 3rd International Conference on. IEEE, 78--85. Google Scholar
Digital Library
- V. Kursun and E. G. Friedman. 2006. Supply and Threshold Voltage Scaling Techniques. Multi-Voltage CMOS Circuit Design (2006), 45--84.Google Scholar
- Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling Energy Optimizations in GPGPUs. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). ACM, New York, NY, USA, 487--498. Google Scholar
Digital Library
- Xiaohan Ma, Mian Dong, Lin Zhong, and Zhigang Deng. 2009. Statistical power consumption analysis and modeling for GPU-based computing. In Proceeding of ACM SOSP Workshop on Power Aware Computing and Systems (HotPower).Google Scholar
- Xinxin Mei, Xiaowen Chu, Yiu-Wing Leung, Hai Liu, and Zongpeng Li. 2017. Energy Efficient Real-time Task Scheduling on CPU-GPU Hybrid Clusters. In Proceedings of IEEE INFOCOM 2017. Atlanta, GA, USA.Google Scholar
Cross Ref
- Xinxin Mei, Qiang Wang, and Xiaowen Chu. 2017. A survey and measurement study of GPU DVFS on energy conservation. Digital Communications and Networks 3, 2 (May 2017), 89--100.Google Scholar
Cross Ref
- H. Nagasaka, N. Maruyama, A. Nukada, T. Endo, and S. Matsuoka. 2010. Statistical power modeling of GPU kernels using performance counters. In International Conference on Green Computing. 115--122. Google Scholar
Digital Library
- NVIDIA. 2014. GeForce GTX 980 Whitepaper. {Online} http://www.geforce.com/hardware/notebook-gpus/geforce-gtx-980/specifications. (2014).Google Scholar
- NVIDIA. 2016. GPU Computing SDK. {Online} https://developer.nvidia.com/gpucomputing-sdk. (2016).Google Scholar
- NVIDIA. 2016. NVIDIA Profiler. {Online} http://docs.nvidia.com/cuda/profilerusers-guide. (2016).Google Scholar
- NVIDIA. 2016. NVIDIA System Management Interface (nvidia-smi). {Online} https://developer.nvidia.com/nvidia-system-management-interface. (2016).Google Scholar
- S. Song, C. Su, B. Rountree, and K. W. Cameron. 2013. A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures. In 2013 IEEE 27th International Symposium on Parallel and Distributed Processing. 673--686. Google Scholar
Digital Library
- Qiang Wang and Xiaowen Chu. 2017. GPGPU Performance Estimation with Core and Memory Frequency Scaling. arXiv preprint arXiv:1701.05308 (2017).Google Scholar
- Qiang Wang, Pengfei Xu, Yatao Zhang, and Xiaoowen Chu. 2017. EPPMiner: An Extended Benchmark Suite for Energy, Power and Performance Characterization of Heterogeneous Architecture. In Proceedings of e-Energy '17. Shatin, Hong Kong, 11 pages. Google Scholar
Digital Library
- G.Wu, J. L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou. 2015. GPGPU performance and power estimation using machine learning. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 564--576.Google Scholar
- Kaiyong Zhao and Xiaowen Chu. 2014. G-BLASTN: accelerating nucleotide alignment by graphics processors. Bioinformatics 30, 10 (2014), 1384.Google Scholar
Cross Ref
Index Terms
- GPGPU Power Estimation with Core and Memory Frequency Scaling
Recommendations
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture
Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
GPGPU Footprint Models to Estimate per-Core Power
We explore the problem of how to easily estimate the per-core power distribution of GPGPUs from the total power of all cores. We show that the dynamic energy consumption of a core for a given kernel, represented by its work footprint, is approximately ...
A Taxonomy of GPGPU Performance Scaling
IISWC '15: Proceedings of the 2015 IEEE International Symposium on Workload CharacterizationGraphics processing units (GPUs) range from small, embedded designs to large, high-powered discrete cards. While the performance of graphics workloads is generally understood, there has been little study of the performance of GPGPU applications across a ...





Comments