Abstract
Modern multiprocessor systems-on-chip (MpSoCs) offer tremendous power and performance optimization opportunities by tuning thousands of potential voltage, frequency and core configurations. As the workload phases change at runtime, different configurations may become optimal with respect to power, performance or other metrics. Identifying the optimal configuration at runtime is infeasible due to the large number of workloads and configurations. This paper proposes a novel methodology that can find the Pareto-optimal configurations at runtime as a function of the workload. To achieve this, we perform an extensive offline characterization to find classifiers that map performance counters to optimal configurations. Then, we use these classifiers and performance counters at runtime to choose Pareto-optimal configurations. We evaluate the proposed methodology by maximizing the performance per watt for 18 single- and multi-threaded applications. Our experiments demonstrate an average increase of 93%, 81% and 6% in performance per watt compared to the interactive, ondemand and powersave governors, respectively.
- A. Aalsaud et al. 2016. Power--Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems. In Proc. of the Intl. Symp. on Low Power Elec. and Design. 368--373. Google Scholar
Digital Library
- L. Benini, A. Bogliolo, and G. De Micheli. 2000. A Survey of Design Techniques For System-Level Dynamic Power Management. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 8, 3 (2000), 299--316. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proc. of the Intl. Conf. on Parallel Arch. and Compilation Tech. 72--81. Google Scholar
Digital Library
- P. Bogdan, R. Marculescu, S. Jain, and R. T. Gavila. 2012. An Optimal Control Approach to Power Management for Multi-Voltage and Frequency Islands Multiprocessor Platforms under Highly Variable Workloads. In Proc. of the Intl. Symp. on Networks on Chip. 35--42. Google Scholar
Digital Library
- X. Chen et al. 2013. Dynamic Voltage and Frequency Scaling for Shared Resources in Multicore Processor Designs. In Proc. of the Design Autom. Conf. 114. Google Scholar
Digital Library
- R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. 2011. Pack 8 Cap: Adaptive DVFS and Thread Packing Under Power Caps. In Proc. of the Intl. Symp. on Microarch. 175--185. Google Scholar
Digital Library
- A. Cortex. 2013. A15 MPCore Processor Technical Reference Manual. ARM Holdings PLC 24 (2013).Google Scholar
- A. K. Coskun, T. S. Rosing, and K. Whisnant. 2007. Temperature Aware Task Scheduling in MPSoCs. In Proc. of the Conf. on Design, Autom. and Test in Europe. 1659--1664. Google Scholar
Digital Library
- A. C. de Melo. 2010. The New Linux Perf Tools. In Linux Kongress, Vol. 18.Google Scholar
- E. Del Sozzo et al. 2016. Workload-aware Power Optimization Strategy for Asymmetric Multiprocessors. In Proc. of the Design, Auto. 8 Test in Europe Conf. 8 Exhib. 531--534. Google Scholar
Digital Library
- G. Dhiman and T. S. Rosing. 2009. System-Level Power Management Using Online Learning. IEEE Trans. Comput.-Aided Design Integr. Circuits and Syst. 28, 5 (2009), 676--689. Google Scholar
Digital Library
- B. Donyanavard, T. Mück, S. Sarma, and N. Dutt. 2016. SPARTA: Runtime Task Allocation for Energy Efficient Heterogeneous Many-cores. In Proc. of the Intl. Conf. on Hardware/Software Codesign and Sys. Syn. 27. Google Scholar
Digital Library
- J. Friedman, T. Hastie, and R. Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer Series in Statistics, Berlin.Google Scholar
- U. Gupta et al. 2017. Dynamic Power Budgeting for Mobile Systems Running Graphics Workloads. IEEE Trans. on Multi-Scale Comp. Sys.Google Scholar
- M. R. Guthaus et al. 2001. Mibench: A Free, Commercially Representative Embedded Benchmark Suite. In Proc. of the Intl. Work. on Workload Char. 3--14. Google Scholar
Digital Library
- J. Henkel et al. 2015. Dark Silicon: From Computation to Communication. In Proc. of the Intl. Symp. on Networks-on-Chip. 23. Google Scholar
Digital Library
- S. Herbert and D. Marculescu. 2007. Analysis of Dynamic Voltage/Frequency Scaling in Chip-Multiprocessors. In Proc. of the Intl. Symp. on Low Power Elec. and Design. 38--43. Google Scholar
Digital Library
- C. Isci, G. Contreras, and M. Martonosi. 2006. Live, Runtime Phase Monitoring and Prediction on Real Systems With Application to Dynamic Power Management. In Proc. of the Intl. Symp. on Microarch. 359--370. Google Scholar
Digital Library
- G. James, D. Witten, T. Hastie, and R. Tibshirani. 2013. An Introduction to Statistical Learning. Vol. 6. Springer. Google Scholar
Digital Library
- R. G. Kim et al. 2016. Wireless NoC and Dynamic VFI Codesign: Energy Efficiency Without Performance Penalty. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 24, 7 (2016), 2488--2501.Google Scholar
Cross Ref
- C. Lattner. 2008. LLVM and Clang: Next Generation Compiler Technology. In Proc. of the BSD. 1--2.Google Scholar
- C. Lattner and V. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis 8 Transformation. In Proc. of the Intl. Symp. on Code Gen. and Opt.: Feedback-directed and Runtime Opt. 75. Google Scholar
Digital Library
- J. Li and J. F. Martinez. 2006. Dynamic Power-Performance Adaptation of Parallel Computation on Chip Multiprocessors. In Proc. of the Intl. Symp. on High-Perf. Comp. Arch. 77--87.Google Scholar
- P. Mochel. 2005. The Sysfs Filesystem. In Proc. of the Linux Symp.Google Scholar
- P. J. Mucci, S. Browne, C. Deane, and G. Ho. 1999. PAPI: A Portable Interface to Hardware Performance Counters. In Proc. of the Department of Defense HPCMP Users Group Conf.Google Scholar
- T. S. Muthukaruppan, M. Pricopi, V. Venkataramani, T. Mitra, and S. Vishin. 2013. Hierarchical Power Management for Asymmetric Multi-Core in Dark Silicon Era. In Proc. of the Design Autom. Conf. 1--9. Google Scholar
Digital Library
- ODROID. Platforms, ODROID - XU3. http://www.hardkernel.com/main/products/prdt_info.php?g_code=G143452239825, accessed 6 April 2017.Google Scholar
- U. Y. Ogras and R. Marculescu. 2013. Modeling, Analysis and Optimization of Network-on-Chip Communication Architectures. Vol. 184. Springer Science 8 Business Media. Google Scholar
Digital Library
- G. Palermo, C. Silvano, and V. Zaccaria. 2005. Multi-objective Design Space Exploration of Embedded Systems. Jrnl of Embd. Comp. 1.3 (2005), 305--316. Google Scholar
Digital Library
- M. Palesi and T. Givargis. 2002. Multi-objective Design Space Exploration Using Genetic Algorithms. In Proc. of the Intl. Symp. on Hardware/Software Codesign. 67--72. Google Scholar
Digital Library
- V. Pallipadi, S. Li, and A. Belay. 2007. Cpuidle: Do Nothing, Efficiently. In Proc. of the Linux Symp., Vol. 2. 119--125.Google Scholar
- V. Pallipadi and A. Starikovskiy. 2006. The Ondemand Governor. In Proc. of the Linux Symp., Vol. 2.Google Scholar
- T. Sherwood, E. Perelman, G. Hamerly, S. Sair, and B. Calder. 2003. Discovering and Exploiting Program Phases. IEEE micro 23, 6 (2003), 84--93. Google Scholar
Digital Library
- G. Singla, G. Kaur, A. K. Unver, and U. Y. Ogras. 2015. Predictive Dynamic Thermal and Power Management for Heterogeneous Mobile Platforms. In Proc. of the Conf. on Design, Automation 8 Test in Europe. 960--965. Google Scholar
Digital Library
- S. Thomas et al. 2014. CortexSuite: A Synthetic Brain Benchmark Suite. In Proc. of the Intl. Symp. on Workload Char. 76--79.Google Scholar
Cross Ref
- TI-INA231. http://www.ti.com/lit/ds/symlink/ina231.pdf, accessed April 06, 2017.Google Scholar
- N. Vallina-Rodriguez and J. Crowcroft. 2012. Energy Management Techniques in Modern Mobile Handsets. IEEE Comm. Surveys 8 Tutorials 99 (2012), 1--20.Google Scholar
- W. Wang, P. Mishra, and S. Ranka. 2012. Dynamic Reconfiguration in Real-Time Systems. Springer. Google Scholar
Digital Library
- X. Wang et al. 2016. A Pareto-Optimal Runtime Power Budgeting Scheme for Many-Core Systems. Microprocessors and Microsystems 46 (2016), 136--148.Google Scholar
Cross Ref
- XDA-Developers Forums. https://forum.xda-developers.com/general/general/ref-to-date-guide-cpu-governors-o-t3 048957, accessed April 06, 2017.Google Scholar
- X. Zheng, L. K. John, and A. Gerstlauer. 2016. Accurate Phase-level Cross-platform Power and Performance Estimation. In Proc. of Design Autom. Conf. 4. Google Scholar
Digital Library
- Y. Zhu and V. J. Reddi. 2013. High-Performance and Energy-Efficient Mobile Web Browsing on Big/Little Systems. In Intl. Symp. on High Perf. Comput. Arch. Google Scholar
Digital Library
Index Terms
DyPO: Dynamic Pareto-Optimal Configuration Selection for Heterogeneous MpSoCs
Recommendations
Optimizing performance-per-watt on GPUs in high performance computing
The magnitude of the real-time digital signal processing challenge attached to large radio astronomical antenna arrays motivates use of high performance computing (HPC) systems. The need for high power efficiency at remote observatory sites parallels ...
Energy and Power Characterization of Parallel Programs Running on Intel Xeon Phi
ICPPW '14: Proceedings of the 2014 43rd International Conference on Parallel Processing WorkshopsIntel's Xeon Phi coprocessor has successfully proved its capability by being used in Tianhe-2 and Stampede, two of the top ten most powerful supercomputers today. It is almost certain that the popularity of Xeon Phi in heterogeneous computing will grow ...
GPUWattch: enabling energy optimizations in GPGPUs
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer ArchitectureGeneral-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly ...






Comments