Abstract
Advances in silicon process technology have made it possible to include multiple processor cores on a single die. Billion transistor architectures usually in the form of networks-on-chip present a wide range of challenges in design, microarchitecture, and algorithmic levels with significant impact to system performance and power consumption. In this article, we propose efficient methods and mechanisms that exploit a heterogeneous network-on-chip (NoC) to achieve a power- and thermal-aware coherent system. To this end, we utilize different management techniques which employ dynamic frequency scaling circuitry and power and temperature sensors per node to achieve real-time workload prediction and allocation at node and system level by low-cost threads. The developed heterogeneous multicoprocessing infrastructure is utilized to evaluate diverse policies for power-aware computing in terms of effectiveness and in relation to distributed sensor-conscious management. The proposed reconfigurable architecture supports coprocessor accelerators per node, monitors the program’s power profile on-the-fly, and balances power and thermal behavior at the NoC level. Overall, these techniques form a system exploration methodology using a multi-FPGA emulation platform showing a minimum complexity overhead.
- Agarwal, K. and Nowka, K. 2007. Dynamic power management by combination of dual static supply voltages. In Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED’07). IEEE Computer Society, Los Alamitos, CA, 85--92. Google Scholar
Digital Library
- AMD. AMD accelerated processing units. www.amd.com/us/products/technologies/apu/Pages/apu.aspx.Google Scholar
- Atienza, D., Valle, P. G. D., Paci, G., Poletti, F., Benini, L., Micheli, G. D., Mendias, J. M., and Hermida, R. 2007. HW-SW emulation framework for temperature-aware design in MPSoCs. ACM Trans. Design Automat. Electron. Syst. 12, 3. Google Scholar
Digital Library
- Bao, M., Andrei, A., Eles, P., Peng, Z., and Eles, P. 2010. Temperature-aware idle time distribution for energy optimization with dynamic voltage scaling. In Proceedings of the Conference on Design, Automation and Test in Europe. 21--26. Google Scholar
Digital Library
- Beign, E., Clermidy, F., Lhermet, H., Miermont, S., Thonnart, Y., Tran, X.-T., Valentian, A., Varreau, D., Vivet, P., Popon, X., and Lebreton, H. 2009. An asynchronous power aware and adaptive noc based circuit. J. Solid-State Circ. 40, 4, 1167--1177.Google Scholar
Cross Ref
- Bellosa, F., Kellner, S., Waitz, M., and Weissel, A. 2003. Event-driven energy accounting for dynamic thermal management. In Proceedings of the Workshop on Compilers and Operating Systems for Low Power (COLP’03).Google Scholar
- Bhattacharjee, A., Contreras, G., and Martonosi, M. 2008. Full-system chip multiprocessor power evaluations using FPGA-based emulation. In Proceedings of the 13th International Symposium on Low Power Electronics and Design. 335--340. Google Scholar
Digital Library
- Brooks, D. and Martonosi, M. 2001. Dynamic thermal management for high-performance microprocessors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). 304--309. Google Scholar
Digital Library
- Brooks, D., Bose, P., and Martonosi, M. 2004. Power-performance simulation: Design and validation strategies. SIGMETRICS Perf. Eval. Rev. 31, 4, 13--18. Google Scholar
Digital Library
- Broyles, M., Franscois, C., and Geissler, A. 2013. IBM EnergyScale for POWER7 processor-based systems. www-03.ibm.com/systems/power/hardware/whitepapers/energyscale7.html, March 2013.Google Scholar
- Carta, S., Acquaviva, A., Del Valle, P. G., Atienza, D., De Micheli, G., Rincon, F., Benini, L., and Mendias, J. M. 2007. Multi-processor operating system emulation framework with thermal feedback for systems-on-chip. In Proceedings of the 17th ACM Great Lakes Symposium on VLSI. 311--316. Google Scholar
Digital Library
- Che, W. and Chatha, K. S. 2010. Scheduling of synchronous data flow models on scratchpad memory based embedded processors. In Proceedings of the International Conference on Computer-Aided Design. 205--212. Google Scholar
Digital Library
- Chen, S., Kozuch, M., Strigkos, T., Falsafi, B., Gibbons, P. B., Mowry, T. C., Ramachandran, V., Ruwase, O., Ryan, M., and Vlachos, E. 2008. Flexible hardware acceleration for instruction-grain program monitoring. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA’08). IEEE Computer Society, Los Alamitos, CA, 377--388. Google Scholar
Digital Library
- Chung, E. S., Papamichael, M. K., Nurvitadhi, E., Hoe, J. C., Mai, K., and Falsafi, B. 2009. Protoflex: Towards scalable, full-system multiprocessor simulations using fpgas. ACM Trans. Reconfig. Technol. Syst. 2, 2, 15:1--15:32. Google Scholar
Digital Library
- Cochran, R. and Reda, S. 2010. Consistent runtime thermal prediction and control through workload phase detection. In Proceedings of the 47th Design Automation Conference. 62--67. Google Scholar
Digital Library
- Coskun, A. K., Rosing, T. S., and Gross, K. C. 2009. Utilizing predictors for efficient thermal management in multiprocessor SoCs. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 10, 1503--1516. Google Scholar
Digital Library
- Dalton, M., Kannan, H., and Kozyrakis, C. 2007. Raksha: A flexible information flow architecture for software security. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, 482--493. Google Scholar
Digital Library
- Donald, J. and Martonosi, M. 2006. Techniques for multicore thermal management: Classification and new exploration. In Proceedings of the 33rd International Symposium on Computer Architecture. 78--88. Google Scholar
Digital Library
- Draper, N. and Smith, H. 1998. Applied Regression Analysis. Wiley-Interscience.Google Scholar
- Ghodrat, Mohammad, A., Lahiri, K., and Raghunathan, A. 2007. Accelerating system-on-chip power analysis using hybrid power estimation. In Proceedings of the 44th Annual Design Automation Conference. 883--886. Google Scholar
Digital Library
- Gschwind, M., Hofstee, H. P., Flachs, B., Hopkins, M., Watanabe, Y., and Yamazaki, T. 2006. Synergistic processing in cell’s multicore architecture. IEEE Micro 26, 2, 10--24. Google Scholar
Digital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the Workload Characterization (WWC-4). 3--14. Google Scholar
Digital Library
- Heo, S., Barr, K., and Asanović, K. 2003. Reducing power density through activity migration. In Proceedings of the International Symposium on Low Power Electronics and Design. 217--222. Google Scholar
Digital Library
- Hsu, C.-H. and Feng, W.-c. 2005. A power-aware run-time system for high-performance computing. In Proceedings of the ACM/IEEE Conference on Supercomputing. p. 1. Google Scholar
Digital Library
- Hussein, J., Klein, M., and Hart, M. 2011. Lowering power at 28 nm with Xilinx 7 series FPGAs. White paper WP389 (v1.1).Google Scholar
- Intel. 2012. The Intel Xeon Phi coprocessor: Parallel processing, unparalleled discovery. www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html.Google Scholar
- Intel Labs Single-chip Cloud Computer. 2009. http://techresearch.intel.com/newsdetail.aspx?Id=17#SCC.Google Scholar
- Isci, C., Contreras, G., and Martonosi, M. 2006. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. 359--370. Google Scholar
Digital Library
- Kim, J. and Kim, H. 2009. Router microarchitecture and scalability of ring topology in on-chip networks. In Proceedings of the 2nd International Workshop on Network on Chip Architectures. 5--10. Google Scholar
Digital Library
- Kim, W., Gupta, M., Wei, G.-Y., and Brooks, D. 2008. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture. 123--134.Google Scholar
- Kornaros, G. 2010. Application Specific Customizable Embedded Systems. In Multi-Core Embedded Systems, Chapter 2, CRC Press.Google Scholar
- Kornaros, G. and Pnevmatikatos, D. 2011. Hardware-assisted dynamic power and thermal management in multi-core socs. In Proceedings of the 21st Edition of the Great Lakes Symposium on VLSI. 115--120. Google Scholar
Digital Library
- Kotla, R., Ghiasi, G., Keller, T., and Rawson, F. 2005. Scheduling processor voltage and frequency in server and cluster systems. In Proceedings of the Workshop on High-Performance, Power-Aware Computing (HP-HPAC). 234.2. Google Scholar
Digital Library
- Kumar, A., Shang, L., Peh, L.-S., and Jha, N. 2008. System-level dynamic thermal management for high-performance microprocessors. IEEE Trans. Comput.-Aided Des. Integ. Circ. Syst. 27, 1, 96 --108. Google Scholar
Digital Library
- Lee, H. G., Chang, N., Ogras, U. Y., and Marculescu, R. 2008. On-chip communication architecture exploration: A quantitative evaluation of point-to-point, bus, and network-on-chip approaches. ACM Trans. Des. Autom. Electron. Syst. 12, 3, 23:1--23:20. Google Scholar
Digital Library
- Li, M., Sasanka, R., Adve, S. V., kuang Chen, Y., and Debes, E. 2005. The ALPBench benchmark suite for complex multimedia applications. In Proceedings of the IEEE International Symposium on Workload Characterization. 34--45.Google Scholar
- Li, Xinyu and Hammami, Omar. Fast design productivity for embedded multiprocessor through multi-FPGA emulation: The case of a 48-way multiprocessor with NoC. http://www.design-reuse.com/articles/21324/multi-fpga-emulation-multiprocessor-noc.html.Google Scholar
- Maxeler Technologies. MPC-X series. www.maxeler.com/products/mpc-xseries.Google Scholar
- Merkel, A. and Bellosa, F. 2005. Event-driven thermal management in SMP systems. In Proceedings of the 2nd Workshop on Temperature-Aware Computer Systems (TACS).Google Scholar
- Mulas, F., Atienza, D., Acquaviva, A., Carta, S., Benini, L., and De Micheli, G. 2009. Thermal balancing policy for multiprocessor stream computing platforms. Trans. Comp.-Aided Des. Integ. Circ. Syst. 28, 1870--1882. Google Scholar
Digital Library
- Ogras, U. Y., Marculescu, R., Choudhary, P., and Marculescu, D. 2007. Voltage-frequency island partitioning for GALS-based networks-on-chip. In Proceedings of the 44th Annual Conference on Design Automation. 110--115. Google Scholar
Digital Library
- Ogras, U. Y., Marculescu, R., Marculescu, D., and Jung, E. G. 2009. Design and management of voltage-frequency island partitioned networks-on-chip. IEEE Trans. VLSI Syst. 17, 3, 330--341. Google Scholar
Digital Library
- Ou, J. and Prasanna, V. K. 2008. A cooperative management scheme for power efficient implementations of real-time operating systems on soft processors. IEEE Trans. VLSI Syst. 16, 45--56. Google Scholar
Digital Library
- Pham, D., Aipperspach, T., Boerstler, D., Bolliger, M., Chaudhry, R., Cox, D., Harvey, P., Harvey, P., Hofstee, H., Johns, C., Kahle, J., Kameyama, A., Keaty, J., Masubuchi, Y., Pham, M., Pille, J., Posluszny, S., Riley, M., Stasiak, D., Suzuoki, M., Takahashi, O., Warnock, J., Weitzel, S., Wendel, D., and Yazawa, K. 2006. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE J. Solid-State Circ. 41, 1, 179--196.Google Scholar
Cross Ref
- Powell, M. D., Gomaa, M., and Vijaykumar, T. N. 2004. Heat-and-run: Leveraging SMT and CMP to manage power density through the operating system. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems. 260--270. Google Scholar
Digital Library
- Rangan, K. K., Wei, G.-Y., and Brooks, D. 2009. Thread motion: Fine-grained power management for multi-core systems. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, 302--313. Google Scholar
Digital Library
- Rotem, E., Mendelson, A., Ginosar, R., and Weiser, U. 2009. Multiple clock and voltage domains for chip multi processors. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. 459--468. Google Scholar
Digital Library
- Rudin, W. 1987. Real and Complex Analysis 3rd Ed. McGraw-Hill, Inc., New York, NY. Google Scholar
Digital Library
- Srinivasan, K. and Karam, C. S. 2005. A technique for low energy mapping and routing in network-on-chip architectures. In Proceedings of the International Symposium on Low Power Electronics and Design. 387--392. Google Scholar
Digital Library
- Talpes, E. and Marculescu, D. 2005. Toward a multiple clock/voltage island design style for power-aware processors. IEEE Trans. VLSI Syst. 13, 591--603. Google Scholar
Digital Library
- Venkataramani, G., Roemer, B., Solihin, Y., and Prvulovic, M. 2007. Memtracker: Efficient and programmable support for memory access monitoring and debugging. In Proceedings of the IEEE 13th International Symposium on High Performance Computer Architecture (HPCA’07). 273--284. Google Scholar
Digital Library
- Wang, Y., Ma, K., and Wang, X. 2009. Temperature-constrained power control for chip multiprocessors with online model estimation. In Proceedings of the International Symposium on Computer Architecture (ISCA). 314--324. Google Scholar
Digital Library
- Wawrzynek, J., Patterson, D., Oskin, M., Lu, S.-L., Kozyrakis, C., Hoe, J. C., Chiou, D., and Asanovic, K. 2007. Ramp: Research accelerator for multiple processors. IEEE Micro 27, 2, 46--57. Google Scholar
Digital Library
- Wu, Q., Juang, P., Martonosi, M., and Clark, D. W. 2005. Voltage and frequency control with adaptive reaction time in multiple-clock-domain processors. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. 178--189. Google Scholar
Digital Library
- Xilinx, Inc. a. Aurora 8B/10B for Virtex-4 FX FPGA User Guide. UG061, v3.1. www.xilinx.com/support/documentation/ip_documentation/virtex_4fx_aurora_8b10b_ug061.pdf, 2009.Google Scholar
- Xilinx, Inc. b. Xilinx demonstrates industry’s first scalable 3-D graphics hardware accelerator for automotive applications. www.xilinx.com/prs_rls/2007/end_markets/0703_xylon3dCES.htm,2007.Google Scholar
- Xilinx, Inc. c. Xilinx SDR radio kit wins 2006 portable design editor’s choice award. www.xilinx.com/prs_rls/2007/xil_corp/0733_pdawards.htm, Feb.2007.Google Scholar
- Xilinx, Inc. d. Xilinx spartan-3e fpgas enable JVC’s latest professional broadcast hdv camera-recorder GY-HD250. www.xilinx.com/prs_rls/design_win/06123jvc.htm, Nov. 2006.Google Scholar
- Yeo, I., Liu, C. C., and Kim, E. J. 2008. Predictive dynamic thermal management for multicore systems. In Proceedings of the 45th annual Design Automation Conference (DAC’08). 734--739. Google Scholar
Digital Library
- Yu, C. and Petrov, P. 2010. Adaptive multi-threading for dynamic workloads in embedded multiprocessors. In Proceedings of the 23rd Symposium on Integrated Circuits and System Design. 67--72. Google Scholar
Digital Library
- Yuffe, M., Knoll, E., Mehalel, M., Shor, J., and Kurts, T. 2011. A fully integrated multi-CPU, GPU and memory controller 32nm processor. In Proceedings of the IEEE International Solid-State Circuits Conference. 264--265.Google Scholar
- Zhang, X., Shen, K., Dwarkadas, S., and Zhong, R. 2010. An evaluation of per-chip nonuniform frequency scaling on multicores. In Proceedings of the USENIX Annual Technical Conference. 19--19. Google Scholar
Digital Library
- Zhou, X., Yang, J., Chrobak, M., and Zhang, Y. 2010. Performance-aware thermal management via task scheduling. ACM Trans. Archit. Code Optim. 7, 5:1--5:31. Google Scholar
Digital Library
- Zhu, Y. and Albonesi, D. 2006. Synergistic temperature and energy management in GALS processor architectures. In Proceedings of the International Symposium on Low Power Electronics and Design. 55--60. Google Scholar
Digital Library
Index Terms
Dynamic Power and Thermal Management of NoC-Based Heterogeneous MPSoCs
Recommendations
Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsPower management has become one of the first-order considerations in high performance computing field. Many recent studies focus on optimizing the performance of a computer system within a given power budget. However, most existing solutions adopt fixed ...
Predictive dynamic thermal and power management for heterogeneous mobile platforms
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionHeterogeneous multiprocessor systems-on-chip (MPSoCs) powering mobile platforms integrate multiple asymmetric CPU cores, a GPU, and many specialized processors. When the MPSoC operates close to its peak performance, power dissipation easily increases ...
A thermal stress-aware algorithm for power and temperature management of MPSoCs
DATE '15: Proceedings of the 2015 Design, Automation & Test in Europe Conference & ExhibitionIn this work, we propose a thermal stress-aware algorithm for the management of the power and temperature in MPSoCs. The algorithm, which uses a heuristic approach, controls the power consumption, maximum temperature, thermal cycles, and temporal/...








Comments