Abstract
As processors, memories, and other components of today's embedded systems are pushed to higher performance in more enclosed spaces, processor thermal management is quickly becoming a limiting design factor. While previous proposals mostly approached this thermal management problem from circuit and architecture angles, software can also play an important role in identifying and eliminating thermal hotspots as it is the main factor that shapes the order and frequency of accesses to different hardware components in the chip. This is particularly true for compiler-scheduled Very Long Instruction Word (VLIW) datapath.In this paper, we focus on a compiler-based approach to make the thermal profile more balanced in the integer functional units of VLIW architectures. For balanced thermal behavior and peak temperature minimization, we propose techniques based on load balancing across the integer functional units with or without rotation of functional unit usage. As leakage power is exponentially dependent on temperature and temperature is dependent on total power (i.e., switching and leakage), in our techniques, we also consider leakage power optimization by IPC tuning (instructions issued per cycle). By taking a code that is already scheduled for maximum performance as input, our scheduling strategies modify this performance-oriented schedule for balanced thermal behavior with negligible performance degradation. We simulate our scheduling strategies using a framework that consists of the Trimaran infrastructure, a power model, and the HotSpot. Our experimental results using several benchmark programs reveal that the peak temperature can be reduced through compiler scheduling.
- http://www.intel.com/products/server/processors/server/itanium/Google Scholar
- http://www.isonics.com/Google Scholar
- http://www.itrs.net/Common/2004Update/2004_000_ORTC.pdfGoogle Scholar
- http://lava.cs.virginia.edu/HotSpot/index.htmGoogle Scholar
- http://www.trimaran.org/Google Scholar
- S. Borkar. Design challenges of technology scaling. IEEE Micro, 19(4):23--29, 1999. Google Scholar
Digital Library
- D. Brooks and M. Martonosi. Dynamic thermal management for high-performance microprocessors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture, pp. 171--182, 2001. Google Scholar
Digital Library
- P. Chaparro, J. González, and A. González. Thermal-aware clustered microarchitectures. In Proceedings of the 22nd IEEE International Conference on Computer Design, pp. 48--53, 2004. Google Scholar
Digital Library
- P. Chaparro, G. Magklis, J. González, and A. González. Distributing the frontend for temperature reduction. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture, pp. 61--70, 2005. Google Scholar
Digital Library
- D.-Y. Chen, L. Liu, C. Fu, S. Yang, C. Wu, and R. Ju. Efficient resource management during instruction scheduling for the EPIC architectures. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, pp. 36--45, 2003. Google Scholar
Digital Library
- J. Deeney. Thermal modeling and measurement of large high-power silicon devices with asymmetric power distribution. In Proceedings of the 35th International Symposium on Microelectronics, 2002.Google Scholar
- E.M.C. Filho, E.S.T. Fernandes, and A. Wolfe. Load balancing in superscalar architectures. In Proceedings of the 22nd EUROMICRO Conference, pp. 651--658, 1996.Google Scholar
- A. Fisher. Trace scheduling: A technique for global microcode compaction. IEEE Transactions on Computers, C-30(7):478--490, 1981.Google Scholar
Digital Library
- S. Ghiasi, J. Casmira, D. Grunwald. Using IPC variation in workloads with externally specified rates to reduce power consumption. In Workshop on Complexity-Effective Design, 2000.Google Scholar
- S.H. Gunther, F. Binns, D.M. Carmean, and J.C. Hall. Managing the impact of increasing microprocessor power consumption. Intel Technology Journal, Q1, 2001.Google Scholar
- S. Haga, N. Reeves, R. Barua, and D. Marculescu. Dynamic functional unit assignment for low power. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 1052--1057, 2003. Google Scholar
Digital Library
- Y. Han, I. Koren, and C. A. Moritz. Temperature aware floorplanning. In Proceedings of the 2nd Workshop on Temperature Aware Computer Systems, 2005.Google Scholar
- S. Heo, K. Barr, and K. Asanović. Reducing power density through activity migration. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design, pp. 217--222, 2003. Google Scholar
Digital Library
- W. Huang, J. Renau, S.-M. Yoo, and J. Torellas. A framework for dynamic energy efficiency and temperature management. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 202--213, 2000. Google Scholar
Digital Library
- W.W. Hwu, S.A. Mahlke, W.Y. Chen, P.P. Chang, N.J. Warter, R.A. Bringmann, R.G. Ouellette, R.E. Hank, T. Kiyohara, G.E. Haab, J.G. Holm, and D.M. Lavery. The superblock: An effective technique for VLIW and superscalar compilation. The Journal of Supercomputing, 7:229--248, 1993. Google Scholar
Digital Library
- H.S. Kim, N. Vijaykrishnan, M. Kandemir, and M.J. Irwin. Adapting Instruction Level Parallelism for Optimizing Leakage in VLIW Architectures. In Proceedings of the 2003 ACM SIGPLAN conference on Language, Compiler, and Tool for Embedded Systems, pp. 275--283, 2003. Google Scholar
Digital Library
- C. Lee, J. K. Lee, T. Hwang, and S. Tsai. Compiler optimization on instruction scheduling for low power. In Proceedings of the 13th International Symposium on System Synthesis, pp. 55--60, 2000. Google Scholar
Digital Library
- W. Liao, F. Li, and L. He. Microarchitecture level power and thermal simulation considering temperature dependent leakage model. In Proceedings of the 2003 International Symposium on Low Power Electronics and Design, pp. 211--216, 2003. Google Scholar
Digital Library
- S. Mahlke, D. Lin, W. Chen, R. Hank, and R. Bringmann. Effective Compiler Support for Predicated Execution Using the Hyperblock. In Proceedings of the 25th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 45--54, 1992. Google Scholar
Digital Library
- S. Moon and K. Ebcioglu. Performance Analysis of Tree VLIW Architecture for Exploiting Branch ILP in Non-Numerical Code. In Proceedings of the 11th International Conference on Supercomputing, pp. 301--308, 1997. Google Scholar
Digital Library
- S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann, 1997. Google Scholar
Digital Library
- R. Mukherjee, S.O. Memik, and G. Memik. Temperature-aware resource allocation and binding in high-level synthesis. In Proceedings of the 42nd Annual ACM/IEEE Design Automation Conference, pp. 196--201, 2005. Google Scholar
Digital Library
- A.C. Nacul and T. Givargis. Lightweight multitasking support for embedded systems using the Phantom serializing compiler. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 742--747, 2005. Google Scholar
Digital Library
- D. C. Pham. The design and implementation of a first-generation CELL Processor: A multi-core supercomputer SoC. In Proceedings of International Forum on Application Specific MPSoC, 2005.Google Scholar
Cross Ref
- S. Pillai and M. F. Jacome. Compiler-directed ILP extraction for clustered VLIW/EPIC machines: predication, speculation and modulo scheduling. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pp. 422--427, 2003. Google Scholar
Digital Library
- K. Skadron, T. Abdelzaher, and M.R. Stan. Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture, pp. 17--28, 2002. Google Scholar
Digital Library
- K. Skadron, K. Sankaranarayanan, S. Velusamy, D. Tarjan, M.R. Stan, and W. Huang. Temperature aware microarchitecture: modeling and implementation. ACM Transactions on Architecture and Code Optimization, 1(1):94--125, 2004. Google Scholar
Digital Library
- K. Skadron, M.R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan, Temperature-aware microarchitecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture, pp. 2--13, 2003. Google Scholar
Digital Library
- J. Srinivasan and S.V. Adve. Predictive dynamic thermal management for multimedia applications. In Proceedings of the 2003 International Conference on Supercomputing, pp. 109--120, 2003. Google Scholar
Digital Library
- M. C. Toburen, T. M. Conte, and M. Reilly. Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors. In Proceedings of the Power Driven Microarchitecture Workshop, 1998.Google Scholar
- Y-F. Tsai, D.E. Duarte, N. Vijaykrishnan, and M. J. Irwin. Characterization and modeling of run-time techniques for leakage power reduction. IEEE Transactions on Very Large Scale Integration Systems, 12(11):1221--1233, 2004. Google Scholar
Digital Library
- Y-F. Tsai, A. Hegde, N. Vijaykrishnan, and M. J. Irwin. ChipPower: An Architecture-Level Leakage Simulator. In Proceedings of IEEE International SoC Conference, pp. 395--398, 2004.Google Scholar
- R. Viswanath, V. Wakharkar, A. Watwe, and V. Lebonheur. Thermal performance challenges from silicon to systems. Intel Technology Journal, Q3, 2000.Google Scholar
- W. Zhang, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, D. Duarte, and Y. Tsai. Exploiting VLIW schedule slacks for dynamic and leakage energy reduction. In Proceedings of the 34th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 102--113, 2001. Google Scholar
Digital Library
Index Terms
Compiler-directed thermal management for VLIW functional units
Recommendations
Dynamically Scheduling VLIW Instructions
Very long instruction word (VLIW) machines potentially provide the most direct way to exploit instruction-level parallelism; however, they cannot be used to emulate current general-purpose instruction set architectures. In addition, programs scheduled ...
A time-predictable VLIW processor and its compiler support
Time predictability is an important requirement for real-time embedded application domains such as automotive, air transportation, and multimedia processing. However, the architectural design of modern microprocessors mainly concentrates on improving ...
Performance evaluation for a compressed-VLIW processor
SAC '02: Proceedings of the 2002 ACM symposium on Applied computingThis paper presents a new ILP processor architecture called Compressed VLIW (CVLIW). The CVLIW processor constructs a sequence of long instructions by removing nearly all NOPs (No OPerations) and LNOPs (Long NOPs) from VLIW code. The CVLIW processor ...






Comments