Abstract
The hybrid memory architecture that contains both on-chip cache and scratchpad memory (SPM) has been widely used in embedded systems. In this article, we explore this hybrid memory architecture by jointly optimizing time performance and temperature for embedded systems with loops. Our basic idea is to adaptively adjust the workload distribution between cache and SPM based on the current temperature. For a problem in which the workload can be estimated a priori, we present a nonlinear programming formulation to optimally minimize the total execution time of a loop under the constraints of SPM size and temperature. To solve a problem in which the workload is not known a priori, we propose a temperature-aware adaptive loop scheduling algorithm called TALS to dynamically allocate data to cache and SPM at runtime. The experimental results show that our algorithms can effectively achieve both performance and temperature optimization for embedded systems with cache and SPM.
- T. Austin, E. Larson, and D. Ernst. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67. Google Scholar
Digital Library
- R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: A design alternative for cache on-chip memory in embedded systems. In Proceedings of the 10th International Symposium on Hardware/Software Co-Design. 73--78. Google Scholar
Digital Library
- D. Brooks, V. Tiwari, and M. Martonosi. 2000. Wattch: A framework for architectural-level power analysis and optimizations. ACM SIGARCH Computer Architecture News 28, 2, 83--94. Google Scholar
Digital Library
- T. Chantem, R. Dick, and X. S. Hu. 2008. Temperature-aware scheduling and assignment for hard real-time applications on MPSoCs. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'08). 288--293. Google Scholar
Digital Library
- W. Che and K. Chatha. 2011. Compilation of stream programs onto scratchpad memory based embedded multicore processors through retiming. In Proceedings of the 48th Annual Design Automation Conference (DAC'11). IEEE, Los Alamitos, CA, 122--127. Google Scholar
Digital Library
- W. Che, A. Panda, and K. S. Chatha. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE'10). 1118--1123. Google Scholar
Digital Library
- J.-J. Chen and T.-W. Kuo. 2006a. Allocation cost minimization for periodic hard real-time tasks in energy-constrained DVS systems. In Proceedings of the 2006 IEEE/ACM International Conference on Computer-Aided Design (ICCAD'06). 255--260. Google Scholar
Digital Library
- J.-J. Chen and T.-W. Kuo. 2006b. Procrastination for leakage-aware rate-monotonic scheduling on a dynamic voltage scaling processor. In Proceedings of the 2006 ACM SIGPLAN/SIGBED Conference on Language, Compilers, and Tool Support for Embedded Systems (LCTES'06). 153--162. Google Scholar
Digital Library
- A. Fourmigue, G. Beltrame, G. Nicolescu, and E. Aboulhamid. 2011. A linear-time approach for the transient thermal simulation of liquid-cooled 3D ICs. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS'11). IEEE, Los Alamitos, CA, 197--205. Google Scholar
Digital Library
- V. Hanumaiah, R. Rao, S. Vrudhula, and K. S. Chatha. 2009. Throughput optimal task allocation under thermal constraints for multi-core processors. In Proceedings of the 46th Annual Design Automation Conference (DAC'09). ACM, New York, NY, 776--781. Google Scholar
Digital Library
- C.-M. Hung, J.-J. Chen, and T.-W. Kuo. 2006. Energy-efficient real-time task scheduling for a DVS system with a non-DVS processing element. In Proceedings of the IEEE Real-Time Systems Symposium (RTSS'06). 303--312. Google Scholar
Digital Library
- R. E. Kessler. 1999. The Alpha 21264 microprocessor. IEEE Micro 19, 2, 24--36. Google Scholar
Digital Library
- A. Krum. 2000. Thermal management. In The CRC Handbook of Thermal Engineering, F. Kreith (Ed.). CRC Press/Springer, 2.1--2.92.Google Scholar
- J. C. Ku, S. Ozdemir, G. Memik, and Y. Ismail. 2005. Thermal management of on-chip caches through power density minimization. In Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 38). IEEE, Los Alamitos, CA, 283--293. Google Scholar
Digital Library
- A. Kumar, L. Shang, L.-S. Peh, and N. K. Jha. 2006. HybDTM: A coordinated hardware-software approach for dynamic thermal management. In Proceedings of the 43rd Annual Conference on Design Automation (DAC'06). 548--553. Google Scholar
Digital Library
- A. Kumar, L. Shang, L.-S. Peh, and N. K. Jha. 2008. System-level dynamic thermal management for high-performance microprocessors. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 1, 96--108. Google Scholar
Digital Library
- H. Li, P. Liu, Z. Qi, L. Jin, W. Wu, S. X.-D. Tan, and J. Yang. 2005. Efficient thermal simulation for run-time temperature tracking and management. In Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors (ICCD'05). 130--133. Google Scholar
Digital Library
- L. Li, L. Gao, and J. Xue. 2005. Memory coloring: A compiler approach for scratchpad memory management. In Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05). 180--192. Google Scholar
Digital Library
- L. Li, Q. H. Nguyen, and J. Xue. 2007. Scratchpad allocation for data aggregates in superperfect graphs. In Proceedings of the ACM SIGPLAN/SIGBED 2007 International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'07). 207--216. Google Scholar
Digital Library
- L. Li, J. Xue, and J. Knoop. 2010. Scratchpad memory allocation for data aggregates via interval coloring in superperfect graphs. ACM Transactions on Embedded Computing Systems 10, 2, 28. Google Scholar
Digital Library
- LINGO. 2012. LINDO Software for Integer Programming, Linear Programming, Nonlinear Programming, Stochastic Programming, Global Optimization. Retrieved December 12, 2014, from http://www. lindo.com.Google Scholar
- P. Liu, Z. Qi, H. Li, L. Jin, W. Wu, S. X.-D. Tan, and J. Yang. 2005. Fast thermal simulation for architecture level dynamic thermal management. In Proceedings of the International Conference on Computer-Aided Design (ICCAD'05). 639--644. Google Scholar
Digital Library
- T. Liu, A. Orailoglu, C. Xue, and M. Li. 2011. Register allocation for simultaneous reduction of energy and peak temperature on registers. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'11). IEEE, Los Alamitos, CA, 1--6.Google Scholar
- Y. Liu, R. Dick, L. Shang, and H. Yang. 2007. Accurate temperature-dependent integrated circuit leakage power estimation is easy. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'07). 1--6. Google Scholar
Digital Library
- J. Luo and N. K. Jha. 2007. Power-efficient scheduling for heterogeneous distributed real-time embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 6, 1161--1170. Google Scholar
Digital Library
- J. Meng, K. Kawakami, and A. Coskun. 2012. Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). ACM, New York, NY, 648--655. Google Scholar
Digital Library
- B. Mochocki, X. S. Hu, and G. Quan. 2007. Transition-overhead-aware voltage scheduling for fixed-priority real-time systems. ACM Transactions on Design Automation of Electronic Systems 12, 2, Article No. 11. Google Scholar
Digital Library
- G. Quan, Y. Zhang, W. Wiles, and P. Pei. 2008. Guaranteed scheduling for repetitive hard real-time tasks under the maximal temperature constraint. In Proceedings of the IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis (ISSS+CODES'08). 267--272. Google Scholar
Digital Library
- L. Shang, L.-S. Peh, A. Kumar, and N. K. Jha. 2004. Thermal modeling, characterization and management of on-chip networks. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (Micro'04). 67--78. Google Scholar
Digital Library
- S. Sharifi, R. Ayoub, and T. Rosing. 2012. Tempomp: Integrated prediction and management of temperature in heterogeneous MPSoCs. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'12). IEEE, Los Alamitos, CA, 593--598. Google Scholar
Digital Library
- K. Skadron, M. R. Stan, W. Huang, S. Velusamy, K. Sankaranarayanan, and D. Tarjan. 2003. Temperature-aware microarchitecture. In Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA'03). 2--13. Google Scholar
Digital Library
- S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. 2002. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'02). 409--415. Google Scholar
Digital Library
- V. Suhendra, T. Mitra, A. Roychoudhury, and T. Chen. 2005. WCET centric data allocation to scratchpad memory. In Proceedings of the 26th IEEE International Real-Time Systems Symposium (RTSS'05). IEEE, Los Alamitos, CA, 10--19. Google Scholar
Digital Library
- V. Suhendra, A. Roychoudhury, and T. Mitra. 2010. Scratchpad allocation for concurrent embedded software. ACM Transactions on Programming Languages and Systems 32, 4, 13:1--13:47. Google Scholar
Digital Library
- I. Ukhov, M. Bao, P. Eles, and Z. Peng. 2012. Steady-state dynamic temperature analysis and reliability optimization for embedded multiprocessor systems. In Proceedings of the 49th Annual Design Automation Conference (DAC'12). IEEE, Los Alamitos, CA, 197--204. Google Scholar
Digital Library
- X. Vera, B. Lisper, and J. Xue. 2007. Data cache locking for tight timing calculations. ACM Transactions on Embedded Computing Systems 7, 1, Article No. 4. Google Scholar
Digital Library
- M. Wang, Z. Shao, C. J. Xue, and E. H.-M. Sha. 2007. Real-time loop scheduling with leakage energy minimization for embedded VLIW DSP processors. In Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA'07). 12--19. Google Scholar
Digital Library
- W. Wang, P. Mishra, and A. Gordon-Ross. 2012. Dynamic cache reconfiguration for soft real-time systems. ACM Transactions on Embedded Computing Systems 11, 2, 28. Google Scholar
Digital Library
- Z. Wang and X. S. Hu. 2005. Energy-aware variable partitioning and instruction scheduling for multibank memory architectures. ACM Transactions on Design Automation of Electronic Systems 10, 2, 369--388. Google Scholar
Digital Library
- W. Wu, J. Yang, S. X.-D. Tan, and S.-L. Lu. 2007. Improving the reliability of on-chip data caches under process variations. In Proceedings of the International Conference on Computer Design (ICCD'07). 325--332.Google Scholar
Cross Ref
- L. Yuan, S. Leventhal, and G. Qu. 2006. Temperature-aware leakage minimization techniques for real-time systems. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD'06). 761--764. Google Scholar
Digital Library
- S. Zhang and K. S. Chatha. 2007. Approximation algorithm for the temperature-aware scheduling problem. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'07). IEEE, Los Alamitos, CA, 281--288. Google Scholar
Digital Library
- W. Zhang. 2004. Compiler-directed data cache leakage reduction. In Proceedings of the IEEE Computer Society Symposium on VLSI (ISVLSI'04). 305--306.Google Scholar
Cross Ref
- Y. Zhang, X. S. Hu, and D. Z. Chen. 2002. Task scheduling and voltage selection for energy minimization. In Proceedings of the Design Automation Conference (DAC'02). 183--188. Google Scholar
Digital Library
- Y. Zhang, D. Parikh, K. Sankaranarayanan, K. Skadron, and M. Stan. 2003. HotLeakage: A Temperature-Aware Model of Subthreshold and Gate Leakage for Architects. Technical Report CS-2003. Department of Computer Science, University of Virginia, Charlottesville, VA.Google Scholar
- Y. Zhang, J. Yang, and R. Gupta. 2000. Frequent value locality and value-centric data cache design. In Proceedings of the 9th ACM International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS'00). 150--159. Google Scholar
Digital Library
- X. Zhong and C.-Z. Xu. 2005. Energy-aware modeling and scheduling of real-time tasks for dynamic voltage scaling. In Proceedings of the IEEE Real-Time System Symposium (RTSS'05). 366--375. Google Scholar
Digital Library
- X. Zhong and C.-Z. Xu. 2006. System-wide energy minimization for real-time tasks: Lower bound and approximation. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD'06). 516--521. Google Scholar
Digital Library
- C. Zhu, Z. Gu, L. Shang, R. Dick, and R. Joseph. 2008. Three-dimensional chip-multiprocessor run-time thermal management. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 27, 3, 1479--1492. Google Scholar
Digital Library
- V. Zivojnovic, J. Velarde, C. Schlager, and H. Meyr. 1994. DSPstone: A DSP-oriented benchmarking methodology. In Proceedings of the International Conference on Signal Processing Applications and Technology (ICSPAT'94). 715--720.Google Scholar
Index Terms
Temperature-Aware Data Allocation for Embedded Systems with Cache and Scratchpad Memory
Recommendations
Leveraging both Data Cache and Scratchpad Memory through Synergetic Data Allocation
RTAS '12: Proceedings of the 2012 IEEE 18th Real Time and Embedded Technology and Applications SymposiumAlthough a data cache provides fast access latency, it degrades the timing predictability of real-time embedded systems due to misses which are difficult to predict. Scratch pad memory is accessed as fast as a data cache, but does not suffer from ...
A reuse-aware prefetching scheme for scratchpad memory
DAC '11: Proceedings of the 48th Design Automation ConferenceScratchpad memory (SPM) has been utilized as prefetch buffer in embedded systems and parallel architectures to hide memory access latency. However, the impact of reuse pattern on SPM prefetching has not been fully investigated. In this paper we quantify ...
Dynamic data scratchpad memory management for a memory subsystem with an MMU
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsIn this paper, we propose a dynamic scratchpad memory (SPM)management technique for a horizontally-partitioned memory subsystem with an MMU. The memory subsystem consists of a relatively cheap direct-mapped data cache and SPM. Our technique loads ...






Comments