Abstract
The recent emergence of various Non-Volatile Memories (NVMs), with many attractive characteristics such as low leakage power and high-density, provides us with a new way of addressing the memory power consumption problem. In this article, we target embedded CMPs, and propose a novel Hybrid Scratch Pad Memory (HSPM) architecture which consists of SRAM and NVM to take advantage of the ultra-low leakage power, high density of NVM, and fast access of SRAM. A novel data allocation algorithm as well as an algorithm to determine the NVM/SRAM ratio for the novel HSPM architecture are proposed. The experimental results show that the data allocation algorithm can reduce the memory access time by 33.51% and the dynamic energy consumption by 16.81% on average for the HSPM architecture when compared with a greedy algorithm. The NVM/SRAM size determination algorithm can further reduce the memory access time by 14.7% and energy consumption by 20.1% on average.
- O. Avissar, R. Barua, and D. Stewart. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01). 34--43. Google Scholar
Digital Library
- O. Avissar, R. Barua, and D. Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google Scholar
Digital Library
- R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'02). 73--78. Google Scholar
Digital Library
- C. Bienia. 2011. Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University. Google Scholar
Digital Library
- N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. 2006. The m5 simulator: Modeling networked systems. IEEE Micro 26, 52--60. Google Scholar
Digital Library
- W. Che, A. Panda, and K. S. Chatha. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 1118--1123. Google Scholar
Digital Library
- Y. Chen, H. Li, X. Wang, W. Zhu, W. Xu, and T. Zhang. 2010. A nondestructive self-reference scheme for spin-transfer torque random access memory (stt-ram). In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 148--153. Google Scholar
Digital Library
- K. C. Chun, P. Jain, and C. H. Kim. 2009. A 0.9v, 65nm logic-compatible embedded dram with > 1ms data retention time and 53% less static power than a power-gated sram. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'09). 119--120. Google Scholar
Digital Library
- D. Culler, J. P. Singh, and A. Gupta. 1998. Parallel Computer Architecture: A Hardware/Software Approach. 1st Ed. Morgan Kaufmann. Google Scholar
Digital Library
- G. Dhiman, R. Ayoub, and T. Rosing. 2009. Pdram: a hybrid pram and dram main memory system. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'09). 664--469. Google Scholar
Digital Library
- A. Dominguez, S. Udayakumaran, and R. Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4, 521--540. Google Scholar
Digital Library
- X. Dong, N. P. Jouppi, and Y. Xie. 2009. Pcramsim: System-level performance, energy, and area modeling for phase-change ram. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'09). 269--275. Google Scholar
Digital Library
- X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic ram (mram) as a universal memory replacement. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'08). 554--559. Google Scholar
Digital Library
- J. Du, Y. Wang, Q. Zhuge, J. Hu, and E. H.-M. Sha. 2013. Efficient loop scheduling for chip-multiprocessors with non-volatile main memory. J. Signal Proces. Syst., 1--13. Google Scholar
Digital Library
- A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Mossé. 2010. Increasing pcm main memory lifetime. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 914--919. Google Scholar
Digital Library
- Y. Guo, Q. Zhuge, J. Hu, M. Qiu, and E.-M. Sha. 2011. Optimal data allocation for scratch-pad memory on embedded multi-core systems. In Proceedings of the International Conference on Parallel Processing (ICPP'11). 464--471. Google Scholar
Digital Library
- M. Hosomi, H. Yamagishi. et al. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'09). 459--462.Google Scholar
- J. Hu, W.-C. Tseng, C. J. Xue, Q. Zhuge, Y. Zhao, and E. H.-M. Sha. 2011. Write activity minimization for non-volatile main memory via scheduling and recomputation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 30, 4, 584--592. Google Scholar
Digital Library
- J. Hu, C. J. Xue, W.-C. Tseng, Y. He, M. Qiu, and E. H.-M. Sha. 2010a. Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'10). 350--355. Google Scholar
Digital Library
- J. Hu, C. J. Xue, W.-C. Tseng, Q. Zhuge, and E. H.-M. Sha. 2010b. Minimizing write activities to non-volatile memory via scheduling and recomputation. In Proceedings of the IEEE 8th Symposium on Application Specific Processors (SASP'10). 7--12. Google Scholar
Digital Library
- J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2011. Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'11). 1--6.Google Scholar
- J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2012a. Data allocation optimization for hybrid scratch pad memory with sram and non-volatile memory. IEEE Trans. VLSI Syst., 1--9.Google Scholar
- J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2012b. Write activity reduction on non-volatile main memories for embedded chip multi-processors. ACM Trans. Embed. Comput. Syst. 12, 3, 1--25. Google Scholar
Digital Library
- J. Hu, Q. Zhuge, C. Xue, W.-C. Tseng, and E. Sha. 2012. Optimizing data allocation and memory configuration for non-volatile memory based hybrid spm on embedded cmps. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'12). 982--989. Google Scholar
Digital Library
- Z. Hu, G. Gerfin, B. Dobry, and G. R. Gao. 2006. Programming experience on cyclops-64 multi-core chip architecture. In Proceedings of the 1st Workshop on Software Tools for Multi-Core Systems (STMCS'06).Google Scholar
- L. Jiang, Y. Du, Y. Zhang, B. Childers, and J. Yang. 2011. Lls: Cooperative integration of wear-leveling and salvaging for pcm main memory. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'11). 221--232. Google Scholar
Digital Library
- Y. Joo, D. Niu, X. Dong, G. Sun, N. Chang, and Y. Xie. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 136--141. Google Scholar
Digital Library
- M. Kandemir, M. J. Irwin, G. Chen, and I. Kolcu. 2004. Banked scratch-pad memory management for reducing leakage energy consumption. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'04). 120--124. Google Scholar
Digital Library
- M. Kandemir, M. J. Irwin, G. Chen, and I. Kolcu. 2005. Compiler-guided leakage optimization for banked scratch-pad memories. IEEE Trans. VLSI Syst. 13, 10, 1136--1146. Google Scholar
Digital Library
- M. Kandemir, J. Ramanujam, and A. Choudhary. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'02). 219--224. Google Scholar
Digital Library
- B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 2--13. Google Scholar
Digital Library
- Q. Li, J. Li, L. Shi, C. J. Xue, and Y. He. 2012. Mac: migration-aware compilation for stt-ram based hybrid cache in embedded systems. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'12). 351--356. Google Scholar
Digital Library
- Q. Li, Y. Zhao, J. Hu, C. J. Xue, E. H.-M. Sha, and Y. He. 2012. Mgc: Multiple graph-coloring for non-volatile memory based hybrid scratchpad memory. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures. 17--24. Google Scholar
Digital Library
- T. Liu, Y. Zhao, C. Xue, and M. Li. 2011. Power-aware variable partitioning for dsps with hybrid pram and dram main memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'11). 405--410. Google Scholar
Digital Library
- P. Mangalagiri, K. Sarpatwari, A. Yanamandra, V. Narayanan, Y. Xie, M. J. Irwin, and O. A. Karim. 2008. A low-power phase change memory based hybrid cache architecture. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI'08). 395--398. Google Scholar
Digital Library
- N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2009. Cacti 6.0: A tool to model large caches. Tech. Rep. HPL-2009-85, HP Laboratories.Google Scholar
- O. Ozturk, M. Kandemir, and I. Kolcu. 2006. Shared scratch-pad memory space management. In Proceedings of the International Symposium on Quality Electronic Design (ISQED'06). 576--584. Google Scholar
Digital Library
- P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Design and Test Conference (EDTC'97). Google Scholar
Digital Library
- M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 24--33. Google Scholar
Digital Library
- Y. Shang, W. Fei, and H. Yu. 2012. Analysis and modeling of internal state variables for dynamic effects of nonvolatile memory devices. IEEE Trans. Circuits Syst. Regul. Pap. 59, 9, 1.Google Scholar
Cross Ref
- L. Shi, C. J. Xue, J. Hu, W.-C. Tseng, and E. H.-M. Sha. 2010. Write activity reduction on flash main memory via smart victim cache. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI'10). 91--94. Google Scholar
Digital Library
- J. Sjödin, B. Fröderberg, and L. Thomas. 1998. Allocation of global data objects in on-chip ram. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'98). 1--5.Google Scholar
- J. Sjödin, and C. Von Platen. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01). 15--23. Google Scholar
Digital Library
- W.-C. Tseng, C. J. Xue, Q. Zhuge, J. Hu, and E. H.-M. Sha. 2010. Optimal scheduling to minimize non-volatile memory access time with hardware cache. In Proceedings of the 18th IEEE/IFIP VLSI System on Chip Conference (VLSI-SOC'10). 131--136.Google Scholar
- S. Udayakumaran, and R. Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). 276--286. Google Scholar
Digital Library
- S. Udayakumaran, A. Dominguez, and R. Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google Scholar
Digital Library
- Y. Wang, J. Du, J. Hu, Q. Zhuge, and E.-M. Sha. 2012. Loop scheduling optimization for chip-multiprocessors with non-volatile main memory. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'12). 1553--1556.Google Scholar
Cross Ref
- X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 34--45. Google Scholar
Digital Library
- X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie. 2009. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'09). 737--742. Google Scholar
Digital Library
- Y. Xie, G. H. Loh, B. Black, and K. Bernstein. 2006. Design space exploration for 3D architectures. J. Emerg. Technol. Comput. Syst. 2, 2, 65--103. Google Scholar
Digital Library
- P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 14--23. Google Scholar
Digital Library
Index Terms
Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors
Recommendations
Optimizing Data Allocation and Memory Configuration for Non-Volatile Memory Based Hybrid SPM on Embedded CMPs
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumThe recent emergence of various Non-Volatile Memories (NVMs), with many attractive characteristics such as low leakage power and high-density, provides us with a new way of addressing the memory power consumption problem. In this paper, we target ...
Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation
Nonvolatile memories such as Flash memory, phase change memory (PCM), and magnetic random access memory (MRAM) have many desirable characteristics for embedded systems to employ them as main memory. However, there are two common challenges we need to ...
Architecting phase change memory as a scalable dram alternative
Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM'...






Comments