skip to main content
research-article

Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

Authors Info & Claims
Published:10 March 2014Publication History
Skip Abstract Section

Abstract

The recent emergence of various Non-Volatile Memories (NVMs), with many attractive characteristics such as low leakage power and high-density, provides us with a new way of addressing the memory power consumption problem. In this article, we target embedded CMPs, and propose a novel Hybrid Scratch Pad Memory (HSPM) architecture which consists of SRAM and NVM to take advantage of the ultra-low leakage power, high density of NVM, and fast access of SRAM. A novel data allocation algorithm as well as an algorithm to determine the NVM/SRAM ratio for the novel HSPM architecture are proposed. The experimental results show that the data allocation algorithm can reduce the memory access time by 33.51% and the dynamic energy consumption by 16.81% on average for the HSPM architecture when compared with a greedy algorithm. The NVM/SRAM size determination algorithm can further reduce the memory access time by 14.7% and energy consumption by 20.1% on average.

References

  1. O. Avissar, R. Barua, and D. Stewart. 2001. Heterogeneous memory management for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01). 34--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Avissar, R. Barua, and D. Stewart. 2002. An optimal memory allocation scheme for scratch-pad-based embedded systems. ACM Trans. Embed. Comput. Syst. 1, 1, 6--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. 2002. Scratchpad memory: design alternative for cache on-chip memory in embedded systems. In Proceedings of the International Workshop on Hardware/Software Codesign (CODES'02). 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Bienia. 2011. Benchmarking modern multiprocessors. Ph.D. thesis, Princeton University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. 2006. The m5 simulator: Modeling networked systems. IEEE Micro 26, 52--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. W. Che, A. Panda, and K. S. Chatha. 2010. Compilation of stream programs for multicore processors that incorporate scratchpad memories. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 1118--1123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Chen, H. Li, X. Wang, W. Zhu, W. Xu, and T. Zhang. 2010. A nondestructive self-reference scheme for spin-transfer torque random access memory (stt-ram). In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 148--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. C. Chun, P. Jain, and C. H. Kim. 2009. A 0.9v, 65nm logic-compatible embedded dram with > 1ms data retention time and 53% less static power than a power-gated sram. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'09). 119--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Culler, J. P. Singh, and A. Gupta. 1998. Parallel Computer Architecture: A Hardware/Software Approach. 1st Ed. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Dhiman, R. Ayoub, and T. Rosing. 2009. Pdram: a hybrid pram and dram main memory system. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'09). 664--469. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Dominguez, S. Udayakumaran, and R. Barua. 2005. Heap data allocation to scratch-pad memory in embedded systems. J. Embed. Comput. 1, 4, 521--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. Dong, N. P. Jouppi, and Y. Xie. 2009. Pcramsim: System-level performance, energy, and area modeling for phase-change ram. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'09). 269--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. Dong, X. Wu, G. Sun, Y. Xie, H. Li, and Y. Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic ram (mram) as a universal memory replacement. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'08). 554--559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Du, Y. Wang, Q. Zhuge, J. Hu, and E. H.-M. Sha. 2013. Efficient loop scheduling for chip-multiprocessors with non-volatile main memory. J. Signal Proces. Syst., 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Mossé. 2010. Increasing pcm main memory lifetime. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 914--919. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Guo, Q. Zhuge, J. Hu, M. Qiu, and E.-M. Sha. 2011. Optimal data allocation for scratch-pad memory on embedded multi-core systems. In Proceedings of the International Conference on Parallel Processing (ICPP'11). 464--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Hosomi, H. Yamagishi. et al. 2005. A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-ram. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'09). 459--462.Google ScholarGoogle Scholar
  18. J. Hu, W.-C. Tseng, C. J. Xue, Q. Zhuge, Y. Zhao, and E. H.-M. Sha. 2011. Write activity minimization for non-volatile main memory via scheduling and recomputation. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 30, 4, 584--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Hu, C. J. Xue, W.-C. Tseng, Y. He, M. Qiu, and E. H.-M. Sha. 2010a. Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'10). 350--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Hu, C. J. Xue, W.-C. Tseng, Q. Zhuge, and E. H.-M. Sha. 2010b. Minimizing write activities to non-volatile memory via scheduling and recomputation. In Proceedings of the IEEE 8th Symposium on Application Specific Processors (SASP'10). 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2011. Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'11). 1--6.Google ScholarGoogle Scholar
  22. J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2012a. Data allocation optimization for hybrid scratch pad memory with sram and non-volatile memory. IEEE Trans. VLSI Syst., 1--9.Google ScholarGoogle Scholar
  23. J. Hu, C. J. Xue, Q. Zhuge, W.-C. Tseng, and E. H.-M. Sha. 2012b. Write activity reduction on non-volatile main memories for embedded chip multi-processors. ACM Trans. Embed. Comput. Syst. 12, 3, 1--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Hu, Q. Zhuge, C. Xue, W.-C. Tseng, and E. Sha. 2012. Optimizing data allocation and memory configuration for non-volatile memory based hybrid spm on embedded cmps. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'12). 982--989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Z. Hu, G. Gerfin, B. Dobry, and G. R. Gao. 2006. Programming experience on cyclops-64 multi-core chip architecture. In Proceedings of the 1st Workshop on Software Tools for Multi-Core Systems (STMCS'06).Google ScholarGoogle Scholar
  26. L. Jiang, Y. Du, Y. Zhang, B. Childers, and J. Yang. 2011. Lls: Cooperative integration of wear-leveling and salvaging for pcm main memory. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'11). 221--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Joo, D. Niu, X. Dong, G. Sun, N. Chang, and Y. Xie. 2010. Energy- and endurance-aware design of phase change memory caches. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'10). 136--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Kandemir, M. J. Irwin, G. Chen, and I. Kolcu. 2004. Banked scratch-pad memory management for reducing leakage energy consumption. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD'04). 120--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Kandemir, M. J. Irwin, G. Chen, and I. Kolcu. 2005. Compiler-guided leakage optimization for banked scratch-pad memories. IEEE Trans. VLSI Syst. 13, 10, 1136--1146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Kandemir, J. Ramanujam, and A. Choudhary. 2002. Exploiting shared scratch pad memory space in embedded multiprocessor systems. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'02). 219--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Q. Li, J. Li, L. Shi, C. J. Xue, and Y. He. 2012. Mac: migration-aware compilation for stt-ram based hybrid cache in embedded systems. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED'12). 351--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q. Li, Y. Zhao, J. Hu, C. J. Xue, E. H.-M. Sha, and Y. He. 2012. Mgc: Multiple graph-coloring for non-volatile memory based hybrid scratchpad memory. In Proceedings of the 16th Workshop on Interaction between Compilers and Computer Architectures. 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Liu, Y. Zhao, C. Xue, and M. Li. 2011. Power-aware variable partitioning for dsps with hybrid pram and dram main memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC'11). 405--410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. Mangalagiri, K. Sarpatwari, A. Yanamandra, V. Narayanan, Y. Xie, M. J. Irwin, and O. A. Karim. 2008. A low-power phase change memory based hybrid cache architecture. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI'08). 395--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. 2009. Cacti 6.0: A tool to model large caches. Tech. Rep. HPL-2009-85, HP Laboratories.Google ScholarGoogle Scholar
  37. O. Ozturk, M. Kandemir, and I. Kolcu. 2006. Shared scratch-pad memory space management. In Proceedings of the International Symposium on Quality Electronic Design (ISQED'06). 576--584. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. R. Panda, N. D. Dutt, and A. Nicolau. 1997. Efficient utilization of scratch-pad memory in embedded processor applications. In Proceedings of the European Design and Test Conference (EDTC'97). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. K. Qureshi, V. Srinivasan, and J. A. Rivers. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Shang, W. Fei, and H. Yu. 2012. Analysis and modeling of internal state variables for dynamic effects of nonvolatile memory devices. IEEE Trans. Circuits Syst. Regul. Pap. 59, 9, 1.Google ScholarGoogle ScholarCross RefCross Ref
  41. L. Shi, C. J. Xue, J. Hu, W.-C. Tseng, and E. H.-M. Sha. 2010. Write activity reduction on flash main memory via smart victim cache. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI'10). 91--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Sjödin, B. Fröderberg, and L. Thomas. 1998. Allocation of global data objects in on-chip ram. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'98). 1--5.Google ScholarGoogle Scholar
  43. J. Sjödin, and C. Von Platen. 2001. Storage allocation for embedded processors. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'01). 15--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. W.-C. Tseng, C. J. Xue, Q. Zhuge, J. Hu, and E. H.-M. Sha. 2010. Optimal scheduling to minimize non-volatile memory access time with hardware cache. In Proceedings of the 18th IEEE/IFIP VLSI System on Chip Conference (VLSI-SOC'10). 131--136.Google ScholarGoogle Scholar
  45. S. Udayakumaran, and R. Barua. 2003. Compiler-decided dynamic memory allocation for scratch-pad based embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES'03). 276--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. S. Udayakumaran, A. Dominguez, and R. Barua. 2006. Dynamic allocation for scratch-pad memory using compile-time decisions. ACM Trans. Embed. Comput. Syst. 5, 2, 472--511. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Wang, J. Du, J. Hu, Q. Zhuge, and E.-M. Sha. 2012. Loop scheduling optimization for chip-multiprocessors with non-volatile main memory. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP'12). 1553--1556.Google ScholarGoogle ScholarCross RefCross Ref
  48. X. Wu, J. Li, L. Zhang, E. Speight, R. Rajamony, and Y. Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 34--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. X. Wu, J. Li, L. Zhang, E. Speight, and Y. Xie. 2009. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe (DATE'09). 737--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Y. Xie, G. H. Loh, B. Black, and K. Bernstein. 2006. Design space exploration for 3D architectures. J. Emerg. Technol. Comput. Syst. 2, 2, 65--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'09). 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!