skip to main content
research-article

An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main Memory

Authors Info & Claims
Published:14 January 2022Publication History
Skip Abstract Section

Abstract

A long battery life is a first-class design objective for mobile devices, and main memory accounts for a major portion of total energy consumption. Moreover, the energy consumption from memory is expected to increase further with ever-growing demands for bandwidth and capacity. A hybrid memory system with both DRAM and PCM can be an attractive solution to provide additional capacity and reduce standby energy. Although providing much greater density than DRAM, PCM has longer access latency and limited write endurance to make it challenging to architect it for main memory.

To address this challenge, this article introduces CAMP, a novel DRAM cache architecture for mobile platforms with PCM-based main memory. A DRAM cache in this environment is required to filter most of the writes to PCM to increase its lifetime, and deliver highest efficiency even for a relatively small-sized DRAM cache that mobile platforms can afford. To address this CAMP divides DRAM space into two regions: a page cache for exploiting spatial locality in a bandwidth-efficient manner and a dirty block buffer for maximally filtering writes. CAMP improves the performance and energy-delay-product by 29.2% and 45.2%, respectively, over the baseline PCM-oblivious DRAM cache, while increasing PCM lifetime by 2.7×. And CAMP also improves the performance and energy-delay-product by 29.3% and 41.5%, respectively, over the state-of-the-art design with dirty block buffer, while increasing PCM lifetime by 2.5×.

REFERENCES

  1. [1] Ahn Junwhan, Yoo Sungjoo, and Choi Kiyoung. 2015. Prediction hybrid cache: An energy-efficient STT-RAM cache architecture. IEEE Transactions on Computers 65, 3 (2015), 940951. DOI: https://doi.org/10.1109/TC.2015.2435772 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ahn J. H., Li S., O S., and Jouppi N. P.. 2013. mcsima+: A manycore simulator with application-level+ simulation and detailed microarchitecture modeling. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. DOI: https://doi.org/10.1109/ISPASS.2013.6557148 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] ANANDTECH. 2019. Intel shares new optane and 3D NAND roadmap. Retrieved December 6th, 2021 from https://www.anandtech.com/show/14903/intel-shares-new-optane-and-3d-nand-roadmap.Google ScholarGoogle Scholar
  4. [4] Asadi Sina, Monazzah Amir Mahdi Hosseini, Farbeh Hamed, and Miremadi Seyed Ghassem. [n.d.]. Wipe: Wearout informed pattern elimination to improve the endurance of nvm-based caches. In Proceedings of the 2017 22nd Asia and South Pacific Design Automation Conference. 188193.Google ScholarGoogle Scholar
  5. [5] Chang Hung-Sheng, Chang Yuan-Hao, Kuo Tei-Wei, and Li Hsiang-Pang. 2015. A light-weighted software-controlled cache for PCM-based main memory systems. In Proceedings of the 2015 IEEE/ACM International Conference on Computer-aided Design. IEEE, 2229. DOI: https://doi.org/10.1109/ICCAD.2015.7372545 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chang Yu-Ming, Hsiu Pi-Cheng, Chang Yuan-Hao, Chen Chi-Hao, Kuo Tei-Wei, and Wang Cheng-Yuan Michael. 2016. Improving PCM endurance with a constant-cost wear leveling design. ACM Transactions on Design Automation of Electronic Systems 22, 1 (2016), 127. DOI: https://doi.org/10.1145/2905364 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Cho S. and Lee H.. 2009. Flip-n-write: A simple deterministic technique to improve PRAM write performance, energy and endurance. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 347357. DOI: https://doi.org/10.1145/1669112.1669157 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Patterson John L. Hennessy and David A.. 2014. Computer Organization and Design MIPS Edition: The Hardware/ Software Interface (5th. Ed.). Morgan Kaufmann Publishers Inc., Waltham, MA. 455–466 pages.Google ScholarGoogle Scholar
  9. [9] Dhiman G., Ayoub R., and Rosing T.. 2009. PDRAM: A hybrid PRAM and DRAM main memory system. In Proceedings of the 46th ACM/IEEE Design Automation Conference. IEEE, 664669. DOI: https://doi.org/10.1145/1629911.1630086 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Dong Xiangyu, Xie Yuan, Muralimanohar Naveen, and Jouppi Norman P.. 2010. Simple but effective heterogeneous main memory with on-chip memory controller support. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 111. DOI: https://doi.org/10.1109/SC.2010.50 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Farbeh Hamed, Kim Hyeonggyu, Miremadi Seyed Ghassem, and Kim Soontae. 2016. Floating-ECC: Dynamic repositioning of error correcting code bits for extending the lifetime of STT-RAM caches. IEEE Transactions on Computers 65, 12 (2016), 36613675. DOI: https://doi.org/10.1109/TC.2016.2557326 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Ferreira Alexandre P., Zhou Miao, Bock Santiago, Childers Bruce, Melhem Rami, and Mossé Daniel. 2010. Increasing PCM main memory lifetime. In Proceedings of the 2010 Design, Automation & Test in Europe Conference & Exhibition. IEEE, 914919. DOI: https://doi.org/10.1109/DATE.2010.5456923 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Ghaemi Seyedeh Golsana, Ahmadpour Iman, Ardebili Mehdi, and Farbeh Hamed. 2019. Sleepy-LRU: Extending the lifetime of non-volatile caches by reducing activity of age bits. The Journal of Supercomputing 75, 7 (2019), 39453974. DOI: https://doi.org/10.1007/s11227-019-02758-0 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Gulur N., Mehendale M., Manikantan R., and Govindarajan R.. 2014. Bi-modal DRAM cache: Improving hit rate, hit latency and bandwidth. In Proceeding of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE. DOI: http://dx.doi.org/10.1109/MICRO.2014.36Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Huang C. C. and Nagarajan V.. 2014. ATCache: Reducing DRAM cache latency via a small SRAM tag cache. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 5160. DOI: https://doi.org/10.1145/2628071.2628089 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Intel. 2015. Intel and micron produce breakthrough memory technology. Retrieved December 6th, 2021 from https://newsroom.intel.com/news-releases/intel-and-micron-produce-breakthrough-memory-technology.Google ScholarGoogle Scholar
  17. [17] Jang H., Lee Y., Kim J., Kim Y., Kim J., Jeong J., and Lee J. W.. 2016. Efficient footprint caching for tagless DRAM caches. In Proceedings of 2016 IEEE International Symposium on High Performance Computer Architecture. IEEE. DOI: https://doi.org/10.1109/HPCA.2016.7446068Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] JEDEC. 2014. Wide I/O 2(WideIO2): JESD229-2 standard. Retrieved December 6th, 2021 from https://www.jedec.org/ standards-documents/docs/jesd229-2.Google ScholarGoogle Scholar
  19. [19] Jevdjic D., Loh G. H., Kaynak C., and Falsafi B.. 2014. Unison cache: A scalable and effective die-stacked DRAM cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 2537. DOI: https://doi.org/10.1109/MICRO.2014.51Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Jevdjic Djordje, Volos Stavros, and Falsafi Babak. 2013. Die-stacked DRAM caches for servers: Hit ratio, latency, or bandwidth? have it all with footprint cache. In Proceedings of the 40th Annual International Symposium on Computer Architecture. 404415. DOI: https://doi.org/10.1145/2508148.2485957 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Kawahara Takayuki. 2011. Scalable spin-transfer torque RAM technology for normally-off computing. IEEE Design & Test of Computers 41, 1 (2011), 5263. DOI: https://doi.org/10.1109/MDT.2010.97 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Khouzani H. A., Yang C., and Hu Jingtong. 2015. Improving performance and lifetime of DRAM-PCM hybrid main memory through a proactive page allocation strategy. In Proceedings of the 20th Asia and South Pacific Design Automation Conference. IEEE. DOI: https://doi.org/10.1109/ASPDAC.2015.7059057Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Kim Jung-Sik, Oh Chi Sung, Lee Hocheol, Lee Donghyuk, Hwang Hyong-Ryol, Hwang Sooman, Na Byongwook, Moon Joungwook, Kim Jin-Guk, Park Hanna, Ryu Jang-Woo, Park Kiwon, Kang Sang-Kyu, Kim So-Young, Kim Hoyoung, Bang Jong-Min, Cho Hyunyoon, Jang Minsoo, Han Cheolmin, Lee Jung-Bae, Kyung Kyehyun, Choi Joo-Sun, and Jun Young-Hyun. 2011. a 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4x 128 I/Os using TSV-based stacking. In Proceedings of the 2011 IEEE International Solid-state Circuits Conference. IEEE. DOI: https://doi.org/10.1109/ISSCC.2011.5746413Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lee Benjamin C., Ipek Engin, Mutlu Onur, and Burger Doug. 2009. Architecting phase change memory as a scalable DRAM alternative. In Proceedings of the 36th Annual International Symposium on Computer Architecture. ACM, 213. DOI: https://doi.org/10.1145/1555815.1555758 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Lee Dong Uk, Kim Kyung Whan, Kim Kwan Weon, Kim Hongjung, Kim Ju Young, Park Young Jun, Kim Jae Hwan, Kim Dae Suk, Park Heat Bit, Shin Jin Wook, Jang Hwan Cho, Ki Hun Kwon, Min Jeong Kim, Jaejin Lee, Kun Woo Park, Byongtae Chung, and Sungjoo Hong. 2014. 25.2 A 1.2 V 8Gb 8-channel 128GB/s High-bandwidth Memory (HBM) Stacked DRAM with Effective Microbump I/O Test Methods using 29nm Process and TSV. In Proceedings of the 2011 IEEE International Solid-state Circuits Conference. IEEE. DOI: https://doi.org/10.1109/ISSCC.2014.6757501Google ScholarGoogle Scholar
  26. [26] Lee Hyung Gyu, Baek Seungcheol, Nicopoulos Chrysostomos, and Kim Jongman. 2011. An energy-and performance-aware DRAM cache architecture for hybrid DRAM/PCM main memory systems. In Proceedings of the 2011 IEEE 29th International Conference on Computer Design. IEEE, 381387. DOI: https://doi.org/10.1109/ICCD.2011.6081427 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Lee Y., Kim J., Jang H., Yang H., Kim J., Jeong J., and Lee J. W.. 2015. A fully associative, tagless DRAM cache. In Proceedings of the 42th Annual International Symposium on Computer Architecture. ACM, 211222. DOI: https://doi.org/10.1145/2872887.2750383 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Lin Ye-jyun, Yang Chia-Lin, Li Hsiang-Pang, and Wang Cheng-Yuan Michael. 2015. A buffer cache architecture for smartphones with hybrid DRAM/PCM memory. In Proceedings of the 2015 IEEE Non-volatile Memory System and Applications Symposium. IEEE, 16. DOI: https://doi.org/10.1109/NVMSA.2015.7304363Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Loh G. H. and Hill M. D.. 2011. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches. In Proceeding of the 44th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 454464. DOI: https://doi.org/10.1145/2155620.2155673 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Mirhoseini A., Potkonjak M., and Koushanfar F.. 2012. Coding-based energy minimization for phase change memory. In Proceedings of the 49th ACM/IEEE Design Automation Conference. IEEE, 6876. DOI: https://doi.org/10.1145/2228360.2228374 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Park H., Kim C., Yoo S., and Park C.. 2015. Filtering dirty data in DRAM to reduce PRAM writes. In Proceeding of the 2015 IFIP/IEEE International Conference on Very Large Scale Integration. IEEE. DOI: https://doi.org/10.1109/VLSI-SoC.2015.7314437Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Park Hyunsun, Yoo Sungjoo, and Lee Sunggu. 2011. Power management of hybrid DRAM/PRAM-based main memory. In Proceedings of the 48th Design Automation Conference. IEEE, 5964. DOI: https://doi.org/10.1145/2024724.2024738 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Pawlowski J. T.. 2011. Hybrid Memory Cube. In Proceedings of the Hot Chips Symposium. IEEE. Retrieved on December 6th, 2021 from https://doi.org/10.1109/HOTCHIPS.2011.7477494Google ScholarGoogle Scholar
  34. [34] Pourshirazi B. and Zhu Z.. 2017. NEMO: An energy-efficient hybrid main memeory system for mobile devices. In Proceedings of the International Symposium on Memory Systems. ACM. DOI: https://doi.org/10.1145/3132402.3132445 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Qureshi Moinuddin K. and Loh Gabe H.. 2012. Fundamental latency trade-off in architecting DRAM caches: Outperforming impractical SRAM-Tags with a simple and practical design. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 235246. DOI: https://doi.org/10.1109/MICRO.2012.30 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Qureshi Moinuddin K., Srinivasan Vijayalakshmi, and Rivers Jude A.. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. ACM, 2433. DOI: https://doi.org/10.1145/1555815.1555760 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Ramos Luiz E., Gorbatov Eugene, and Bianchini Ricardo. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing. ACM, 8595. DOI: https://doi.org/10.1145/1995896.1995911 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Samsung. 2010. 4Gb DDP LPDDR2-S4 SDRAM (K4P4G304EC) datasheet. Retrieved December 6th, 2021 from https://datasheetspdf.com/.Google ScholarGoogle Scholar
  39. [39] Kim Chang Hyun Park Sanghoon Cha, Bokyeong and Huh Jaehyuk. 2019. Morphable DRAM cache design for hybrid memory systems. In Proceedings of the ACM Transactions on Architecture and Code Optimization. ACM. DOI: https://doi.org/10.1145/3338505 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Sim Jaewoong, Loh Gabriel H., Kim Hyesoon, O’Connor Mike, and Thottethodi Mithuna. 2012. A mostly-clean DRAM cache for effective hit speculation and self-balancing dispatch. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 247257. DOI: https://doi.org/10.1109/MICRO.2012.31 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Sodani A., Gramunt R., Corbal J., Kim H. S., Vinod K., Chinthamani S., Hutsell S., Agarwal R., and Liu Y. C.. 2016. Knights landing: Second-generation intel xeon phi product. IEEE Micro 36, 2 (2016), 3446. DOI: https://doi.org/10.1109/MM.2016.25 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Yun Jung-Geun Kim Su-Kyung Yoon, Jitae, and Kim Shin-Dug. 2018. Self-adaptive filtering algorithm with PCM-Based memory storage system. Tecs 17, 3 (2018), 1–23. DOI: https://doi.org/10.1145/3190856 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Sun G., Niu D., Ouyang J., and Xie Y.. 2011. A frequent-value based PRAM memory architecture. In Proceedings of the 16th Asia and South Pacific Design Automation Conference. IEEE. DOI: https://doi.org/10.1109/ASPDAC.2011.5722186 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Wang Jue, Dong Xiangyu, Xie Yuan, and Jouppi Norman P.. 2014. Endurance-aware cache line management for non-volatile caches. ACM Transactions on Architecture and Code Optimization 11, 1 (2014), 125. DOI: https://doi.org/10.1145/2579671 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Wong H. S. P., Lee H. Y., Yu S., Chen Y. S., Wu Y., Chen P. S., Lee B., Chen F. T., and Tsai M. J.. 2012. Metal-oxide RRAM. Proceedings of the IEEE 100, 6 (2012), 19511970. DOI: https://doi.org/10.1109/JPROC.2012.2190369Google ScholarGoogle Scholar
  46. [46] Yang B. D., Lee J. E., Kim J. S., Cho J., Lee S. Y., and Yu B. G.. 2007. A low power phase-change random access memory using a data-comparison write scheme. In Proceedings of the 2007 IEEE International Symposium on Circuits and Systems. IEEE. DOI: https://doi.org/10.1109/ISCAS.2007.377981Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] ZDNET. 2019. Getting ready for NVRAM: Intel’s 3D xpoint launches soon. Retrieved December 6th, 2021 from https://www.zdnet.com/article/getting-ready-for-nvram/.Google ScholarGoogle Scholar
  48. [48] Zhou Ping, Zhao Bo, Yang Jun, and Zhang Youtao. 2009. A durable and energy efficient main memory using phase change memory technology. In Proceedings of the 36th Annual International Symposium on Computer Architecture. ACM, 1423. DOI: https://doi.org/10.1145/1555815.1555759Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main Memory

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 21, Issue 1
        January 2022
        288 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3505211
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 January 2022
        • Accepted: 1 February 2021
        • Revised: 1 January 2021
        • Received: 1 November 2019
        Published in tecs Volume 21, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!