skip to main content
research-article

Rethinking the Interactivity of OS and Device Layers in Memory Management

Published:23 August 2022Publication History
Skip Abstract Section

Abstract

In the big data era, a huge number of services has placed a fast-growing demand on the capacity of DRAM-based main memory. However, due to the high hardware cost and serious leakage power/energy consumption, the growth rate of DRAM capacity cannot meet the increased rate of the required main memory space when the energy or hardware cost is a critical concern. To tackle this issue, hybrid main-memory devices/modules have been proposed to replace the pure DRAM main memory with a hybrid main memory module that provides a large main memory space by integrating a small-sized DRAM and a large-sized non-volatile memory (NVM) into the same memory module. Although NVMs have high-density and low-cost features, they suffer from the low read/write performance and low endurance issue, compared to DRAM. Thus, inside the hybrid main-memory module, it also includes a memory management design to use DRAM as the cache of NVMs to enhance its performance and lifetime. However, it also introduces new design challenges in both the OS and the memory module. In this work, we rethink the interactivity of OS and hybrid main-memory module, and propose a cross-layer cache design that (1) utilizes the information from the operating system to optimize the hit ratio of the DRAM cache inside the memory module, and (2) takes advantage of the bulk-size (or block-based) read/write feature of NVM to minimize the time overhead on the data movement between DRAM and NVM. At the same time, this cross-layer cache design is very lightweight and only introduces limited runtime management overheads. A series of experiments was conducted to evaluate the effectiveness of the proposed cross-layer cache design. The results show that the proposed design could improve access performance for up to 88%, compared to the investigated well-known page replacement algorithms.

REFERENCES

  1. [1] Bansal Sorav and Modha Dharmendra S.. 2004. CAR: Clock with adaptive replacement. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04). USENIX Association, Berkeley, CA, USA, 187200. http://dl.acm.org/citation.cfm?id=1096673.1096699.Google ScholarGoogle Scholar
  2. [2] Chang Che-Wei, Wu Chun-Feng, Chang Yuan-Hao, Yang Ming-Chang, and Chang Chieh-Fu. 2021. Leveraging write heterogeneity of phase change memory on supporting self-balancing binary tree. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).Google ScholarGoogle Scholar
  3. [3] Chen R., Shao Z., and Li T.. 2016. Bridging the I/O performance gap for big data workloads: A new NVDIMM-based approach. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 112.Google ScholarGoogle Scholar
  4. [4] Chen Renhai, Shao Zili, Liu Duo, Feng Zhiyong, and Li Tao. 2019. Towards efficient NVDIMM-based heterogeneous storage hierarchy management for big data workloads. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). Association for Computing Machinery, New York, NY, USA, 849860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Corbató F. J. and Technology) Project MAC (Massachusetts Institute of. 1968. A Paging Experiment with the Multics System. Defense Technical Information Center.Google ScholarGoogle Scholar
  6. [6] Dhiman G., Ayoub R., and Rosing T.. 2009. PDRAM: A hybrid PRAM and DRAM main memory system. In 2009 46th ACM/IEEE Design Automation Conference. 664669.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Gupta Aayush, Kim Youngjae, and Urgaonkar Bhuvan. 2009. DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings. In ASPLOS. 229240.Google ScholarGoogle Scholar
  8. [8] Henning John L.. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Ho Yu Ting, Wu Chun-Feng, Yang Ming-Chang, Chen Tseng-Yi, and Chang Yuan-Hao. 2019. Replanting your forest: NVM-friendly bagging strategy for random forest. In 2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 16.Google ScholarGoogle Scholar
  10. [10] Jiang Song, Chen Feng, and Zhang Xiaodong. 2005. CLOCK-Pro: An effective improvement of the CLOCK replacement. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, Berkeley, CA, USA, 3535. http://dl.acm.org/citation.cfm?id=1247360.1247395.Google ScholarGoogle Scholar
  11. [11] Jiang Song and Zhang Xiaodong. 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. SIGMETRICS Perform. Eval. Rev. 30, 1 (June 2002), 3142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Lee Minho, Kang Dong Hyun, Kim Junghoon, and Eom Young Ik. 2015. M-CLOCK: Migration-optimized page replacement algorithm for hybrid DRAM and PCM memory architecture. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC’15). ACM, New York, NY, USA, 20012006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Lee S., Bahn H., and Noh S. H.. 2014. CLOCK-DWF: A write-history-aware page replacement algorithm for hybrid PCM and DRAM memory architectures. IEEE Trans. Comput. 63, 9 (Sep. 2014), 21872200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Megiddo Nimrod and Modha Dharmendra S.. 2003. ARC: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST’03). USENIX Association, Berkeley, CA, USA, 115130. http://dl.acm.org/citation.cfm?id=1090694.1090708.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Micron Technology. 2009. NAND Flash Memory MT29F64G08CBAA[A/B], MT29F128G08C[E/F]AAA, MT29F128G08CFAAB. Micron Technology.Google ScholarGoogle Scholar
  16. [16] Micron Technology. 2013. NAND Flash Memory MT29F64G08AB[C/E]BB, MT29F128G08AE[C/E]BB,MT29F256G08AK[C/E]BB. Micron Technology.Google ScholarGoogle Scholar
  17. [17] Narayanan Dushyanth and Hodson Orion. 2012. Whole-system persistence. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’XVII). Association for Computing Machinery, New York, NY, USA, 401410. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] O’Neil Elizabeth J., O’Neil Patrick E., and Weikum Gerhard. 1993. The LRU-K page replacement algorithm for database disk buffering. SIGMOD Rec. 22, 2 (June 1993), 297306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Ramos Luiz, Gorbatov Eugene, and Bianchini Ricardo. 2011. Page placement in hybrid memory systems. 8595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Ramos Luiz E., Gorbatov Eugene, and Bianchini Ricardo. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, USA, 8595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Robinson John T. and Devarakonda Murthy V.. 1990. Data cache management using frequency-based replacement. In Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’90). ACM, New York, NY, USA, 134142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Seok Hyunchul, Park Youngwoo, Park Ki-Woong, and Park Kyu Ho. 2011. Efficient page caching algorithm with prediction and migration for a hybrid main memory. SIGAPP Appl. Comput. Rev. 11, 4 (Dec. 2011), 3848. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Tsao C., Chang Y., and Kuo T.. 2018. Boosting NVDIMM performance with a lightweight caching algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (Aug. 2018), 15181530. Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Wu Chun-Feng, Chang Yuan-Hao, Yang Ming-Chang, and Kuo Tei-Wei. 2020. Joint management of CPU and NVDIMM for breaking down the great memory wall. IEEE Trans. Comput. 69, 5 (2020), 722733.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Wu Chun-Feng, Chang Yuan-Hao, Yang Ming-Chang, and Kuo Tei-Wei. 2020. When storage response time catches up with overall context switch overhead, what is next? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 42664277.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Wu Chun-Feng, Yang Ming-Chang, Chang Yuan-Hao, and Kuo Tei-Wei. 2018. Hot-spot suppression for resource-constrained image recognition devices with nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 25672577.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Wu Zhangling, Jin Peiquan, Yang Chengcheng, and Yue Lihua. 2014. APP-LRU: A new page replacement method for PCM/DRAM-based hybrid memory systems. In Network and Parallel Computing, Hsu Ching-Hsien, Shi Xuanhua, and Salapura Valentina (Eds.). Springer Berlin, Berlin, 8495.Google ScholarGoogle Scholar
  28. [28] Shao Zili. 2017. Utilizing NVDIMM to alleviate the I/O performance gap for big data workloads. In 2017 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA). 11.Google ScholarGoogle Scholar

Index Terms

  1. Rethinking the Interactivity of OS and Device Layers in Memory Management

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Embedded Computing Systems
        ACM Transactions on Embedded Computing Systems  Volume 21, Issue 4
        July 2022
        330 pages
        ISSN:1539-9087
        EISSN:1558-3465
        DOI:10.1145/3551651
        • Editor:
        • Tulika Mitra
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 August 2022
        • Online AM: 27 April 2022
        • Accepted: 1 April 2022
        • Revised: 1 March 2022
        • Received: 1 November 2021
        Published in tecs Volume 21, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)185
        • Downloads (Last 6 weeks)6

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!