Abstract
In the big data era, a huge number of services has placed a fast-growing demand on the capacity of DRAM-based main memory. However, due to the high hardware cost and serious leakage power/energy consumption, the growth rate of DRAM capacity cannot meet the increased rate of the required main memory space when the energy or hardware cost is a critical concern. To tackle this issue, hybrid main-memory devices/modules have been proposed to replace the pure DRAM main memory with a hybrid main memory module that provides a large main memory space by integrating a small-sized DRAM and a large-sized non-volatile memory (NVM) into the same memory module. Although NVMs have high-density and low-cost features, they suffer from the low read/write performance and low endurance issue, compared to DRAM. Thus, inside the hybrid main-memory module, it also includes a memory management design to use DRAM as the cache of NVMs to enhance its performance and lifetime. However, it also introduces new design challenges in both the OS and the memory module. In this work, we rethink the interactivity of OS and hybrid main-memory module, and propose a cross-layer cache design that (1) utilizes the information from the operating system to optimize the hit ratio of the DRAM cache inside the memory module, and (2) takes advantage of the bulk-size (or block-based) read/write feature of NVM to minimize the time overhead on the data movement between DRAM and NVM. At the same time, this cross-layer cache design is very lightweight and only introduces limited runtime management overheads. A series of experiments was conducted to evaluate the effectiveness of the proposed cross-layer cache design. The results show that the proposed design could improve access performance for up to 88%, compared to the investigated well-known page replacement algorithms.
- [1] . 2004. CAR: Clock with adaptive replacement. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (FAST’04). USENIX Association, Berkeley, CA, USA, 187–200. http://dl.acm.org/citation.cfm?id=1096673.1096699.Google Scholar
- [2] . 2021. Leveraging write heterogeneity of phase change memory on supporting self-balancing binary tree. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021).Google Scholar
- [3] . 2016. Bridging the I/O performance gap for big data workloads: A new NVDIMM-based approach. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12.Google Scholar
- [4] . 2019. Towards efficient NVDIMM-based heterogeneous storage hierarchy management for big data workloads. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’52). Association for Computing Machinery, New York, NY, USA, 849–860. Google Scholar
Digital Library
- [5] . 1968. A Paging Experiment with the Multics System. Defense Technical Information Center.Google Scholar
- [6] . 2009. PDRAM: A hybrid PRAM and DRAM main memory system. In 2009 46th ACM/IEEE Design Automation Conference. 664–669.Google Scholar
Digital Library
- [7] . 2009. DFTL: A flash translation layer employing demand-based selective caching of page-level address mappings. In ASPLOS. 229–240.Google Scholar
- [8] . 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (
Sept. 2006), 1–17. Google ScholarDigital Library
- [9] . 2019. Replanting your forest: NVM-friendly bagging strategy for random forest. In 2019 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 1–6.Google Scholar
- [10] . 2005. CLOCK-Pro: An effective improvement of the CLOCK replacement. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, Berkeley, CA, USA, 35–35. http://dl.acm.org/citation.cfm?id=1247360.1247395.Google Scholar
- [11] . 2002. LIRS: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. SIGMETRICS Perform. Eval. Rev. 30, 1 (
June 2002), 31–42. Google ScholarDigital Library
- [12] . 2015. M-CLOCK: Migration-optimized page replacement algorithm for hybrid DRAM and PCM memory architecture. In Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC’15). ACM, New York, NY, USA, 2001–2006. Google Scholar
Digital Library
- [13] . 2014. CLOCK-DWF: A write-history-aware page replacement algorithm for hybrid PCM and DRAM memory architectures. IEEE Trans. Comput. 63, 9 (
Sep. 2014), 2187–2200. Google ScholarDigital Library
- [14] . 2003. ARC: A self-tuning, low overhead replacement cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST’03). USENIX Association, Berkeley, CA, USA, 115–130. http://dl.acm.org/citation.cfm?id=1090694.1090708.Google Scholar
Digital Library
- [15] Micron Technology. 2009. NAND Flash Memory MT29F64G08CBAA[A/B], MT29F128G08C[E/F]AAA, MT29F128G08CFAAB. Micron Technology.Google Scholar
- [16] Micron Technology. 2013. NAND Flash Memory MT29F64G08AB[C/E]BB, MT29F128G08AE[C/E]BB,MT29F256G08AK[C/E]BB. Micron Technology.Google Scholar
- [17] . 2012. Whole-system persistence. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’XVII). Association for Computing Machinery, New York, NY, USA, 401–410. Google Scholar
Digital Library
- [18] . 1993. The LRU-K page replacement algorithm for database disk buffering. SIGMOD Rec. 22, 2 (
June 1993), 297–306. Google ScholarDigital Library
- [19] . 2011. Page placement in hybrid memory systems. 85–95. Google Scholar
Digital Library
- [20] . 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, USA, 85–95. Google Scholar
Digital Library
- [21] . 1990. Data cache management using frequency-based replacement. In Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’90). ACM, New York, NY, USA, 134–142. Google Scholar
Digital Library
- [22] . 2011. Efficient page caching algorithm with prediction and migration for a hybrid main memory. SIGAPP Appl. Comput. Rev. 11, 4 (
Dec. 2011), 38–48. Google ScholarDigital Library
- [23] . 2018. Boosting NVDIMM performance with a lightweight caching algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 8 (
Aug. 2018), 1518–1530. Google ScholarCross Ref
- [24] . 2020. Joint management of CPU and NVDIMM for breaking down the great memory wall. IEEE Trans. Comput. 69, 5 (2020), 722–733.Google Scholar
Cross Ref
- [25] . 2020. When storage response time catches up with overall context switch overhead, what is next? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 11 (2020), 4266–4277.Google Scholar
Cross Ref
- [26] . 2018. Hot-spot suppression for resource-constrained image recognition devices with nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2567–2577.Google Scholar
Cross Ref
- [27] . 2014. APP-LRU: A new page replacement method for PCM/DRAM-based hybrid memory systems. In Network and Parallel Computing, , , and (Eds.). Springer Berlin, Berlin, 84–95.Google Scholar
- [28] . 2017. Utilizing NVDIMM to alleviate the I/O performance gap for big data workloads. In 2017 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA). 1–1.Google Scholar
Index Terms
Rethinking the Interactivity of OS and Device Layers in Memory Management
Recommendations
Improving phase change memory performance with data content aware access
ISMM 2020: Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory ManagementPhase change memory (PCM) is a scalable non-volatile memory technology that has low access latency (like DRAM) and high capacity (like Flash). Writing to PCM incurs significantly higher latency and energy penalties compared to reading its content. A ...
Accurate age counter for wear leveling on non-volatile based main memory
Limited lifetime has been a key challenge in development of emerging non-volatile memories (NVM). Age counter based wear leveling is the most effective approach in the extension of their lifetime. The age counters in these approaches are determined by ...
Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures
ICS '17: Proceedings of the International Conference on SupercomputingNon-Volatile Memory (NVM) has recently emerged for its nonvolatility, high density and energy efficiency. Hybrid memory systems composed of DRAM and NVM have the best of both worlds, because NVM can offer larger capacity and have near-zero standby power ...






Comments