Abstract
We present Nap, a black-box approach that converts concurrent persistent memory (PM) indexes into non-uniform memory access (NUMA)-aware counterparts. Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. The NAL maintains (1) per-node partial views in PM for serving insert/update/delete operations with failure atomicity and (2) a global view in DRAM for serving lookup operations. The NAL eliminates remote PM accesses to hot items without inducing extra local PM accesses. Moreover, to handle dynamic workloads, Nap adopts a fast NAL switch mechanism. We convert five state-of-the-art PM indexes using Nap. Evaluation on a four-node machine with Optane DC Persistent Memory shows that Nap can improve the throughput by up to 2.3× and 1.56× under write-intensive and read-intensive workloads, respectively.
- [1] 2020. AutoNUMA: The Other Approach to NUMA Scheduling. Retrieved 01 Dec., 2020 from https://lwn.net/Articles/488709/.Google Scholar
- [2] 2020. Distributed Reader-Writer Mutex. Retrieved 01 Dec., 2020 from http://www.1024cores.net/home/lock-free-algorithms/reader-writer-problem/distributed-reader-writer-mutex.Google Scholar
- [3] 2020. Implementation of P-Masstree and FAST_FAIR. Retrieved 01 Dec., 2020 from https://github.com/utsaslab/RECIPE/.Google Scholar
- [4] 2020. Persistent Memory Development Kit. Retrieved 01 Dec., 2020 from https://pmem.io/pmdk/.Google Scholar
- [5] 2020. PMDK Implementation of Clevel, CCEH and P-CLHT. Retrieved 01 Dec., 2020 from https://github.com/chenzhangyu/Clevel-Hashing/.Google Scholar
- [6] 2020. Processor Counter Monitor (PCM). Retrieved 01 Dec., 2020 from https://github.com/opcm/pcm.Google Scholar
- [7] 2020. Sequential Locks. Retrieved 01 Dec., 2020 from https://www.kernel.org/doc/html/latest/locking/seqlock.html.Google Scholar
- [8] 2021. Intel 64 and IA-32 Architectures Optimization Reference Manual. Retrieved 15 Oct., 2021 from https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf.Google Scholar
- [9] 2021. Intel Optane Persistent Memory 200 Series Brief. Retrieved 15 Oct., 2021 from https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/optane-persistent-memory-200-series-brief.html.Google Scholar
- [10] 2021. Intel Xeon Processor Scalable Family Technical Overview. Retrieved 15 Oct., 2021 from https://software.intel.com/content/www/us/en/develop/articles/intel-xeon-processor-scalable-family-technical-overview.html.Google Scholar
- [11] . 2020. Mitosis: Transparently self-replicating page-tables for large-memory machines. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 283–300.
DOI: https://doi.org/10.1145/3373376.3378468 Google ScholarDigital Library
- [12] . 2020. Assise: Performance and availability via client-local NVM in a distributed file system. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 1011–1027. Retrieved from https://www.usenix.org/conference/osdi20/presentation/anderson.Google Scholar
- [13] . 2012. Workload analysis of a large-scale key-value store. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems. Association for Computing Machinery, New York, NY, 53–64.
DOI: https://doi.org/10.1145/2254756.2254766 Google ScholarDigital Library
- [14] . 2021. NrOS: Effective replication and sharing in an operating system. In Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association.Google Scholar
- [15] . 2017. Black-box concurrent data structures for NUMA architectures. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 207–221.
DOI: https://doi.org/10.1145/3037697.3037721 Google ScholarDigital Library
- [16] . 2020. Characterizing, modeling, and benchmarking RocksDB key-value workloads at Facebook. In Proceedings of the 18th USENIX Conference on File and Storage Technologies. USENIX Association, Santa Clara, CA, 209–223. Retrieved from https://www.usenix.org/conference/fast20/presentation/cao-zhichao. Google Scholar
Digital Library
- [17] . 2015. High performance locks for multi-level NUMA systems. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Association for Computing Machinery, New York, NY, 215–226.
DOI: https://doi.org/10.1145/2688500.2688503 Google ScholarDigital Library
- [18] . 2020. HotRing: A hotspot-aware in-memory key-value store. In Proceedings of the 18th USENIX Conference on File and Storage Technologies. USENIX Association, Santa Clara, CA, 239–252. Retrieved from https://www.usenix.org/conference/fast20/presentation/chen-jiqiang. Google Scholar
Digital Library
- [19] . 2015. Persistent B\(^+\)-trees in non-volatile main memory. Proceedings of the VLDB Endowment 8, 7 (
Feb. 2015), 786–797.DOI: https://doi.org/10.14778/2752939.2752947 Google ScholarDigital Library
- [20] . 2020. uTree: A persistent B+-Tree with low tail latency. Proceedings of the VLDB Endowment 13, 12 (
July 2020), 2634–2648.DOI: https://doi.org/10.14778/3407790.3407850 Google ScholarDigital Library
- [21] . 2020. FlatStore: An efficient log-structured key-value storage engine for persistent memory. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 1077–1091.
DOI: https://doi.org/10.1145/3373376.3378515 Google ScholarDigital Library
- [22] . 2021. Scalable persistent memory file system with kernel-userspace collaboration. In Proceedings of the 19th USENIX Conference on File and Storage Technologies. USENIX Association, 81–95. Retrieved from https://www.usenix.org/conference/fast21/presentation/chen-youmin.Google Scholar
- [23] . 2020. Lock-free concurrent level hashing for persistent memory. In Proceedings of the 2020 USENIX Annual Technical Conference. USENIX Association, 799–812. Retrieved from https://www.usenix.org/conference/atc20/presentation/chen. Google Scholar
Digital Library
- [24] . 2019. Fine-grain checkpointing with in-cache-line logging. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 441–454.
DOI: https://doi.org/10.1145/3297858.3304046 Google ScholarDigital Library
- [25] . 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, NY, 143–154.
DOI: https://doi.org/10.1145/1807128.1807152 Google ScholarDigital Library
- [26] . 2005. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms 55, 1 (
April 2005), 58–75.DOI: https://doi.org/10.1016/j.jalgor.2003.12.001 Google ScholarDigital Library
- [27] . 2021. Maximizing persistent memory bandwidth utilization for OLAP workloads. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD’21). ACM. Google Scholar
Digital Library
- [28] . 2018. NUMASK: High performance scalable skip list for NUMA. In Proceedings of the 32nd International Symposium on Distributed Computing.Google Scholar
- [29] . 2013. Traffic management: A holistic approach to memory placement on NUMA systems. In Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 381–394.
DOI: https://doi.org/10.1145/2451116.2451157 Google ScholarDigital Library
- [30] . 2015. Asynchronized concurrency: The secret to scaling concurrent search data structures. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 631–644.
DOI: https://doi.org/10.1145/2694344.2694359 Google ScholarDigital Library
- [31] . 2011. Flat-combining NUMA locks. In Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery, New York, NY, 65–74.
DOI: https://doi.org/10.1145/1989493.1989502 Google ScholarDigital Library
- [32] . 2015. Lock cohorting: A general technique for designing NUMA locks. ACM Transactions on Parallel Computing 1, 2, (
Feb. 2015), 42 pages.DOI: https://doi.org/10.1145/2686884 Google ScholarDigital Library
- [33] . 2019. Size-aware sharding for improving tail latencies in in-memory key-value stores. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 79–93. Google Scholar
Digital Library
- [34] . 2019. Performance and protection in the ZoFS user-space NVM file system. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, 478–493.
DOI: https://doi.org/10.1145/3341301.3359637 Google ScholarDigital Library
- [35] . 2011. Small cache, big effect: Provable load balancing for randomly partitioned cluster services. In Proceedings of the 2nd ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, NY, Article
23 , 12 pages.DOI: https://doi.org/10.1145/2038916.2038939 Google ScholarDigital Library
- [36] . 2004. Practical Lock-Freedom. Ph. D. Dissertation. University of Cambridge, UK. Retrieved from http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.599193.Google Scholar
- [37] . 2020. NVTraverse: In NVRAM data structures, the destination is more important than the journey. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery, New York, NY, 377–392.
DOI: https://doi.org/10.1145/3385412.3386031 Google ScholarDigital Library
- [38] . 2021. Mirror: making lock-free data structures persistent. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation Association for Computing Machinery, New York, NY, 1218–1232.
DOI: https://doi.org/10.1145/3453483.3454105 Google ScholarDigital Library
- [39] . 2019. Pisces: A scalable and efficient persistent transactional memory. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 913–928. Google Scholar
Digital Library
- [40] . 2020. MOD: Minimally ordered durable datastructures for persistent memory. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 775–788.
DOI: https://doi.org/10.1145/3373376.3378472 Google ScholarDigital Library
- [41] . 2010. Flat combining and the synchronization-parallelism tradeoff. In Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery, New York, NY, 355–364.
DOI: https://doi.org/10.1145/1810479.1810540 Google ScholarDigital Library
- [42] . 2014. Characterizing load imbalance in real-world networked caches. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks. Association for Computing Machinery, New York, NY, 1–7.
DOI: https://doi.org/10.1145/2670518.2673882 Google ScholarDigital Library
- [43] . 2018. Endurable transient inconsistency in byte-addressable persistent B+-Tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies. USENIX Association, Oakland, CA, 187–200. Retrieved from https://www.usenix.org/conference/fast18/presentation/hwang. Google Scholar
Digital Library
- [44] . 2017. NetCache: Balancing key-value stores with fast in-network caching. In Proceedings of the 26th Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, 121–136.
DOI: https://doi.org/10.1145/3132747.3132764 Google ScholarDigital Library
- [45] . 2019. SplitFS: Reducing software overhead in file systems for persistent memory. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, 494–508.
DOI: https://doi.org/10.1145/3341301.3359631 Google ScholarDigital Library
- [46] . 2020. Challenges and solutions for fast remote persistent memory access. In Proceedings of the 11th ACM Symposium on Cloud Computing. Association for Computing Machinery, New York, NY, 105–119.
DOI: https://doi.org/10.1145/3419111.3421294 Google ScholarDigital Library
- [47] . 2019. Datacenter RPCs can be general and fast. In Proceedings of the 16th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 1–16. Google Scholar
Digital Library
- [48] . 2018. A scalable ordering primitive for multicore machines. In Proceedings of the 13th EuroSys Conference. Association for Computing Machinery, New York, NY, Article
34 , 15 pages.DOI: https://doi.org/10.1145/3190508.3190510 Google ScholarDigital Library
- [49] . 2017. Scalable NUMA-aware blocking synchronization primitives. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 603–615. Google Scholar
Digital Library
- [50] . 2021. Exploring the design space of page management for multi-tiered memory systems. In Proceedings of the 2021 USENIX Annual Technical Conference. USENIX Association, 715–728. Retrieved from https://www.usenix.org/conference/atc21/presentation/kim-jonghyeon.Google Scholar
- [51] . 2021. PACTree: A high performance persistent range index using PAC guidelines. In Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, 424–439.
DOI: https://doi.org/10.1145/3477132.3483589 Google ScholarDigital Library
- [52] . 2020. Durable transactional memory can scale with timestone. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 335–349.
DOI: https://doi.org/10.1145/3373376.3378483 Google ScholarDigital Library
- [53] . 2021. TIPS: Making volatile index structures persistent with DRAM-NVMM tiering. In Proceedings of the 2021 USENIX Conference on Usenix Annual Technical Conference. USENIX Association.Google Scholar
- [54] . 2017. WORT: Write optimal radix tree for persistent memory storage systems. In Proceedings of the 15th Usenix Conference on File and Storage Technologies. USENIX Association, 257–270. Google Scholar
Digital Library
- [55] . 2019. Recipe: Converting concurrent DRAM indexes to persistent-memory indexes. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, 462–477.
DOI: https://doi.org/10.1145/3341301.3359635 Google ScholarDigital Library
- [56] . 2016. Be fast, cheap and in control with SwitchKV. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation. USENIX Association, 31–44. Google Scholar
Digital Library
- [57] . 2017. Cicada: Dependably fast multi-core in-memory transactions. In Proceedings of the 2017 ACM International Conference on Management of Data. Association for Computing Machinery, New York, NY, 21–35.
DOI: https://doi.org/10.1145/3035918.3064015 Google ScholarDigital Library
- [58] . 2020. LB+Trees: Optimizing persistent index performance on 3DXPoint memory. Proceedings of the VLDB Endowment 13, 7 (
March 2020), 1078–1090.DOI: https://doi.org/10.14778/3384345.3384355 Google ScholarDigital Library
- [59] . 2019. DistCache: Provable load balancing for large-scale storage systems with distributed caching. In Proceedings of the 17th USENIX Conference on File and Storage Technologies. USENIX Association, 143–157. Google Scholar
Digital Library
- [60] . 2020. Dash: Scalable hashing on persistent memory. Proceedings of the VLDB Endowment 13, 10 (
April 2020), 1147–1161.DOI: https://doi.org/10.14778/3389133.3389134 Google ScholarDigital Library
- [61] . 2021. ROART: Range-query optimized persistent ART. In Proceedings of the 19th USENIX Conference on File and Storage Technologies. USENIX Association, 1–16. Retrieved from https://www.usenix.org/conference/fast21/presentation/ma.Google Scholar
- [62] . 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems. Association for Computing Machinery, New York, NY, 183–196.
DOI: https://doi.org/10.1145/2168836.2168855 Google ScholarDigital Library
- [63] . 2020. HydraList: A scalable in-memory index using asynchronous updates and partial replication. Proceedings of the VLDB Endowment 13, 9 (
May 2020), 1332–1345.DOI: https://doi.org/10.14778/3397230.3397232 Google ScholarDigital Library
- [64] . 2020. Pronto: Easy and fast persistence for volatile data structures. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 789–806.
DOI: https://doi.org/10.1145/3373376.3378456 Google ScholarDigital Library
- [65] . 2019. Write-optimized dynamic hashing for persistent memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies. USENIX Association, Boston, MA, 31–44. Retrieved from https://www.usenix.org/conference/fast19/presentation/nam. Google Scholar
Digital Library
- [66] . 2017. Dalí: A periodically persistent hash map. In Proceedings of the 31st International Symposium on Distributed Computing.Google Scholar
- [67] . 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data. Association for Computing Machinery, New York, NY, 371–386.
DOI: https://doi.org/10.1145/2882903.2915251 Google ScholarDigital Library
- [68] . 2021. Fast local page-tables for virtualized NUMA servers with VMitosis. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 194–210.
DOI: https://doi.org/10.1145/3445814.3446709 Google ScholarDigital Library
- [69] . 2019. System evaluation of the intel optane byte-addressable NVM. In Proceedings of the International Symposium on Memory Systems. Association for Computing Machinery, New York, NY, 304–315.
DOI: https://doi.org/10.1145/3357526.3357568 Google ScholarDigital Library
- [70] . 2003. Hierarchical backoff locks for nonuniform communication architectures. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture. 241–252.
DOI: https://doi.org/10.1109/HPCA.2003.1183542 Google ScholarDigital Library
- [71] . 2020. TH-DPMS: Design and implementation of an RDMA-enabled distributed persistent memory storage system. ACM Transactions on Storage 16, 4, (
Oct. 2020), 31 pages.DOI: https://doi.org/10.1145/3412852 Google ScholarDigital Library
- [72] . 2020. IOctopus: Outsmarting nonuniform DMA. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 101–115.
DOI: https://doi.org/10.1145/3373376.3378509 Google ScholarDigital Library
- [73] . 2013. Speedy transactions in multicore in-memory databases. In Proceedings of the 24th ACM Symposium on Operating Systems Principles. Association for Computing Machinery, New York, NY, 18–32.
DOI: https://doi.org/10.1145/2517349.2522713 Google ScholarDigital Library
- [74] . 2011. Consistent and durable data structures for non-volatile byte-addressable memory. In Proceedings of the 9th USENIX Conference on File and Stroage Technologies. USENIX Association, 5. Google Scholar
Digital Library
- [75] . 2020. NUMA-aware thread migration for high performance NVMM file systems. In Proceedings of the 36th International Conference on Massive Storage Systems and Technology.Google Scholar
- [76] . 2021. Characterizing and optimizing remote persistent memory with RDMA and NVM. In Proceedings of the 2021 USENIX Annual Technical Conference. USENIX Association, 523–536. Retrieved from https://www.usenix.org/conference/atc21/presentation/wei.Google Scholar
- [77] . 2019. Finding and fixing performance pathologies in persistent memory software stacks. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, New York, NY, 427–439.
DOI: https://doi.org/10.1145/3297858.3304077 Google ScholarDigital Library
- [78] . 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies. USENIX Association, Santa Clara, CA, 323–338. Retrieved from https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu. Google Scholar
Digital Library
- [79] . 2020. An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies. USENIX Association, Santa Clara, CA, 169–182. Retrieved from https://www.usenix.org/conference/fast20/presentation/yang. Google Scholar
Digital Library
- [80] . 2015. NV-Tree: Reducing consistency cost for NVM-based single level systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies. USENIX Association, 167–181. Google Scholar
Digital Library
- [81] . 2020. A large scale analysis of hundreds of in-memory cache clusters at Twitter. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 191–208. Retrieved from https://www.usenix.org/conference/osdi20/presentation/yang. Google Scholar
Digital Library
- [82] . 2020. Persistent state machines for recoverable in-memory storage systems with NVRam. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, 1029–1046. Retrieved from https://www.usenix.org/conference/osdi20/presentation/zhang-wen. Google Scholar
Digital Library
- [83] . 2019. DPTree: Differential indexing for persistent memory. Proceedings of the VLDB Endowment 13, 4 (
Dec. 2019), 421–434.DOI: https://doi.org/10.14778/3372716.3372717 Google ScholarDigital Library
- [84] . 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation. USENIX Association, Carlsbad, CA, 461–476. Retrieved from https://www.usenix.org/conference/osdi18/presentation/zuo.Google Scholar
Index Terms
Nap: Persistent Memory Indexes for NUMA Architectures
Recommendations
A Case for Virtualizing Persistent Memory
SoCC '16: Proceedings of the Seventh ACM Symposium on Cloud ComputingWith the proliferation of software and hardware support for persistent memory (PM) like PCM and NV-DIMM, we envision that PM will soon become a standard component of commodity cloud, especially for those applications demanding high performance and low ...
Toward Virtual Machine Image Management for Persistent Memory
Persistent memory’s (PM) byte-addressability and high capacity will also make it emerging for virtualized environment. Modern virtual machine monitors virtualize PM using either I/O virtualization or memory virtualization. However, I/O virtualization will ...
NUMA Time Warp
SIGSIM PADS '15: Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete SimulationIt is well known that Time Warp may suffer from large usage of memory, which may hamper the efficiency of the memory hierarchy. To cope with this issue, several approaches have been devised, mostly based on the reduction of the amount of used virtual ...






Comments