Abstract
Computer systems utilizing byte-addressable Non-Volatile Memory (NVM) as memory/storage can provide low-latency data persistence. The widely used key-value stores using Log-Structured Merge Tree (LSM-Tree) are still beneficial for NVM systems in aspects of the space and write efficiency. However, the significant write amplification introduced by the leveled compaction of LSM-Tree degrades the write performance of the key-value store and shortens the lifetime of the NVM devices. The existing studies propose new compaction methods to reduce write amplification. Unfortunately, they result in a relatively large read amplification. In this article, we propose NVLSM, a key-value store for NVM systems using LSM-Tree with new accumulative compaction. By fully utilizing the byte-addressability of NVM, accumulative compaction uses pointers to accumulate data into multiple floors in a logically sorted run to reduce the number of compactions required. We have also proposed a cascading searching scheme for reads among the multiple floors to reduce read amplification. Therefore, NVLSM reduces write amplification with small increases in read amplification. We compare NVLSM with key-value stores using LSM-Tree with two other compaction methods: leveled compaction and fragmented compaction. Our evaluations show that NVLSM reduces write amplification by up to 67% compared with LSM-Tree using leveled compaction without significantly increasing the read amplification. In write-intensive workloads, NVLSM reduces the average latency by 15.73%–41.2% compared to other key-value stores.
- Joy Arulraj, Justin Levandoski, Umar Farooq Minhas, and Per-Ake Larson. 2018. BzTree: A high-performance latch-free range index for non-volatile memory. Proceedings of the VLDB Endowment 11, 5 (2018), 553–565. Google Scholar
Digital Library
- Joy Arulraj, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let’s talk about storage and recovery methods for non-volatile memory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 707–722. Google Scholar
Digital Library
- Joy Arulraj, Matthew Perron, and Andrew Pavlo. 2016. Write-behind logging. Proceedings of the VLDB Endowment 10, 4 (2016), 337–348. Google Scholar
Digital Library
- Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review, Vol. 40. ACM, 53–64. Google Scholar
Digital Library
- Katelin Bailey, Luis Ceze, Steven D. Gribble, and Henry M. Levy. 2011. Operating system implications of fast, cheap, non-volatile memory. In HotOS, Vol. 13. 2–2. Google Scholar
Digital Library
- Zhichao Cao, Shiyong Liu, Fenggang Wu, Guohua Wang, Bingzhe Li, and David H. C. Du. 2019. Sliding look-back window assisted data chunk rewriting for improving deduplication restore performance. In 17th USENIX Conference on File and Storage Technologies (FAST’19). USENIX Association, Boston, MA, 129–142. https://www.usenix.org/conference/fast19/presentation/cao Google Scholar
Digital Library
- Dhruva R. Chakrabarti, Hans-J. Boehm, and Kumud Bhandari. 2014. Atlas: Leveraging locks for non-volatile memory consistency. In ACM SIGPLAN Notices, Vol. 49. ACM, 433–452. Google Scholar
Digital Library
- Helen H. W. Chan, Chieh-Jan Mike Liang, Yongkun Li, Wenjia He, Patrick P. C. Lee, Lianjie Zhu, Yaozu Dong, Yinlong Xu, Yu Xu, Jin Jiang, et al. 2018. HashKV: Enabling efficient updates in KV storage via hashing. In 2018 USENIX Annual Technical Conference (USENIX ATC’18). 1007–1019. Google Scholar
Digital Library
- Andreas Chatzistergiou, Marcelo Cintra, and Stratis D. Viglas. 2015. Rewind: Recovery write-ahead system for in-memory non-volatile data-structures. Proceedings of the VLDB Endowment 8, 5 (2015), 497–508. Google Scholar
Digital Library
- Bernard Chazelle and Leonidas J. Guibas. 1986. Fractional cascading: I. A data structuring technique. Algorithmica 1, 1-4 (1986), 133–162.Google Scholar
Cross Ref
- Cheng Chen, Jun Yang, Qingsong Wei, Chundong Wang, and Mingdi Xue. 2016. Fine-grained metadata journaling on NVM. In 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST’16). IEEE, 1–13.Google Scholar
Cross Ref
- Shimin Chen and Qin Jin. 2015. Persistent B-trees in non-volatile main memory. Proceedings of the VLDB Endowment 8, 7 (2015), 786–797. Google Scholar
Digital Library
- Trishul M. Chilimbi, Bob Davidson, and James R. Larus. 1999. Cache-conscious structure definition. In ACM SIGPLAN Notices, Vol. 34. ACM, 13–24. Google Scholar
Digital Library
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. ACM, 143–154. Google Scholar
Digital Library
- Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 79–94. Google Scholar
Digital Library
- Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better space-time tradeoffs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In Proceedings of the 2018 International Conference on Management of Data. ACM, 505–520. Google Scholar
Digital Library
- Biplob Debnath, Alireza Haghdoost, Asim Kadav, Mohammed G. Khatib, and Cristian Ungureanu. 2016. Revisiting hash table design for phase change memory. ACM SIGOPS Operating Systems Review 49, 2 (2016), 18–26. Google Scholar
Digital Library
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. In ACM SIGOPS Operating Systems Review, Vol. 41. ACM, 205–220. Google Scholar
Digital Library
- Siying Dong, Mark Callaghan, Leonidas Galanis, Dhruba Borthakur, Tony Savor, and Michael Strum. 2017. Optimizing space amplification in rocksDB. In CIDR, Vol. 3. 3.Google Scholar
- Facebook. 2015. RocksDB. (2015). Retrieved on 09 June, 2019 https://rocksdb.org/.Google Scholar
- Alexandre P. Ferreira, Miao Zhou, Santiago Bock, Bruce Childers, Rami Melhem, and Daniel Mossé. 2010. Increasing PCM main memory lifetime. In Proceedings of the Conference on Design, Automation and Test in Europe. European Design and Automation Association, 914–919. Google Scholar
Digital Library
- Sanjay Ghemawat and Jeff Dean. 2011. LevelDB. Retrieved on 13 Aug., 2019 https://github.com/google/leveldb.Google Scholar
- Google. 2017. LevelDB Benchmarks. Retrieved on 13 Aug., 2019 http://www.lmdb.tech/bench/microbench/benchmark.html.Google Scholar
- Frank T. Hady, Annie Foong, Bryan Veal, and Dan Williams. 2017. Platform storage performance with 3D XPoint technology. Proceedings of the IEEE 105, 9 (2017), 1822–1833.Google Scholar
Cross Ref
- Nadav Har’El. 2017. Scylla Compaction Strategies Series: Space Amplification in Size-Tiered Compaction. Retrieved on 16 Aug., 2019 https://www.scylladb.com/2018/01/17/compaction-series-space-amplification/.Google Scholar
- Tyler Harter, Dhruba Borthakur, Siying Dong, Amitanand S. Aiyer, Liyin Tang, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Analysis of HDFS under HBase: A facebook messages case study. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14), Vol. 14. Google Scholar
Digital Library
- Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable transient inconsistency in byte-addressable persistent B+-tree. In 16th USENIX Conference on File and Storage Technologies. 187. Google Scholar
Digital Library
- HyperDex. 2016. HyperLevelDB: A fork of LevelDB intended to meet the needs of HyperDex while remaining compatible with LevelDB. (2016).Google Scholar
- Intel. 2018. https://pmem.io/libpmemobj-cpp/. Retrieved on 20 August, 2019 from https://pmem.io/libpmemobj-cpp/.Google Scholar
- Intel. 2018. pmemkv. Retrieved on 20 August, 2019 from https://github.com/pmem/pmemkv.Google Scholar
- Intel. 2016. Persistent Memory Development Kit. (2016). Retrieved on 20 August, 2019 from http://pmem.io/pmdk/libpmemobj/.Google Scholar
- Intel. 2017. Transactions in Persistent Memory Development Kits. (2017). Retrieved on 20 August, 2019 from http://pmem.io/2016/05/25/cpp-07.html.Google Scholar
- Joseph Izraelevitz, Terence Kelly, and Aasheesh Kolli. 2016. Failure-atomic persistent memory updates via JUSTDO logging. ACM SIGARCH Computer Architecture News 44, 2 (2016), 427–442. Google Scholar
Digital Library
- Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young ri Choi. 2019. SLM-DB: Single-level key-value store with persistent memory. In 17th USENIX Conference on File and Storage Technologies (FAST’19). USENIX Association, Boston, MA, 191–205. https://www.usenix.org/conference/fast19/presentation/kaiyrakhmet Google Scholar
Digital Library
- Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for nonvolatile memory with NoveLSM. In 2018 USENIX Annual Technical Conference (USENIX ATC’18). 993–1005. Google Scholar
Digital Library
- Bradley C. Kuszmaul. 2014. A comparison of fractal trees to log-structured merge (LSM) trees. Tokutek White PaperGoogle Scholar
- Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. ACM SIGOPS Operating Systems Review 44, 2 (2010), 35–40. Google Scholar
Digital Library
- Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. 2009. Architecting phase change memory as a scalable dram alternative. ACM SIGARCH Computer Architecture News 37, 3 (2009), 2–13. Google Scholar
Digital Library
- Se Kwon Lee, K. Hyun Lim, Hyunsub Song, Beomseok Nam, and Sam H. Noh. 2017. WORT: Write optimal radix tree for persistent memory storage systems. In FAST. 257–270. Google Scholar
Digital Library
- J. Li, A. Pavlo, and S. Dong. 2017. NVMRocks: RocksDB on non-volatile memory systems. Retrieved on 03 June, 2019 https://github.com/pmem/pmem-rocksdb.Google Scholar
- Yinan Li, Bingsheng He, Qiong Luo, and Ke Yi. 2009. Tree indexing on flash disks. In 2009 IEEE 25th International Conference on Data Engineering. IEEE, 1303–1306. Google Scholar
Digital Library
- linux. 2017. clflush. (2017). Retrieved on 15 July, 2019 from https://www.felixcloutier.com/x86/CLFLUSH.html.Google Scholar
- linux. 2017. mfence. (2017). Retrieved on 15 July, 2019 from https://www.felixcloutier.com/x86/MFENCE.html.Google Scholar
- Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. WiscKey: Separating keys from values in SSD-conscious storage. ACM Transactions on Storage (TOS) 13 (2017), 5. Google Scholar
Digital Library
- Youyou Lu, Jiwu Shu, Long Sun, and Onur Mutlu. 2017. Improving performance and endurance of persistent memory with loose-ordering consistency. IEEE Transactions on Parallel and Distributed Systems (2017).Google Scholar
- Paul E. McKenney. 2005. Memory ordering in modern microprocessors, part I. Linux Journal 2005, 136 (2005), 2. Google Scholar
Digital Library
- Fei Mei, Qiang Cao, Hong Jiang, and Jingjun Li. 2018. SifrDB: A unified solution for write-optimized key-value stores in large datacenter. ACM SoCC (2018), 477–489. Google Scholar
Digital Library
- Amirsaman Memaripour, Anirudh Badam, Amar Phanishayee, Yanqi Zhou, Ramnatthan Alagappan, Karin Strauss, and Steven Swanson. 2017. Atomic in-place updates for non-volatile main memories with kamino-tx. In Proceedings of the 12th European Conference on Computer Systems. ACM, 499–512. Google Scholar
Digital Library
- Sparsh Mittal and Jeffrey S. Vetter. 2016. A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Transactions on Parallel and Distributed Systems 27, 5 (2016), 1537–1550. Google Scholar
Digital Library
- Jeffrey C. Mogul, Eduardo Argollo, Mehul Shah, and Paolo Faraboschi. 2009. Operating system support for NVM hybrid main memory.Google Scholar
- Iulian Moraru, David G. Andersen, Michael Kaminsky, Niraj Tolia, Parthasarathy Ranganathan, and Nathan Binkert. 2013. Consistent, durable, and safe memory management for byte-addressable non volatile main memory. In Proceedings of the 1st ACM SIGOPS Conference on Timely Results in Operating Systems. ACM, 1. Google Scholar
Digital Library
- Chris Mumford. 2011. LevelDB Implementations. Retrieved on 26 Aug., 2019 from https://github.com/google/leveldb/blob/master/doc/impl.md.Google Scholar
- Moohyeon Nam, Hokeun Cha, Young-ri Choi, Sam H. Noh, and Beomseok Nam. 2019. Write-optimized dynamic hashing for persistent memory. In 17th USENIX Conference on File and Storage Technologies (FAST’19). 31–44. Google Scholar
Digital Library
- Songjie Niu and Shimin Chen. 2015. Optimizing CPU cache performance for Pregel-like graph computation. In 2015 31st IEEE International Conference on Data Engineering Workshops (ICDEW’15). IEEE, 149–154.Google Scholar
Cross Ref
- Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data. ACM, 371–386. Google Scholar
Digital Library
- Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351–385. Google Scholar
Digital Library
- William Pugh. 1990. Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM 33, 6 (1990), 668–677. Google Scholar
Digital Library
- Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 497–514. Google Scholar
Digital Library
- Simone Raoux, Feng Xiong, Matthias Wuttig, and Eric Pop. 2014. Phase change materials and phase change memory. MRS Bulletin 39, 8 (2014), 703–710.Google Scholar
Cross Ref
- Jinglei Ren, Jishen Zhao, Samira Khan, Jongmoo Choi, Yongwei Wu, and Onur Mutlu. 2015. ThyNVM: Enabling software-transparent crash consistency in persistent memory systems. In Proceedings of the 48th International Symposium on Microarchitecture. ACM, 672–685. Google Scholar
Digital Library
- Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB: A space-efficient key-value storage engine for semi-sorted data. Proceedings of the VLDB Endowment 10, 13 (2017), 2037–2048. Google Scholar
Digital Library
- Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 217–228. Google Scholar
Digital Library
- Clinton W. Smullen, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA’11). IEEE, 50–61. Google Scholar
Digital Library
- John Tse and Alan Jay Smith. 1998. CPU cache prefetching: Timing evaluation of hardware implementations. IEEE Transactions on Computers 47, 5 (1998), 509–526. Google Scholar
Digital Library
- Dean M. Tullsen and Susan J. Eggers. 1993. Limitations of cache prefetching on a bus-based multiprocessor. In ACM SIGARCH Computer Architecture News, Vol. 21. ACM, 278–288. Google Scholar
Digital Library
- Jin Wang, Yong Zhang, Yang Gao, and Chunxiao Xing. 2013. PLSM: A highly efficient LSM-tree index supporting real-time big data analysis. In 2013 IEEE 37th Annual Computer Software and Applications Conference (COMPSAC’13). IEEE, 240–245. Google Scholar
Digital Library
- Wei Wei, Dejun Jiang, Jin Xiong, and Mingyu Chen. 2017. HAP: Hybrid-memory-aware partition in shared last-level cache. ACM Transactions on Architecture and Code Optimization (TACO) 14, 3 (2017), 24. Google Scholar
Digital Library
- Dan Williams. 2018. Persistent Memory. Retrieved on 15 Aug, 2019 from https://nvdimm.wiki.kernel.org/.Google Scholar
- Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 71–82. Google Scholar
Digital Library
- Jian Xu and Steven Swanson. 2016. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In FAST. 323–338. Google Scholar
Digital Library
- Maysam Yabandeh. 2018. Compaction in RocksDB. (2018). Retrieved on 03 Aug., 2019 from https://github.com/facebook/rocksdb/wiki/Compaction.Google Scholar
- Yahoo. 2010. Core Workloads in YCSB. Retrieved on 15 Nov., 2019 from https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads.Google Scholar
- Jun Yang, Qingsong Wei, Cheng Chen, Chundong Wang, Khai Leong Yong, and Bingsheng He. 2015. NV-Tree: Reducing consistency cost for NVM-based single level systems. In FAST, Vol. 15. 167–181. Google Scholar
Digital Library
- Ting Yao, Jiguang Wan, Ping Huang, Xubin He, Qingxin Gui, Fei Wu, and Changsheng Xie. 2017. A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In Proceedings of the 33rd International Conference on Massive Storage Systems and Technology (MSST’17).Google Scholar
- Yinliang Yue, Bqingsheng He, Yuzhe Li, and Weiping Wang. 2017. Building an efficient put-intensive key-value store with skip-tree. IEEE Transactions on Parallel and Distributed Systems 28, 4 (2017), 961–973. Google Scholar
Digital Library
- Yiying Zhang, Jian Yang, Amirsaman Memaripour, and Steven Swanson. 2015. Mojim: A reliable and highly-available non-volatile memory system. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 3–18. Google Scholar
Digital Library
- Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 461–476. Google Scholar
Digital Library
Index Terms
NVLSM: A Persistent Memory Key-Value Store Using Log-Structured Merge Tree with Accumulative Compaction
Recommendations
An efficient design and implementation of LSM-tree based key-value store on open-channel SSD
EuroSys '14: Proceedings of the Ninth European Conference on Computer SystemsVarious key-value (KV) stores are widely employed for data management to support Internet services as they offer higher efficiency, scalability, and availability than relational database systems. The log-structured merge tree (LSM-tree) based KV stores ...
Revisiting Log-Structured Merging for KV Stores in Hybrid Memory Systems
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2We present MioDB, a novel LSM-tree based key-value (KV) store system designed to fully exploit the advantages of byte-addressable non-volatile memories (NVMs). Our experimental studies reveal that the performance bottleneck of LSM-tree based KV stores ...
Rethinking key-value store for byte-addressable optane persistent memory
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceOptane Persistent Memory (PM) is a pioneering solution to byte-addressable PM for commodity systems. However, the performance of Optane PM is highly workload-sensitive, rendering many prior designs of Key-Value (KV) store inefficient. To cope with this ...






Comments