skip to main content
research-article

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

Published:06 March 2023Publication History
Skip Abstract Section

Abstract

The Log-Structured Merge Tree (LSM-Tree) is widely used in key-value (KV) stores because of its excwrite performance. But LSM-Tree-based KV stores still have the overhead of write-ahead log and write stall caused by slow L0 flush and L0-L1 compaction. New byte-addressable, persistent memory (PM) devices bring an opportunity to improve the write performance of LSM-Tree. Previous studies on PM-based LSM-Tree have not fully exploited PM’s “dual role” of main memory and external storage. In this article, we analyze two strategies of memtables based on PM and the reasons write stall problems occur in the first place. Inspired by the analysis result, we propose FlatLSM, a specially designed flat LSM-Tree for non-volatile memory based KV stores. First, we propose PMTable with separated index and data. The PM Log utilizes the Buffer Log to store KVs of size less than 256B. Second, to solve the write stall problem, FlatLSM merges the volatile memtables and the persistent L0 into large PMTables, which can reduce the depth of LSM-Tree and concentrate I/O bandwidth on L0-L1 compaction. To mitigate write stall caused by flushing large PMTables to SSD, we propose a parallel flush/compaction algorithm based on KV separation. We implemented FlatLSM based on RocksDB and evaluated its performance on Intel’s latest PM device, the Intel Optane DC PMM with the state-of-the-art PM-based LSM-Tree KV stores, FlatLSM improves the throughput 5.2× on random write workload and 2.55× on YCSB-A.

REFERENCES

  1. [1] SNIA NVM Programming Technical Working Group. 2017. NVM Programming Model (Version 1.2). SNIA NVM Programming Technical Working Group.Google ScholarGoogle Scholar
  2. [2] 2018. Titan: A RocksDB Plugin to Reduce Write Amplification. Retrieved April 25, 2019 from https://pingcap.com/blog/titan-storage-engine-design-and-implementation.Google ScholarGoogle Scholar
  3. [3] Apache. 2014. HBase. Retrieved January 30, 2023 from https://hbase.apache.org/.Google ScholarGoogle Scholar
  4. [4] Arulraj Joy, Pavlo Andrew, and Dulloor Subramanya R.. 2015. Let’s talk about storage & recovery methods for non-volatile memory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 707722.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Balmau Oana, Didona Diego, Guerraoui Rachid, Zwaenepoel Willy, Yuan Huapeng, Arora Aashray, Gupta Karan, and Konka Pavan. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 363–375.Google ScholarGoogle Scholar
  6. [6] Balmau Oana, Dinu Florin, Zwaenepoel Willy, Gupta Karan, Chandhiramoorthi Ravishankar, and Didona Diego. 2019. SILK: Preventing latency spikes in log-structured merge key-value stores. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’19). 753–766.Google ScholarGoogle Scholar
  7. [7] Beaver Doug, Kumar Sanjeev, Li Harry C., Sobel Jason, and Peter Vajgel. 2010. Finding a needle in haystack: Facebook’s photo storage. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10). 18.Google ScholarGoogle Scholar
  8. [8] Chang Fay, Dean Jeffrey, Ghemawat Sanjay, Hsieh Wilson C., Wallach Deborah A., Burrows Mike, Chandra Tushar, Fikes Andrew, and Gruber Robert E.. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 2 (2008), 126.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Chen Youmin, Lu Youyou, Yang Fan, Wang Qing, Wang Yang, and Shu Jiwu. 2020. FlatStore: An efficient log-structured key-value storage engine for persistent memory. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 10771091.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Cooper Brian F., Silberstein Adam, Tam Erwin, Ramakrishnan Raghu, and Sears Russell. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. 143154.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Corporation Intel. 2019. Intel Optane DC Persistent Memory Product Brief. Retrieved January 30, 2023 from https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/optane-dc-persistent-memory-brief.html.Google ScholarGoogle Scholar
  12. [12] Corporation Intel. 2019. Persistent Memory Development Kit. Retrieved January 30, 2023 from https://pmem.io/pmdk/.Google ScholarGoogle Scholar
  13. [13] Dayan Niv and Idreos Stratos. 2019. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data. 449466.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] DeCandia Giuseppe, Hastorun Deniz, Jampani Madan, Kakulapati Gunavardhan, Lakshman Avinash, Pilchin Alex, Sivasubramanian Swaminathan, Vosshall Peter, and Vogels Werner. 2007. Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Operating Systems Review 41, 6 (2007), 205220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Eisenman Assaf, Gardner Darryl, AbdelRahman Islam, Axboe Jens, Dong Siying, Hazelwood Kim, Petersen Chris, Cidon Asaf, and Katti Sachin. 2018. Reducing DRAM footprint with NVM in Facebook. In Proceedings of the 13th EuroSys Conference. 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Facebook. 2008. Cassandra. Retrieved January 30, 2023 from https://cassandra.apache.org/.Google ScholarGoogle Scholar
  17. [17] Facebook. 2013. RocksDB. Retrieved January 30, 2023 from https://rocksdb.org/.Google ScholarGoogle Scholar
  18. [18] Gilad Eran, Bortnikov Edward, Braginsky Anastasia, Gottesman Yonatan, Hillel Eshcar, Keidar Idit, Moscovici Nurit, and Shahout Rana. 2020. EvenDB: Optimizing key-value storage for spatial locality. In Proceedings of the 15th European Conference on Computer Systems. 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Google. 2011. LevelDB. Retrieved January 30, 2023 from https://github.com/google/leveldb.Google ScholarGoogle Scholar
  20. [20] Hwang Deukyeon, Kim Wook-Hee, Won Youjip, and Nam Beomseok. 2018. Endurable transient inconsistency in byte-addressable persistent B+-tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 187–200.Google ScholarGoogle Scholar
  21. [21] Intel. 2019. What Is Intel Optane DC Persistent Memory? Retrieved January 30, 2023 from https://www.boston.co.uk/blog/2019/07/10/intel-optane-dc-persistant-memory.aspx.Google ScholarGoogle Scholar
  22. [22] Izraelevitz Joseph, Yang Jian, Zhang Lu, Kim Juno, Liu Xiao, Memaripour Amirsaman, Soh Yun Joon, et al. 2019. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).Google ScholarGoogle Scholar
  23. [23] Kaiyrakhmet Olzhas, Lee Songyi, Nam Beomseok, Noh Sam H., and Choi Young-Ri. 2019. SLM-DB: Single-level key-value store with persistent memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 191–205.Google ScholarGoogle Scholar
  24. [24] Kannan Sudarsun, Bhat Nitish, Gavrilovska Ada, Arpaci-Dusseau Andrea, and Arpaci-Dusseau Remzi. 2018. Redesigning LSMs for nonvolatile memory with NoveLSM. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18). 993–1005.Google ScholarGoogle Scholar
  25. [25] Lepers Baptiste, Balmau Oana, Gupta Karan, and Zwaenepoel Willy. 2019. KVell: The design and implementation of a fast persistent key-value store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 447461.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Li Jianhong, Pavlo Andrew, and Dong Siying. 2017. NVMRocks: RocksDB on non-volatile memory systems.Google ScholarGoogle Scholar
  27. [27] Li Yongkun, Liu Zhen, Lee Patrick P. C., Wu Jiayu, Xu Yinlong, Wu Yi, Tang Liu, Liu Qi, and Cui Qiu. 2021. Differentiated key-value storage management for balanced IO performance. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC’21). 673687.Google ScholarGoogle Scholar
  28. [28] Lin Zhan, Kai Lu, Cheng Zhilong, and Wan Jiguang. 2020. RangeKV: An efficient key-value store based on hybrid DRAM-NVM-SSD storage structure. IEEE Access 8 (2020), 154518154529.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Lu Lanyue, Pillai Thanumalayan Sankaranarayana, Gopalakrishnan Hariharan, Arpaci-Dusseau Andrea C., and Arpaci-Dusseau Remzi H.. 2017. WiscKey: Separating keys from values in SSD-conscious storage. ACM Transactions on Storage 13, 1 (2017), 128.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Mao Yandong, Kohler Eddie, and Morris Robert Tappan. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems. 183196.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Mei Fei, Cao Qiang, Jiang Hong, and Li Jingjun. 2018. SifrDB: A unified solution for write-optimized key-value stores in large datacenter. In Proceedings of the ACM Symposium on Cloud Computing. 477489.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Nam Moohyeon, Cha Hokeun, Choi Young-Ri, Noh Sam H., and Nam Beomseok. 2019. Write-optimized dynamic hashing for persistent memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 31–44.Google ScholarGoogle Scholar
  33. [33] Oukid Ismail, Lasperas Johan, Nica Anisoara, Willhalm Thomas, and Lehner Wolfgang. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data. 371386.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] O’Neil Patrick, Cheng Edward, Gawlick Dieter, and O’Neil Elizabeth. 1996. The log-structured merge-tree (LSM-Tree). Acta Informatica 33, 4 (1996), 351385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Raju Pandian, Kadekodi Rohan, Chidambaram Vijay, and Abraham Ittai. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles. 497514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Raoux Simone, Burr Geoffrey W., Breitwisch Matthew J., Rettner Charles T., Chen Y.-C., Shelby Robert M., Salinga Martin, et al. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4.5 (2008), 465479.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Sears Russell and Ramakrishnan Raghu. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 217228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Xia Fei, Jiang Dejun, Xiong Jin, and Sun Ninghui. 2017. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 349–362.Google ScholarGoogle Scholar
  39. [39] Xu Cong, Niu Dimin, Muralimanohar Naveen, Balasubramonian Rajeev, Zhang Tao, Yu Shimeng, and Xie Yuan. 2015. Overcoming the challenges of crossbar resistive memory architectures. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, Los Alamitos, CA, 476488.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Yan Baoyue, Cheng Xuntao, Jiang Bo, Chen Shibin, Shang Canfang, Wang Jianying, Huang Gui, Yang Xinjun, Cao Wei, and Li Feifei. 2021. Revisiting the design of LSM-Tree based OLTP storage engine with persistent memory. Proceedings of the VLDB Endowment 14, 10 (2021), 18721885.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Yang Jian, Kim Juno, Hoseinzadeh Morteza, Izraelevitz Joseph, and Swanson Steve. 2020. An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20). 169182. https://www.usenix.org/conference/fast20/presentation/yang.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Yao Ting, Wan Jiguang, Huang Ping, Zhang Yiwen, Liu Zhiwen, Xie Changsheng, and He Xubin. 2019. GearDB: A GC-free key-value store on HM-SMR drives with gear compaction. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 159–171.Google ScholarGoogle Scholar
  43. [43] Yao Ting, Zhang Yiwen, Wan Jiguang, Cui Qiu, Tang Liu, Jiang Hong, Xie Changsheng, and He Xubin. 2020. MatrixKV: Reducing write stalls and write amplification in LSM-Tree based KV stores with matrix container in NVM. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC’20). 17–31.Google ScholarGoogle Scholar
  44. [44] Zuo Pengfei, Hua Yu, and Wu Jie. 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 461–476.Google ScholarGoogle Scholar

Index Terms

  1. FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 19, Issue 2
          May 2023
          269 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3585541
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 March 2023
          • Online AM: 21 January 2023
          • Accepted: 22 November 2022
          • Revised: 26 September 2022
          • Received: 26 April 2022
          Published in tos Volume 19, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
        • Article Metrics

          • Downloads (Last 12 months)398
          • Downloads (Last 6 weeks)85

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!