skip to main content
research-article

Building GC-free Key-value Store on HM-SMR Drives with ZoneFS

Authors Info & Claims
Published:24 August 2022Publication History
Skip Abstract Section

Abstract

Host-managed shingled magnetic recording drives (HM-SMR) are advantageous in capacity to harness the explosive growth of data. For key-value (KV) stores based on log-structured merge trees (LSM-trees), the HM-SMR drive is an ideal solution owning to its capacity, predictable performance, and economical cost. However, building an LSM-tree-based KV store on HM-SMR drives presents severe challenges in maintaining the performance and space utilization efficiency due to the redundant cleaning processes for applications and storage devices (i.e., compaction and garbage collection). To eliminate the overhead of on-disk garbage collection (GC) and improve compaction efficiency, this article presents GearDB, a GC-free KV store tailored for HM-SMR drives. GearDB improves the write performance and space efficiency through three new techniques: a new on-disk data layout, compaction windows, and a novel gear compaction algorithm. We further augment the read performance of GearDB with a new SSTable layout and read ahead mechanism. We implement GearDB with LevelDB, and use zonefs to access a real HM-SMR drive. Our extensive experiments confirm that GearDB achieves both high performance and space efficiency, i.e., on average 1.7× and 1.5× better than LevelDB in random write and read, respectively, with up to 86.9% space efficiency.

REFERENCES

  1. [1] Aghayev Abutalib and Desnoyers Peter. 2015. Skylight—A window on shingled disk operation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 135149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Amer Ahmed, Long Darrell D. E., Miller Ethan L., Paris Jehan-Francois, and Schwarz S. J. Thomas. 2010. Design issues for a shingled write disk system. In Proceedings of the IEEE 26th Symposium on Massive Storage Systems and Technology (MSST’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Bjørling Matias. 2019. From open-channel SSDs to zoned namespaces. In Proceedings of the Linux Storage and Filesystems Conference (Vault’19). 1.Google ScholarGoogle Scholar
  4. [4] Cassuto Yuval, Sanvido Marco A. A., Guyot Cyril, Hall David R., and Bandic Zvonimir Z.. 2010. Indirection systems for shingled-recording disk drives. In Proceedings of the IEEE 26th Symposium on Massive Storage Systems and Technology (MSST’10). 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Chang Fay, Dean Jeffrey, Ghemawat Sanjay, Hsieh Wilson C., Wallach Deborah A., Burrows Michael, Chandra Tushar, Fikes Andrew, and Gruber Robert. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 205218.Google ScholarGoogle Scholar
  6. [6] Choi Changho. 2016. Increasing SSD Performance and Lifetime with Multi-stream Technology. Retrieved from https://www.snia.org/sites/default/files/DSI/2016/presentations/sec/ChanghoChoi_Increasing_SSD_Performance-rev.pdf.Google ScholarGoogle Scholar
  7. [7] Cooper Brian F., Silberstein Adam, Tam Erwin, Ramakrishnan Raghu, and Sears Russell. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Dayan Niv, Athanassoulis Manos, and Idreos Stratos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the ACM International Conference on Management of Data. ACM, 7994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Digital Western. 2016. dm-zoned. Retrieved from https://github.com/hgst/dm-zoned-tools.Google ScholarGoogle Scholar
  10. [10] Digital Western. 2019. ZoneFS. Retrieved from https://github.com/damien-lemoal/zonefs-tools.Google ScholarGoogle Scholar
  11. [11] Facebook. [n.d.]. RocksDB, A Persistent Key-value Store for Fast Storage Enviroments. Retrieved from http://rocksdb.org/.Google ScholarGoogle Scholar
  12. [12] Feldman Tim and Gibson Garth. 2013. Shingled magnetic recording: Areal density increase requires new data management. USENIX; Login: Mag. 38, 3 (2013), 2230.Google ScholarGoogle Scholar
  13. [13] Ghemawat Sanjay and Dean Jeff. 2016. LevelDB. Retrieved from https://github.com/Level/leveldown/issues/298.Google ScholarGoogle Scholar
  14. [14] Gibson Garth and Ganger Greg. 2011. Principles of operation for shingled disk devices. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-11-107 (2011).Google ScholarGoogle Scholar
  15. [15] Golan-Gueta Guy, Bortnikov Edward, Hillel Eshcar, and Keidar Idit. 2015. Scaling concurrent log-structured data stores. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15).Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] González Javier, Bjørling Matias, Lee Seongno, Dong Charlie, and Huang Yiren Ronnie. 2014. Application-driven flash translation layers on open-channel SSDs. In Proceedings of the Nonvolatile Memory Workshop (NVMW’14).Google ScholarGoogle Scholar
  17. [17] Holmberg Western Digital Corporation Hans. 2020. ZenFS, Zones and RocksDB—Who Likes to Take Out the Garbage Anyway? Retrieved from https://www.snia.org/educational-library/zenfs-zones-and-rocksdb-who-likes-take-out-garbage-anyway-2020.Google ScholarGoogle Scholar
  18. [18] HGST. 2015. HGST Delivers World’s First 10TB Enterprise HDD for Active Archive Applications. Retrieved from http://investor.wdc.com/news-releases/news-release-details/hgst-delivers-worlds-first-10tb-enterprise-hdd-active-archive.Google ScholarGoogle Scholar
  19. [19] HGST. 2017. Libzbc Version 5.4.1. Retrieved from https://github.com/hgst/libzbc.Google ScholarGoogle Scholar
  20. [20] HGST. 2017. Ultrastar Hs14—14TB 3.5 inch Helium Platform Enterprise SMR Hard Drive. Retrieved from https://www.hgst.com/products/hard-drives/ultrastar-hs14.Google ScholarGoogle Scholar
  21. [21] HGST. 2018. Ultrastar DC HC600 SMR Series, 15TB. Retrieved from https://www.westerndigital.com/products/data-center-drives/ultrastar-dc-hc600-series-hdd.Google ScholarGoogle Scholar
  22. [22] Committee INCITS T10 Technical. 2017. Information Technology-Zoned Block Commands (ZBC). Draft Standard T10/BSR INCITS 550, American National Standards Institute, Inc. Retrieved from http://www.t10.org/drafts.htm.Google ScholarGoogle Scholar
  23. [23] Committee INCITS T13 Technical. [n.d.]. Zoned-device ata Command Set (ZAC) Working Draft.Google ScholarGoogle Scholar
  24. [24] Jagadish H. V., Narayan P. P. S., Seshadri Sridhar, Sudarshan S., and Kanneganti Rama. 1997. Incremental organization for data recording and warehousing. In Proceedings of the Conference on Very Large data Bases (VLDB’97). 1625.Google ScholarGoogle Scholar
  25. [25] Jin Chao, Xi Wei-Ya, Ching Zhi-Yong, Huo Feng, and Lim Chun-Teck. 2014. HiSMRfs: A high performance file system for shingled storage array. In Proceedings of the IEEE 30th Symposium on Massive Storage Systems and Technology (MSST’14). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kadekodi Saurabh, Pimpale Swapnil, and Gibson Garth A.. 2015. Caveat-Scriptor: Write anywhere shingled disks. In Proceedings of the 7th USENIX Workshop on HotStorage.Google ScholarGoogle Scholar
  27. [27] Ren Kai, Zheng Qing, Arulraj Joy, and Gibson Garth. 2017. SlimDB—A space-efficient key-value storage engine for semi-sorted data. Proc. VLDB Endow. 10, 13 (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kannan Sudarsun, Bhat Nitish, Gavrilovska Ada, Arpaci-Dusseau Andrea, and Arpaci-Dusseau Remzi. 2018. Redesigning LSMs for nonvolatile memory with NoveLSM. In Proceedings of the USENIX Annual Technical Conference. 9931005.Google ScholarGoogle Scholar
  29. [29] Kim Taejin, Hahn Sangwook Shane, Lee Sungjin, Hwang Jooyoung, Lee Jongyoul, and Kim Jihong. 2018. PCStream: Automatic stream allocation using program contexts. In Proceedings of the 10th USENIX Workshop on HotStorage.Google ScholarGoogle Scholar
  30. [30] Ku S. P. M. Chi-Young and Morgan Stephen P.. 2015. An SMR-aware append-only file system. In Proceedings of the Storage Developer Conference.Google ScholarGoogle Scholar
  31. [31] Lakshman Avinash and Malik Prashant. 2009. Cassandra: A decentralized structured storage system. In Proceedings of the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware.Google ScholarGoogle Scholar
  32. [32] Lu Lanyue, Pillai Thanumalayan Sankaranarayana, Arpaci-Dusseau Andrea C., and Arpaci-Dusseau Remzi H.. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 133148.Google ScholarGoogle Scholar
  33. [33] Lee Changman, Sim Dongho, Hwang Jooyoung, and Cho Sangyeun. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 273286.Google ScholarGoogle Scholar
  34. [34] Lee Sungjin, Liu Ming, Jun Sang Woo, Xu Shuotao, Kim Jihong, and Arvind Arvind. 2016. Application-managed flash.. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 339353.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Luo Q. and Zhang L.. 2015. Implement object storage with SMR-based key-value store. In Proceedings of the Storage Developer Conference.Google ScholarGoogle Scholar
  36. [36] Macko Peter, Ge Xiongzi, Kelley J., Slik D., et al. 2017. SMORE: A cold data object store for SMR drives. In Proceedings of the IEEE 33th Symposium on Massive Storage Systems and Technology (MSST’17).Google ScholarGoogle Scholar
  37. [37] Manzanares Adam, Watkins Noah, Guyot Cyril, LeMoal Damien, Maltzahn Carlos, and Bandic Zvonimr. 2016. ZEA, A data management approach for SMR. In Proceedings of the 8th USENIX Workshop on HotStorage.Google ScholarGoogle Scholar
  38. [38] Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of the USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  39. [39] Marmol Leonardo, Sundararaman Swaminathan, Talagala Nisha, Rangaswami Raju, Devendrappa Sushma, Ramsundar Bharath, and Ganesan Sriram. 2014. NVMKV: A scalable and lightweight flash aware key-value store. In Proceedings of the 6th USENIX Workshop on HotStorage.Google ScholarGoogle Scholar
  40. [40] Maas Martin, Harris Tim, Asanovic Krste, and Kubiatowicz John. 2015. Trash day: Coordinating garbage collection in distributed systems. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’15).Google ScholarGoogle Scholar
  41. [41] Moal Damien Le and Yao Ting. 2020. Zonefs: Mapping POSIX file system interface to raw zoned block device accesses. USENIX Association, Santa Clara, CA.Google ScholarGoogle Scholar
  42. [42] O’Neil Patrick, Cheng Edward, Gawlick Dieter, and O’Neil Elizabeth. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351385.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Pitchumani Rekha, Hughes James, and Miller Ethan L.. 2015. SMRDB: Key-value data store for shingled magnetic recording disks. In Proceedings of the 8th ACM International Systems and Storage Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Raju Pandian, Kadekodi Rohan, Chidambaram Vijay, and Abraham Ittai. 2017. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). ACM, 497514.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Rodeh Ohad, Bacik Josef, and Mason Chris. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013), 132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Seagate. [n.d.]. The Seagate Kinetic Open Storage Vision. Retrieved from https://www.seagate.com/tech-insights/kinetic-vision-how-seagate-new-developer-tools-meets-the-needs-of-cloud-storage-platforms-master-ti.Google ScholarGoogle Scholar
  47. [47] Seagate. 2014. Archive HDDs from Seagate. Retrieved from http://www.seagate.com/www-content/product-content/hdd-fam/seagate-archive-hdd/en-us/docs/100757960a.pdf.Google ScholarGoogle Scholar
  48. [48] Sears Russell and Ramakrishnan Raghu. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’12).Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Shetty Pradeep, Spillane Richard P., Malpani Ravikant, Andrews Binesh, Seyster Justin, and Zadok Erez. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 1730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Wu Fenggang, Yang Ming-Chang, Fan Ziqi, Zhang Baoquan, Ge Xiongzi, and Du David H. C.. 2016. Evaluating host aware SMR drives. In Proceedings of the 8th USENIX Workshop on HotStorage.Google ScholarGoogle Scholar
  51. [51] Wu Xingbo, Xu Yuehai, Shao Zili, and Jiang Song. 2015. LSM-trie: An LSM-tree-based ultra-large key- value store for small data. In Proceedings of the USENIX Annual Technical Conference.Google ScholarGoogle Scholar
  52. [52] Yao Ting, Tan Zhihu, Wan Jiguang, Huang Ping, Zhang Yiwen, Xie Changsheng, and He Xubin. 2018. A set-aware key-value store on shingled magnetic recording drives with dynamic band. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). IEEE, 306315.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Yao Ting, Wan Jiguang, Huang Ping, He Xubin, Gui Qingxin, Wu Fei, and Xie Changsheng. 2017. A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In Proceedings of the IEEE 33rd Symposium on Massive Storage Systems and Technology (MSST’17).Google ScholarGoogle Scholar

Index Terms

  1. Building GC-free Key-value Store on HM-SMR Drives with ZoneFS

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Storage
        ACM Transactions on Storage  Volume 18, Issue 3
        August 2022
        244 pages
        ISSN:1553-3077
        EISSN:1553-3093
        DOI:10.1145/3555792
        • Editor:
        • Sam H. Noh
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 August 2022
        • Online AM: 22 July 2022
        • Accepted: 1 November 2021
        • Revised: 1 October 2021
        • Received: 1 December 2020
        Published in tos Volume 18, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!