skip to main content
research-article

FlashKV: Accelerating KV Performance with Open-Channel SSDs

Published: 27 September 2017 Publication History

Abstract

As the cost-per-bit of solid state disks is decreasing quickly, SSDs are supplanting HDDs in many cases, including the primary storage of key-value stores. However, simply deploying LSM-tree-based key-value stores on commercial SSDs is inefficient and induces heavy write amplification and severe garbage collection overhead under write-intensive conditions. The main cause of these critical issues comes from the triple redundant management functionalities lying in the LSM-tree, file system and flash translation layer, which block the awareness between key-value stores and flash devices. Furthermore, we observe that the performance of LSM-tree-based key-value stores is improved little by only eliminating these redundant layers, as the I/O stacks, including the cache and scheduler, are not optimized for LSM-tree’s unique I/O patterns.
To address the issues above, we propose FlashKV, an LSM-tree based key-value store running on open-channel SSDs. FlashKV eliminates the redundant management and semantic isolation by directly managing the raw flash devices in the application layer. With the domain knowledge of LSM-tree and the open-channel information, FlashKV employs a parallel data layout to exploit the internal parallelism of the flash device, and optimizes the compaction, caching and I/O scheduling mechanisms specifically. Evaluations show that FlashKV effectively improves system performance by 1.5× to 4.5× and decreases up to 50% write traffic under heavy write conditions, compared to LevelDB.

References

[1]
Nitin Agrawal, Vijayan Prabhakaran, Ted Wobber, John D. Davis, Mark Manasse, and Rina Panigrahy. 2008. Design tradeoffs for SSD performance. In Proceedings of 2008 USENIX Annual Technical Conference (USENIX ATC). USENIX, Berkeley, CA.
[2]
Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. 2017. LightNVM: The linux open-channel SSD subsystem. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 359--374. https://www.usenix.org/conference/fast17/technical-sessions/presentation/bjorling.
[3]
Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422--426.
[4]
MingMing Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2008. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium. Ottawa, ON, CA: Red Hat. Retrieved. 01--15.
[5]
Cassandra. 2016. Apache Cassandra Documentation v4.0. (2016). Retrieved May 20, 2017 from http://cassandra.apache.org/doc/.
[6]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI). USENIX, Berkeley, CA, 205--218.
[7]
Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 266--277.
[8]
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!’s hosted data serving platform. Proceedings of the VLDB Endowment (PVLDB 2008) 1, 2 (2008), 1277--1288.
[9]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. ACM, 143--154.
[10]
Corbet. 2006. Trees I: Radix trees. (2006). Retrieved February 28, 2017 from https://lwn.net/Articles/175432/.
[11]
Facebook. 2013. RocksDB. http://rocksdb.org/. (2013).
[12]
Sanjay Ghemawat and Jeff Dean. 2012. LevelDB, A fast and lightweight key/value database library by Google. (2012). Retrieved June 20, 2015 from http://code.google.com/p/leveldb/.
[13]
Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi, Paul H. Siegel, and Jack K. Wolf. 2009. Characterizing flash memory: Anomalies, observations, and applications. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). ACM, 24--33.
[14]
Jie Guo, Chuhan Min, Tao Cai, and Yiran Chen. 2016. A design to reduce write amplification in object-based NAND flash devices. In International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). IEEE, 1--10.
[15]
HBase. 2016. Apache HBase Reference Guide. (2016). Retrieved April 21, 2017 from http://hbase.apache.org/book.html.
[16]
Yang Hu, Hong Jiang, Dan Feng, Lei Tian, Hao Luo, and Shuping Zhang. 2011. Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity. In Proceedings of the International Conference on Supercomputing (ICS). ACM, 96--107.
[17]
William K. Josephson, Lars A. Bongo, David Flynn, and Kai Li. 2010. DFS: A file system for virtualized flash storage. In Proceedings of the 8th USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA.
[18]
Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX conference on Hot Topics in Storage and File Systems. USENIX Association, 13--13.
[19]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST). USENIX, Santa Clara, CA. https://www.usenix.org/conference/fast15/technical-sessions/presentation/lee.
[20]
Sungjin Lee, Ming Liu, Sangwoo Jun, Shuotao Xu, Jihong Kim, and Arvind. 2016. Application-managed flash. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 339--353. http://usenix.org/conference/fast16/technical-sessions/presentation/lee.
[21]
Young-Sik Lee, Sang-Hoon Kim, Jin-Soo Kim, Jaesoo Lee, Chanik Park, and Seungryoul Maeng. 2013. OSSD: A case for object-based solid state drives. In Proceedings of the 29th IEEE Symposium on Mass Storage Systems and Technologies (MSST). IEEE, 1--13.
[22]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in ssd-conscious storage. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 133--148. https://www.usenix.org/conference/fast16/technical-sessions/presentation/lu.
[23]
Youyou Lu, Jiwu Shu, and Wei Wang. 2014. ReconFS: A reconstructable file system on flash storage. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 75--88.
[24]
Youyou Lu, Jiwu Shu, and Weimin Zheng. 2013. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA.
[25]
Youyou Lu, Jiacheng Zhang, and Jiwu Shu. 2015. Rethinking the file system design on flash-based storage. In Communications of the Korean Institute of Information Scientists and Engineers (KIISE).
[26]
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, and Raju Rangaswami. 2015. NVMKV: A scalable, lightweight, FTL-aware key-value store. In USENIX Annual Technical Conference (ATC). 207--219.
[27]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385.
[28]
Jian Ouyang, Shiding Lin, Song Jiang, Zhenyu Hou, Yong Wang, and Yuanzheng Wang. 2014. SDF: Software-defined flash for web-scale internet storage systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York, NY, USA, 471--484.
[29]
Xiangyong Ouyang, David Nellans, Robert Wipfel, David Flynn, and Dhabaleswar K. Panda. 2011. Beyond block I/O: Rethinking traditional storage primitives. In Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 301--311.
[30]
William Pugh. 1990. Skip lists: A probabilistic alternative to balanced trees. Commun. ACM 33, 6 (1990), 668--676.
[31]
Zhaoyan Shen, Feng Chen, Yichen Jia, and Zili Shao. 2017. DIDACache: A deep integration of device and application for flash based key-value caching. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 391--405. https://www.usenix.org/conference/fast17/technical-sessions/presentation/shen.
[32]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the Ninth European Conference on Computer Systems (EuroSys). ACM, New York, NY, USA, Article 16, 14 pages.
[33]
Wei Wang, Youyou Lu, and Jiwu Shu. 2014. p-OFTL: An object-based semantic-aware parallel flash translation layer. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE). European Design and Automation Association, 157.
[34]
Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, and Swaminathan Sundararaman. 2014. Don’t stack your Log on my Log. In USENIX Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW).
[35]
Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. ParaFS: A log-structured file system to exploit the internal parallelism of flash devices. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). USENIX Association, 87--100.
[36]
Jiacheng Zhang, Jiwu Shu, and Youyou Lu. 2016. RFFS: A log-structured file system on raw-flash devices(WiPs). In 14th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Santa Clara, CA.
[37]
Yiying Zhang, Leo Prasath Arulraj, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. De-indirection for flash-based SSDs with nameless writes. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA.

Cited By

View all
  • (2024)Optimizing Read Performance of HBase through Dynamic Control of Data Block Sizes and KVCacheProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635898(1495-1503)Online publication date: 8-Apr-2024
  • (2024)STEM: Streaming-Based FPGA Acceleration for Large-Scale Compactions in LSM KV2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00298(3893-3905)Online publication date: 13-May-2024
  • (2024)Dynamic zone redistribution for key-value stores on zoned namespaces SSDsJournal of Systems Architecture10.1016/j.sysarc.2024.103159152(103159)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 16, Issue 5s
Special Issue ESWEEK 2017, CASES 2017, CODES + ISSS 2017 and EMSOFT 2017
October 2017
1448 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3145508
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 27 September 2017
Accepted: 01 June 2017
Revised: 01 May 2017
Received: 01 April 2017
Published in TECS Volume 16, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LSM-tree-based key-value store
  2. application-managed flash
  3. hardware-software co-design
  4. open-channel SSD

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)6
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimizing Read Performance of HBase through Dynamic Control of Data Block Sizes and KVCacheProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635898(1495-1503)Online publication date: 8-Apr-2024
  • (2024)STEM: Streaming-Based FPGA Acceleration for Large-Scale Compactions in LSM KV2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00298(3893-3905)Online publication date: 13-May-2024
  • (2024)Dynamic zone redistribution for key-value stores on zoned namespaces SSDsJournal of Systems Architecture10.1016/j.sysarc.2024.103159152(103159)Online publication date: Jul-2024
  • (2023)ADOCProceedings of the 21st USENIX Conference on File and Storage Technologies10.5555/3585938.3585943(65-80)Online publication date: 21-Feb-2023
  • (2023)A survey on design and application of open-channel solid-state drives开放通道固态硬盘的设计与应用研究综述Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220031724:5(637-658)Online publication date: 2-Jun-2023
  • (2023)ZNSwap: un-Block your SwapACM Transactions on Storage10.1145/358243419:2(1-25)Online publication date: 6-Mar-2023
  • (2023)ConfZNS : A Novel Emulator for Exploring Design Space of ZNS SSDsProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594772(71-82)Online publication date: 5-Jun-2023
  • (2023)Tidal-Tree-Mem: Toward Read-Intensive Key-Value Stores With Tidal Structure Based on LSM-TreeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317757542:2(423-436)Online publication date: Feb-2023
  • (2023)ZenFS+: Nurturing Performance and Isolation to ZenFSIEEE Access10.1109/ACCESS.2023.325735411(26344-26357)Online publication date: 2023
  • (2022)Accelerating range queries of primary and secondary indices for key-value separationProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563479(226-239)Online publication date: 7-Nov-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media