skip to main content
research-article

Bringing Order to Chaos: Barrier-Enabled I/O Stack for Flash Storage

Published:03 October 2018Publication History
Skip Abstract Section

Abstract

This work is dedicated to eliminating the overhead required for guaranteeing the storage order in the modern IO stack. The existing block device adopts a prohibitively expensive approach in ensuring the storage order among write requests: interleaving the write requests with Transfer-and-Flush. For exploiting the cache barrier command for flash storage, we overhaul the IO scheduler, the dispatch module, and the filesystem so that these layers are orchestrated to preserve the ordering condition imposed by the application with which the associated data blocks are made durable. The key ingredients of Barrier-Enabled IO stack are Epoch-based IO scheduling, Order-Preserving Dispatch, and Dual-Mode Journaling. Barrier-enabled IO stack can control the storage order without Transfer-and-Flush overhead. We implement the barrier-enabled IO stack in server as well as in mobile platforms. SQLite performance increases by 270% and 75%, in server and in smartphone, respectively. In a server storage, BarrierFS brings as much as by 43 × and by 73× performance gain in MySQL and SQLite, respectively, against EXT4 via relaxing the durability of a transaction.

References

  1. Jens Axboe. 2004. Linux block IO present and future. In Proceedings of the Ottawa Linux Symposium. Ottawa, Ontario, Canada.Google ScholarGoogle Scholar
  2. Steve Best. 2000. JFS Overview. Retrieved from http://jfs.sourceforge.net/project/pub/jfs.pdf.Google ScholarGoogle Scholar
  3. Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Yung-Chun Li, and Hsiang-Pang Li. 2015. Achieving SLC performance with MLC flash memory. In Proceedings of the Design Automation Conference (DAC’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Chen, R. Lee, and X. Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In Proceedings of the IEEE Symposium on High Performance Computer Architecture (HPCA’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Qingshu Chen, Liang Liang, Yubin Xia, Haibo Chen, and Hyunsoo Kim. 2016. Mitigating sync amplification for copy-on-write virtual disk. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16). 241--247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vijay Chidambaram. 2015. Orderless and Eventually Durable File Systems. Ph.D. Dissertation. University of Wisconsin--Madison.Google ScholarGoogle Scholar
  7. Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yong Sung Cho, Il Han Park, Sang Yong Yoon, Nam Hee Lee, Sang Hyun Joo, Ki-Whan Song, Kihwan Choi, Jin-Man Han, Kye Hyun Kyung, and Young-Hyun Jun. 2013. Adaptive multi-pulse program scheme based on tunneling speed classification for next generation multi-bit/cell NAND flash. IEEE J. Solid-State Circ. 48, 4 (2013), 948--959.Google ScholarGoogle ScholarCross RefCross Ref
  10. James Cipar, Greg Ganger, Kimberly Keeton, Charles B Morrey III, Craig AN Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Danny Cobb and Amber Huffman. 2012. NVM express and the PCI express SSD revolution. In Proceedings of the Intel Developer Forum.Google ScholarGoogle Scholar
  12. Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jonathan Corbet. 2010. Barriers and journaling filesystems. Retrieved from http://lwn.net/Articles/283161/.Google ScholarGoogle Scholar
  14. Jonathan Corbet. 2010. The end of block barriers. Retrieved from https://lwn.net/Articles/400541/.Google ScholarGoogle Scholar
  15. Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, and others. 2014. Exploiting bounded staleness to speed up big data analytics. In Proceedings of the USENIX Annual Technical Conference (ATC’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. 2001. Wide-area cooperative storage with CFS. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’01). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Brian Dees. 2005. Native command queuing-advanced performance in desktop storage. IEEE Potent. Mag. 24, 4 (2005), 4--7.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ramez Elmasri. 2008. Fundamentals of Database Systems. Pearson Education India, 815--817.Google ScholarGoogle Scholar
  19. Christopher Frost, Mike Mammarella, Eddie Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, and Lei Zhang. 2007. Generalized file system dependencies. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jongmin Gim and Youjip Won. 2010. Extract and infer quickly: Obtaining sector geometry of modern hard disk drives. ACM Trans. Stor. 6, 2 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Laura M. Grupp, John D. Davis, and Steven Swanson. 2012. The bleak future of NAND flash memory. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jie Guo, Jun Yang, Youtao Zhang, and Yiran Chen. 2013. Low cost power failure protection for MLC NAND flash storage systems with PRAM/DRAM hybrid buffer. In Proceedings of the Design, Automation and Test Conference (DATE’13). 859--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Christoph Hellwig. Patchwork Block: Update Documentation for REQ_FLUSH/REQ_FUA. Retrieved from https://patchwork.kernel.org/patch/134161/.Google ScholarGoogle Scholar
  24. Mark Helm, Jae-Kwan Park, Ali Ghalam, Jason Guo, Chang wan Ha, Cairong Hu, Heonwook Kim, Kalyan Kavalipurapu, Eric Lee, Ali Mohammadzadeh, and others. 2014. 19.1 A 128Gb MLC NAND-flash device using 16nm planar cell. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’14).Google ScholarGoogle ScholarCross RefCross Ref
  25. SK hynix. 2015. eMMC5.1 solution in SK hynix. Retrieved from https://www.skhynix.com/kor/product/nandEMMC.jsp.Google ScholarGoogle Scholar
  26. Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O stack optimization for smartphones. In Proceedings of the USENIX Annual Technical Conference (ATC’13). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. JEDEC Standard JESD220C. 2016. Universal flash storage(UFS) version 2.1.Google ScholarGoogle Scholar
  28. JEDEC Standard JESD84-B51. 2015. Embedded multi-media card(eMMC) electrical standard (5.1).Google ScholarGoogle Scholar
  29. Junbin Kang, Benlong Zhang, Tianyu Wo, Weiren Yu, Lian Du, Shuai Ma, and Jinpeng Huai. 2015. SpanFS: A scalable file system on fast storage devices. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, and Changwoo Min. 2013. X-FTL: Transactional FTL for SQLite databases. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ram Kesavan, Rohit Singh, Travis Grusecki, and Yuvraj Patel. 2017. Algorithms and data structures for efficient free space reclamation in WAFL. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hyeong-Jun Kim and Jin-Soo Kim. 2011. Tuning the Ext4 filesystem performance for android-based smartphones. In Proceedings of the 2011 International Conference on Frontiers in Computer Education (ICFCE'11), Sabo Sambath and Egui Zhu (Eds.), Vol. 133. Springer, 745--752.Google ScholarGoogle Scholar
  33. Youngjae Kim. 2015. An empirical study of redundant array of independent solid-state drives (RAIS). Cluster Comput. 18, 2 (2015), 963--977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alexey Kopytov. 2004. SysBench Manual. Retrieved from http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf.Google ScholarGoogle Scholar
  35. Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Seungjae Lee, Jin-yub Lee, Il-han Park, Jongyeol Park, Sung-won Yun, Min-su Kim, Jong-hoon Lee, Minseok Kim, Kangbin Lee, Taeeun Kim, and others. 2016. 7.5 A 128Gb 2b/cell NAND flash memory in 14nm technology with tPROG=640us and 800MB/s I/O rate. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSC’16).Google ScholarGoogle Scholar
  37. Wongun Lee, Keonwoo Lee, Hankeun Son, Wook-Hee Kim, Beomseok Nam, and Youjip Won. 2015. WALDIO: Eliminating the filesystem journaling in resolving the journaling of journal anomaly. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical disentanglement in a container-based file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Youyou Lu, Jiwu Shu, Jia Guo, Shuai Li, and Onur Mutlu. LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions. In Proceedings of the IEEE IEEE International Conference on Computer Design (ICCD’13).Google ScholarGoogle Scholar
  40. Ashlie Martinez and Vijay Chidambaram. 2017. CrashMonkey: A framework to automatically test file-system crash consistency. In Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’17). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2007. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium 2007.Google ScholarGoogle Scholar
  42. Marshall K. McKusick, Gregory R. Ganger, and others. 1999. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proceedings of the USENIX Annual Technical Conference (ATC’99). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 71--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17, 1 (1992), 94--162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. AB MySQL. 2007. Mysql 5.1 Reference Manual. Sun Microsystems.Google ScholarGoogle Scholar
  48. Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. ACM Trans. Stor. 4, 3 (2008), 10:1--10:23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Okun and A. Barak. 2002. Atomic writes for data integrity and consistency in shared storage devices for clusters. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02).Google ScholarGoogle Scholar
  51. Jiaxin Ou, Jiwu Shu, and Youyou Lu. 2016. A high performance file system for non-volatile main memory. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Xiangyong Ouyang, David Nellans, Robert Wipfel, David Flynn, and Dhabaleswar K Panda. 2011. Beyond block I/O: Rethinking traditional storage primitives. In Proceedings of the IEEE Symposium on High Performance Computer Architecture (HPCA’11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, and Shekoufeh Qawami. 2016. MFENCE and LFENCE micro-architectural implementation method and system. (July 5 2016). US Patent 9,383,998.Google ScholarGoogle Scholar
  54. Stan Park, Terence Kelly, and Kai Shen. 2013. Failure-atomic msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application crash consistency and performance with CCFS. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). Berkeley, CA, 181--196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’05). Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional flash. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). Berkeley, CA, 147--160. http://dl.acm.org/citation.cfm?id=1855741.1855752 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Dhathri Purohith, Jayashree Mohan, and Vijay Chidambaram. 2017. The dangers and complexities of SQLite benchmarking. In Proceedings of the 8th Asia-Pacific Workshop on Systems (APSys’17). ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. H. Rev. 2014. SCSI Commands Reference Manual. Seagate.Google ScholarGoogle Scholar
  60. Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Priya Sehgal, Vasily Tarasov, and Erez Zadok. 2010. Evaluating performance and energy in file system server workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’10). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Margo I. Seltzer, Gregory R. Ganger, Marshall K. McKusick, Keith A. Smith, Craig A. N. Soules, and Christopher A. Stein. 2000. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’00). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Girish Shilamkar. 2007. Journal Checksums. Retrieved from http://wiki.old.lustre.org/images/4/44/Journal-\checksums.pdf.Google ScholarGoogle Scholar
  65. SQLite. 2018. Well-known Users of SQLite. Retrieved from https://www.sqlite.org/famous.html.Google ScholarGoogle Scholar
  66. Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference (ATC’96). Berkeley, CA, 1. http://dl.acm.org/citation.cfm?id=1268299.1268300 Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Toshiba. 2015. Toshiba Expands Line-up of e-MMC Version 5.1 Compliant Embedded NAND Flash Memory Modules. Retrieved from http://toshiba.semicon-storage.com/us/company/taec/news/2015/03/memory-20150323-1.html.Google ScholarGoogle Scholar
  68. Theodore Ts’o. 2015. Using Cache barrier in liue of REQ_FLUSH. Retrieved from http://www.spinics.net/lists/linux-ext4/msg49018.html.Google ScholarGoogle Scholar
  69. Stephen C. Tweedie. 1998. Journaling the linux ext2fs filesystem. In Proceedings of the 4th Annual Linux Expo.Google ScholarGoogle Scholar
  70. Rajat Verma, Anton Ajay Mendez, Stan Park, Sandya Mannarswamy, Terence Kelly, and Charles Morrey. 2015. Failure-atomic updates of application data in a linux file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin. 2013. Robustness in the salus scalable block store. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). USENIX Association, Berkeley, CA, 357--370. http://dl.acm.org/citation.cfm?id=2482626.2482661 Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Andrew Wilson. 2008. The new and improved filebench. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’08). Berkeley, CA.Google ScholarGoogle Scholar
  74. Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance analysis of NVMe SSDs and their implication on real world databases. In Proceedings of the ACM International Systems and Storage Conference (SYSTOR’15). Haifa, Israel. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. S. y. Park, E. Seo, J. Y. Shin, S. Maeng, and J. Lee. 2010. Exploiting internal parallelism of flash-based SSDs. IEEE Comput. Arch. Lett. 9, 1 (2010), 9--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. C. Zhang, Y. Wang, T. Wang, R. Chen, D. Liu, and Z. Shao. 2014. Deterministic crash recovery for NAND flash based storage systems. In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’14). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bringing Order to Chaos: Barrier-Enabled I/O Stack for Flash Storage

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Storage
      ACM Transactions on Storage  Volume 14, Issue 3
      Special Issue on FAST 2018 and Regular Papers
      August 2018
      210 pages
      ISSN:1553-3077
      EISSN:1553-3093
      DOI:10.1145/3282875
      • Editor:
      • Sam H. Noh
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 October 2018
      • Accepted: 1 July 2018
      • Received: 1 June 2018
      Published in tos Volume 14, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!