Abstract
This work is dedicated to eliminating the overhead required for guaranteeing the storage order in the modern IO stack. The existing block device adopts a prohibitively expensive approach in ensuring the storage order among write requests: interleaving the write requests with Transfer-and-Flush. For exploiting the cache barrier command for flash storage, we overhaul the IO scheduler, the dispatch module, and the filesystem so that these layers are orchestrated to preserve the ordering condition imposed by the application with which the associated data blocks are made durable. The key ingredients of Barrier-Enabled IO stack are Epoch-based IO scheduling, Order-Preserving Dispatch, and Dual-Mode Journaling. Barrier-enabled IO stack can control the storage order without Transfer-and-Flush overhead. We implement the barrier-enabled IO stack in server as well as in mobile platforms. SQLite performance increases by 270% and 75%, in server and in smartphone, respectively. In a server storage, BarrierFS brings as much as by 43 × and by 73× performance gain in MySQL and SQLite, respectively, against EXT4 via relaxing the durability of a transaction.
- Jens Axboe. 2004. Linux block IO present and future. In Proceedings of the Ottawa Linux Symposium. Ottawa, Ontario, Canada.Google Scholar
- Steve Best. 2000. JFS Overview. Retrieved from http://jfs.sourceforge.net/project/pub/jfs.pdf.Google Scholar
- Yu-Ming Chang, Yuan-Hao Chang, Tei-Wei Kuo, Yung-Chun Li, and Hsiang-Pang Li. 2015. Achieving SLC performance with MLC flash memory. In Proceedings of the Design Automation Conference (DAC’15). Google Scholar
Digital Library
- F. Chen, R. Lee, and X. Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In Proceedings of the IEEE Symposium on High Performance Computer Architecture (HPCA’11). Google Scholar
Digital Library
- Qingshu Chen, Liang Liang, Yubin Xia, Haibo Chen, and Hyunsoo Kim. 2016. Mitigating sync amplification for copy-on-write virtual disk. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16). 241--247. Google Scholar
Digital Library
- Vijay Chidambaram. 2015. Orderless and Eventually Durable File Systems. Ph.D. Dissertation. University of Wisconsin--Madison.Google Scholar
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’13). Google Scholar
Digital Library
- Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). Google Scholar
Digital Library
- Yong Sung Cho, Il Han Park, Sang Yong Yoon, Nam Hee Lee, Sang Hyun Joo, Ki-Whan Song, Kihwan Choi, Jin-Man Han, Kye Hyun Kyung, and Young-Hyun Jun. 2013. Adaptive multi-pulse program scheme based on tunneling speed classification for next generation multi-bit/cell NAND flash. IEEE J. Solid-State Circ. 48, 4 (2013), 948--959.Google Scholar
Cross Ref
- James Cipar, Greg Ganger, Kimberly Keeton, Charles B Morrey III, Craig AN Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’12). Google Scholar
Digital Library
- Danny Cobb and Amber Huffman. 2012. NVM express and the PCI express SSD revolution. In Proceedings of the Intel Developer Forum.Google Scholar
- Jeremy Condit, Edmund B. Nightingale, Christopher Frost, Engin Ipek, Benjamin Lee, Doug Burger, and Derrick Coetzee. 2009. Better I/O through byte-addressable, persistent memory. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’09). Google Scholar
Digital Library
- Jonathan Corbet. 2010. Barriers and journaling filesystems. Retrieved from http://lwn.net/Articles/283161/.Google Scholar
- Jonathan Corbet. 2010. The end of block barriers. Retrieved from https://lwn.net/Articles/400541/.Google Scholar
- Henggang Cui, James Cipar, Qirong Ho, Jin Kyu Kim, Seunghak Lee, Abhimanu Kumar, Jinliang Wei, Wei Dai, Gregory R. Ganger, Phillip B. Gibbons, and others. 2014. Exploiting bounded staleness to speed up big data analytics. In Proceedings of the USENIX Annual Technical Conference (ATC’14). Google Scholar
Digital Library
- Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. 2001. Wide-area cooperative storage with CFS. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’01). Google Scholar
Digital Library
- Brian Dees. 2005. Native command queuing-advanced performance in desktop storage. IEEE Potent. Mag. 24, 4 (2005), 4--7.Google Scholar
Cross Ref
- Ramez Elmasri. 2008. Fundamentals of Database Systems. Pearson Education India, 815--817.Google Scholar
- Christopher Frost, Mike Mammarella, Eddie Kohler, Andrew de los Reyes, Shant Hovsepian, Andrew Matsuoka, and Lei Zhang. 2007. Generalized file system dependencies. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’07). Google Scholar
Digital Library
- Jongmin Gim and Youjip Won. 2010. Extract and infer quickly: Obtaining sector geometry of modern hard disk drives. ACM Trans. Stor. 6, 2 (2010). Google Scholar
Digital Library
- Laura M. Grupp, John D. Davis, and Steven Swanson. 2012. The bleak future of NAND flash memory. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’12). 1. Google Scholar
Digital Library
- Jie Guo, Jun Yang, Youtao Zhang, and Yiran Chen. 2013. Low cost power failure protection for MLC NAND flash storage systems with PRAM/DRAM hybrid buffer. In Proceedings of the Design, Automation and Test Conference (DATE’13). 859--864. Google Scholar
Digital Library
- Christoph Hellwig. Patchwork Block: Update Documentation for REQ_FLUSH/REQ_FUA. Retrieved from https://patchwork.kernel.org/patch/134161/.Google Scholar
- Mark Helm, Jae-Kwan Park, Ali Ghalam, Jason Guo, Chang wan Ha, Cairong Hu, Heonwook Kim, Kalyan Kavalipurapu, Eric Lee, Ali Mohammadzadeh, and others. 2014. 19.1 A 128Gb MLC NAND-flash device using 16nm planar cell. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’14).Google Scholar
Cross Ref
- SK hynix. 2015. eMMC5.1 solution in SK hynix. Retrieved from https://www.skhynix.com/kor/product/nandEMMC.jsp.Google Scholar
- Sooman Jeong, Kisung Lee, Seongjin Lee, Seoungbum Son, and Youjip Won. 2013. I/O stack optimization for smartphones. In Proceedings of the USENIX Annual Technical Conference (ATC’13). Berkeley, CA. Google Scholar
Digital Library
- JEDEC Standard JESD220C. 2016. Universal flash storage(UFS) version 2.1.Google Scholar
- JEDEC Standard JESD84-B51. 2015. Embedded multi-media card(eMMC) electrical standard (5.1).Google Scholar
- Junbin Kang, Benlong Zhang, Tianyu Wo, Weiren Yu, Lian Du, Shuai Ma, and Jinpeng Huai. 2015. SpanFS: A scalable file system on fast storage devices. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA. Google Scholar
Digital Library
- Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Gi-Hwan Oh, and Changwoo Min. 2013. X-FTL: Transactional FTL for SQLite databases. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’13). Google Scholar
Digital Library
- Ram Kesavan, Rohit Singh, Travis Grusecki, and Yuvraj Patel. 2017. Algorithms and data structures for efficient free space reclamation in WAFL. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, Berkeley, CA, 1--14. Google Scholar
Digital Library
- Hyeong-Jun Kim and Jin-Soo Kim. 2011. Tuning the Ext4 filesystem performance for android-based smartphones. In Proceedings of the 2011 International Conference on Frontiers in Computer Education (ICFCE'11), Sabo Sambath and Egui Zhu (Eds.), Vol. 133. Springer, 745--752.Google Scholar
- Youngjae Kim. 2015. An empirical study of redundant array of independent solid-state drives (RAIS). Cluster Comput. 18, 2 (2015), 963--977. Google Scholar
Digital Library
- Alexey Kopytov. 2004. SysBench Manual. Retrieved from http://imysql.com/wp-content/uploads/2014/10/sysbench-manual.pdf.Google Scholar
- Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA. Google Scholar
Digital Library
- Seungjae Lee, Jin-yub Lee, Il-han Park, Jongyeol Park, Sung-won Yun, Min-su Kim, Jong-hoon Lee, Minseok Kim, Kangbin Lee, Taeeun Kim, and others. 2016. 7.5 A 128Gb 2b/cell NAND flash memory in 14nm technology with tPROG=640us and 800MB/s I/O rate. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSC’16).Google Scholar
- Wongun Lee, Keonwoo Lee, Hankeun Son, Wook-Hee Kim, Beomseok Nam, and Youjip Won. 2015. WALDIO: Eliminating the filesystem journaling in resolving the journaling of journal anomaly. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA. Google Scholar
Digital Library
- Lanyue Lu, Yupu Zhang, Thanh Do, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. Physical disentanglement in a container-based file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). Google Scholar
Digital Library
- Youyou Lu, Jiwu Shu, Jia Guo, Shuai Li, and Onur Mutlu. LightTx: A lightweight transactional design in flash-based SSDs to support flexible transactions. In Proceedings of the IEEE IEEE International Conference on Computer Design (ICCD’13).Google Scholar
- Ashlie Martinez and Vijay Chidambaram. 2017. CrashMonkey: A framework to automatically test file-system crash consistency. In Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’17). Google Scholar
Digital Library
- Avantika Mathur, Mingming Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2007. The new ext4 filesystem: Current status and future plans. In Proceedings of the Linux Symposium 2007.Google Scholar
- Marshall K. McKusick, Gregory R. Ganger, and others. 1999. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proceedings of the USENIX Annual Technical Conference (ATC’99). Google Scholar
Digital Library
- Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the USENIX Annual Technical Conference (ATC’15). Berkeley, CA. Google Scholar
Digital Library
- Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’16). Google Scholar
Digital Library
- Changwoo Min, Sanidhya Kashyap, Steffen Maass, and Taesoo Kim. 2016. Understanding manycore scalability of file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’16). 71--85. Google Scholar
Digital Library
- C Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Trans. Database Syst. 17, 1 (1992), 94--162. Google Scholar
Digital Library
- AB MySQL. 2007. Mysql 5.1 Reference Manual. Sun Microsystems.Google Scholar
- Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008. Write off-loading: Practical power management for enterprise storage. ACM Trans. Stor. 4, 3 (2008), 10:1--10:23. Google Scholar
Digital Library
- Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn. 2006. Rethink the sync. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’06). Google Scholar
Digital Library
- M. Okun and A. Barak. 2002. Atomic writes for data integrity and consistency in shared storage devices for clusters. In Proceedings of the International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02).Google Scholar
- Jiaxin Ou, Jiwu Shu, and Youyou Lu. 2016. A high performance file system for non-volatile main memory. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’16). Google Scholar
Digital Library
- Xiangyong Ouyang, David Nellans, Robert Wipfel, David Flynn, and Dhabaleswar K Panda. 2011. Beyond block I/O: Rethinking traditional storage primitives. In Proceedings of the IEEE Symposium on High Performance Computer Architecture (HPCA’11). Google Scholar
Digital Library
- Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, and Shekoufeh Qawami. 2016. MFENCE and LFENCE micro-architectural implementation method and system. (July 5 2016). US Patent 9,383,998.Google Scholar
- Stan Park, Terence Kelly, and Kai Shen. 2013. Failure-atomic msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the ACM European Conference on Computer Systems (EuroSys’13). Google Scholar
Digital Library
- Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application crash consistency and performance with CCFS. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’17). Berkeley, CA, 181--196. Google Scholar
Digital Library
- Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP’05). Google Scholar
Digital Library
- Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional flash. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). Berkeley, CA, 147--160. http://dl.acm.org/citation.cfm?id=1855741.1855752 Google Scholar
Digital Library
- Dhathri Purohith, Jayashree Mohan, and Vijay Chidambaram. 2017. The dangers and complexities of SQLite benchmarking. In Proceedings of the 8th Asia-Pacific Workshop on Systems (APSys’17). ACM, New York, NY. Google Scholar
Digital Library
- H. Rev. 2014. SCSI Commands Reference Manual. Seagate.Google Scholar
- Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013). Google Scholar
Digital Library
- Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52. Google Scholar
Digital Library
- Priya Sehgal, Vasily Tarasov, and Erez Zadok. 2010. Evaluating performance and energy in file system server workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’10). Berkeley, CA. Google Scholar
Digital Library
- Margo I. Seltzer, Gregory R. Ganger, Marshall K. McKusick, Keith A. Smith, Craig A. N. Soules, and Christopher A. Stein. 2000. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proceedings of the USENIX Annual Technical Conference (ATC’00). Berkeley, CA. Google Scholar
Digital Library
- Girish Shilamkar. 2007. Journal Checksums. Retrieved from http://wiki.old.lustre.org/images/4/44/Journal-\checksums.pdf.Google Scholar
- SQLite. 2018. Well-known Users of SQLite. Retrieved from https://www.sqlite.org/famous.html.Google Scholar
- Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference (ATC’96). Berkeley, CA, 1. http://dl.acm.org/citation.cfm?id=1268299.1268300 Google Scholar
Digital Library
- Toshiba. 2015. Toshiba Expands Line-up of e-MMC Version 5.1 Compliant Embedded NAND Flash Memory Modules. Retrieved from http://toshiba.semicon-storage.com/us/company/taec/news/2015/03/memory-20150323-1.html.Google Scholar
- Theodore Ts’o. 2015. Using Cache barrier in liue of REQ_FLUSH. Retrieved from http://www.spinics.net/lists/linux-ext4/msg49018.html.Google Scholar
- Stephen C. Tweedie. 1998. Journaling the linux ext2fs filesystem. In Proceedings of the 4th Annual Linux Expo.Google Scholar
- Rajat Verma, Anton Ajay Mendez, Stan Park, Sandya Mannarswamy, Terence Kelly, and Charles Morrey. 2015. Failure-atomic updates of application data in a linux file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA Google Scholar
Digital Library
- Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike Dahlin. 2013. Robustness in the salus scalable block store. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation (NSDI’13). USENIX Association, Berkeley, CA, 357--370. http://dl.acm.org/citation.cfm?id=2482626.2482661 Google Scholar
Digital Library
- Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’15). Berkeley, CA. Google Scholar
Digital Library
- Andrew Wilson. 2008. The new and improved filebench. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’08). Berkeley, CA.Google Scholar
- Qiumin Xu, Huzefa Siyamwala, Mrinmoy Ghosh, Tameesh Suri, Manu Awasthi, Zvika Guz, Anahita Shayesteh, and Vijay Balakrishnan. 2015. Performance analysis of NVMe SSDs and their implication on real world databases. In Proceedings of the ACM International Systems and Storage Conference (SYSTOR’15). Haifa, Israel. Google Scholar
Digital Library
- S. y. Park, E. Seo, J. Y. Shin, S. Maeng, and J. Lee. 2010. Exploiting internal parallelism of flash-based SSDs. IEEE Comput. Arch. Lett. 9, 1 (2010), 9--12. Google Scholar
Digital Library
- C. Zhang, Y. Wang, T. Wang, R. Chen, D. Liu, and Z. Shao. 2014. Deterministic crash recovery for NAND flash based storage systems. In Proceedings of the ACM/EDAC/IEEE Design Automation Conference (DAC’14). Google Scholar
Digital Library
Index Terms
Bringing Order to Chaos: Barrier-Enabled I/O Stack for Flash Storage
Recommendations
The design and implementation of an extensible network backup system in realtime
ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and CommunicationThis paper proposes a backup system based on mirroring filesystem "GMFS." GMFS has been developed to mirror data in realtime on the filesystem layer. The GMFS is a stackable filesystem which flexibly mirrors without changing the existing environment by ...
File System Usage in Android Mobile Phones
SYSTOR '16: Proceedings of the 9th ACM International on Systems and Storage ConferenceIn this paper, we report on the analysis of data from Android mobile phones of 38 users, composed of access traces of the users' mobile file systems during 30 days. We shed new light on the file usage patterns and present the data in terms of file size ...
hashFS: Applying Hashing to Optimize File Systems for Small File Reads
SNAPI '10: Proceedings of the 2010 International Workshop on Storage Network Architecture and Parallel I/OsToday’s file systems typically need multiple disk accesses for a single read operation of a file. In the worst case, when none of the needed data is already in the cache, the metadata for each component of the file path has to be read in. Once the ...






Comments