Abstract
In this article, we propose a simple but practical and efficient optimization scheme for journaling in ext4, called lightweight data journaling (LDJ). By compressing journaled data prior to writing, LDJ can perform comparable to or even faster than the default ordered journaling (OJ) mode in ext4 on top of both HDDs and flash storage devices, while still guaranteeing the version consistency of the data journaling (DJ) mode. This surprising result can be explained with three main reasons. First, on modern storage devices, the sequential write pattern dominating in DJ mode is more and more high-performant than the random one in OJ mode. Second, the compression significantly reduces the amount of journal writes, which will in turn make the write completion faster and prolong the lifespan of storage devices. Third, the compression also enables the atomicity of each journal write without issuing an intervening FLUSH command between journal data blocks and commit block, thus halving the number of costly FLUSH calls in LDJ. We have prototyped our LDJ by slightly modifying the existing ext4 with jbd2 for journaling and also e2fsck for recovery; less than 300 lines of source code were changed. Also, we carried out a comprehensive evaluation using four standard benchmarks and three real applications. Our evaluation results clearly show that LDJ outperforms the OJ mode by up to 9.6× on the real applications.
- Abutalib Aghayev, Theodore Tsâo, Garth Gibson, and Peter Desnoyers. 2017. Evolving Ext4 for shingled disks. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 105--119.Google Scholar
Digital Library
- Vasily Tar Asov, Erez Zadok, and Spencer Shepler. 2016. Filebench: A flexible framework for file system benchmarking. ;login: USENIX Mag. 41, 1 (2016), 6--12.Google Scholar
- Qingshu Chen, Liang Liang, Yubin Xia, and Haibo Chen. 2016. Mitigating sync amplification for copy-on-write virtual disk. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST’16). USENIX Association, 241--247.Google Scholar
Digital Library
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, 228--243.Google Scholar
- Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). USENIX Association.Google Scholar
- Xianzheng Dou, Peter M. Chen, and Jason Flinn. 2017. Knockoff: Cheap versions in the cloud. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 73--87.Google Scholar
- Fred Douglis, Abhinav Duggal, Philip Shilane, Tony Wong, Shiqin Yan, and Fabiano Botelho. 2017. The logic of physical garbage collection in deduplicating storage. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 29--43.Google Scholar
Digital Library
- Sangwook Shane Hahn, Sungjin Lee, Cheng Ji, Li-Pin Chang, Inhyuk Yee, Liang Shi, Chun Jason Xue, and Jihong Kim. 2017. Improving file system performance of mobile storage systems using a decoupled defragmenter. In Proceedings of the USENIX Annual Technical Conference (ATC’17). USENIX Association, 759--771.Google Scholar
- Weiping He and David H. C. Du. 2017. SMaRT: An approach to shingled magnetic recording translation. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 121--133.Google Scholar
Digital Library
- Yige Hu, Youngjin Kwon, Vijay Chidambaram, and Emmett Witchel. 2017. From crash consistency to transactions. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS’17). USENIX Association, 1--8.Google Scholar
Digital Library
- David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. In Proceedings of the Institute of Radio Engineers (IRE’52). IEEE, 1098--1101.Google Scholar
Cross Ref
- Dong Hyun Kang and Young Ik Eom. 2017. TO FLUSH or NOT: Zero padding in the file system with SSD devices. In Proceedings of the 8th ACM Asia-Pacific Workshop on Systems (APSys’17). ACM, 1--9.Google Scholar
Digital Library
- Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The multi-streamed solid-state drive. In Proceedings of the 6th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’14). USENIX Association, 1--5.Google Scholar
Digital Library
- Woon-Hak Kang, Sang-Won Lee, Bongki Moon, Yang-Suk Kee, and Moonwook Oh. 2014. Durable write cache in flash memory SSD for relational and NoSQL databases. In Proceedings of the International Conference on Management of Data (SIGMOD’14). ACM, 529--540.Google Scholar
Digital Library
- Hyukjoong Kim, Dongkun Shin, Yun Ho Jeong, and Kyung Ho Kim. 2017. SHRD: Improving spatial locality in flash storage accesses by sequentializing in host and randomizing in device. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 271--283.Google Scholar
- Hyeong-Jun Kim, Young-Sik Lee, and Jin-Soo Kim. 2016. NVMeDirect: A user-space I/O framework for application-specific optimization on NVMe SSDs. In Proceedings of the 8th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’16). USENIX Association, 1--5.Google Scholar
Digital Library
- Wook-Hee Kim, Beomseok Nam, Dongil Park, and Youjip Won. 2014. Resolving journaling of journal anomaly in Android I/O: Multi-version B-tree with lazy split. In Proceedings of USENIX Conference on File and Storage Technologies (FAST’14). USENIX Association, 273--285.Google Scholar
- Florian Lautenschlager, Michael Philippsen, Andreas Kumlehn, and Josef Adersberger. 2017. Chronix: Long term storage and retrieval technology for anomaly detection in operational data. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 229--242.Google Scholar
- Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 273--286.Google Scholar
Digital Library
- Wongun Lee, Keonwoo Lee, Hankeun Son, Wook-Hee Kim, Beomseok Nam, and Youjip Won. 2015. WALDIO: Eliminating the filesystem journaling in resolving the journaling of journal anomaly. In Proceedings of the USENIX Annual Technical Conferences (ATC’15). USENIX Association, 235--247.Google Scholar
- Ao Ma, Chris Dragga, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. ffsck: The fast file system checker. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). USENIX Association, 1--15.Google Scholar
Digital Library
- Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the USENIX Annual Technical Conference (ATC’15). USENIX Association, 221--234.Google Scholar
Digital Library
- Changwoo Min, Kangnyeon Kim, Hyunjin Cho, Sang-Won Lee, and Young Ik Eom. [n.d.]. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST’12). USENIX Association.Google Scholar
- Daejun Park and Dongkun Shin. 2017. iJournaling: Fine-grained journaling for improving the latency of fsync system call. In Proceedings of the USENIX Annual Technical Conference (ATC’17). USENIX Association, 787--798.Google Scholar
- Stan Park, Terence Kelly, and Kai Shen. 2013. Failure-atomic Msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the 8th ACM European Conference on Computer Systems (EuroSys’13). ACM, 225--238.Google Scholar
Digital Library
- Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan1 Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application crash consistency and performance with CCFS. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). USENIX Association, 181--196.Google Scholar
Digital Library
- Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating system transactions. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’09). ACM, 1--16.Google Scholar
Digital Library
- Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). ACM, 206--220.Google Scholar
- Habibelahi Rahmani, Cihan Topal, and Cuneyt Akinlar. 2014. A parallel huffman coder on the CUDA architecture. In Proceedings of IEEE Visual Communications and Image Processing Conference (VCIP’14). IEEE, 311--314.Google Scholar
Cross Ref
- Abhishek Rajimwale, Vijayan Prabhakaran, Deepak Ramamurthi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2011. Coerced cache eviction and discreet mode journaling: Dealing with misbehaving disks. In Proceedings of the IEEE/IFIP 41st International Conference on Dependable Systems and Networks (DSN’11). IEEE, 518--529.Google Scholar
- Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-tree filesystem. Trans. Storage 9, 3 (Aug. 2013), 9:1--9:32.Google Scholar
Digital Library
- Mendel Rosenblum and John K. Ousterhout. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1 (Feb. 1992), 26--52.Google Scholar
Digital Library
- Kai Shen, Stan Park, and Meng Zhu. 2014. Journaling of journal is (almost) free. In Proceedings of USENIX Conference on File and Storage Technologies (FAST’14). USENIX Association, 287--293.Google Scholar
- Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference (ATC’96). USENIX Association.Google Scholar
Digital Library
- Rajat Verma, Anton Ajay Mendez, Stan Park, Sandya Mannarswamy, Terence Kelly, and Charles B. Morrey. 2015. Failure-atomic updates of application data in a Linux file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 203--211.Google Scholar
- Zev Weiss, Sriram Subramanian, Swaminathan Sundararaman, Nisha Talagala, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2015. ANViL: Advanced virtualization for modern non-volatile memory devices. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). USENIX Association, 111--118.Google Scholar
Digital Library
- Jeseong Yeon, Minseong Jeong, Sungjin Lee, and Eunji Lee. 2018. RFLUSH: Rethink the flush. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’18). USENIX Association, 201--207.Google Scholar
- Jacob Ziv and Abraham Lempel. 1977. A universal algorithm for sequential data compression. IEEE Trans. Info. Theory 23, 3 (May 1977), 337--343.Google Scholar
Digital Library
- Jacob Ziv and Abraham Lempel. 1978. Compression of individual sequences via variable-rate coding. IEEE Trans. Info. Theory 24, 5 (Sept. 1978), 530--536.Google Scholar
- Aviad Zuck, Sivan Toledo, Dmitry Sotnikov, and Danny Harnik. 2014. Compression and SSD: Where and how? In Proceedings of the 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads (INFLOW’14). USENIX Association, 1--10.Google Scholar
Index Terms
LDJ: Version Consistency Is Almost Free on Commercial Storage Devices
Recommendations
sJournal: A New Design of Journaling for File Systems to Provide Crash Consistency
NAS '14: Proceedings of the 2014 9th IEEE International Conference on Networking, Architecture, and StorageMaintain consistency is one of the major challenges faced by modern file systems in the presence of system crashes. File systems have evolved various techniques to provide crash consistency, in which journaling technique is one of the most important. ...
Efficient Crash Consistency for NVMe over PCIe and RDMA
This article presents crash-consistent Non-Volatile Memory Express (ccNVMe), a novel extension of the NVMe that defines how host software communicates with the non-volatile memory (e.g., solid-state drive) across a PCI Express bus and RDMA-capable ...
CosaFS: A Cooperative Shingle-Aware File System
Special Issue on MSST 2017 and Regular PapersIn this article, we design and implement a cooperative shingle-aware file system, called CosaFS, on heterogeneous storage devices that mix solid-state drives (SSDs) and shingled magnetic recording (SMR) technology to improve the overall performance of ...






Comments