Abstract
We introduce TxFS, a transactional file system that builds upon a file system’s atomic-update mechanism such as journaling. Though prior work has explored a number of transactional file systems, TxFS has a unique set of properties: a simple API, portability across different hardware, high performance, low complexity (by building on the file-system journal), and full ACID transactions. We port SQLite, OpenLDAP, and Git to use TxFS and experimentally show that TxFS provides strong crash consistency while providing equal or better performance.
- Fsync man page. 2006. Retrieved from http://man7.org/linux/man-pages/man2/fdatasync.2.html.Google Scholar
- Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 228--243. Google Scholar
Digital Library
- Joel Coburn, Trevor Bunker, Meir Schwarz, Rajesh Gupta, and Steven Swanson. 2013. From ARIES to MARS: Transaction support for next-generation, solid-state drives. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 197--212. Google Scholar
Digital Library
- David A. Wheeler. {n.d.}. SLOCCount. https://www.dwheeler.com/sloccount/.Google Scholar
- Narain H. Gehani, H. V. Jagadish, and William D. Roome. 1994. OdeFS: A file system interface to an object-oriented database. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB’94). Citeseer, 249--260. Google Scholar
Digital Library
- Jim Gray. 1981. The transaction concept: Virtues and limitations. In Proceedings of the International Conference on Very Large Data Bases (VLDB’81), Vol. 81. 144--154. Google Scholar
Digital Library
- Jim Gray, Raymond A. Lorie, Gianfranco R. Putzolu, and Irving L. Traiger. 1976. Granularity of locks and degrees of consistency in a shared data base. In Proceedings of the IFIP Working Conference on Modelling in Data Base Management Systems, G. M. Nijssen (Ed.). North-Holland, 365--394.Google Scholar
- Robert Hagmann. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles, Vol. 21. ACM. Google Scholar
Digital Library
- Tsukasa Hamano. {n.d.}. lb—LDAP benchmarking tool like an Apache Bench. https://github.com/hamano/lb.Google Scholar
- Tyler Harter, Chris Dragga, Michael Vaughn, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. A file is not a file: Understanding the I/O behavior of Apple desktop applications. ACM Trans. Comput. Syst. 30, 3 (2012), 10. Google Scholar
Digital Library
- Dave Hitz, James Lau, and Michael Malcolm. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter’94). Google Scholar
Digital Library
- Yige Hu, Younjin Kwon, Vijay Chidambaram, and Emmett Witchel. 2017. From crash consistency to transactions. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (HotOS’17). Google Scholar
Digital Library
- Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Cheng, Vijay Chidambaram, and Emmett Witchel. 2018. TxFS: Leveraging file-system crash consistency to provide ACID transactions. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’18). Haryadi S. Gunawi and Benjamin Reed (Eds.). USENIX Association, 879--891. https://www.usenix.org/conference/atc18/presentation/hu. Google Scholar
Digital Library
- Sitaram Iyer and Peter Druschel. 2001. Anticipatory scheduling: A disk scheduling framework to overcome deceptive idleness in synchronous I/O. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP’01). Google Scholar
Digital Library
- Ashlie Martinez and Vijay Chidambaram. 2017. CrashMonkey: A framework to systematically test file-system crash consistency. In Proceedings of the 9th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’17). Google Scholar
Digital Library
- Microsoft. 2018. Alternatives to using transactional NTFS. Retrieved from https://msdn.microsoft.com/en-us/en-%20us/library/hh802690.aspx.Google Scholar
- Changwoo Min, Woon-Hak Kang, Taesoo Kim, Sang-Won Lee, and Young Ik Eom. 2015. Lightweight application-level crash consistency on transactional flash storage. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’15). 221--234. Google Scholar
Digital Library
- Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. 2018. Finding crash-consistency bugs with bounded black-box crash testing. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18), Andrea C. Arpaci-Dusseau and Geoff Voelker (Eds.). USENIX Association, 33--50. Retrieved from https://www.usenix.org/conference/osdi18/presentation/mohan. Google Scholar
Digital Library
- Nick Murphy, Mark Tonkelowitz, and Mike Vernal. 2001. The design and implementation of the database file system. Retrieved from https://goo.gl/3Gj328.Google Scholar
- Raghunath Nambiar, Meikel Poess, Andrew Masland, H. Reza Taheri, Andrew Bond, Forrest Carman, and Michael Majdalany. 2013. TPC state of the council 2013. In Performance Characterization and Benchmarking—Proceedings of the 5th TPC Technology Conference (TPCTC’13). Revised Selected Papers (Lecture Notes in Computer Science), Raghunath Nambiar and Meikel Poess (Eds.), Vol. 8391. Springer, 1--15. Google Scholar
Digital Library
- Michael A. Olson. 1993. The design and implementation of the inversion file system. In Proceedings of the USENIX Winter Technical Conference (USENIX Winter’93). USENIX Association, 205--218.Google Scholar
- Michael A. Olson, Keith Bostic, and Margo I. Seltzer. 1999. Berkeley DB. In Proceedings of the FREENIX Track: USENIX Annual Technical Conference (USENIX’99). 183--191.Google Scholar
- Stan Park, Terence Kelly, and Kai Shen. 2013. Failure-atomic Msync(): A simple and efficient mechanism for preserving the integrity of durable data. In Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 225--238. Google Scholar
Digital Library
- Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI’14).Google Scholar
Digital Library
- Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating system transactions. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, 161--176. Google Scholar
Digital Library
- Vijayan Prabhakaran, Thomas L. Rodeheffer, and Lidong Zhou. 2008. Transactional flash. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). Richard Draves and Robbert van Renesse (Eds.). USENIX Association, 147--160. Retrieved from http://www.usenix.org/events/osdi08/tech/full_papers/prabhakaran/prabhakaran.pdf. Google Scholar
Digital Library
- Dhathri Purohith, Jayashree Mohan, and Vijay Chidambaram. 2017. The dangers and complexities of SQLite benchmarking. In Proceedings of the 8th Asia-Pacific Workshop on Systems. ACM, 3:1--3:6. Google Scholar
Digital Library
- Mark E. Russinovich, David A. Solomon, and Jim Allchin. 2005. Microsoft Windows Internals: Microsoft Windows Server 2003, Windows XP, and Windows 2000, Vol. 4. Microsoft Press.Google Scholar
- Kai Shen, Stan Park, and Men Zhu. 2014. Journaling of journal is (almost) free. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 287--293. Google Scholar
Digital Library
- Ji-Yong Shin, Mahesh Balakrishnan, Tudor Marian, and Hakim Weatherspoon. 2016. Isotope: Transactional isolation for block storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). Google Scholar
Digital Library
- Riley Spahn, Jonathan Bell, Michael Lee, Sravan Bhamidipati, Roxana Geambasu, and Gail Kaiser. 2014. Pebbles: Fine-grained data management abstractions for modern operating systems. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14). 113--129. Google Scholar
Digital Library
- Richard P. Spillane, Sachin Gaikwad, Manjunath Chinni, Erez Zadok, and Charles P. Wright. 2009. Enabling transactional file access via lightweight kernel extensions. In Proceedings of the 7th USENIX Conference on File and Storage Technologies, Margo I. Seltzer and Richard Wheeler (Eds.). USENIX, 29--42. Retrieved from http://www.usenix.org/events/fast09/tech/full_papers/spillane/spillane.pdf. Google Scholar
Digital Library
- SQLite. {n.d.}. SQLite transactional SQL database engine. Retrieved from http://www.sqlite.org/.Google Scholar
- Symas. {n.d.}. OpenLDAP. Retrieved from https://www.openldap.org/.Google Scholar
- Stephen C. Tweedie. 1998. Journaling the Linux ext2fs file system. In Proceedings of the 4th Annual Linux Expo.Google Scholar
- Rajat Verma, Anton Ajay Mendez, Stan Park, Sandya S. Mannarswamy, Terence Kelly, and Charles B. Morrey III. 2015. Failure-atomic updates of application data in a Linux file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 203--211. Google Scholar
Digital Library
- Charles P. Wright, Richard Spillane, Gopalan Sivathanu, and Erez Zadok. 2007. Extending ACID semantics to the file system. ACM Trans. Storage 3, 2 (2007), 4. Google Scholar
Digital Library
Index Terms
TxFS: Leveraging File-system Crash Consistency to Provide ACID Transactions
Recommendations
SplitFS: reducing software overhead in file systems for persistent memory
SOSP '19: Proceedings of the 27th ACM Symposium on Operating Systems PrinciplesWe present SplitFS, a file system for persistent memory (PM) that reduces software overhead significantly compared to state-of-the-art PM file systems. SplitFS presents a novel split of responsibilities between a user-space library file system and an ...
A multiple-file write scheme for improving write performance of small files in Fast File System
Fast File System (FFS) stores files to disk in separate disk writes, each of which incurs a disk positioning (seek + rotation) limiting the write performance for small files. We propose a new scheme called co-writing to accelerate small file writes in ...
Meta-data snapshotting: a simple mechanism for file system consistency
SNAPI '03: Proceedings of the international workshop on Storage network architecture and parallel I/OsFile system consistency frequently involves a choice between raw performance and integrity guarantees. A few software-based solutions for this problem have appeared and are currently being used on some commercial operating systems; these include log-...






Comments