skip to main content
research-article
Public Access

Can Applications Recover from fsync Failures?

Published:15 June 2021Publication History
Skip Abstract Section

Abstract

We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability) as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.

References

  1. SQLite. 2020. Atomic Commit in SQLite. Retrieved from https://www.sqlite.org/atomiccommit.html.Google ScholarGoogle Scholar
  2. Anthony Rebello. 2020. Bug-207729 Mounting EXT4 with data_err=abort does not abort journal on data block write failure. Retrieved from https://bugzilla.kernel.org/show_bug.cgi?id=207729.Google ScholarGoogle Scholar
  3. Allen Lai. 2020. Bug-27805553 HARD ERROR SHOULD BE REPORTED WHEN FSYNC() RETURN EIO. Retrieved from https://github.com/mysql/mysql-server/commit/8590c8e12a3374eeccb547359750a9d2a128fa6a.Google ScholarGoogle Scholar
  4. Anthony Rebello. 2020. Custom Fault Injection Device Mapper Target: dm-loki. Retrieved from https://github.com/WiscADSL/dm-loki.Google ScholarGoogle Scholar
  5. The Linux Kernel Organization. 2020. Device Mapper: dm-flakey. Retrieved from https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-flakey.html.Google ScholarGoogle Scholar
  6. The FreeBSD Project. 2020. FreeBSD VFS Layer re-dirties pages after failed block write. Retrieved from https://github.com/freebsd/freebsd/blob/0209fe3398be56e5e042c422a96a4fbc654247f4/sys/kern/vfs_bio.c#L2646.Google ScholarGoogle Scholar
  7. The Linux Kernel Organization. 2020. FSQA (xfstests). Retrieved from https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/about/.Google ScholarGoogle Scholar
  8. Thomas Munro and Craig Ringer. 2020. Fsync Errors—PostgreSQL wiki. Retrieved from https://wiki.postgresql.org/wiki/Fsync_Errors.Google ScholarGoogle Scholar
  9. The Linux Kernel Organization. 2020. fsync(2)—Linux Programmer's Manual. Retrieved from http://man7.org/linux/man-pages/man2/fdatasync.2.html.Google ScholarGoogle Scholar
  10. Anthony Rebello. 2020. FUSE file system to emulate different file-system failure reactions: CuttleFS. Retrieved from https://github.com/WiscADSL/cuttlefs.Google ScholarGoogle Scholar
  11. Jonathan Corbet. 2020. Improved block-layer error handling. Retrieved from https://lwn.net/Articles/724307/.Google ScholarGoogle Scholar
  12. The Stack Exchange network. 2020. Is data=journal safer for Ext4 as opposed to data=ordered? Retrieved from https://unix.stackexchange.com/q/127235.Google ScholarGoogle Scholar
  13. Google. 2020. LevelDB. Retrieved from https://github.com/google/leveldb.Google ScholarGoogle Scholar
  14. Howard Chu. 2020. Lightning Memory-Mapped Database Manager (LMDB). Retrieved from http://www.lmdb.tech/doc/.Google ScholarGoogle Scholar
  15. The Linux Kernel Organization. 2020. Man Pages: dmsetup. Retrieved from https://man7.org/linux/man-pages/man8/dmsetup.8.html.Google ScholarGoogle Scholar
  16. The Linux Kernel Organization. 2020. Man Pages: losetup. Retrieved from https://man7.org/linux/man-pages/man8/losetup.8.html.Google ScholarGoogle Scholar
  17. IEEE and The Open Group. 2020. POSIX Specification for fsync. Retrieved from https://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html.Google ScholarGoogle Scholar
  18. The PostgreSQL Global Development Group. 2020. PostgreSQL. Retrieved from https://www.postgresql.org/.Google ScholarGoogle Scholar
  19. The PostgreSQL Global Development Group. 2020. PostgreSQL: Write-Ahead Logging (WAL). Retrieved from https://www.postgresql.org/docs/current/wal-intro.html.Google ScholarGoogle Scholar
  20. Craig Ringer. 2020. PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS. Retrieved from https://www.postgresql.org/message-id/flat/CA[email protected]Google ScholarGoogle Scholar
  21. Redis Labs. 2020. Redis. Retrieved from https://redis.io/.Google ScholarGoogle Scholar
  22. Redis Labs. 2020. Redis Persistence. Retrieved from https://redis.io/topics/persistence.Google ScholarGoogle Scholar
  23. SQLite. 2020. SQLite. Retrieved from https://www.sqlite.org/index.html.Google ScholarGoogle Scholar
  24. SQLite. 2020. SQLite Write-Ahead Logging. Retrieved from https://www.sqlite.org/wal.html.Google ScholarGoogle Scholar
  25. SystemTap. 2020. SystemTap. Retrieved from https://sourceware.org/systemtap/.Google ScholarGoogle Scholar
  26. Theodore Ts'o. 2020. Why does ext4 clear the dirty bit on I/O error? Retrieved from https://www.postgresql.org/message-id/edc2e4d5-5446-e0db-25da-66db6c020cc3%40commandprompt.comGoogle ScholarGoogle Scholar
  27. WiredTiger. 2020. WT-4045 Don't retry fsync calls after EIO failure. Retrieved from https://github.com/wiredtiger/wiredtiger/commit/ae8bccce3d8a8248afa0e4e0cf67674a43dede96.Google ScholarGoogle Scholar
  28. Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau. 2018. Operating Systems: Three Easy Pieces (1st ed.). Arpaci-Dusseau Books. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lakshmi N. Bairavasundaram, Garth Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2008. An analysis of data corruption in the storage stack. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08). 223–238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, and Jiri Schindler. 2007. An analysis of latent sector errors in disk drives. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’07). 289–300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lakshmi N. Bairavasundaram, Meenali Rungta, Nitin Agrawal, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Michael M. Swift. 2008. Analyzing the effects of disk-pointer corruption. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’08). 502–511.Google ScholarGoogle Scholar
  32. Jinrui Cao, Om Rameshwar Gatla, Mai Zheng, Dong Dai, Vidya Eswarappa, Yan Mu, and Yong Chen. 2018. PFault: A general framework for analyzing the reliability of high-performance parallel file systems. In Proceedings of the International Conference on Supercomputing. 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vijay Chidambaram, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2013. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). 228–243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Vijay Chidambaram, Tushar Sharma, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2012. Consistency without ordering. In Proceedings of the 10th USENIX Symposium on File and Storage Technologies (FAST’12). 101–116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Dmitry Duplyakin, Robert Ricci, Aleksander Maricq, Gary Wong, Jonathon Duerig, Eric Eide, Leigh Stoller, Mike Hibler, David Johnson, Kirk Webb, Aditya Akella, Kuangching Wang, Glenn Ricart, Larry Landweber, Chip Elliott, Michael Zink, Emmanuel Cecchet, Snigdhaswin Kar, and Prabodh Mishra. 2019. The design and operation of CloudLab. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19). 1–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Christian Forfang. 2014. Evaluation of High Performance Key-value Stores. Master’s thesis. Norwegian University of Science and Technology.Google ScholarGoogle Scholar
  37. Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to single errors and corruptions. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 149–165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Gregory R. Ganger and Yale N. Patt. 1994. Metadata update performance in file systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI’94). 49–60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Haryadi S. Gunawi, Vijayan Prabhakaran, Swetha Krishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2007. Improving file system reliability with I/O shepherding. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07). 293–306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Haryadi S. Gunawi, Cindy Rubio-González, Remzi H. Arpaci-Dusseau Andrea C. Arpaci-Dusseau, and Ben Liblit. 2008. EIO: Error handling is occasionally correct. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08). 207–222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Robert Hagmann. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (SOSP’87). 155–162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. FUSE (Filesystem in Userspace). 2020. The reference implementation of the Linux FUSE (Filesystem in Userspace) interface. Retrieved from https://github.com/libfuse/libfuse.Google ScholarGoogle Scholar
  43. Shehbaz Jaffer, Stathis Maneas, Andy Hwang, and Bianca Schroeder. 2019. Evaluating file system reliability on solid state drives. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’19). 783–797. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hannu H. Kari. 1997. Latent Sector Faults and Reliability of Disk Arrays. Ph.D. Dissertation. Helsinki University of Technology.Google ScholarGoogle Scholar
  45. Thomas Kluyver, Benjamin Ragan-Kelley, Fernando Pérez, Brian Granger, Matthias Bussonnier, Jonathan Frederic, Kyle Kelley, Jessica Hamrick, Jason Grout, Sylvain Corlay, Paul Ivanov, Damián Avila, Safia Abdalla, Carol Willing, and Jupyter development team. 2016. Jupyter notebooks—A publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas, Fernando Loizides and Birgit Scmidt (Eds.). IOS Press, Netherlands, 87–90. Retrieved from https://eprints.soton.ac.uk/403913/.Google ScholarGoogle Scholar
  46. Andrew Krioukov, Lakshmi N. Bairavasundaram, Garth R. Goodson, Kiran Srinivasan, Randy Thelen, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2008. Parity lost and parity regained. In Proceedings of the 6th USENIX Symposium on File and Storage Technologies (FAST’08). 127–141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Avantika Mathur, Mingming Cao, and Andreas Dilger. 2007. Ext4: The next generation of the Ext3 file system. Usenix Assoc. 32, 3 (June 2007), 25–30.Google ScholarGoogle Scholar
  48. Jeffrey C. Mogul. 1994. A better update policy. In Proceedings of the USENIX Summer Technical Conference (USENIX Summer’94). 99–111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli, Pandian Raju, and Vijay Chidambaram. 2018. Finding crash-consistency bugs with bounded black-box crash testing. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 33–50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Thanumalayan Sankaranarayana Pillai, Ramnatthan Alagappan, Lanyue Lu, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. Application crash consistency and performance with CCFS. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 181–196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Ramnatthan Alagappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2014. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI’14). 433–448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. Model-based failure analysis of journaling file systems. In Proceedings of the International Conference on Dependable Systems and Networks (DSN’05). 802–811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Vijayan Prabhakaran, Lakshmi N. Bairavasundaram, Nitin Agrawal, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2005. IRON file systems. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP’05). 206–220. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Anthony Rebello, Yuvraj Patel, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2020. Can applications recover from fsync failures? In Proceedings of the USENIX Annual Technical Conference (USENIX ATC’20). 753–767.Google ScholarGoogle Scholar
  55. Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-tree filesystem. ACM Trans. 9, 3 (Aug. 2013), 1–32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Bianca Schroeder, Sotirios Damouras, and Phillipa Gill. 2010. Understanding latent sector errors and how to protect against them. In Proceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST’10). 71–84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Margo Seltzer, Peter Chen, and John Ousterhout. 1990. Disk scheduling revisited. In Proceedings of the Winter 1990 USENIX Conference. 313–323.Google ScholarGoogle Scholar
  58. Chuck Silvers. 2000. UBC: An efficient unified I/O and memory caching subsystem for NetBSD. In Proceedings of the USENIX Annual Technical Conference: FREENIX Track. 285–290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tomas Vondra. 2019. PostgreSQL vs. fsync. How is it possible that PostgreSQL used fsync incorrectly for 20 years, and what we’ll do about it.Brussels, Belgium. Retrieved from https://archive.fosdem.org/2019/schedule/event/postgresql_fsync/.Google ScholarGoogle Scholar
  61. Youjip Won, Jaemin Jung, Gyeongyeol Choi, Joontaek Oh, Seongbae Son, Jooyoung Hwang, and Sangyeun Cho. 2018. Barrier-enabled IO stack for flash storage. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 211–226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2010. End-to-end data integrity for file systems: A ZFS case study. In Proceedings of the 8th USENIX Symposium on File and Storage Technologies (FAST’10). San Jose, CA, 29–42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yiying Zhang and Steven Swanson. 2015. A study of application performance with non-volatile main memory. In Proceedings of the 31st IEEE Conference on Massive Data Storage (MSST’15). Santa Clara, CA, 1–10.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Can Applications Recover from fsync Failures?

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Storage
              ACM Transactions on Storage  Volume 17, Issue 2
              May 2021
              202 pages
              ISSN:1553-3077
              EISSN:1553-3093
              DOI:10.1145/3465461
              • Editor:
              • Sam H. Noh
              Issue’s Table of Contents

              Copyright © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 15 June 2021
              • Accepted: 1 February 2021
              • Received: 1 November 2020
              Published in tos Volume 17, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed
            • Article Metrics

              • Downloads (Last 12 months)531
              • Downloads (Last 6 weeks)44

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!