skip to main content
research-article

Copy-on-Abundant-Write for Nimble File System Clones

Published:29 January 2021Publication History
Skip Abstract Section

Abstract

Making logical copies, or clones, of files and directories is critical to many real-world applications and workflows, including backups, virtual machines, and containers. An ideal clone implementation meets the following performance goals: (1) creating the clone has low latency; (2) reads are fast in all versions (i.e., spatial locality is always maintained, even after modifications); (3) writes are fast in all versions; (4) the overall system is space efficient. Implementing a clone operation that realizes all four properties, which we call a nimble clone, is a long-standing open problem.

This article describes nimble clones in B-ϵ-tree File System (BetrFS), an open-source, full-path-indexed, and write-optimized file system. The key observation behind our work is that standard copy-on-write heuristics can be too coarse to be space efficient, or too fine-grained to preserve locality. On the other hand, a write-optimized key-value store, such as a Bε-tree or an log-structured merge-tree (LSM)-tree, can decouple the logical application of updates from the granularity at which data is physically copied. In our write-optimized clone implementation, data sharing among clones is only broken when a clone has changed enough to warrant making a copy, a policy we call copy-on-abundant-write.

We demonstrate that the algorithmic work needed to batch and amortize the cost of BetrFS clone operations does not erode the performance advantages of baseline BetrFS; BetrFS performance even improves in a few cases. BetrFS cloning is efficient; for example, when using the clone operation for container creation, BetrFS outperforms a simple recursive copy by up to two orders-of-magnitude and outperforms file systems that have specialized Linux Containers (LXC) backends by 3--4×.

References

  1. 1985. Vax/VMS System Software Handbook.Google ScholarGoogle Scholar
  2. Michael A. Bender, Jake Christensen, Alex Conway, Martin Farach-Colton, Rob Johnson, and Meng-Tsung Tsai. 2019. Optimal ball recycling. In SODA. SIAM, 2527--2546. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michael A. Bender, Richard Cole, Erik D. Demaine, and Martin Farach-Colton. 2002. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In ESA (Lecture Notes in Computer Science), Vol. 2461. Springer, 139--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael A. Bender, Alex Conway, Martin Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan. 2019. Small refinements to the DAM can have big consequences for data-structure design. In SPAA. ACM, 265--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. 2007. Cache-oblivious streaming B-trees. In Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang Zhan. 2015. An introduction to Bϵ-trees and write-optimization. :login; Magazine 40, 5 (Oct 2015), 22--28.Google ScholarGoogle Scholar
  7. Michael A. Bender, Martín Farach-Colton, Rob Johnson, Simon Mauras, Tyler Mayer, Cynthia A. Phillips, and Helen Xu. 2017. Write-optimized skip lists. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. ACM, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniel G. Bobrow, Jerry D. Burchfiel, Daniel L. Murphy, and Raymond S. Tomlinson. 1972. TENEX, a paged time sharing system for the PDP - 10. Commun. ACM 15, 3 (March 1972), 135--143. DOI:https://doi.org/10.1145/361268.361271 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bill Bolosky, Scott Corbin, David Goebel, and John (JD) Douceur. 2000. Single instance storage in Windows 2000. In Proceedings of 4th USENIX Windows Systems Symposium (proceedings of 4th usenix windows systems symposium ed.). USENIX. https://www.microsoft.com/en-us/research/publication/single-instance-storage-in-windows-2000/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gerth Stolting Brodal and Rolf Fagerberg. 2003. Lower bounds for external memory dictionaries. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. 546--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sailesh Chutani, Owen T. Anderson, Michael L. Kazar, Bruce W. Leverett, W. Anthony Mason, Robert N. Sidebotham, et al. 1992. The Episode file system. In Proceedings of the USENIX Winter 1992 Technical Conference. 43--60.Google ScholarGoogle Scholar
  12. Alex Conway, Ainesh Bakshi, Yizheng Jiao, Yang Zhan, Michael A. Bender, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Martin Farach-Colton. 2017. File systems fated for senescence? Nonsense, says Science! In Proceedings of the 15th Usenix Conference on File and Storage Technologies. 45--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alex Conway, Ainesh Bakshi, Yizheng Jiao, Yang Zhan, Michael A. Bender, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Martin Farach-Colton. 2017. How to fragment your file system. :login; Magazine 42, 2 (Summer 2017), 22--28.Google ScholarGoogle Scholar
  14. Alexander Conway, Martin Farach-Colton, and Philip Shilane. 2018. Optimal Hashing in External Memory. In ICALP (LIPIcs), Vol. 107. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 39:1--39:14.Google ScholarGoogle Scholar
  15. Alex Conway, Eric Knorr, Yizheng Jiao, Michael A. Bender, William Jannen, Rob Johnson, Donald Porter, and Martin Farach-Colton. 2019. Filesystem aging: It’s more usage than fullness. In 11th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 19). USENIX Association, Renton, WA. https://www.usenix.org/conference/hotstorage19/presentation/conway. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chris Dragga and Douglas J. Santry. 2016. GCTrees: Garbage collecting snapshots. ACM Transactions on Storage 12, 1 (2016), 4:1--4:32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. John K. Edwards, Daniel Ellard, Craig Everhart, Robert Fair, Eric Hamilton, Andy Kahn, Arkady Kanevsky, James Lentini, Ashish Prakash, Keith A. Smith, and Edward Zayas. 2008. FlexVol: Flexible, efficient file volume virtualization in WAFL. In Proceedings of the 2008 USENIX Annual Technical Conference. 129--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. John Esmet, Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. 2012. The TokuFS streaming file system. In Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jan Finis, Robert Brunel, Alfons Kemper, Thomas Neumann, Norman May, and Franz Faerber. 2015. Indexing highly dynamic hierarchical data. In VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dave Hitz, James Lau, and Michael Malcolm. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter 1994 Technical Conference. 19--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. John H. Howard, Michael L. Kazar, Sherri G. Menees, David A. Nichols, M. Satyanarayanan, Robert N. Sidebotham, and Michael J. West. 1988. Scale and performance in a distributed file system. ACM Transactions on Computer Systems 6, 1 (1988), 51--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies. 301--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: Write-optimization in a kernel file system. ACM Transactions on Storage 11, 4 (2015), 18:1--18:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Ryusuke Konishi, Yoshiji Amagai, Koji Sato, Hisashi Hifumi, Seiji Kihara, and Satoshi Moriai. 2006. The Linux implementation of a log-structured file system. SIGOPS Operating Systems Review 40, 3 (2006), 102--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Philip L. Lehman and s. Bing Yao. 1981. Efficient locking for concurrent operations on B-trees. ACM Transactions on Database Systems 6, 4 (Dec. 1981). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marshall Kirk McKusick and Gregory R. Ganger. 1999. Soft updates: A technique for eliminating most synchronous writes in the fast filesystem. In Proceedings of the 1999 USENIX Annual Technical Conference. 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Digital Equipment Corporation (DEC). 1988. Digital Equipment Corporation (DEC). TOPS-20 User's manual. http://www.bourguet.org/v2/pdp10/users/index.Google ScholarGoogle Scholar
  28. Kiran-Kumar Muniswamy-Reddy, Charles P. Wright, Andrew Himmer, and Erez Zadok. 2004. A versatile and user-oriented versioning file system. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies. 115--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Prashanth Nayak and Robert Ricci. 2013. Detailed study on Linux logical volume manager. Flux Research Group University of Utah (2013).Google ScholarGoogle Scholar
  30. Patrick O’Neil, Edward Cheng, Dieter Gawlic, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351--385. DOI:https://doi.org/10.1007/s002360050048 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Zachary Peterson and Randal Burns. 2005. Ext3Cow: A time-shifting file system for regulatory compliance. ACM Transactions on Storage 1, 2 (2005), 190--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Rob Pike, Dave Presotto, Ken Thompson, and Howard Trickey. 1990. Plan 9 from bell labs. In Proceedings of the Summer 1990 UKUUG Conference. 1--9.Google ScholarGoogle Scholar
  33. Ohad Rodeh. 2008. B-trees, shadowing, and clones. ACM Transactions on Storage 3, 4 (2008), 2:1--2:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage 9, 3 (2013), 9:1--9:32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Douglas S. Santry, Michael J. Feeley, Norman C. Hutchinson, Alistair C. Veitch, Ross W. Carton, and Jacob Ofir. 1999. Deciding when to forget in the elephant file system. In Proceedings of the 17th ACM Symposium on Operating Systems Principles. 110--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Mike Schroeder, David K. Gifford, and Roger M. Needham. 1985. A caching file system for a programmer’s workstation. In Proceedings of the 10th ACM Symposium on Operating Systems Principles. ACM, Inc. https://www.microsoft.com/en-us/research/publication/a-caching-file-system-for-a-programmers-workstation/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Craig A. N. Soules, Garth R. Goodson, John D. Strunk, and Gregory R. Ganger. 2003. Metadata efficiency in versioning file systems. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies. 43--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Richard P. Spillane, Wenguang Wang, Luke Lu, Maxime Austruy, Rawlinson Rivera, and Christos Karamanolis. 2016. Exo-clones: Better container runtime image management across the clouds. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16). USENIX Association, Denver, CO. https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/spillane. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vasily Tarasov, Lukas Rupprecht, Dimitris Skourtis, Wenji Li, Raju Rangaswami, and Ming Zhao. 2019. Evaluating Docker storage performance: From workloads to graph drivers. Cluster Computing (2019), 1--14.Google ScholarGoogle Scholar
  40. Vasily Tarasov, Lukas Rupprecht, Dimitris Skourtis, Amit Warke, Dean Hildebrand, Mohamed Mohamed, Nagapramod Mandagere, Wenji Li, Raju Rangaswami, and Ming Zhao. 2017. In search of the ideal storage configuration for Docker containers. In Proceedings of the 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS* W). IEEE, 199--206.Google ScholarGoogle ScholarCross RefCross Ref
  41. Veritas. 2019. Veritas System Recovery. Retreived from https://www.veritas.com/product/backup-and-recovery/system-recovery.Google ScholarGoogle Scholar
  42. Xingbo Wu, Wenguang Wang, and Song Jiang. 2015. Totalcow: Unleash the power of copy-on-write for thin-provisioned containers. In Proceedings of the 6th Asia-Pacific Workshop on Systems. ACM, 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. 2017. NOVA-Fortis: A fault-tolerant non-volatile main memory file system. In Proceedings of the 26th Symposium on Operating Systems Principles. 478--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2016. Optimizing every operation in a write-optimized file system. In Proceedings of the 14th Usenix Conference on File and Storage Technologies. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2017. Writes wrought right, and other adventures in file system optimization. ACM Transactions on Storage 13, 1 (2017), 3:1--3:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. ZFS. [n.d.]. Retrieved July 5, 2018 from http://zfsonlinux.org/.Google ScholarGoogle Scholar
  47. Yang Zhan, Alex Conway, Yizheng Jiao, Eric Knorr, Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Donald E. Porter, and Jun Yuan. 2018. The full path to full-path indexing. In Proceedings of the 16th USENIX Conference on File and Storage Technologies. 123--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yang Zhan, Yizheng Jiao, Donald E. Porter, Alex Conway, Eric Knorr, Martin Farach-Colton, Michael A. Bender, Jun Yuan, William Jannen, and Rob Johnson. 2018. Efficient directory mutations in a full-path-indexed file system. ACM Transactions on Storage 14, 3 (2018), 22:1--22:27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Frank Zhao, Kevin Xu, and Randy Shain. 2016. Improving Copy-on-Write Performance in Container Storage Drivers. Storage Developer’s Conference.Google ScholarGoogle Scholar

Index Terms

  1. Copy-on-Abundant-Write for Nimble File System Clones

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 17, Issue 1
          Special Section on Usenix Fast 2020
          February 2021
          165 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3446939
          • Editor:
          • Sam H. Noh
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 January 2021
          • Accepted: 1 September 2020
          • Received: 1 June 2020
          Published in tos Volume 17, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)42
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!