skip to main content
research-article
Public Access

Efficient Directory Mutations in a Full-Path-Indexed File System

Published:26 November 2018Publication History
Skip Abstract Section

Abstract

Full-path indexing can improve I/O efficiency for workloads that operate on data organized using traditional, hierarchical directories, because data is placed on persistent storage in scan order. Prior results indicate, however, that renames in a local file system with full-path indexing are prohibitively expensive.

This article shows how to use full-path indexing in a file system to realize fast directory scans, writes, and renames. The article introduces a range-rename mechanism for efficient key-space changes in a write-optimized dictionary. This mechanism is encapsulated in the key-value Application Programming Interface (API) and simplifies the overall file system design.

We implemented this mechanism in Bε-trees File System (BetrFS), an in-kernel, local file system for Linux. This new version, BetrFS 0.4, performs recursive greps 1.5x faster and random writes 1.2x faster than BetrFS 0.3, but renames are competitive with indirection-based file systems for a range of sizes. BetrFS 0.4 outperforms BetrFS 0.3, as well as traditional file systems, such as ext4, Extents File System (XFS), and Z File System (ZFS), across a variety of workloads.

References

  1. Yanif Ahmad, Oliver Kennedy, Christoph Koch, and Milos Nikolic. 2012. DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Proc. VLDB Endow. 5, 10 (2012), 968--979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yanif Ahmad and Christoph Koch. 2009. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. Proc. VLDB Endow. 2, 2 (2009), 1566--1569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Itaru Nishizawa, Justin Rosenstein, and Jennifer Widom. 2003. STREAM: The Stanford stream data manager (demonstration description). In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD’03). 665--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Oana Balmau, Rachid Guerraoui, Vasileios Trigonakis, and Igor Zablotchi. 2017. FloDB: Unlocking memory in persistent key-value stores. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys’17). 80--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Michael A. Bender, Richard Cole, Erik D. Demaine, and Martin Farach-Colton. 2002. Scanning and traversing: Maintaining data for traversals in a memory hierarchy. In Proceedings of the 10th Annual European Symposium on Algorithms (ESA’02). 139--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. 2007. Cache-oblivious streaming B-trees. In Proceedings of the 19th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’07). 81--92. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael A. Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang Zhan. 2015. An introduction to B-trees and write-optimization. :login; Magazine 40, 5 (Oct. 2015), 22--28.Google ScholarGoogle Scholar
  8. Gerth Stølting Brodal, Erik D. Demaine, Jeremy T. Fineman, John Iacono, Stefan Langerman, and J. Ian Munro. 2010. Cache-oblivious dynamic dictionaries with update/query tradeoffs. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’10). 1448--1456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gerth Stolting Brodal and Rolf Fagerberg. 2003. Lower bounds for external memory dictionaries. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 546--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Adam L. Buchsbaum, Michael Goldwasser, Suresh Venkatasubramanian, and Jeffery R. Westbrook. 2000. On external memory graph traversal. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’00). 859--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2 (2008), 4:1--4:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. James Cipar, Greg Ganger, Kimberly Keeton, Charles B. Morrey, III, Craig A. N. Soules, and Alistair Veitch. 2012. LazyBase: Trading freshness for performance in a scalable database. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). 169--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alex Conway, Ainesh Bakshi, Yizheng Jiao, Yang Zhan, Michael A. Bender, William Jannen, Rob Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Martin Farach-Colton. 2017. File systems fated for senescence? Nonsense, says science! In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 45--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rene De La Briandais. 1959. File searching using variable length keys. In Papers Presented at the the March 3-5, 1959, Western Joint Computer Conference (IRE-AIEE-ACM’59 (Western)). 295--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. John Esmet, Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. 2012. The TokuFS streaming file system. In Proceedings of the 4th USENIX Conference on Hot Topics in Storage and File Systems (HotStorage’12). 14--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Facebook, Inc.RocksDB. Retrieved April 26, 2018 from http://rocksdb.org/.Google ScholarGoogle Scholar
  17. Jan Finis, Robert Brunel, Alfons Kemper, Thomas Neumann, Norman May, and Franz Faerber. 2015. Indexing highly dynamic hierarchical data. Proc. VLDB Endow. 8, 10 (2015), 986--997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. FUSE. Retrieved April 26, 2018 from https://github.com/libfuse/libfuse.Google ScholarGoogle Scholar
  19. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP’03). 29--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15). 32:1--32:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Google, Inc.LevelDB. Retrieved April 26, 2018 from https://github.com/google/leveldb.Google ScholarGoogle Scholar
  22. Mingsheng Hong, Alan J. Demers, Johannes E. Gehrke, Christoph Koch, Mirek Riedewald, and Walker M. White. 2007. Massively multi-query join processing in publish/subscribe systems. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data (SIGMOD’07). 761--772. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: A right-optimized write-optimized file system. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 301--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. William Jannen, Jun Yuan, Yang Zhan, Amogh Akshintala, John Esmet, Yizheng Jiao, Ankur Mittal, Prashant Pandey, Phaneendra Reddy, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2015. BetrFS: Write-optimization in a kernel file system. ACM Trans. Storage 11, 4 (2015), 18:1--18:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Charles Johnson, Kimberly Keeton, Charles B. Morrey, Craig A. N. Soules, Alistair Veitch, Stephen Bacon, Oskar Batuner, Marcelo Condotta, Hamilton Coutinho, Patrick J. Doyle, Rafael Eichelberger, Hugo Kiehl, Guilherme Magalhaes, James McEvoy, Padmanabhan Nagarajan, Patrick Osborne, Joaquim Souza, Andy Sparkes, Mike Spitzer, Sebastien Tandel, Lincoln Thomas, and Sebastian Zangaro. 2014. From research to practice: Experiences engineering a production metadata database for a scale out file system. In Proceedings of the 12th USENIX Conference on File and Storage Technologies (FAST’14). 191--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sangman Kim, Michael Z. Lee, Alan M. Dunn, Owen S. Hofmann, Xuan Wang, Emmett Witchel, and Donald E. Porter. 2012. Improving server applications with system transactions. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). 15--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ryusuke Konishi, Yoshiji Amagai, Koji Sato, Hisashi Hifumi, Seiji Kihara, and Satoshi Moriai. 2006. The Linux implementation of a log-structured file system. SIGOPS Operating Systems Review 40, 3 (2006), 102--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Avinash Lakshman and Prashant Malik. 2010. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44, 2 (2010), 35--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Paul Hermann Lensing, Toni Cortes, and André Brinkmann. 2013. Direct lookup and hash-based metadata placement for local file systems. In Proceedings of the 6th International Systems and Storage Conference (SYSTOR’13). 5:1--5:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Linux kernel source tree. Retrieved April 26, 2018 from https://github.com/torvalds/linux.Google ScholarGoogle Scholar
  31. Mary Lovelace, Jose Dovidauskas, Alvaro Salla, and Valeria Sokai. 2004. VSAM Demystified. (2004). Retrieved April 26, 2018 from http://www.redbooks.ibm.com/abstracts/sg246105.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 133--148. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Avantika Mathur, MingMing Cao, Suparna Bhattacharya, Andreas Dilger, Alex Tomas, and Laurent Vivier. 2007. The new ext4 filesystem: Current status and future plans. In Ottowa Linux Symposium (OLS), Vol. 2. 21--34.Google ScholarGoogle Scholar
  34. Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and Robert S. Fabry. 1984. A fast file system for UNIX. ACM Trans. Comput. Syst. 2, 3 (1984), 181--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Fei Mei, Qiang Cao, Hong Jiang, and Lei Tian Tintri. 2017. LSM-tree managed storage for large-scale key-value store. In Proceedings of the 2017 Symposium on Cloud Computing (SoCC’17). 142--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jason Olson. 2007. Enhance your apps with file system transactions. MSDN Magazine (July 2007). http://msdn2.microsoft.com/en-us/magazine/cc163388.aspx.Google ScholarGoogle Scholar
  37. ZFS on Linux. Retrieved April 26, 2018 from http://zfsonlinux.org/.Google ScholarGoogle Scholar
  38. Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Inf. 33, 4 (1996), 351--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Anastasios Papagiannis, Giorgos Saloustros, Pilar González-Férez, and Angelos Bilas. 2016. Tucana: Design and implementation of a fast and efficient scale-up key-value store. In Proceedings of the 2016 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC’16). 537--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Christopher Peery, Francisco Matias Cuenca-Acuna, Richard P. Martin, and Thu D. Nguyen. 2005. Wayfinder: Navigating and sharing information in a decentralized world. In Proceedings of the Second International Conference on Databases, Information Systems, and Peer-to-Peer Computing (DBISP2P’04). 200--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel. 2009. Operating system transactions. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). 161--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Kai Ren and Garth Gibson. 2013. TABLEFS: Enhancing metadata efficiency in the local file system. In Proceedings of the 2013 USENIX Conference on Annual Technical Conference (USENIX ATC’13). 145--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Storage 9, 3 (2013), 9:1--9:32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Russell Sears, Mark Callaghan, and Eric Brewer. 2008. Rose: Compressed, log-structured replication. Proc. VLDB Endow. 1, 1 (2008), 526--537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12). 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Pradeep Shetty, Richard Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 17--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Richard P. Spillane, Sachin Gaikwad, Manjunath Chinni, Erez Zadok, and Charles P. Wright. 2009. Enabling transactional file access via lightweight kernel extensions. In Proceedings of the 7th Conference on File and Storage Technologies (FAST’09). 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Adam Sweeney, Doug Doucette, Wei Hu, Curtis Anderson, Mike Nishimoto, and Geoff Peck. 1996. Scalability in the XFS file system. In Proceedings of the 1996 Annual Conference on USENIX Annual Technical Conference (ATEC’96). 1--1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Alexander Thomson and Daniel J. Abadi. 2015. CalvinFS: Consistent WAN replication and scalable metadata management for distributed file systems. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tokutek, Inc.TokuDB. Retrieved April 26, 2018 from https://github.com/Tokutek/ft-index.Google ScholarGoogle Scholar
  51. Chia-Che Tsai, Yang Zhan, Jayashree Reddy, Yizheng Jiao, Tao Zhang, and Donald E. Porter. 2015. How to get more value from your file system directory cache. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP’15). 441--456. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 307--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sage A. Weil, Kristal T. Pollack, Scott A. Brandt, and Ethan L. Miller. 2004. Dynamic metadata management for petabyte-scale file systems. In Proceedings of the 2004 ACM/IEEE Conference on Supercomputing (SC’04). 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC’15). 71--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2017. Writes wrought right, and other adventures in file system optimization. ACM Trans. Storage 13, 1 (2017), 3:1--3:26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jun Yuan, Yang Zhan, William Jannen, Prashant Pandey, Amogh Akshintala, Kanchan Chandnani, Pooja Deo, Zardosht Kasheff, Leif Walsh, Michael A. Bender, Martin Farach-Colton, Rob Johnson, Bradley C. Kuszmaul, and Donald E. Porter. 2016. Optimizing every operation in a write-optimized file system. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Nickolai Zeldovich, Silas Boyd-Wickizer, Eddie Kohler, and David Mazières. 2006. Making information flow explicit in HiStar. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 263--278. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Directory Mutations in a Full-Path-Indexed File System

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Storage
          ACM Transactions on Storage  Volume 14, Issue 3
          Special Issue on FAST 2018 and Regular Papers
          August 2018
          210 pages
          ISSN:1553-3077
          EISSN:1553-3093
          DOI:10.1145/3282875
          • Editor:
          • Sam H. Noh
          Issue’s Table of Contents

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 26 November 2018
          • Revised: 1 July 2018
          • Accepted: 1 July 2018
          • Received: 1 May 2018
          Published in tos Volume 14, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!