research-article

BTRFS: The Linux B-Tree Filesystem

Abstract

BTRFS is a Linux filesystem that has been adopted as the default filesystem in some popular versions of Linux. It is based on copy-on-write, allowing for efficient snapshots and clones. It uses B-trees as its main on-disk data structure. The design goal is to work well for many use cases and workloads. To this end, much effort has been directed to maintaining even performance as the filesystem ages, rather than trying to support a particular narrow benchmark use-case.

Linux filesystems are installed on smartphones as well as enterprise servers. This entails challenges on many different fronts.

---Scalability. The filesystem must scale in many dimensions: disk space, memory, and CPUs.

---Data integrity. Losing data is not an option, and much effort is expended to safeguard the content. This includes checksums, metadata duplication, and RAID support built into the filesystem.

---Disk diversity. The system should work well with SSDs and hard disks. It is also expected to be able to use an array of different sized disks, which poses challenges to the RAID and striping mechanisms.

This article describes the core ideas, data structures, and algorithms of this filesystem. It sheds light on the challenges posed by defragmentation in the presence of snapshots, and the tradeoffs required to maintain even performance in the face of a wide spectrum of workloads.

References

  1. Bonwick, J. and Moore, B. ZFS, The last word in file systems. http://hub.opensolaris.org/bin/download/Community+Group+zfs/docs/zfslast.pdf.Google ScholarGoogle Scholar
  2. Callaghan, B., Pawlowski, B., and Staubach, P. 1995. NFS Version 3 Protocol Specification. RFC 1813, IETF. June. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chang, F., Dean, J., Ghemawat, S., Wilson, C., Wallach, D., Burrows, M., Chandra, T., Fikes, A., and Gruber R. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, Berkeley, CA, 15--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Comer, D. 1979. Ubiquitous B-Tree. ACM Comput. Surv. 11, 2, 121--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dean, J. and Ghemawat, S. LevelDB. http://code.google.com/p/leveldb.Google ScholarGoogle Scholar
  6. Edwards, J., Ellard, D., Everhart, C., Fair, R., Hamilton, E., Kahn, A., Kanevsky, A., Lentini, J., Prakash, A., Smith, K., and Zayas, E. 2008. FlexVol: Flexible, efficient file volume virtualization in WAFL. In Proceedings of the USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, 129--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gailly, J. L. and Adler, M. ZLIB. en.wikipedia.org/wiki/Zlib.Google ScholarGoogle Scholar
  8. Heizer, I., Leach, P., and Perry, D. 1996. Common Internet File System Protocol (CIFS/1.0). Draft draft-heizer-cifs-v1-spec-00.txt, IETF.Google ScholarGoogle Scholar
  9. Hellwig, C. 2009. XFS: The big storage file system for Linux. In Usenix Login Magazine.Google ScholarGoogle Scholar
  10. Henson, V., Ahrens, M., and Bonwick, J. 2003. Automatic performance tuning in the Zettabyte File System. In File and Storage Technologies (FAST), Work in Progress Report. USENIX Association, Berkeley.Google ScholarGoogle Scholar
  11. Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In USENIX. USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Konishi, R., Sato, K., and Amagai, Y. NILFS. www.nilfs.org.Google ScholarGoogle Scholar
  13. Macko, P., Seltzer, M., and Smith, K. 2010. Tracking back references in a Write-Anywhere file system. In Proceedings of 8th USENIX Conference on File and Storage Technologies (FAST). USENIX Association, Berkeley, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mason, C. 2007. BTRFS. http://en.wikipedia.org/wiki/Btrfs.Google ScholarGoogle Scholar
  15. Mason, C. 2008. Seekwatcher. http://oss.oracle.com/~mason/seekwatcher.Google ScholarGoogle Scholar
  16. Mathur, A., Cao, M., Bhattacharya, S., Dilger, A., Tomas, A., and Vivier, L. 2007. The new Ext4 Filesystem: Current status and future plans. In Proceedings of Linux Symposium.Google ScholarGoogle Scholar
  17. O’Neil, P., Cheng, E., Gawlick, D., and O’Neil, E. 1996. The Log-Structured Merge-Tree (LSM-tree). Acta Informatica 33, 4, 351--385. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Patterson, D., Gibson, G., and Katz, R. 1988. A Case for redundant arrays of inexpensive disks (RAID). SIGMOD 17, 3, 109--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Reed, I. S. and Solomon, G. 1960. Polynomial codes over certain finite fields. J. Society Indus. Appl. Math. 8, 300--304.Google ScholarGoogle ScholarCross RefCross Ref
  20. Reiser, H. 2001. ReiserFS. http://http://en.wikipedia.org/wiki/ReiserFS.Google ScholarGoogle Scholar
  21. Ren, K. and Gibson, G. 2012. TABLEFS: Embedding a NOSQL database inside the local file system. Tech. rep. CMU-PDL-12-103.Google ScholarGoogle Scholar
  22. Rodeh, O. 2006a. B-trees, shadowing, and clones. Tech. rep. H-248, IBM.Google ScholarGoogle Scholar
  23. Rodeh, O. 2006b. B-trees, shadowing, and range-operations. Tech. rep. H-248, IBM.Google ScholarGoogle Scholar
  24. Rodeh, O. 2008. B-trees, shadowing, and clones. ACM Trans. Storage 3, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rodeh, O. 2010. Deferred reference counters for Copy-On-Write B-trees. Tech. rep. rj10464, IBM.Google ScholarGoogle Scholar
  26. Rosenblum, M. and Ousterhout, J. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sears, R. and Ramakrishnan, R. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and Noveck, D. 2000. NFS version 4 Protocol. RFC 3010, IETF. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Shetty, P., Spillane, R., Malpani, R., Andrews, B., Seyster, J., and Zadok, E. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST).Google ScholarGoogle Scholar
  30. Sweeney, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS File System. In Proceedings of USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. BTRFS

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!