research-article

B-trees, shadowing, and clones

Abstract

B-trees are used by many file systems to represent files and directories. They provide guaranteed logarithmic time key-search, insert, and remove. File systems like WAFL and ZFS use shadowing, or copy-on-write, to implement snapshots, crash recovery, write-batching, and RAID. Serious difficulties arise when trying to use b-trees and shadowing in a single system.

This article is about a set of b-tree algorithms that respects shadowing, achieves good concurrency, and implements cloning (writeable snapshots). Our cloning algorithm is efficient and allows the creation of a large number of clones.

We believe that using our b-trees would allow shadowing file systems to better scale their on-disk data structures.

References

  1. Bayer, R. and McCreight, E. 1972. Organization and maintenance of large ordered indices. Acta Informatica, 173--189.Google ScholarGoogle Scholar
  2. Bayer, R. and Schkolnick, M. 1977. Concurrency of operations on B-trees. Acta Informatica 9, 1--21.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Best, S. 2002. Journaling file systems. Linux Mag.Google ScholarGoogle Scholar
  4. Comer, D. 1979. Ubiquitous B-tree. ACM Comput. Surv. 11, 2, 121--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Graefe, G. 2004. Write-Optimized B-trees. In Proceedings of the International Conference on Very Large Database (VLDB), 672--683. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gray, J. and Reuter, A. 1993. Transaction Processing: Concepts and Techniques. Mogran Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Guibas, L. and Sedgewick, R. 1978. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science.Google ScholarGoogle Scholar
  8. Henson, V., Ahrens, M., and Bonwick, J. 2003. Automatic performance tuning in the Zettabyte file system. In USENIX Conference on File and Storage Technologies (work in progress report).Google ScholarGoogle Scholar
  9. Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Lanin, V. and Shasha, D. 1986. A symmetric concurrent B-tree algorithm. In Proceedings of the ACM Fall Joint Computer Conference, Dallas, TX, 380--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lehman, P. and Yao, S. 1981. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6, 4, 650--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lomet, D. 2001. The evolution of effective B-tree: Page organization and techniques: A personal account. ACM SIGMOD Rec. 30, 3(Sept.), 64--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lomet, D. and Salzberg, B. 1992. Access method concurrency with recovery. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 351--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. McKusick, M., Joy, W., Leffler, S., and Fabry, R. 1984. A fast file system for Unix. ACM Trans. Comput. Syst. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Megiddo, N. and Modha, D. S. 2003. ARC: A self-tuning, low overhead replacement cache. In USENIX Conference on File and Storage Technologies (FAST). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Menon, J., Pease, D., Rees, R., Duyanovich, L., and Hillsberg, B. 2003. IBM storage tank a heterogeneous scalable SAN file system. IBM Syst. J. 42, 2, 250--267. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mohan, C. and Levine, F. 1992. ARIES/IM: An efficient and high concurrency index management method using write-ahead logging. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 371--380. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mond, Y. and Raz, Y. 1985. Concurrency control in B+ trees databases using preparatory operations. In the 11th International Conference on Very Large Data Bases. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Reiser, H. 2004. ReiserFS. http://www.namesys.com/.Google ScholarGoogle Scholar
  20. Rodeh, O. 2006. B-trees, shadowing, and range-operations. Tech. Rep. H-248. November.Google ScholarGoogle Scholar
  21. Rosenberg, J., Henskens, F., Brown, A., Morrison, R., and Munro, D. 1990. Stability in a persistent store based on a large virtual memory. Secur. Persist. 229--245.Google ScholarGoogle Scholar
  22. Soules, C., Goodson, G., Strunk, J., and Ganger, G . 2003. Metadata efficiency in a comprehensive versioning file system. In USENIX Conference on File and Storage Technologies (FAST). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Srinivasan, V. and Carey, M. 1993. Performance of B+ tree concurrency control algorithms. VLDB J. 2, 4, 361--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sweeny, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Technical Conference, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. B-trees, shadowing, and clones

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!