Abstract
B-trees are used by many file systems to represent files and directories. They provide guaranteed logarithmic time key-search, insert, and remove. File systems like WAFL and ZFS use shadowing, or copy-on-write, to implement snapshots, crash recovery, write-batching, and RAID. Serious difficulties arise when trying to use b-trees and shadowing in a single system.
This article is about a set of b-tree algorithms that respects shadowing, achieves good concurrency, and implements cloning (writeable snapshots). Our cloning algorithm is efficient and allows the creation of a large number of clones.
We believe that using our b-trees would allow shadowing file systems to better scale their on-disk data structures.
- Bayer, R. and McCreight, E. 1972. Organization and maintenance of large ordered indices. Acta Informatica, 173--189.Google Scholar
- Bayer, R. and Schkolnick, M. 1977. Concurrency of operations on B-trees. Acta Informatica 9, 1--21.Google Scholar
Cross Ref
- Best, S. 2002. Journaling file systems. Linux Mag.Google Scholar
- Comer, D. 1979. Ubiquitous B-tree. ACM Comput. Surv. 11, 2, 121--137. Google Scholar
Digital Library
- Graefe, G. 2004. Write-Optimized B-trees. In Proceedings of the International Conference on Very Large Database (VLDB), 672--683. Google Scholar
Digital Library
- Gray, J. and Reuter, A. 1993. Transaction Processing: Concepts and Techniques. Mogran Kaufmann. Google Scholar
Digital Library
- Guibas, L. and Sedgewick, R. 1978. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science.Google Scholar
- Henson, V., Ahrens, M., and Bonwick, J. 2003. Automatic performance tuning in the Zettabyte file system. In USENIX Conference on File and Storage Technologies (work in progress report).Google Scholar
- Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Technical Conference. Google Scholar
Digital Library
- Lanin, V. and Shasha, D. 1986. A symmetric concurrent B-tree algorithm. In Proceedings of the ACM Fall Joint Computer Conference, Dallas, TX, 380--389. Google Scholar
Digital Library
- Lehman, P. and Yao, S. 1981. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst. 6, 4, 650--670. Google Scholar
Digital Library
- Lomet, D. 2001. The evolution of effective B-tree: Page organization and techniques: A personal account. ACM SIGMOD Rec. 30, 3(Sept.), 64--69. Google Scholar
Digital Library
- Lomet, D. and Salzberg, B. 1992. Access method concurrency with recovery. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 351--360. Google Scholar
Digital Library
- McKusick, M., Joy, W., Leffler, S., and Fabry, R. 1984. A fast file system for Unix. ACM Trans. Comput. Syst. Google Scholar
Digital Library
- Megiddo, N. and Modha, D. S. 2003. ARC: A self-tuning, low overhead replacement cache. In USENIX Conference on File and Storage Technologies (FAST). Google Scholar
Digital Library
- Menon, J., Pease, D., Rees, R., Duyanovich, L., and Hillsberg, B. 2003. IBM storage tank a heterogeneous scalable SAN file system. IBM Syst. J. 42, 2, 250--267. Google Scholar
Digital Library
- Mohan, C. and Levine, F. 1992. ARIES/IM: An efficient and high concurrency index management method using write-ahead logging. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 371--380. Google Scholar
Digital Library
- Mond, Y. and Raz, Y. 1985. Concurrency control in B+ trees databases using preparatory operations. In the 11th International Conference on Very Large Data Bases. Google Scholar
Digital Library
- Reiser, H. 2004. ReiserFS. http://www.namesys.com/.Google Scholar
- Rodeh, O. 2006. B-trees, shadowing, and range-operations. Tech. Rep. H-248. November.Google Scholar
- Rosenberg, J., Henskens, F., Brown, A., Morrison, R., and Munro, D. 1990. Stability in a persistent store based on a large virtual memory. Secur. Persist. 229--245.Google Scholar
- Soules, C., Goodson, G., Strunk, J., and Ganger, G . 2003. Metadata efficiency in a comprehensive versioning file system. In USENIX Conference on File and Storage Technologies (FAST). Google Scholar
Digital Library
- Srinivasan, V. and Carey, M. 1993. Performance of B+ tree concurrency control algorithms. VLDB J. 2, 4, 361--406. Google Scholar
Digital Library
- Sweeny, A., Doucette, D., Hu, W., Anderson, C., Nishimoto, M., and Peck, G. 1996. Scalability in the XFS file system. In Proceedings of the USENIX Technical Conference, 1--14. Google Scholar
Digital Library
Index Terms
B-trees, shadowing, and clones
Recommendations
BTRFS: The Linux B-Tree Filesystem
BTRFS is a Linux filesystem that has been adopted as the default filesystem in some popular versions of Linux. It is based on copy-on-write, allowing for efficient snapshots and clones. It uses B-trees as its main on-disk data structure. The design goal ...
An Efficient Memory-Mapped Key-Value Store for Flash Storage
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingPersistent key-value stores have emerged as a main component in the data access path of modern data processing systems. However, they exhibit high CPU and I/O overhead. Today, due to power limitations it is important to reduce CPU overheads for data ...
Write-optimized B-trees
VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases - Volume 30Large writes are beneficial both on individual disks and on disk arrays, e.g., RAID-5. The presented design enables large writes of internal B-tree nodes and leaves. It supports both in-place updates and large append-only ("log-structured") write ...






Comments