ABSTRACT
Most B-tree papers assume that all N keys have the same size K, that F = B/K keys fit in a disk block, and therefore that the search cost is O(logf+1 N) block transfers. When keys have variable size, however, B-tree operations have no nontrivial performance guarantees.
This paper provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly.
This paper describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K, the search cost is O(⌈K/B⌉ log1+⌈K/B⌉ N) expected transfers. The paper proves that it is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O(NK) operations and O(NK/B) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost.
For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ. Specifically the amortized cost to insert a key κ of arbitrary length and random rank is O(⌈K/B⌉ log1+⌈K/B⌉ N + |κ| /B) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost.
- Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, 1988. Google Scholar
Digital Library
- Stephen Alstrup, Michael A. Bender, Erik D. Demaine, Martin Farach-Colton, J. Ian Munro, Theis Rauhe, and Mikkel Thorup. Efficient tree layout in a multilevel memory hierarchy. arXiv:cs.DS/0211010, November 2002. http://www.arXiv.org/abs/cs.DS/0211010.Google Scholar
- Rudolf Bayer and Edward M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173--189, February 1972.Google Scholar
Digital Library
- Rudolf Bayer and Karl Unterauer. Prefix B-trees. ACM Trans. Database Syst., 2(1):11--26, 1977. Google Scholar
Digital Library
- Peter Becker. A new algorithm for the construction of optimal B-trees. Nordic J. of Computing, 1(4):389--401, 1994. Google Scholar
Digital Library
- Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proceedings of Latin American Theoretical INformatics (LATIN), pages 88--94, 2000. Google Scholar
Digital Library
- Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. Cache-oblivious string B-trees. In Proc. 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 233--242, 2006. Google Scholar
Digital Library
- Omer Berkman and Uzi Vishkin. Recursive star-tree parallel data structure. SIAM J. Comput., 22(2):221--242, 1993. Google Scholar
Digital Library
- Gerth Stölting Brodal and Rolf Fagerberg. Cache-oblivious string dictionaries. In Proc. 17th Annual ACM--SIAM Symposium on Discrete Algorithms (SODA), pages 581--590, 2006. Google Scholar
Digital Library
- Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey Scott Vitter. External-memory graph algorithms. In Proc. 6th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 139--149, 1995. Google Scholar
Digital Library
- William A. Clark IV, Kent A. Salmond, and Thomas A Stafford. Method and means for generating compressed keys. US Patent 3,593,309, 3 January 1969.Google Scholar
- Douglas Comer. The ubiquitous B-tree. ACM Comput. Surv., 11(2):121--137, 1979. Google Scholar
Digital Library
- Erik D. Demaine, Gad M. Landau, and Oren Weimann. On cartesian trees and range minimum queries. In Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP), volume 5555 of Lecture Notes in Computer Science, pages 341--353. Springer, 2009. Google Scholar
Digital Library
- George Diehr and Bruce Faaland. Optimal pagination of B-trees with variable-length items. Commun. ACM, 27(3):241--247, 1984. Google Scholar
Digital Library
- Paolo Ferragina and Roberto Grossi. The string B-tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236--280, 1999. Google Scholar
Digital Library
- Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. Scaling and related techniques for geometry problems. In Proc. 16th Annual ACM Symposium on Theory of Computing (STOC), pages 135--143, 1984. Google Scholar
Digital Library
- Joseph Gil and Alon Itai. How to pack trees. J. Algorithms, 32(2):108--132, 1999. Google Scholar
Digital Library
- L. Gotlieb. Optimal multi-way search trees. SIAM J. Comput., 10(3):422--433, 1981.Google Scholar
Cross Ref
- Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338--355, 1984. Google Scholar
Digital Library
- Shou-Hsuan Stephen Huang and Venkatraman Viswanathan. On the construction of weighted time-optimal B-trees. BIT, 30(2):207--215, 1990. Google Scholar
Digital Library
- Donald E. Knuth. The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison Wesley, Reading, MA, 1973.Google Scholar
Digital Library
- Lawrence L. Larmore and Daniel S. Hirschberg. Efficient optimal pagination of scrolls. Commun. ACM, 28(8):854--856, 1985. Google Scholar
Digital Library
- Edward M. McCreight. Pagination of B*-trees with variable-length records. Commun. ACM, 20(9):670--674, 1977. Google Scholar
Digital Library
- Oracle. Oracle Berkeley DB programmer's reference guide, release 4.8. http://www.oracle.com/technology/documentation/berkeley-db/db/index.htm%l, August 2009.Google Scholar
- Arnold L. Rosenberg and Lawrence Snyder. Time- and space-optimality in B-trees. ACM Trans. Database Syst., 6(1):174--193, 1981. Google Scholar
Digital Library
- Baruch Schieber and Uzi Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput., 17(6):1253--1262, 1988. Google Scholar
Digital Library
- Vijay K. Vaishnavi, Hans-Peter Kriegel, and Derick Wood. Optimum multiway search trees. Acta Inf., 14:119--133, 1980.Google Scholar
Digital Library
- R. E. Wagner. Indexing design considerations. IBM Syst. J., 12(4):351--367, 1973. Google Scholar
Digital Library
Index Terms
Performance guarantees for B-trees with different-sized atomic keys
Recommendations
B-Trees and Cache-Oblivious B-Trees with Different-Sized Atomic Keys
Most B-tree articles assume that all N keys have the same size K, that f = B/K keys fit in a disk block, and therefore that the search cost is O(logf + 1N) block transfers. When keys have variable size, B-tree operations have no nontrivial performance ...
Cache-oblivious B-trees
FOCS '00: Proceedings of the 41st Annual Symposium on Foundations of Computer ScienceWe present dynamic search-tree data structures that perform well in the setting of a hierarchical memory (including various levels of cache, disk, etc.), but do not depend on the number of memory levels, the block sizes and number of blocks at each ...
Deleting Keys of B-trees in Parallel
IPDPS '02: Proceedings of the 16th International Symposium on Parallel and Distributed ProcessingThe B-tree is a fundamental data structure that is used to access and update a large number of keys. In this paper we present a parallel algorithm on the EREW PRAM that deletes keys in a B-tree. Our algorithm runs in O(t(\log k + \log_t n)) time with k ...






Comments