skip to main content
10.1145/1807085.1807125acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Performance guarantees for B-trees with different-sized atomic keys

Published:06 June 2010Publication History

ABSTRACT

Most B-tree papers assume that all N keys have the same size K, that F = B/K keys fit in a disk block, and therefore that the search cost is O(logf+1 N) block transfers. When keys have variable size, however, B-tree operations have no nontrivial performance guarantees.

This paper provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly.

This paper describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K, the search cost is O(⌈K/B⌉ log1+⌈K/B N) expected transfers. The paper proves that it is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O(NK) operations and O(NK/B) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost.

For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ. Specifically the amortized cost to insert a key κ of arbitrary length and random rank is O(⌈K/B⌉ log1+⌈K/B N + |κ| /B) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost.

References

  1. Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Commun. ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Stephen Alstrup, Michael A. Bender, Erik D. Demaine, Martin Farach-Colton, J. Ian Munro, Theis Rauhe, and Mikkel Thorup. Efficient tree layout in a multilevel memory hierarchy. arXiv:cs.DS/0211010, November 2002. http://www.arXiv.org/abs/cs.DS/0211010.Google ScholarGoogle Scholar
  3. Rudolf Bayer and Edward M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica, 1(3):173--189, February 1972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Rudolf Bayer and Karl Unterauer. Prefix B-trees. ACM Trans. Database Syst., 2(1):11--26, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Peter Becker. A new algorithm for the construction of optimal B-trees. Nordic J. of Computing, 1(4):389--401, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael A. Bender and Martin Farach-Colton. The LCA problem revisited. In Proceedings of Latin American Theoretical INformatics (LATIN), pages 88--94, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael A. Bender, Martin Farach-Colton, and Bradley C. Kuszmaul. Cache-oblivious string B-trees. In Proc. 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pages 233--242, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Omer Berkman and Uzi Vishkin. Recursive star-tree parallel data structure. SIAM J. Comput., 22(2):221--242, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gerth Stölting Brodal and Rolf Fagerberg. Cache-oblivious string dictionaries. In Proc. 17th Annual ACM--SIAM Symposium on Discrete Algorithms (SODA), pages 581--590, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yi-Jen Chiang, Michael T. Goodrich, Edward F. Grove, Roberto Tamassia, Darren Erik Vengroff, and Jeffrey Scott Vitter. External-memory graph algorithms. In Proc. 6th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 139--149, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. William A. Clark IV, Kent A. Salmond, and Thomas A Stafford. Method and means for generating compressed keys. US Patent 3,593,309, 3 January 1969.Google ScholarGoogle Scholar
  12. Douglas Comer. The ubiquitous B-tree. ACM Comput. Surv., 11(2):121--137, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Erik D. Demaine, Gad M. Landau, and Oren Weimann. On cartesian trees and range minimum queries. In Proc. 36th International Colloquium on Automata, Languages and Programming (ICALP), volume 5555 of Lecture Notes in Computer Science, pages 341--353. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. George Diehr and Bruce Faaland. Optimal pagination of B-trees with variable-length items. Commun. ACM, 27(3):241--247, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Paolo Ferragina and Roberto Grossi. The string B-tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236--280, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Harold N. Gabow, Jon Louis Bentley, and Robert E. Tarjan. Scaling and related techniques for geometry problems. In Proc. 16th Annual ACM Symposium on Theory of Computing (STOC), pages 135--143, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joseph Gil and Alon Itai. How to pack trees. J. Algorithms, 32(2):108--132, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Gotlieb. Optimal multi-way search trees. SIAM J. Comput., 10(3):422--433, 1981.Google ScholarGoogle ScholarCross RefCross Ref
  19. Dov Harel and Robert Endre Tarjan. Fast algorithms for finding nearest common ancestors. SIAM J. Comput., 13(2):338--355, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Shou-Hsuan Stephen Huang and Venkatraman Viswanathan. On the construction of weighted time-optimal B-trees. BIT, 30(2):207--215, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Donald E. Knuth. The Art of Computer Programming, Vol. 3: Sorting and Searching. Addison Wesley, Reading, MA, 1973.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lawrence L. Larmore and Daniel S. Hirschberg. Efficient optimal pagination of scrolls. Commun. ACM, 28(8):854--856, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Edward M. McCreight. Pagination of B*-trees with variable-length records. Commun. ACM, 20(9):670--674, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Oracle. Oracle Berkeley DB programmer's reference guide, release 4.8. http://www.oracle.com/technology/documentation/berkeley-db/db/index.htm%l, August 2009.Google ScholarGoogle Scholar
  25. Arnold L. Rosenberg and Lawrence Snyder. Time- and space-optimality in B-trees. ACM Trans. Database Syst., 6(1):174--193, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Baruch Schieber and Uzi Vishkin. On finding lowest common ancestors: Simplification and parallelization. SIAM J. Comput., 17(6):1253--1262, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Vijay K. Vaishnavi, Hans-Peter Kriegel, and Derick Wood. Optimum multiway search trees. Acta Inf., 14:119--133, 1980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. E. Wagner. Indexing design considerations. IBM Syst. J., 12(4):351--367, 1973. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Performance guarantees for B-trees with different-sized atomic keys

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PODS '10: Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
          June 2010
          350 pages
          ISBN:9781450300339
          DOI:10.1145/1807085

          Copyright © 2010 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 6 June 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate476of1,835submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!