skip to main content
10.1145/1559795.1559824acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Secondary indexing in one dimension: beyond b-trees and bitmap indexes

Published:29 June 2009Publication History

ABSTRACT

Let ∑ be a finite, ordered alphabet, and consider a string x1χ2... χn ∈ ∑n. A secondary index for x answers alphabet range queries of the form: Given a range [αlr] ⊆ ∑, return the set Ilr] = {ii ∈ >[αlr]}. Secondary indexes are heavily used in relational databases and scientific data analysis. It is well-known that the obvious solution, storing a dictionary for the set ∪ii} with a position set associated with each character, does not always give optimal query time. In this paper we give the first theoretically optimal data structure for the secondary indexing problem. In the I/O model, the amount of data read when answering a query is within a constant factor of the minimum space needed to represent the set Ilr], assuming that the size of internal memory is (|∑| lg n)δ blocks, for some constant δ > 0. The space usage of the data structure is O(nlg |∑|) bits in the worst case, and we further show how to bound the size of the data structure in terms of the 0th order entropy of x. We show how to support updates achieving various time-space trade-offs.

We also consider an approximate version of the basic secondary indexing problem where a query reports a superset of Ilr] containing each element not in Ilr] with probability at most ∈, where ∈ > 0 is the false positive probability. For this problem the amount of data that needs to be read by the query algorithm is reduced to O(|Ilr]| lg(1/∈)) bits.

References

  1. Alok Aggarwal and Jeffrey S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tan Apaydin, Guadalupe Canahuate, Hakan Ferhatosmanoglu, and Ali Saman Tosun. Approximate encoding for direct access and query processing over compressed bitmaps. In Proceedings of the 32nd International Conference on Very Large Data Bases, pages 846--857. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lars Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica, 37(1):1--24, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lars Arge and Jeffrey Scott Vitter. Optimal external memory interval management. SIAM Journal on Computing, 32(6):1488--1508, December 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Philip Bille, Anna Pagh, and Rasmus Pagh. Fast evaluation of union-intersection expressions. In Proceedings of the 18th International Symposium on Algorithms And Computation (ISAAC '07), volume 4835 of Lecture Notes in Computer Science, pages 739--750. Springer-Verlag, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burton H. Bloom. Space\slash time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gerth S. Brodal and Rolf Fagerberg. Lower bounds for external memory dictionaries. In Proc. 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 546--554, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Larry Carter, Robert Floyd, John Gill, George Markowsky, and Mark Wegman. Exact and approximate membership testers. In Proceedings of Symposium on Theory of Computation (STOC '78), pages 59--65. ACM Press, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chee-Yong Chan and Yannis E. Ioannidis. Bitmap index design and evaluation. In Proceedings of the ACM SIGMOD International Conference on Management of Data, volume 27(2) of SIGMOD Record (ACM Special Interest Group on Management of Data), pages 355--366. ACM Press, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chee-Yong Chan and Yannis E. Ioannidis. An efficient bitmap encoding scheme for selection queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, volume 28(2) of SIGMOD Record (ACM Special Interest Group on Management of Data), pages 215--226. ACM Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. de Berg, O. Cheong, M. van Kreveld, and M. Overmars. Computational Geometry: Algorithms and Applications. Springer-Verlag, January 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peter Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194--203, March 1975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Goetz Graefe. B-tree indexes for high update rates. SIGMOD Record (ACM Special Interest Group on Management of Data), 35(1):39--44, March 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Patrick O'Neil and Dallan Quass. Improved query performance with variant indexes. In Proceedings, ACM SIGMOD International Conference on Management of Data, volume 26(2) of SIGMOD Record (ACM Special Interest Group on Management of Data), pages 38--49. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Vijayshankar Raman, Lin Qiao, Wei Han, Inderpal Narang, Ying-Lin Chen, Kou-Horng Yang, and Fen--Ling Ling. Lazy, adaptive rid-list intersection, and its application to index anding. In Proceedings of the ACM SIGMOD international conference on Management of data, pages 773--784. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Rishi Rakesh Sinha and Marianne Winslett. Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst., 32(3):16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rishi Rakesh Sinha, Marianne Winslett, Kesheng Wu, Kurt Stockinger, and Arie Shoshani. Adaptive bitmap indexes for space-constrained systems. In ICDE, pages 1418--1420. IEEE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB), pages 24--35. Morgan Kaufmann, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Secondary indexing in one dimension: beyond b-trees and bitmap indexes

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
            June 2009
            298 pages
            ISBN:9781605585536
            DOI:10.1145/1559795
            • General Chair:
            • Jan Paredaens,
            • Program Chair:
            • Jianwen Su

            Copyright © 2009 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 29 June 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate476of1,835submissions,26%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!