10.1145/2463676.2465306acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Column imprints: a secondary index structure

Online:22 June 2013Publication History

ABSTRACT

Large scale data warehouses rely heavily on secondary indexes, such as bitmaps and b-trees, to limit access to slow IO devices. However, with the advent of large main memory systems, cache conscious secondary indexes are needed to improve also the transfer bandwidth between memory and cpu. In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly and exploits the empirical observation that data often exhibits local clustering or partial ordering as a side-effect of the construction process. Most importantly, column imprint compression remains effective and robust even in the case of unclustered data, while other state-of-the-art solutions fail. We conducted an extensive experimental evaluation to assess the applicability and the performance impact of the column imprints. The storage overhead, when experimenting with real world datasets, is just a few percent over the size of the columns being indexed. The evaluation time for over 40000 range queries of varying selectivity revealed the efficiency of the proposed index compared to zonemaps and bitmaps with WAH compression.

References

  1. D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization Strategies in a Column-Oriented DBMS. In Proc. of the 23rd ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. L. Beckham. The CNET E-Commerce Data Set, 2005.Google ScholarGoogle Scholar
  3. N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In Proc. of the ACM SIGMOD, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7), 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, 2005.Google ScholarGoogle Scholar
  6. G. Canahuate, M. Gibas, and H. Ferhatosmanoglu. Update Conscious Bitmap Indices. In 19th SSDBM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Chan and Y. Ioannidis. An efficient bitmap encoding scheme for selection queries. In ACM SIGMOD, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Gosink, K. Wu, E. Bethel, J. D. Owens, and K. I. Joy. Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures. In Proc. of the SSDBM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Goyal and Y. Sharma. New binning strategy for bitmap indices on high cardinality attributes. In Proc. of the 2nd Bangalore Annual Compute Conference, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Hellerstein, J. Naughton, and A. Pfeffer. Generalized Search Trees for Database Systems. In Proc. of the 21th VLDB, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Heman, M. Zukowski, N. Nes, L. Sidirourgos, and P. Boncz. Positional Update Handling in Column Stores. In Proc. of the ACM SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Idreos, M. Kersten, and S. Manegold. Database Cracking. In Proc. of the 3rd CIDR, 2007.Google ScholarGoogle Scholar
  13. N. Koudas. Space efficient bitmap indexing. In Proc. of the 9th CIKM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Stonebraker et. al. C-Store: A Column Oriented DBMS. In Proc. of the 31st VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Manegold, M. Kersten, and P. Boncz. Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct. In Proc. of the 35th VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. MonetDB. http://www.monetdb.org.Google ScholarGoogle Scholar
  17. P. O'Neil. Model 204 Architecture and Performance. In Proc. of the 2nd HPTS, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. O'Neil and G. Graefe. Multi-table joins through bitmapped join indices. SIGMOD Rec., 24(3), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. In Proc. of the ACM SIGMOD, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. R. Sinha and M. Winslett. Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst., 32(3), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Wong, H. Liu, F. Olken, D. Rotem, and L. Wong. Bit Transposed Files. In Proc. of the 11th VLDB, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Wu, K. Madduri, and S. Canon. Multi-level bitmap indexes for flash memory storage. In Proc. of the 14th IDEAS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Wu, E. J. Otoo, and A. Shoshani. Compressing Bitmap Indexes for Faster Search Operations. In Proc. of the 14th SSDBM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Wu, E. J. Otoo, and A. Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proc. of the 30th VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst., 31(1), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. K. Wu, A. Shoshani, and K. Stockinger. Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst., 35(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Column imprints: a secondary index structure

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        ACM Conferences cover image
        SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
        June 2013
        1322 pages
        ISBN:9781450320375
        DOI:10.1145/2463676

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Online: 22 June 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '13 Paper Acceptance Rate 76 of 372 submissions, 20%
        Overall Acceptance Rate 678 of 3,582 submissions, 19%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!