ABSTRACT
Large scale data warehouses rely heavily on secondary indexes, such as bitmaps and b-trees, to limit access to slow IO devices. However, with the advent of large main memory systems, cache conscious secondary indexes are needed to improve also the transfer bandwidth between memory and cpu. In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly and exploits the empirical observation that data often exhibits local clustering or partial ordering as a side-effect of the construction process. Most importantly, column imprint compression remains effective and robust even in the case of unclustered data, while other state-of-the-art solutions fail. We conducted an extensive experimental evaluation to assess the applicability and the performance impact of the column imprints. The storage overhead, when experimenting with real world datasets, is just a few percent over the size of the columns being indexed. The evaluation time for over 40000 range queries of varying selectivity revealed the efficiency of the proposed index compared to zonemaps and bitmaps with WAH compression.
References
- D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization Strategies in a Column-Oriented DBMS. In Proc. of the 23rd ICDE, 2007.Google Scholar
Cross Ref
- J. L. Beckham. The CNET E-Commerce Data Set, 2005.Google Scholar
- N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: an efficient and robust access method for points and rectangles. In Proc. of the ACM SIGMOD, 1990. Google Scholar
Digital Library
- B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 13(7), 1970. Google Scholar
Digital Library
- P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, 2005.Google Scholar
- G. Canahuate, M. Gibas, and H. Ferhatosmanoglu. Update Conscious Bitmap Indices. In 19th SSDBM, 2007. Google Scholar
Digital Library
- C. Chan and Y. Ioannidis. An efficient bitmap encoding scheme for selection queries. In ACM SIGMOD, 1999. Google Scholar
Digital Library
- L. Gosink, K. Wu, E. Bethel, J. D. Owens, and K. I. Joy. Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures. In Proc. of the SSDBM, 2009. Google Scholar
Digital Library
- N. Goyal and Y. Sharma. New binning strategy for bitmap indices on high cardinality attributes. In Proc. of the 2nd Bangalore Annual Compute Conference, 2009. Google Scholar
Digital Library
- J. Hellerstein, J. Naughton, and A. Pfeffer. Generalized Search Trees for Database Systems. In Proc. of the 21th VLDB, 1995. Google Scholar
Digital Library
- S. Heman, M. Zukowski, N. Nes, L. Sidirourgos, and P. Boncz. Positional Update Handling in Column Stores. In Proc. of the ACM SIGMOD, 2010. Google Scholar
Digital Library
- S. Idreos, M. Kersten, and S. Manegold. Database Cracking. In Proc. of the 3rd CIDR, 2007.Google Scholar
- N. Koudas. Space efficient bitmap indexing. In Proc. of the 9th CIKM, 2000. Google Scholar
Digital Library
- M. Stonebraker et. al. C-Store: A Column Oriented DBMS. In Proc. of the 31st VLDB, 2005. Google Scholar
Digital Library
- S. Manegold, M. Kersten, and P. Boncz. Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct. In Proc. of the 35th VLDB, 2009. Google Scholar
Digital Library
- MonetDB. http://www.monetdb.org.Google Scholar
- P. O'Neil. Model 204 Architecture and Performance. In Proc. of the 2nd HPTS, 1987. Google Scholar
Digital Library
- P. O'Neil and G. Graefe. Multi-table joins through bitmapped join indices. SIGMOD Rec., 24(3), 1995. Google Scholar
Digital Library
- P. O'Neil and D. Quass. Improved Query Performance with Variant Indexes. In Proc. of the ACM SIGMOD, 1997. Google Scholar
Digital Library
- R. R. Sinha and M. Winslett. Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst., 32(3), 2007. Google Scholar
Digital Library
- H. Wong, H. Liu, F. Olken, D. Rotem, and L. Wong. Bit Transposed Files. In Proc. of the 11th VLDB, 1985. Google Scholar
Digital Library
- K. Wu, K. Madduri, and S. Canon. Multi-level bitmap indexes for flash memory storage. In Proc. of the 14th IDEAS, 2010. Google Scholar
Digital Library
- K. Wu, E. J. Otoo, and A. Shoshani. Compressing Bitmap Indexes for Faster Search Operations. In Proc. of the 14th SSDBM, 2002. Google Scholar
Digital Library
- K. Wu, E. J. Otoo, and A. Shoshani. On the performance of bitmap indices for high cardinality attributes. In Proc. of the 30th VLDB, 2004. Google Scholar
Digital Library
- K. Wu, E. J. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst., 31(1), 2006. Google Scholar
Digital Library
- K. Wu, A. Shoshani, and K. Stockinger. Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst., 35(1), 2008. Google Scholar
Digital Library
Index Terms
Column imprints: a secondary index structure





Comments