skip to main content
10.1145/3318464.3380588acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Tree-Encoded Bitmaps

Published: 31 May 2020 Publication History

Abstract

We propose a novel method to represent compressed bitmaps. Similarly to existing bitmap compression schemes, we exploit the compression potential of bitmaps populated with consecutive identical bits, i.e., 0-runs and 1-runs. But in contrast to prior work, our approach employs a binary tree structure to represent runs of various lengths. Leaf nodes in the upper tree levels thereby represent longer runs, and vice versa. The tree-based representation results in high compression ratios and enables efficient random access, which in turn allows for the fast intersection of bitmaps. Our experimental analysis with randomly generated bitmaps shows that our approach significantly improves over state-of-the-art compression techniques when bitmaps are dense and/or only barely clustered. Further, we evaluate our approach with real-world data sets, showing that our tree-encoded bitmaps can save up to one third of the space over existing techniques.

Supplementary Material

MP4 File (3318464.3380588.mp4)
Presentation Video

References

[1]
Karolina Alexiou, Donald Kossmann, and Per-Åke Larson. 2013. Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia. PVLDB, Vol. 6, 14 (2013), 1714--1725. http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf
[2]
G. Antoshenkov. 1995. Byte-aligned bitmap compression. In Proceedings DCC '95 Data Compression Conference. 476--. https://doi.org/10.1109/DCC.1995.515586
[3]
Manos Athanassoulis, Michael S. Kester, Lukas M. Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016a. Designing Access Methods: The RUM Conjecture. In Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15--16, 2016. 461--466. https://doi.org/10.5441/002/edbt.2016.42
[4]
Manos Athanassoulis, Zheng Yan, and Stratos Idreos. 2016b. UpBit: Scalable In-Memory Updatable Bitmap Indexing. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 1319--1332. https://doi.org/10.1145/2882903.2915964
[5]
The RoaringBitmap authors. [n.d.]. Roaring Bitmap. https://github.com/RoaringBitmap/RoaringBitmap. [Online; accessed 27-May-2019].
[6]
David Benoit, Erik D. Demaine, J. Ian Munro, Rajeev Raman, Venkatesh Raman, and S. Srinivasa Rao. 2005. Representing Trees of Higher Degree. Algorithmica, Vol. 43, 4 (2005), 275--292. https://doi.org/10.1007/s00453-004--1146--6
[7]
Boost.org. [n.d.]. Boost C+ Libraries. https://www.boost.org/. [Online; accessed 04-Jun-2019].
[8]
Michael Cain and Kent Milligan. 2011. IBM DB2 for i indexing methods and strategies. IBM White Paper.
[9]
Samy Chambi, Daniel Lemire, Robert Godin, Kamel Boukhalfa, Charles R. Allen, and Fangjin Yang. 2016. Optimizing Druid with Roaring bitmaps. In Proceedings of the 20th International Database Engineering & Applications Symposium, IDEAS 2016, Montreal, QC, Canada, July 11--13, 2016. 77--86. https://doi.org/10.1145/2938503.2938515
[10]
Samy Chambi, Daniel Lemire, Owen Kaser, and Robert Godin. 2014. Better bitmap performance with Roaring bitmaps. CoRR, Vol. abs/1402.6407 (2014). arxiv: 1402.6407 http://arxiv.org/abs/1402.6407
[11]
Chee Yong Chan and Yannis E. Ioannidis. 1998. Bitmap Index Design and Evaluation. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2--4, 1998, Seattle, Washington, USA. 355--366. https://doi.org/10.1145/276304.276336
[12]
Chee Yong Chan and Yannis E. Ioannidis. 1999. An Efficient Bitmap Encoding Scheme for Selection Queries. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1--3, 1999, Philadelphia, Pennsylvania, USA. 215--226. https://doi.org/10.1145/304182.304201
[13]
David R. Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: architectural support for efficient concurrency-safe shared structured data access. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, London, UK, March 3--7, 2012. 287--300. https://doi.org/10.1145/2150976.2151007
[14]
David R. Clark and J. Ian Munro. 1996. Efficient Suffix Trees on Secondary Storage (SODA '96 ), Vol. 96. Society for Industrial and Applied Mathematics, USA, 383--391.
[15]
Alessandro Colantonio and Roberto Di Pietro. 2010. Concise: Compressed 'n' Composable Integer Set. Inf. Process. Lett., Vol. 110, 16 (2010), 644--650. https://doi.org/10.1016/j.ipl.2010.05.018
[16]
Oracle Corporation. 2005. Bitmap Index vs. B-tree Index: Which and When? https://www.oracle.com/technetwork/articles/sharma-indexes-093638.html. [Online; accessed 14-Jun-2019].
[17]
Pooya Davoodi, Rajeev Raman, and Srinivasa Rao Satti. 2017. On Succinct Representations of Binary Trees. Mathematics in Computer Science, Vol. 11, 2 (2017), 177--189. https://doi.org/10.1007/s11786-017-0294--4
[18]
Francc ois Deliè ge and Torben Bach Pedersen. 2010. Position list word aligned hybrid: optimizing space and performance for compressed bitmaps. In EDBT 2010, 13th International Conference on Extending Database Technology, Lausanne, Switzerland, March 22--26, 2010, Proceedings. 228--239. https://doi.org/10.1145/1739041.1739071
[19]
Ziqiang Feng, Eric Lo, Ben Kao, and Wenjian Xu. 2015. ByteSlice: Pushing the Envelop of Main Memory Data Processing with a New Storage Layout. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 31--46. https://doi.org/10.1145/2723372.2747642
[20]
Simon Gog, Timo Beller, Alistair Moffat, and Matthias Petri. 2014. From Theory to Practice: Plug and Play with Succinct Data Structures. In 13th International Symposium on Experimental Algorithms, (SEA 2014). 326--337.
[21]
Rodrigo González, Szymon Grabowski, Veli M"akinen, and Gonzalo Navarro. 2005. Practical implementation of rank and select queries. In Poster Proc. Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA). 27--38.
[22]
Gheorghi Guzun, Guadalupe Canahuate, David Chiu, and Jason Sawin. 2014. A tunable compression framework for bitmap indices. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014. 484--495. https://doi.org/10.1109/ICDE.2014.6816675
[23]
Brian Hentschel, Michael S. Kester, and Stratos Idreos. 2018. Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. 857--872. https://doi.org/10.1145/3183713.3196911
[24]
Guy Jacobson. 1989. Space-efficient Static Trees and Graphs. In 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, North Carolina, USA, 30 October - 1 November 1989. 549--554. https://doi.org/10.1109/SFCS.1989.63533
[25]
Theodore Johnson. 1999. Performance Measurements of Compressed Bitmap Indices. In VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7--10, 1999, Edinburgh, Scotland, UK. 278--289. http://www.vldb.org/conf/1999/P29.pdf
[26]
Albert Kim, Liqi Xu, Tarique Siddiqui, Silu Huang, Samuel Madden, and Aditya G. Parameswaran. 2016b. Speedy Browsing and Sampling with NeedleTail. CoRR, Vol. abs/1611.04705 (2016). arxiv: 1611.04705 http://arxiv.org/abs/1611.04705
[27]
Sangchul Kim, Junhee Lee, Srinivasa Rao Satti, and Bongki Moon. 2016a. SBH: Super byte-aligned hybrid bitmap compression. Inf. Syst., Vol. 62 (2016), 155--168. https://doi.org/10.1016/j.is.2016.07.004
[28]
Nick Koudas. 2000. Space Efficient Bitmap Indexing. In Proceedings of the 2000 ACM CIKM International Conference on Information and Knowledge Management, McLean, VA, USA, November 6--11, 2000. 194--201. https://doi.org/10.1145/354756.354819
[29]
Harald Lang, Tobias Mü hlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016. 311--326. https://doi.org/10.1145/2882903.2882925
[30]
C. C. Lee, D. T. Lee, and C. K. Wong. 1986. Generating Binary Trees of Bounded Height. Acta Inf., Vol. 23, 5 (1986), 529--544. https://doi.org/10.1007/BF00288468
[31]
Daniel Lemire. [n.d.]. Official Roaring Bitmap website. https://roaringbitmap.org. [Online; accessed 27-May-2019].
[32]
Daniel Lemire, Gregory Ssi Yan Kai, and Owen Kaser. 2016. Consistently faster and smaller compressed bitmaps with Roaring. Softw., Pract. Exper., Vol. 46, 11 (2016), 1547--1569. https://doi.org/10.1002/spe.2402
[33]
Daniel Lemire, Owen Kaser, and Kamel Aouiche. 2010. Sorting improves word-aligned bitmap indexes. Data Knowl. Eng., Vol. 69, 1 (2010), 3--28. https://doi.org/10.1016/j.datak.2009.08.006
[34]
Daniel Lemire, Owen Kaser, and Eduardo Gutarra. 2012. Reordering rows for better compression: Beyond the lexicographic order. ACM Trans. Database Syst., Vol. 37, 3 (2012), 20:1--20:29. https://doi.org/10.1145/2338626.2338633
[35]
Yinan Li, Craig Chasseur, and Jignesh M. Patel. 2015. A Padded Encoding Scheme to Accelerate Scans by Leveraging Skew. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015. 1509--1524. https://doi.org/10.1145/2723372.2737787
[36]
Yinan Li and Jignesh M. Patel. 2013. BitWeaving: fast scans for main memory data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013. 289--300. https://doi.org/10.1145/2463676.2465322
[37]
Roger MacNicol and Blaine French. 2004. Sybase IQ Multiplex - Designed For Analytics. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004. 1227--1230. https://doi.org/10.1016/B978-012088469--8.50111-X
[38]
Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24--27, 1998, New York City, New York, USA. 476--487. http://www.vldb.org/conf/1998/p476.pdf
[39]
J. Munro and V. Raman. 2001. Succinct Representation of Balanced Parentheses and Static Trees. SIAM J. Comput., Vol. 31, 3 (2001), 762--776. https://doi.org/10.1137/S0097539799364092 https://doi.org/10.1145/211990.212001
[40]
Patrick E. O'Neil. 1987. Model 204 Architecture and Performance. In High Performance Transaction Systems, 2nd International Workshop, Asilomar Conference Center, Pacific Grove, California, USA, September 28--30, 1987, Proceedings. 40--59. https://doi.org/10.1007/3--540--51085-0_42
[41]
Patrick E. O'Neil and Dallan Quass. 1997. Improved Query Performance with Variant Indexes. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13--15, 1997, Tucson, Arizona, USA. 38--49. https://doi.org/10.1145/253260.253268
[42]
Ali Pinar, Tao Tao, and Hakan Ferhatosmanoglu. 2005. Compressing Bitmap Indices by Data Reorganization. In Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, 5--8 April 2005, Tokyo, Japan. 310--321. https://doi.org/10.1109/ICDE.2005.35
[43]
Orestis Polychroniou and Kenneth A. Ross. 2015. Efficient Lightweight Compression Alongside Fast Scans. In Proceedings of the 11th International Workshop on Data Management on New Hardware, DaMoN 2015, Melbourne, VIC, Australia, May 31 - June 04, 2015. 9:1--9:6. https://doi.org/10.1145/2771937.2771943
[44]
Rajeev Raman, Venkatesh Raman, and Srinivasa Rao Satti. 2007. Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms, Vol. 3, 4 (2007), 43. https://doi.org/10.1145/1290672.1290680
[45]
Denis Rinfret, Patrick E. O'Neil, and Elizabeth J. O'Neil. 2001. Bit-Sliced Index Arithmetic. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data, Santa Barbara, CA, USA, May 21--24, 2001. 47--57. https://doi.org/10.1145/375663.375669
[46]
Lefteris Sidirourgos and Martin L. Kersten. 2013. Column imprints: a secondary index structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22--27, 2013. 893--904. https://doi.org/10.1145/2463676.2465306
[47]
Lefteris Sidirourgos and Hannes Mühleisen. 2017. Scaling column imprints using advanced vectorization. In Proceedings of the 13th International Workshop on Data Management on New Hardware, DaMoN 2017, Chicago, IL, USA, May 15, 2017. 4:1--4:8. https://doi.org/10.1145/3076113.3076120
[48]
Rishi Rakesh Sinha and Marianne Winslett. 2007. Multi-resolution bitmap indexes for scientific data. ACM Trans. Database Syst., Vol. 32, 3 (2007), 16. https://doi.org/10.1145/1272743.1272746
[49]
The PostgreSQL Global Development Group. [n.d.]. Block Range Index (BRIN) in PostgreSQL. https://www.postgresql.org/docs/11/brin.html. [Online; accessed 01-Jul-2019].
[50]
Sebastiano Vigna. 2008. Broadword Implementation of Rank/Select Queries. In Experimental Algorithms, 7th International Workshop, WEA 2008, Provincetown, MA, USA, May 30-June 1, 2008, Proceedings. 154--168. https://doi.org/10.1007/978--3--540--68552--4_12
[51]
Bo Wang, Heiner Litz, and David R. Cheriton. 2014a. HICAMP bitmap: space-efficient updatable bitmap index for in-memory databases. In Tenth International Workshop on Data Management on New Hardware, DaMoN 2014, Snowbird, UT, USA, June 23, 2014. 7:1--7:7. https://doi.org/10.1145/2619228.2619235
[52]
Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, and Steven Swanson. 2017. An Experimental Study of Bitmap Compression vs. Inverted List Compression. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017. 993--1008. https://doi.org/10.1145/3035918.3064007
[53]
Sheng Wang, David Maier, and Beng Chin Ooi. 2014b. Lightweight Indexing of Observational Data in Log-Structured Storage. PVLDB, Vol. 7, 7 (2014), 529--540. http://www.vldb.org/pvldb/vol7/p529-wang.pdf
[54]
J. W. J. Williams. 1964. Algorithm 232: Heapsort. Commun. ACM, Vol. 7, 6 (1964), 347--348.
[55]
John Wu and Kurt Stockinger. [n.d.]. FastBit: An Efficient Compressed Bitmap Index Technology. https://sdm.lbl.gov/fastbit/. [Online; accessed 27-May-2019].
[56]
Kesheng Wu, Sean Ahern, E Wes Bethel, Jacqueline Chen, Hank Childs, Estelle Cormier-Michel, Cameron Geddes, Junmin Gu, Hans Hagen, Bernd Hamann, et al. 2009. FastBit: interactively searching massive data. In Journal of Physics: Conference Series, Vol. 180. IOP Publishing, 012053.
[57]
Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2004. On the performance of bitmap indices for high cardinality attributes. In (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31 - September 3 2004. 24--35. https://doi.org/10.1016/B978-012088469--8.50006--1
[58]
Kesheng Wu, Ekow J. Otoo, and Arie Shoshani. 2006. Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst., Vol. 31, 1 (2006), 1--38. https://doi.org/10.1145/1132863.1132864
[59]
Kesheng Wu, Arie Shoshani, and Kurt Stockinger. 2010. Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst., Vol. 35, 1 (2010), 2:1--2:52. https://doi.org/10.1145/1670243.1670245
[60]
Kun-Lung Wu and Philip S. Yu. 1998. Range-Based Bitmap Indexing for High Cardinality Attributes with Skew. In COMPSAC '98 - 22nd International Computer Software and Applications Conference, August 19--21, 1998, Vienna, Austria. 61--67. https://doi.org/10.1109/CMPSAC.1998.716637
[61]
Ming-Chuan Wu and Alejandro P. Buchmann. 1998. Encoded Bitmap Indexing for Data Warehouses. In Proceedings of the Fourteenth International Conference on Data Engineering, Orlando, Florida, USA, February 23--27, 1998. 220--230. https://doi.org/10.1109/ICDE.1998.655780
[62]
Jia Yu and Mohamed Sarwat. 2016. Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems. PVLDB, Vol. 10, 4 (2016), 385--396. http://www.vldb.org/pvldb/vol10/p385-yu.pdf
[63]
Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. 323--336. https://doi.org/10.1145/3183713.3196931
[64]
Dong Zhou, David G. Andersen, and Michael Kaminsky. 2013. Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences. In Experimental Algorithms, 12th International Symposium, SEA 2013, Rome, Italy, June 5--7, 2013. Proceedings. 151--163. https://doi.org/10.1007/978--3--642--38527--8_15

Cited By

View all
  • (2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
  • (2023)An Index for Set Intersection With Post-FilteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3329145(1-14)Online publication date: 2023
  • (2023)ANTI: An Adaptive Network Traffic Indexing Algorithm for High-Speed NetworksGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437924(1699-1704)Online publication date: 4-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bitmap
  2. compression
  3. data structure
  4. indexing
  5. succinct

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)8
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Revisiting B-tree Compression: An Experimental StudyProceedings of the ACM on Management of Data10.1145/36549722:3(1-25)Online publication date: 30-May-2024
  • (2023)An Index for Set Intersection With Post-FilteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3329145(1-14)Online publication date: 2023
  • (2023)ANTI: An Adaptive Network Traffic Indexing Algorithm for High-Speed NetworksGLOBECOM 2023 - 2023 IEEE Global Communications Conference10.1109/GLOBECOM54140.2023.10437924(1699-1704)Online publication date: 4-Dec-2023
  • (2022)In-Place Updates in Tree-Encoded BitmapsProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538745(1-4)Online publication date: 6-Jul-2022
  • (2021)LES3Proceedings of the VLDB Endowment10.14778/3476249.347626314:11(2073-2086)Online publication date: 27-Oct-2021
  • (2021)Energy Efficiency vs. Performance of Analytical Queries: The case of Bitmap Join Indexes2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671307(3066-3074)Online publication date: 15-Dec-2021
  • (2020)Cuckoo indexProceedings of the VLDB Endowment10.14778/3424573.342457713:13(3559-3572)Online publication date: 27-Oct-2020
  • (2020)MoHA: A Composable System for Efficient In-Situ Analytics on Heterogeneous HPC SystemsSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00086(1-16)Online publication date: Nov-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media