skip to main content
research-article

CORES: Towards Scan-Optimized Columnar Storage for Nested Records

Authors Info & Claims
Published:26 June 2019Publication History
Skip Abstract Section

Abstract

The relatively high cost of record deserialization is increasingly becoming the bottleneck of column-based storage systems in tree-structured applications [58]. Due to record transformation in the storage layer, unnecessary processing costs derived from fields and rows irrelevant to queries may be very heavy in nested schemas, significantly wasting the computational resources in large-scale analytical workloads. This leads to the question of how to reduce both the deserialization and IO costs of queries with highly selective filters following arbitrary paths in a nested schema.

We present CORES (Column-Oriented Regeneration Embedding Scheme) to push highly selective filters down into column-based storage engines, where each filter consists of several filtering conditions on a field. By applying highly selective filters in the storage layer, we demonstrate that both the deserialization and IO costs could be significantly reduced. We show how to introduce fine-grained composition on filtering results. We generalize this technique by two pair-wise operations, rollup and drilldown, such that a series of conjunctive filters can effectively deliver their payloads in nested schema. The proposed methods are implemented on an open-source platform. For practical purposes, we highlight how to build a column storage engine and how to drive a query efficiently based on a cost model. We apply this design to the nested relational model especially when hierarchical entities are frequently required by ad hoc queries. The experiments, including a real workload and the modified TPCH benchmark, demonstrate that CORES improves the performance by 0.7×--26.9× compared to state-of-the-art platforms in scan-intensive workloads.

References

  1. Apache. 2017. Apache Hive TM. Retrieved June 13, 2019 from https://hive.apache.orgGoogle ScholarGoogle Scholar
  2. Apache. 2017. Apache Parquet. Retrieved June 13, 2019 from https://parquet.apache.org.Google ScholarGoogle Scholar
  3. Apache. 2017. Apache Spark. Retrieved June 13, 2019 from https://spark.apache.org.Google ScholarGoogle Scholar
  4. Apache. 2017. Apache Tez. Retrieved June 13, 2019 from https://tez.apache.org.Google ScholarGoogle Scholar
  5. Apache. 2018. Apache AsterixDB. Retrieved June 13, 2019 from https://asterixdb.apache.org.Google ScholarGoogle Scholar
  6. Apache. 2018. Apache Avro. Retrieved June 13, 2019 from https://avro.apache.org.Google ScholarGoogle Scholar
  7. Google. 2017. Protocol buffer. Retrieved June 13, 2019 from http://code.google.com/p/protobuf/.Google ScholarGoogle Scholar
  8. Yang Li. 2018. Cores. Retrieved June 13, 2019 from https://github.com/lwhay/cores.Google ScholarGoogle Scholar
  9. NCBI. 2018. PubMed. Retrieved June 13, 2019 from http://www.ncbi.nlm.nih.gov.Google ScholarGoogle Scholar
  10. TPC. 2017. TPC-H benchmark. Retrieved June 13, 2019 from http://www.tpc.org/tpch.Google ScholarGoogle Scholar
  11. Foto N. Afrati, Dan Delorey, Mosha Pasumansky, and Jeffrey D. Ullman. 2014. Storing and querying tree-structured records in Dremel. PVLDB 7, 12 (2014), 1131--1142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Anastassia Ailamaki, David J. Dewitt, and Mark D. Hill. 2002. Data Page Layouts for Relational Databases on Deep Memory Hierarchies. Springer-Verlag New York, Inc. 198--215.Google ScholarGoogle Scholar
  13. Sattam Alsubaiee, Yasser Altowim, et al. 2014. AsterixDB: a scalable, open source BDMS. PVLDB 7, 14 (2014), 1905--1916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sattam Alsubaiee, Alexander Behm, Vinayak R. Borkar, et al. 2014. Storage management in AsterixDB. PVLDB 7, 10 (2014), 841--852. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Gopi Attaluri, Shaorong Liu, and Guy M. Lohman. 2013. DB2 with BLU acceleration: So much more than just a column store. Proceedings of the VLDB Endowment 6, 11 (2013), 1080--1091. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. François Bancilhon, Philippe Richard, and Michel Scholl. 1982. On line processing of compacted relations. In Proceedings of the 8th International Conference on Very Large Data Bases. 263--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Babak Behzad, Huong Vu Thanh Luu, et al. 2013. Taming parallel I/O complexity with auto-tuning. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis. 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kevin Beyer and Raghu Ramakrishnan. 1999. Bottom-up computation of sparse and iceberg CUBEs. Sigmod Record 28, 2 (1999), 359--370.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Medha Bhadkamkar, Fernando Farfan, Vagelis Hristidis, and Raju Rangaswami. 2009. Storing semi-structured data on disk drives. Trans. Storage 5, 2, Article 6 (June 2009), 35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Peter Boncz, Torsten Grust, Maurice Van Keulen, Stefan Manegold, Jan Rittinger, and Jens Teubner. 2006. MonetDB/XQuery: A fast XQuery processor powered by a relational engine. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 479--490.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vinayak Borkar, Michael Carey, Raman Grover, et al. 2011. Hyracks: A flexible and extensible foundation for data-intensive computing. In Proceedings of the IEEE International Conference on Data Engineering. 1151--1162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Chasseur, Yinan Li, and J. M. Patel. 2013. Enabling JSON document stores in relational systems (long version). In Proceedings of the International Workshop on the Web and Databases. 1--16.Google ScholarGoogle Scholar
  23. Shuo-Han Chen, Tseng-Yi Chen, Yuan-Hao Chang, Hsin-Wen Wei, and Wei-Kuan Shih. 2018. UnistorFS: A union storage file system design for resource sharing between memory and storage on persistent RAM-based systems. ACM Trans. Storage 14, 1, Article 3 (Feb. 2018), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Douglas W. Comer and Philip S. Yu. 1987. A vertical partitioning algorithm for relational databases. In Proceedings of the IEEE International Conference on Data Engineering. 30--35.Google ScholarGoogle Scholar
  25. Graham Cormode, Minos Garofalakis, et al. 2012. Synopses for massive data: Samples, histograms, wavelets, sketches. Found. 8 Trends Datab. 4, 1 (2012), 1--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M. J. Egenhofer. 1994. Spatial SQL: A query and presentation language. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 86--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Avrilia Floratou and Umar Farooq Minhas. 2014. SQL-on-Hadoop: Full circle back to shared-nothing database architectures. Proceedings of the VLDB Endowment 7, 12 (Jan. 2014), 1295--1306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Raúl Gracia-Tinedo, Josep Sampé, et al. 2017. Crystal: Software-defined storage for multi-tenant object stores. In Proceedings of the USENIX Conference on File and Storage Technologies. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Bin He, Hui I. Hsiao, Ziyang Liu, Yu Huang, and Yi Chen. 2012. Efficient iceberg query evaluation using compressed bitmap index. IEEE Trans. Knowl. Data Eng. 24, 9 (2012), 1570--1583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jianfeng Jia, Chen Li, and Michael J. Carey. 2017. Drum: A rhythmic approach to interactive analytics on large data. In Proceedings of the IEEE International Conference on Big Data.Google ScholarGoogle Scholar
  32. Martin Kaufmann and Donald Kossmann. 2013. Storing and processing temporal data in a main memory column store. In Proceedings of the VLDB Endowment 6, 12 (2013), 1444--1449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tirthankar Lahiri, Shasank Chavan, Maria Colgan, Dinesh Das, Amit Ganesh, Mike Gleeson, Sanket Hase, Allison Holloway, Jesse Kamp, and Teck Hua Lee. 2015. Oracle database in-memory: A dual format in-memory database. In Proceedings of the IEEE International Conference on Data Engineering. 1253--1258.Google ScholarGoogle ScholarCross RefCross Ref
  34. Andrew Lamb, Matt Fuller, Ramakrishna Varadarajan, et al. 2012. The vertica analytic database: C-store 7 years later. PVLDB 5, 12 (2012), 1790--1801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Eunji Lee and Hyokyung Bahn. 2014. Caching strategies for high-performance storage media. Trans. Storage 10, 3, Article 11 (Aug. 2014), 22 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel Lemire, Robert Godin, et al. 2016. Optimizing Druid with Roaring bitmaps. In Proceedings of the International Database Engineering 8 Applications Symposium. 77--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hang Liu and H. Howie Huang. 2017. Graphene: Fine-grained IO management for graph computing. In Proceedings of the USENIX Conference on File and Storage Technologies. 285--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhen Hua Liu, Beda Hammerschmidt, and Doug Mcmahon. 2014. JSON data management: Supporting schema-less development in RDBMS. In Proceedings of the ACM SIGMOD International Conference on Management of Data 7, 2 (2014), 1247--1258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Peng Lu, Sai Wu, Lidan Shou, and Kian-Lee Tan. 2013. An efficient and compact indexing scheme for large-scale data store. In Proceedings of the IEEE International Conference on Data Engineering. 326--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sagar S. Mane and M. Emmanuel. 2015. Review and comparative study of bitmap indexing techniques. Data Mining Knowl. Eng. 7, 1 (2015).Google ScholarGoogle Scholar
  41. Sergey Melnik, Andrey Gubarev, Jing Jing Long, et al. 2010. Dremel: Interactive analysis of web-scale datasets. Commun. ACM 3, 12 (2010), 114--123.Google ScholarGoogle Scholar
  42. Jan Paredaens and Dirk Van Gucht. 1988. Possibilities and limitations of using flat operators in nested algebra expressions. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 29--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. H. B. Paul, H. J. Schek, and M. H. Scholl. 1987. Architecture and implementation of the Darmstadt database kernel system. In Proceedings of the ACM SIGMOD Conference. 196--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Mark A. Roth, Herry F. Korth, and Abraham Silberschatz. 1988. Extended algebra and calculus for nested relational databases. ACM Trans. Datab. Syst. 13, 4 (1988), 389--417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Michael Rys and Gerhard Weikum. 1994. Heuristic optimization of speedup and benefit/cost for parallel database scans on shared-memory multiprocessors. In Proceedings of the International Parallel Processing Symposium. 894--901. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Marc H. Scholl, H.-Bernhard Paul, and Hans-Jörg Schek. 1987. Supporting flat relations by a nested relational kernel. In Proceedings of the International Conference on Very Large Data Bases. 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Anil Shanbhag, Alekh Jindal, Yi Lu, and Samuel Madden. 2016. A moeba: A shape changing storage system for big data. PVLDB 9, 13 (2016), 1569--1572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jeff Shute, Radek Vingralek, et al. 2013. F1: A distributed SQL database that scales. Proceedings of the VLDB Endowment 6, 11 (2013), 1068--1079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop distributed file system. In Proceedings of the IEEE Symposium on MASS Storage Systems and Technologies. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Laure Soulier and Lynda Tamine. 2017. On the collaboration support in information retrieval. ACM Comput. Surv. 50, 4, Article 51 (2017), 34 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Kurt Stockinger. 2001. Design and implementation of bitmap indices for scientific data. In Proceedings of the International Database Engineering and Applications Symposium. 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mike Stonebraker, Daniel J. Abadi, Adam Batkin, et al. 2005. C-store: A column-oriented DBMS. In Proceedings of the International Conference on Very Large Data Bases. 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Liwen Sun, Sanjay Krishnan, Reynold S. Xin, and Michael J. Franklin. 2014. A partitioning framework for aggressive data skipping. PVLDB 7, 13 (2014), 1617--1620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Yuliang Sun, Yu Wang, and Huazhong Yang. 2018. Bidirectional database storage and SQL query exploiting RRAM-based process-in-memory structure. ACM Trans. Storage 14, 1, Article 8 (March 2018), 19 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Daniel Tahara, Thaddeus Diamond, and Daniel J. Abadi. 2014. Sinew: A SQL system for multi-structured data. In Proceedings of ACM SIGMOD Conference. 815--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Aubrey L. Tatarowicz, Carlo Curino, Evan P. C. Jones, and Sam Madden. 2012. Lookup tables: Fine-grained partitioning for distributed databases. In Proceedings of the IEEE International Conference on Data Engineering. 102--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Sebastian Wandelt, Dong Deng, Stefan Gerdjikov, et al. 2014. State-of-the-art in string similarity search and join. SIGMOD Record 43, 1 (2014), 64--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Zhiyi Wang and Shimin Chen. 2017. Exploiting common patterns for tree-structured data. In Proceedings of the ACM SIGMOD Conference. 883--896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Brent Welch, Marc Unangst, Zainul Abbasi, et al. 2008. Scalable performance of the Panasas parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Chin-Hsien Wu and Kuo-Yi Huang. 2015. Data sorting in flash memory. Trans. Storage 11, 2, Article 7 (March 2015), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Pengfei Xuan, Walter B. Ligon, Pradip K. Srimani, Rong Ge, and Feng Luo. 2016. Accelerating big data analytics on HPC clusters using two-level storage. Parallel Comput. 61 (2016).Google ScholarGoogle Scholar
  62. Atsuo Yoshitaka and Tadao Ichikawa. 1999. A survey on content-based retrieval for multimedia databases. IEEE Trans. Knowl. Data Eng. 11, 1 (1999), 81--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Yuan Yu, Michael Isard, Dennis Fetterly, et al. 2009. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, et al. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yansong Zhang, Xuan Zhou, Ying Zhang, et al. 2016. Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28, 4 (2016), 1061--1074. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CORES: Towards Scan-Optimized Columnar Storage for Nested Records

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!