ABSTRACT
Automatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning.
In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible.
In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations.
The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels.
References
- S. Agrawal, S.Chaudhuri, and V. R. Narasayya. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB, 2000. Google Scholar
Digital Library
- P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the Memory Wall in MonetDB. Commun. ACM, 51(12), 2008. Google Scholar
Digital Library
- C. Bornhövd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald. Adaptive Database Caching with DBCache. IEEE Data Eng. Bull., 27(2):11--18, 2004.Google Scholar
- N. Bruno and S. Chaudhuri. Physical Design Refinement: The 'Merge-Reduce' Approach. ACM Trans. Database Syst., 32(4), 2007. Google Scholar
Digital Library
- C.-M. Chen and N. Roussopoulos. The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching. In EDBT, pages 323--336, 1994. Google Scholar
Digital Library
- C.-H. Choi, J. X. Yu, and H. Lu. Dynamic Materialized View Management Based on Predicates. In APWeb, pages 583--594, 2003. Google Scholar
Digital Library
- R. Cornacchia, S. Heman, M. Zukowski, A. P. de Vries, and P. A. Boncz. Flexible and Efficient IR Using Array Databases. VLDB J., 17(1):151---168, 2008. Google Scholar
Digital Library
- J. Goldstein and P.-A. Larson. Optimizing Queries Using Materialized Views: A practical, scalable solution. In SIGMOD Conference, pages 331--342, 2001. Google Scholar
Digital Library
- G. Graefe. Volcano -- An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng., 6(1):120--135, 1994. Google Scholar
Digital Library
- M. Ivanova, M. L. Kersten, and N. Nes. Self-organizing Strategies for a Column-store Database. In Proc. EDBT, pages 157--168, 2008. Google Scholar
Digital Library
- M. Ivanova, N. Nes, R. Goncalves, and M. L. Kersten. MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. In Proc. SSDBM, Banff, Canada, July 2007. Google Scholar
Digital Library
- Y. Kotidis and N. Roussopoulos. A Case for Dynamic View Management. ACM Trans. Database Syst., 26(4):388--423, 2001. Google Scholar
Digital Library
- P.-Å. Larson, J. Goldstein, and J. Zhou. MTCache: Transparent Mid-Tier Database Caching in SQL Server. In ICDE, pages 177--189, 2004. Google Scholar
Digital Library
- G. Luo. Partial Materialized Views. In ICDE, pages 756--765, 2007.Google Scholar
Cross Ref
- G. Luo and P. S. Yu. Content-based Filtering for Efficient Online Materialized View Maintenance. In CIKM, pages 163--172, 2008. Google Scholar
Digital Library
- H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized View Selection and Maintenance Using Multi-Query Optimization. In SIGMOD Conference, pages 307--318, 2001. Google Scholar
Digital Library
- MonetDB, http://monetdb.cwi.nl/, 2008.Google Scholar
- T. Phan and W.-S. Li. Dynamic Materialization of Query Views for Data Warehouse Workloads. In ICDE, pages 436--445, 2008. Google Scholar
Digital Library
- P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and Extensible Algorithms for Multi Query Optimization. In SIGMOD Conference, pages 249--260, 2000. Google Scholar
Digital Library
- Sloan Digital Sky Survey / SkyServer, 2008.Google Scholar
- A. S. Szalay, J. Gray, et al. The SDSS SkyServer: Public Access to the Sloan Digital Sky Server Data. In SIGMOD, pages 570--581, 2002. Google Scholar
Digital Library
- K.-L. Tan, S.-T. Goh, and B. C. Ooi. Cache-on-Demand: Recycling with Certainty. In ICDE, pages 633--640, 2001. Google Scholar
Digital Library
- Transaction Processing Performance Council. TPC Benchmark H, Revision 2.6.2, 2008.Google Scholar
- J. Zhou, P.-Å. Larson, J. C. Freytag, and W. Lehner. Efficient Exploitation of Similar Subexpressions for Query Processing. In SIGMOD Conference, pages 533--544, 2007. Google Scholar
Digital Library
- J. Zhou, P.-Å. Larson, J. Goldstein, and L. Ding. Dynamic Materialized Views. In ICDE, pages 526--535, 2007.Google Scholar
Cross Ref
- M. Zukowski, S. Héman, N. Nes, and P. Boncz. Super-Scalar RAM-CPU Cache Compression. In Proc. ICDE, Atlanta, GA, USA, 2006. Google Scholar
Digital Library
Index Terms
An architecture for recycling intermediates in a column-store





Comments