Abstract
We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code -- a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective towards generative programming. The core of the query engine is an interpreter for relational algebra operations, written in Scala. Using the open-source LMS Framework (Lightweight Modular Staging), we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation known as Futamura projections, which state that a program that can specialize an interpreter to any given input program is equivalent to a compiler. In this pearl, we discuss LMS programming patterns such as mixed-stage data structures (e.g. data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives.
- E. Axelsson, K. Claessen, M. Sheeran, J. Svenningsson, D. Engdal, and A. Persson. The design and implementation of feldspar: An embedded language for digital signal processing. IFL’10, 2011. Google Scholar
Digital Library
- B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. PPoPP, 2011. Google Scholar
Digital Library
- C. Consel and O. Danvy. Tutorial notes on partial evaluation. In POPL, 1993. Google Scholar
Digital Library
- Z. DeVito, J. Hegarty, A. Aiken, P. Hanrahan, and J. Vitek. Terra: a multi-stage language for high-performance computing. In PLDI, 2013. Google Scholar
Digital Library
- Z. DeVito, D. Ritchie, M. Fisher, A. Aiken, and P. Hanrahan. Firstclass runtime generation of high-performance types using exotypes. In PLDI, 2014. Google Scholar
Digital Library
- Y. Futamura. Partial evaluation of computation process, revisited. Higher-Order and Symbolic Computation, 12(4):377–380, 1999. Google Scholar
Digital Library
- G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng., 6(1):120–135, 1994. Google Scholar
Digital Library
- N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google Scholar
Digital Library
- U. Jørring and W. L. Scherlis. Compilers and staging transformations. In POPL, 1986. Google Scholar
Digital Library
- Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853–864, 2014. Google Scholar
Digital Library
- G. Mainland and G. Morrisett. Nikola: embedding compiled GPU functions in Haskell. Haskell, 2010. Google Scholar
Digital Library
- T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier. Optimising purely functional GPU programs. ICFP, 2013. Google Scholar
Digital Library
- T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539–550, 2011. Google Scholar
Digital Library
- J. C. Reynolds. Definitional interpreters for higher-order programming languages. Higher-Order and Symbolic Computation, 11(4):363–397, 1998. Google Scholar
Digital Library
- T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scalavirtualized: Linguistic reuse for deep embeddings. Higher-Order and Symbolic Computation (Special issue for PEPM’12). Google Scholar
Digital Library
- T. Rompf, K. J. Brown, H. Lee, A. K. Sujeeth, M. Jonnalagedda, N. Amin, G. Ofenbeck, A. Stojanov, Y. Klonatos, M. Dashti, C. Koch, M. Püschel, and K. Olukotun. Go meta! A case for generative programming and dsls in performance critical systems. In SNAPL, 2015.Google Scholar
- T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55(6):121–130, 2012. Google Scholar
Digital Library
- T. Rompf, A. K. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs. POPL, 2013. Google Scholar
Digital Library
- M. Stonebraker and U. Çetintemel. "One Size Fits All": An idea whose time has come and gone (abstract). In ICDE, pages 2–11, 2005. Google Scholar
Digital Library
- M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (it’s time for a complete rewrite). In VLDB, pages 1150–1160, 2007. Google Scholar
Digital Library
- J. Svenningsson and E. Axelsson. Combining deep and shallow embedding for EDSL. In TFP, 2012.Google Scholar
- W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211–242, 2000. Google Scholar
Digital Library
- S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. PLDI, 2011. Google Scholar
Digital Library
- M. Zukowski, P. A. Boncz, N. Nes, and S. Héman. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28(2):17–22, 2005.Google Scholar
Index Terms
Functional pearl: a SQL to C compiler in 500 lines of code
Recommendations
Functional pearl: a SQL to C compiler in 500 lines of code
ICFP 2015: Proceedings of the 20th ACM SIGPLAN International Conference on Functional ProgrammingWe present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code -- a convincing case study that high-level functional programming can handily beat C for ...
How to Architect a Query Compiler, Revisited
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataTo leverage modern hardware platforms to their fullest, more and more database systems embrace compilation of query plans to native code. In the research community, there is an ongoing debate about the best way to architect such query compilers. This is ...
CLOP: a multi-stage compiler to seamlessly embed heterogeneous code
GPCE 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesHeterogeneous programming complicates software development. We present CLOP, a platform that embeds code targeting heterogeneous compute devices in a convenient and clean way, allowing unobstructed data flow between the host code and the devices, ...






Comments