skip to main content
research-article

Functional pearl: a SQL to C compiler in 500 lines of code

Published: 29 August 2015 Publication History

Abstract

We present the design and implementation of a SQL query processor that outperforms existing database systems and is written in just about 500 lines of Scala code -- a convincing case study that high-level functional programming can handily beat C for systems-level programming where the last drop of performance matters. The key enabler is a shift in perspective towards generative programming. The core of the query engine is an interpreter for relational algebra operations, written in Scala. Using the open-source LMS Framework (Lightweight Modular Staging), we turn this interpreter into a query compiler with very low effort. To do so, we capitalize on an old and widely known result from partial evaluation known as Futamura projections, which state that a program that can specialize an interpreter to any given input program is equivalent to a compiler. In this pearl, we discuss LMS programming patterns such as mixed-stage data structures (e.g. data records with static schema and dynamic field components) and techniques to generate low-level C code, including specialized data structures and data loading primitives.

References

[1]
E. Axelsson, K. Claessen, M. Sheeran, J. Svenningsson, D. Engdal, and A. Persson. The design and implementation of feldspar: An embedded language for digital signal processing. IFL’10, 2011.
[2]
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. PPoPP, 2011.
[3]
C. Consel and O. Danvy. Tutorial notes on partial evaluation. In POPL, 1993.
[4]
Z. DeVito, J. Hegarty, A. Aiken, P. Hanrahan, and J. Vitek. Terra: a multi-stage language for high-performance computing. In PLDI, 2013.
[5]
Z. DeVito, D. Ritchie, M. Fisher, A. Aiken, and P. Hanrahan. Firstclass runtime generation of high-performance types using exotypes. In PLDI, 2014.
[6]
Y. Futamura. Partial evaluation of computation process, revisited. Higher-Order and Symbolic Computation, 12(4):377–380, 1999.
[7]
G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng., 6(1):120–135, 1994.
[8]
N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993.
[9]
U. Jørring and W. L. Scherlis. Compilers and staging transformations. In POPL, 1986.
[10]
Y. Klonatos, C. Koch, T. Rompf, and H. Chafi. Building efficient query engines in a high-level language. PVLDB, 7(10):853–864, 2014.
[11]
G. Mainland and G. Morrisett. Nikola: embedding compiled GPU functions in Haskell. Haskell, 2010.
[12]
T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier. Optimising purely functional GPU programs. ICFP, 2013.
[13]
T. Neumann. Efficiently compiling efficient query plans for modern hardware. PVLDB, 4(9):539–550, 2011.
[14]
J. C. Reynolds. Definitional interpreters for higher-order programming languages. Higher-Order and Symbolic Computation, 11(4):363–397, 1998.
[15]
T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scalavirtualized: Linguistic reuse for deep embeddings. Higher-Order and Symbolic Computation (Special issue for PEPM’12).
[16]
T. Rompf, K. J. Brown, H. Lee, A. K. Sujeeth, M. Jonnalagedda, N. Amin, G. Ofenbeck, A. Stojanov, Y. Klonatos, M. Dashti, C. Koch, M. Püschel, and K. Olukotun. Go meta! A case for generative programming and dsls in performance critical systems. In SNAPL, 2015.
[17]
T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55(6):121–130, 2012.
[18]
T. Rompf, A. K. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs. POPL, 2013.
[19]
M. Stonebraker and U. Çetintemel. "One Size Fits All": An idea whose time has come and gone (abstract). In ICDE, pages 2–11, 2005.
[20]
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era (it’s time for a complete rewrite). In VLDB, pages 1150–1160, 2007.
[21]
J. Svenningsson and E. Axelsson. Combining deep and shallow embedding for EDSL. In TFP, 2012.
[22]
W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211–242, 2000.
[23]
S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. PLDI, 2011.
[24]
M. Zukowski, P. A. Boncz, N. Nes, and S. Héman. MonetDB/X100 - A DBMS In The CPU Cache. IEEE Data Eng. Bull., 28(2):17–22, 2005.

Cited By

View all
  • (2024)Flan: An Expressive and Efficient Datalog Compiler for Program AnalysisProceedings of the ACM on Programming Languages10.1145/36329288:POPL(2577-2609)Online publication date: 5-Jan-2024
  • (2023)Graph IRs for Impure Higher-Order Languages: Making Aggressive Optimizations Affordable with Precise Effect DependenciesProceedings of the ACM on Programming Languages10.1145/36228137:OOPSLA2(400-430)Online publication date: 16-Oct-2023
  • (2022)GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741280(53-65)Online publication date: 2-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 50, Issue 9
ICFP '15
September 2015
436 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2858949
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    ICFP 2015: Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming
    August 2015
    436 pages
    ISBN:9781450336697
    DOI:10.1145/2784731
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2015
Published in SIGPLAN Volume 50, Issue 9

Check for updates

Author Tags

  1. Futamura Projections
  2. Generative Programming
  3. Query Compilation
  4. SQL
  5. Staging

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Flan: An Expressive and Efficient Datalog Compiler for Program AnalysisProceedings of the ACM on Programming Languages10.1145/36329288:POPL(2577-2609)Online publication date: 5-Jan-2024
  • (2023)Graph IRs for Impure Higher-Order Languages: Making Aggressive Optimizations Affordable with Precise Effect DependenciesProceedings of the ACM on Programming Languages10.1145/36228137:OOPSLA2(400-430)Online publication date: 16-Oct-2023
  • (2022)GraphIt to CUDA Compiler in 2021 LOC: A Case for High-Performance DSL Implementation via Staging with BuilDSL2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO53902.2022.9741280(53-65)Online publication date: 2-Apr-2022
  • (2016)On supporting compilation in spatial query enginesProceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems10.1145/2996913.2996945(1-4)Online publication date: 31-Oct-2016
  • (2016)Challenges of Modern Query ProcessingProceedings of the First International Conference on Computational Intelligence and Informatics10.1007/978-981-10-2471-9_41(423-432)Online publication date: 1-Dec-2016
  • (2024)Specializing Data Access in a Distributed File System (Generative Pearl)Proceedings of the 23rd ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/3689484.3690736(44-52)Online publication date: 21-Oct-2024
  • (2021)Practical normalization by evaluation for EDSLsProceedings of the 14th ACM SIGPLAN International Symposium on Haskell10.1145/3471874.3472983(56-70)Online publication date: 18-Aug-2021
  • (2021)Multiscale Cholesky preconditioning for ill-conditioned problemsACM Transactions on Graphics10.1145/3450626.345985140:4(1-13)Online publication date: 19-Jul-2021
  • (2021)Neural monocular 3D human motion capture with physical awarenessACM Transactions on Graphics10.1145/3450626.345982540:4(1-15)Online publication date: 19-Jul-2021
  • (2021)MoCap-solverACM Transactions on Graphics10.1145/3450626.345968140:4(1-11)Online publication date: 19-Jul-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media