Abstract
Fast compilation is important when compilation occurs at runtime, such as query compilers in modern database systems and WebAssembly virtual machines in modern browsers. We present copy-and-patch, an extremely fast compilation technique that also produces good quality code. It is capable of lowering both high-level languages and low-level bytecode programs to binary code, by stitching together code from a large library of binary implementation variants. We call these binary implementations stencils because they have holes where missing values must be inserted during code generation. We show how to construct a stencil library and describe the copy-and-patch algorithm that generates optimized binary code.
We demonstrate two use cases of copy-and-patch: a compiler for a high-level C-like language intended for metaprogramming and a compiler for WebAssembly. Our high-level language compiler has negligible compilation cost: it produces code from an AST in less time than it takes to construct the AST. We have implemented an SQL database query compiler on top of this metaprogramming system and show that on TPC-H database benchmarks, copy-and-patch generates code two orders of magnitude faster than LLVM -O0 and three orders of magnitude faster than higher optimization levels. The generated code runs an order of magnitude faster than interpretation and 14% faster than LLVM -O0. Our WebAssembly compiler generates code 4.9X-6.5X faster than Liftoff, the WebAssembly baseline compiler in Google Chrome. The generated code also outperforms Liftoff's by 39%-63% on the Coremark and PolyBenchC WebAssembly benchmarks.
Supplemental Material
Available for Download
Appendix for paper.
- Syrus Akbary. 2018. Wasmer Cranelift backend. Wasmer. https://github.com/wasmerio/wasmer/tree/master/lib/compiler-craneliftGoogle Scholar
- Syrus Akbary, Ivan Enderlin, Mark McCaskey, Nick Lewycky, Heyang Zhou, Brandon Fish, Lachlan Sneff, and Mackenzie Clark. 2018. Wasmer: The leading WebAssembly Runtime supporting WASI and Emscripten. Wasmer Inc. https://wasmer.io/Google Scholar
- Bytecode Alliance. 2018. Cranelift Code Generator. Bytecode Alliance. https://github.com/bytecodealliance/craneliftGoogle Scholar
- AutoCAD. 2018. AutoCAD Web App. AutoCAD. https://web.autocad.com/Google Scholar
- Clemens Backes. 2018. Liftoff: a new baseline compiler for WebAssembly in V8. Google. https://v8.dev/blog/liftoffGoogle Scholar
- JF Bastien, Keith Miller, and Saam Barati. 2017. Assembling WebAssembly. Safari. https://webkit.org/blog/7691/webassembly/Google Scholar
- James R Bell. 1973. Threaded code. Commun. ACM 16, 6 (1973), 370–372. Google Scholar
Digital Library
- Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In 2005 USENIX Annual Technical Conference (USENIX ATC 05). USENIX Association, Anaheim, CA, 41–46. https://www.usenix.org/conference/2005-usenix-annual-technical-conference/qemu-fast-and-portable-dynamic-translator Google Scholar
Digital Library
- Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. Julia: A fresh approach to numerical computing. SIAM review 59, 1 (2017), 65–98.Google Scholar
Digital Library
- Kevin Casey, David Gregg, M. Anton Ertl, and Andrew Nisbet. 2003. Towards Superinstructions for Java Interpreters. In Software and Compilers for Embedded Systems, Andreas Krall (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 329–343. isbn:978-3-540-39920-9Google Scholar
- IBM Knowledge Center. 2020. Disabling the Java JIT Compiler. IBM. https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jit_disable.htmlGoogle Scholar
- Lin Clark. 2018. Making WebAssembly even faster: Firefox’s new streaming and tiering compiler. Mozilla. https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/Google Scholar
- Charles Consel, Luke Hornof, Renaud Marlet, Gilles Muller, Scott Thibault, E-N Volanschi, Julia Lawall, and Jacques Noyé. 1998. Tempo: Specializing systems applications and beyond. Comput. Surveys 30, 3es (1998), 5. Google Scholar
Digital Library
- Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. 2013. Terra: A Multi-Stage Language for High-Performance Computing. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA). ACM, New York, NY, USA, 105–116. isbn:9781450320146 Google Scholar
Digital Library
- Zachary DeVito, Daniel Ritchie, Matt Fisher, Alex Aiken, and Pat Hanrahan. 2014. First-Class Runtime Generation of High-Performance Types Using Exotypes. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom). Association for Computing Machinery, New York, NY, USA, 77–88. isbn:9781450327848 Google Scholar
Digital Library
- EEMBC. 2009. CoreMark Benchmark. EEMBC. https://www.eembc.org/coremark/Google Scholar
- Dawson R Engler. 1996. VCODE: a retargetable, extensible, very fast dynamic code generation system. ACM SIGPLAN Notices 31, 5 (1996), 160–170. Google Scholar
Digital Library
- Dawson R Engler and Todd A Proebsting. 1994. DCG: An efficient, retargetable dynamic code generation system. ACM SIGPLAN Notices 29, 11 (1994), 263–272. Google Scholar
Digital Library
- Martin Anton Ertl and David Gregg. 2003. Implementation issues for superinstructions in Gforth. In Proceedings of EuroForth 2003. Citeseer, Herefordshire, UK, 9.Google Scholar
- H. Finkel, D. Poliakoff, J. S. Camier, and D. F. Richards. 2019. ClangJIT: Enhancing C++ with Just-in-Time Compilation. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, Denver, CO, USA, 82–95. Google Scholar
Cross Ref
- Dimitri Fontaine. 2018. PostgreSQL 11 and Just In Time Compilation of Queries. CitusData. https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/Google Scholar
- M. Frigo and S. G. Johnson. 1998. FFTW: an adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 3. IEEE, Seattle, WA, USA, 1381–1384. Google Scholar
Cross Ref
- GHC and LLVM. 2020. LLVM Documentation on GHC Calling Convention. The Glasgow Haskell Team and LLVM Team. https://releases.llvm.org/10.0.0/docs/LangRef.html?highlight=ghc#calling-conventionsGoogle Scholar
- Dan Gohman. 2018. Introducing Lightbeam: An Optimising Streaming WebAssembly Compiler. Bytecode Alliance. http://troubles.md/posts/lightbeam/Google Scholar
- Dan Gohman, Pat Hickey, Alex Crichton, Andrew Brown, Benjamin Bouvier, and Nick Fitzgerald. 2018. WasmTime: A small and efficient runtime for WebAssembly & WASI. Bytecode Alliance. https://wasmtime.dev/Google Scholar
- Dan Gohman, Pat Hickey, Alex Crichton, Andrew Brown, Benjamin Bouvier, and Nick Fitzgerald. 2018. WasmTime Cranelift Compiler. Bytecode Alliance. https://github.com/bytecodealliance/wasmtime/tree/main/crates/craneliftGoogle Scholar
- Google. 2019. WebAssembly compilation pipeline. Google. https://v8.dev/docs/wasm-compilation-pipelineGoogle Scholar
- David Gries and Jayadev Misra. 1978. A Linear Sieve Algorithm for Finding Prime Numbers. Commun. ACM 21, 12 (Dec. 1978), 999–1003. issn:0001-0782 Google Scholar
Digital Library
- W3C Community Group. 2017. WebAssembly 1.0 Core Specification. W3C Community Group. https://webassembly.github.io/spec/core/Google Scholar
- W3C Community Group. 2018. https://github.com/WebAssembly/WASI/blob/main/phases/snapshot/docs.md. W3C Community Group. https://github.com/WebAssembly/WASI/blob/main/phases/snapshot/docs.mdGoogle Scholar
- Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. 2017. Bringing the Web up to Speed with WebAssembly. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 185–200. isbn:9781450349888 Google Scholar
Digital Library
- Alex Iliasov. 2003. Templates-Based Portable Just-in-Time Compiler. SIGPLAN Not. 38, 8 (Aug. 2003), 37–43. issn:0362-1340 Google Scholar
Digital Library
- Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–29. Google Scholar
Digital Library
- Petr Kobalicek. 2014. AsmJIT - Machine code generation for C++. AsmJIT. https://github.com/asmjit/asmjitGoogle Scholar
- Marcel Kost. 2018. PelotonDB Interpreter. PostgresSQL. https://github.com/cmu-db/peloton-design/blob/master/bytecode_interpreter/bytecode_interpreter.mdGoogle Scholar
- Chris Lattner. 2002. LLVM: An Infrastructure for Multi-Stage Optimization. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL. See http://llvm.cs.uiuc.edu.Google Scholar
- Nick Lewycky. 2018. Wasmer LLVM backend. Wasmer. https://github.com/wasmerio/wasmer/tree/master/lib/compiler-llvmGoogle Scholar
- LinuxBase. 1998. ARM ELF Relocation types. LinuxBase. https://refspecs.linuxbase.org/elf/ARMELFA08.pdfGoogle Scholar
- Tomofumi Yuki Louis-Noel Pouchet. 2011. PolyBenchC Benchmark. Ohio State University. https://github.com/MatthiasJReisinger/PolyBenchC-4.2.1Google Scholar
- Michael Matz, Jan Hubička, Andreas Jaeger, and Mark Mitchell. 2020. System V Application Binary Interface. LinuxBase. https://refspecs.linuxbase.org/elf/x86_64-abi-0.98.pdfGoogle Scholar
- MemSQL. 2020. MemSQL Database. MemSQL. https://www.memsql.comGoogle Scholar
- MemSQL. 2020. MemSQL Query Code-Generation Documentation. MemSQL. https://docs.memsql.com/v7.1/key-concepts-and-features/query-processing/code-generation/Google Scholar
- MemSQL. 2020. Personal communication, with permission to disclose to the public. MemSQL.Google Scholar
- Prashanth Menon, Todd C. Mowry, and Andrew Pavlo. 2017. Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last. Proceedings of the VLDB Endowment 11 (September 2017), 1–13. Issue 1. Google Scholar
Digital Library
- Thomas Neumann. 2011. Efficiently compiling efficient query plans for modern hardware. Proceedings of the VLDB Endowment 4, 9 (2011), 539–550. Google Scholar
Digital Library
- Chris Newland. 2020. JITWatch – Log analyser / visualiser for Java HotSpot JIT compiler. AdoptOpenJDK. https://github.com/AdoptOpenJDK/jitwatchGoogle Scholar
- Francois Noel, Luke Hornof, Charles Consel, and Julia L Lawall. 1998. Automatic, template-based run-time specialization: Implementation and experimental study. In Proceedings of the 1998 International Conference on Computer Languages. IEEE, Chicago, IL, 132–142. Google Scholar
Digital Library
- Oracle. 2020. 64-bit SPARC relocation types. Oracle. https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-54839.html#chapter6-24-1Google Scholar
- Oracle. 2020. The Java HotSpot Performance Engine Architecture. Oracle. https://www.oracle.com/java/technologies/whitepaper.htmlGoogle Scholar
- Mike Pall. 1999. LuaJIT DynASM. The LuaJIT Project. https://luajit.org/dynasm.htmlGoogle Scholar
- Andrew Pavlo. 2021. Database of Databases. Carnegie Mellon Database Group. https://dbdb.io/Google Scholar
- Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In Conference on Innovative Data Systems Research. CIDR, Chaminade, California, 6.Google Scholar
- Ian Piumarta and Fabio Riccardi. 1998. Optimizing Direct Threaded Code by Selective Inlining. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (Montreal, Quebec, Canada). Association for Computing Machinery, New York, NY, USA, 291–300. isbn:0897919874 Google Scholar
Digital Library
- PostgresSQL. 2020. Postgres Documentation - Why JIT. PostgresSQL. https://www.postgresql.org/docs/11/jit-decision.htmlGoogle Scholar
- Todd A. Proebsting. 1995. Optimizing an ANSI C interpreter with superoperators. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages. ACM, San Francisco, California, 322–332. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1–12. Google Scholar
Digital Library
- Andrew Scheidecker and Wanming Lin. 2020. WebAssembly Virtual Machine. WAVM. https://github.com/WAVM/WAVMGoogle Scholar
- Ravi Sethi and J. D. Ullman. 1970. The Generation of Optimal Code for Arithmetic Expressions. J. ACM 17, 4 (Oct. 1970), 715–728. issn:0004-5411 Google Scholar
Digital Library
- Ben Smith. 2018. Clang in WebAssembly. WAPM. https://github.com/wapm-packages/clangGoogle Scholar
- Guy Lewis Steele. 1977. Debunking the “Expensive Procedure Call” Myth or, Procedure Call Implementations Considered Harmful or, LAMBDA: The Ultimate GOTO. In Proceedings of the 1977 Annual Conference (Seattle, Washington) (ACM ’77). ACM, New York, NY, USA, 153–162. Google Scholar
Digital Library
- Scott Thibault, Charles Consel, Julia L Lawall, Renaud Marlet, and Gilles Muller. 2000. Static and dynamic program compilation by interpreter specialization. Higher-Order and Symbolic Computation 13, 3 (2000), 161–178. Google Scholar
Digital Library
- TPC. 2020. TPC-H. http://www.tpc.org/tpch/. Accessed: 2020-11-15.Google Scholar
- Wikipedia. 2021. Simple Sethi-Ullman Algorithm. Wikipedia. https://en.wikipedia.org/wiki/SethiGoogle Scholar
- Christian Wimmer, Michael Haupt, Michael L. Van De Vanter, Mick Jordan, Laurent Daynès, and Douglas Simon. 2013. Maxine: An Approachable Virtual Machine for, and in, Java. ACM Trans. Archit. Code Optim. 9, 4, Article 30 (Jan. 2013), 24 pages. issn:1544-3566 Google Scholar
Digital Library
- Andy Wingo. 2020. firefox’s low-latency webassembly compiler. Mozilla. https://wingolog.org/archives/2020/03/25/firefoxs-low-latency-webassembly-compilerGoogle Scholar
- Heyang Zhou. 2018. Wasmer Singlepass Backend. Wasmer. https://github.com/wasmerio/wasmer/tree/master/lib/compiler-singlepassGoogle Scholar
Index Terms
Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode
Recommendations
Part-compilation in high-level languages
AbstractMany programming languages include the ability to divide large programs into smaller segments, which are compiled separately. When a small modification is made to a large program, then the affected segment only has to be re-compiled.
This paper ...
A simple separate compilation mechanism for block-structured languages
A very simple and efficient technique for the introduction of separate compilation facilities into compilers for block-structured languages is presented. Using this technique, programs may be compiled in parts while the compile-time checking advantages ...






Comments