skip to main content

Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Fast compilation is important when compilation occurs at runtime, such as query compilers in modern database systems and WebAssembly virtual machines in modern browsers. We present copy-and-patch, an extremely fast compilation technique that also produces good quality code. It is capable of lowering both high-level languages and low-level bytecode programs to binary code, by stitching together code from a large library of binary implementation variants. We call these binary implementations stencils because they have holes where missing values must be inserted during code generation. We show how to construct a stencil library and describe the copy-and-patch algorithm that generates optimized binary code.

We demonstrate two use cases of copy-and-patch: a compiler for a high-level C-like language intended for metaprogramming and a compiler for WebAssembly. Our high-level language compiler has negligible compilation cost: it produces code from an AST in less time than it takes to construct the AST. We have implemented an SQL database query compiler on top of this metaprogramming system and show that on TPC-H database benchmarks, copy-and-patch generates code two orders of magnitude faster than LLVM -O0 and three orders of magnitude faster than higher optimization levels. The generated code runs an order of magnitude faster than interpretation and 14% faster than LLVM -O0. Our WebAssembly compiler generates code 4.9X-6.5X faster than Liftoff, the WebAssembly baseline compiler in Google Chrome. The generated code also outperforms Liftoff's by 39%-63% on the Coremark and PolyBenchC WebAssembly benchmarks.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

Appendix for paper.

References

  1. Syrus Akbary. 2018. Wasmer Cranelift backend. Wasmer. https://github.com/wasmerio/wasmer/tree/master/lib/compiler-craneliftGoogle ScholarGoogle Scholar
  2. Syrus Akbary, Ivan Enderlin, Mark McCaskey, Nick Lewycky, Heyang Zhou, Brandon Fish, Lachlan Sneff, and Mackenzie Clark. 2018. Wasmer: The leading WebAssembly Runtime supporting WASI and Emscripten. Wasmer Inc. https://wasmer.io/Google ScholarGoogle Scholar
  3. Bytecode Alliance. 2018. Cranelift Code Generator. Bytecode Alliance. https://github.com/bytecodealliance/craneliftGoogle ScholarGoogle Scholar
  4. AutoCAD. 2018. AutoCAD Web App. AutoCAD. https://web.autocad.com/Google ScholarGoogle Scholar
  5. Clemens Backes. 2018. Liftoff: a new baseline compiler for WebAssembly in V8. Google. https://v8.dev/blog/liftoffGoogle ScholarGoogle Scholar
  6. JF Bastien, Keith Miller, and Saam Barati. 2017. Assembling WebAssembly. Safari. https://webkit.org/blog/7691/webassembly/Google ScholarGoogle Scholar
  7. James R Bell. 1973. Threaded code. Commun. ACM 16, 6 (1973), 370–372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In 2005 USENIX Annual Technical Conference (USENIX ATC 05). USENIX Association, Anaheim, CA, 41–46. https://www.usenix.org/conference/2005-usenix-annual-technical-conference/qemu-fast-and-portable-dynamic-translator Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. Julia: A fresh approach to numerical computing. SIAM review 59, 1 (2017), 65–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kevin Casey, David Gregg, M. Anton Ertl, and Andrew Nisbet. 2003. Towards Superinstructions for Java Interpreters. In Software and Compilers for Embedded Systems, Andreas Krall (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 329–343. isbn:978-3-540-39920-9Google ScholarGoogle Scholar
  11. IBM Knowledge Center. 2020. Disabling the Java JIT Compiler. IBM. https://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.vm.80.doc/docs/jit_disable.htmlGoogle ScholarGoogle Scholar
  12. Lin Clark. 2018. Making WebAssembly even faster: Firefox’s new streaming and tiering compiler. Mozilla. https://hacks.mozilla.org/2018/01/making-webassembly-even-faster-firefoxs-new-streaming-and-tiering-compiler/Google ScholarGoogle Scholar
  13. Charles Consel, Luke Hornof, Renaud Marlet, Gilles Muller, Scott Thibault, E-N Volanschi, Julia Lawall, and Jacques Noyé. 1998. Tempo: Specializing systems applications and beyond. Comput. Surveys 30, 3es (1998), 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. 2013. Terra: A Multi-Stage Language for High-Performance Computing. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (Seattle, Washington, USA). ACM, New York, NY, USA, 105–116. isbn:9781450320146 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Zachary DeVito, Daniel Ritchie, Matt Fisher, Alex Aiken, and Pat Hanrahan. 2014. First-Class Runtime Generation of High-Performance Types Using Exotypes. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (Edinburgh, United Kingdom). Association for Computing Machinery, New York, NY, USA, 77–88. isbn:9781450327848 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. EEMBC. 2009. CoreMark Benchmark. EEMBC. https://www.eembc.org/coremark/Google ScholarGoogle Scholar
  17. Dawson R Engler. 1996. VCODE: a retargetable, extensible, very fast dynamic code generation system. ACM SIGPLAN Notices 31, 5 (1996), 160–170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dawson R Engler and Todd A Proebsting. 1994. DCG: An efficient, retargetable dynamic code generation system. ACM SIGPLAN Notices 29, 11 (1994), 263–272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Martin Anton Ertl and David Gregg. 2003. Implementation issues for superinstructions in Gforth. In Proceedings of EuroForth 2003. Citeseer, Herefordshire, UK, 9.Google ScholarGoogle Scholar
  20. H. Finkel, D. Poliakoff, J. S. Camier, and D. F. Richards. 2019. ClangJIT: Enhancing C++ with Just-in-Time Compilation. In 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE, Denver, CO, USA, 82–95. Google ScholarGoogle ScholarCross RefCross Ref
  21. Dimitri Fontaine. 2018. PostgreSQL 11 and Just In Time Compilation of Queries. CitusData. https://www.citusdata.com/blog/2018/09/11/postgresql-11-just-in-time/Google ScholarGoogle Scholar
  22. M. Frigo and S. G. Johnson. 1998. FFTW: an adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 3. IEEE, Seattle, WA, USA, 1381–1384. Google ScholarGoogle ScholarCross RefCross Ref
  23. GHC and LLVM. 2020. LLVM Documentation on GHC Calling Convention. The Glasgow Haskell Team and LLVM Team. https://releases.llvm.org/10.0.0/docs/LangRef.html?highlight=ghc#calling-conventionsGoogle ScholarGoogle Scholar
  24. Dan Gohman. 2018. Introducing Lightbeam: An Optimising Streaming WebAssembly Compiler. Bytecode Alliance. http://troubles.md/posts/lightbeam/Google ScholarGoogle Scholar
  25. Dan Gohman, Pat Hickey, Alex Crichton, Andrew Brown, Benjamin Bouvier, and Nick Fitzgerald. 2018. WasmTime: A small and efficient runtime for WebAssembly & WASI. Bytecode Alliance. https://wasmtime.dev/Google ScholarGoogle Scholar
  26. Dan Gohman, Pat Hickey, Alex Crichton, Andrew Brown, Benjamin Bouvier, and Nick Fitzgerald. 2018. WasmTime Cranelift Compiler. Bytecode Alliance. https://github.com/bytecodealliance/wasmtime/tree/main/crates/craneliftGoogle ScholarGoogle Scholar
  27. Google. 2019. WebAssembly compilation pipeline. Google. https://v8.dev/docs/wasm-compilation-pipelineGoogle ScholarGoogle Scholar
  28. David Gries and Jayadev Misra. 1978. A Linear Sieve Algorithm for Finding Prime Numbers. Commun. ACM 21, 12 (Dec. 1978), 999–1003. issn:0001-0782 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. W3C Community Group. 2017. WebAssembly 1.0 Core Specification. W3C Community Group. https://webassembly.github.io/spec/core/Google ScholarGoogle Scholar
  30. W3C Community Group. 2018. https://github.com/WebAssembly/WASI/blob/main/phases/snapshot/docs.md. W3C Community Group. https://github.com/WebAssembly/WASI/blob/main/phases/snapshot/docs.mdGoogle ScholarGoogle Scholar
  31. Andreas Haas, Andreas Rossberg, Derek L. Schuff, Ben L. Titzer, Michael Holman, Dan Gohman, Luke Wagner, Alon Zakai, and JF Bastien. 2017. Bringing the Web up to Speed with WebAssembly. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (Barcelona, Spain) (PLDI 2017). Association for Computing Machinery, New York, NY, USA, 185–200. isbn:9781450349888 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alex Iliasov. 2003. Templates-Based Portable Just-in-Time Compiler. SIGPLAN Not. 38, 8 (Aug. 2003), 37–43. issn:0362-1340 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Petr Kobalicek. 2014. AsmJIT - Machine code generation for C++. AsmJIT. https://github.com/asmjit/asmjitGoogle ScholarGoogle Scholar
  35. Marcel Kost. 2018. PelotonDB Interpreter. PostgresSQL. https://github.com/cmu-db/peloton-design/blob/master/bytecode_interpreter/bytecode_interpreter.mdGoogle ScholarGoogle Scholar
  36. Chris Lattner. 2002. LLVM: An Infrastructure for Multi-Stage Optimization. Master’s thesis. Computer Science Dept., University of Illinois at Urbana-Champaign, Urbana, IL. See http://llvm.cs.uiuc.edu.Google ScholarGoogle Scholar
  37. Nick Lewycky. 2018. Wasmer LLVM backend. Wasmer. https://github.com/wasmerio/wasmer/tree/master/lib/compiler-llvmGoogle ScholarGoogle Scholar
  38. LinuxBase. 1998. ARM ELF Relocation types. LinuxBase. https://refspecs.linuxbase.org/elf/ARMELFA08.pdfGoogle ScholarGoogle Scholar
  39. Tomofumi Yuki Louis-Noel Pouchet. 2011. PolyBenchC Benchmark. Ohio State University. https://github.com/MatthiasJReisinger/PolyBenchC-4.2.1Google ScholarGoogle Scholar
  40. Michael Matz, Jan Hubička, Andreas Jaeger, and Mark Mitchell. 2020. System V Application Binary Interface. LinuxBase. https://refspecs.linuxbase.org/elf/x86_64-abi-0.98.pdfGoogle ScholarGoogle Scholar
  41. MemSQL. 2020. MemSQL Database. MemSQL. https://www.memsql.comGoogle ScholarGoogle Scholar
  42. MemSQL. 2020. MemSQL Query Code-Generation Documentation. MemSQL. https://docs.memsql.com/v7.1/key-concepts-and-features/query-processing/code-generation/Google ScholarGoogle Scholar
  43. MemSQL. 2020. Personal communication, with permission to disclose to the public. MemSQL.Google ScholarGoogle Scholar
  44. Prashanth Menon, Todd C. Mowry, and Andrew Pavlo. 2017. Relaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last. Proceedings of the VLDB Endowment 11 (September 2017), 1–13. Issue 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Thomas Neumann. 2011. Efficiently compiling efficient query plans for modern hardware. Proceedings of the VLDB Endowment 4, 9 (2011), 539–550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Chris Newland. 2020. JITWatch – Log analyser / visualiser for Java HotSpot JIT compiler. AdoptOpenJDK. https://github.com/AdoptOpenJDK/jitwatchGoogle ScholarGoogle Scholar
  47. Francois Noel, Luke Hornof, Charles Consel, and Julia L Lawall. 1998. Automatic, template-based run-time specialization: Implementation and experimental study. In Proceedings of the 1998 International Conference on Computer Languages. IEEE, Chicago, IL, 132–142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Oracle. 2020. 64-bit SPARC relocation types. Oracle. https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-54839.html#chapter6-24-1Google ScholarGoogle Scholar
  49. Oracle. 2020. The Java HotSpot Performance Engine Architecture. Oracle. https://www.oracle.com/java/technologies/whitepaper.htmlGoogle ScholarGoogle Scholar
  50. Mike Pall. 1999. LuaJIT DynASM. The LuaJIT Project. https://luajit.org/dynasm.htmlGoogle ScholarGoogle Scholar
  51. Andrew Pavlo. 2021. Database of Databases. Carnegie Mellon Database Group. https://dbdb.io/Google ScholarGoogle Scholar
  52. Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In Conference on Innovative Data Systems Research. CIDR, Chaminade, California, 6.Google ScholarGoogle Scholar
  53. Ian Piumarta and Fabio Riccardi. 1998. Optimizing Direct Threaded Code by Selective Inlining. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (Montreal, Quebec, Canada). Association for Computing Machinery, New York, NY, USA, 291–300. isbn:0897919874 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. PostgresSQL. 2020. Postgres Documentation - Why JIT. PostgresSQL. https://www.postgresql.org/docs/11/jit-decision.htmlGoogle ScholarGoogle Scholar
  55. Todd A. Proebsting. 1995. Optimizing an ANSI C interpreter with superoperators. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages. ACM, San Francisco, California, 322–332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Transactions on Graphics (TOG) 31, 4 (2012), 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Andrew Scheidecker and Wanming Lin. 2020. WebAssembly Virtual Machine. WAVM. https://github.com/WAVM/WAVMGoogle ScholarGoogle Scholar
  58. Ravi Sethi and J. D. Ullman. 1970. The Generation of Optimal Code for Arithmetic Expressions. J. ACM 17, 4 (Oct. 1970), 715–728. issn:0004-5411 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Ben Smith. 2018. Clang in WebAssembly. WAPM. https://github.com/wapm-packages/clangGoogle ScholarGoogle Scholar
  60. Guy Lewis Steele. 1977. Debunking the “Expensive Procedure Call” Myth or, Procedure Call Implementations Considered Harmful or, LAMBDA: The Ultimate GOTO. In Proceedings of the 1977 Annual Conference (Seattle, Washington) (ACM ’77). ACM, New York, NY, USA, 153–162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Scott Thibault, Charles Consel, Julia L Lawall, Renaud Marlet, and Gilles Muller. 2000. Static and dynamic program compilation by interpreter specialization. Higher-Order and Symbolic Computation 13, 3 (2000), 161–178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. TPC. 2020. TPC-H. http://www.tpc.org/tpch/. Accessed: 2020-11-15.Google ScholarGoogle Scholar
  63. Wikipedia. 2021. Simple Sethi-Ullman Algorithm. Wikipedia. https://en.wikipedia.org/wiki/SethiGoogle ScholarGoogle Scholar
  64. Christian Wimmer, Michael Haupt, Michael L. Van De Vanter, Mick Jordan, Laurent Daynès, and Douglas Simon. 2013. Maxine: An Approachable Virtual Machine for, and in, Java. ACM Trans. Archit. Code Optim. 9, 4, Article 30 (Jan. 2013), 24 pages. issn:1544-3566 Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Andy Wingo. 2020. firefox’s low-latency webassembly compiler. Mozilla. https://wingolog.org/archives/2020/03/25/firefoxs-low-latency-webassembly-compilerGoogle ScholarGoogle Scholar
  66. Heyang Zhou. 2018. Wasmer Singlepass Backend. Wasmer. https://github.com/wasmerio/wasmer/tree/master/lib/compiler-singlepassGoogle ScholarGoogle Scholar

Index Terms

  1. Copy-and-patch compilation: a fast compilation algorithm for high-level languages and bytecode

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!