Abstract
We present a novel framework, Duplo, for the low-level post-link optimisation of OCaml programs, achieving a speedup of 7% and a reduction of at least 15% of the code size of widely-used OCaml applications. Unlike existing post-link optimisers, which typically operate on target-specific machine code, our framework operates on a Low-Level Intermediate Representation (LLIR) capable of representing both the OCaml programs and any C dependencies they invoke through the foreign-function interface (FFI). LLIR is analysed, transformed and lowered to machine code by our post-link optimiser, LLIR-OPT. Most importantly, LLIR allows the optimiser to cross the OCaml-C language boundary, mitigating the overhead incurred by the FFI and enabling analyses and transformations in a previously unavailable context. The optimised IR is then lowered to amd64 machine code through the existing target-specific code generator of LLVM, modified to handle garbage collection just as effectively as the native OCaml backend. We equip our optimiser with a suite of SSA-based transformations and points-to analyses capable of capturing the semantics and representing the memory models of both languages, along with a cross-language inliner to embed C methods into OCaml callers. We evaluate the gains of our framework, which can be attributed to both our optimiser and the more sophisticated amd64 backend of LLVM, on a wide-range of widely-used OCaml applications, as well as an existing suite of micro- and macro-benchmarks used to track the performance of the OCaml compiler.
Supplemental Material
- Lars Ole Andersen. 1994. Program analysis and specialization for the C programming language. Ph.D. Dissertation. University of Cophenhagen.Google Scholar
- Andrew W Appel. 1998. SSA is functional programming. ACM SIGPLAN Notices 33, 4 ( 1998 ), 17-20.Google Scholar
- Benoit Boissinot, Sebastian Hack, Daniel Grund, Benoît Dupont de Dine hin, and Fabri e Rastello. 2008. Fast liveness checking for SSA-form programs. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. 35-44.Google Scholar
Digital Library
- Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, 2003. CGO 2003. IEEE, 265-275.Google Scholar
Cross Ref
- Pierre Chambart. 2016. PR # 608 : Whole program dead code elimination. https://github.com/ocaml/ocaml/pull/608.Google Scholar
- Rich Felker. 2019. The musl C standard library. Retrieved July 30 ( 2019 ), 2019.Google Scholar
- Michael Furr and Jefrey S Foster. 2005. Checking type safety of foreign function calls. ACM SIGPLAN Notices 40, 6 ( 2005 ), 62-72.Google Scholar
- Bolei Guo, Matthew J Bridges, Spyridon Triantafyllis, Guilherme Ottoni, Easwaran Raman, and David I August. 2005. Practical and accurate low-level pointer analysis. In Proceedings of the international symposium on Code generation and optimization. IEEE Computer Society, 291-302.Google Scholar
Digital Library
- Ben Hardekopf and Calvin Lin. 2007a. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In ACM SIGPLAN Notices, Vol. 42. ACM, 290-299.Google Scholar
Digital Library
- Ben Hardekopf and Calvin Lin. 2007b. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium. Springer, 265-280.Google Scholar
Cross Ref
- Paul Havlak. 1997. Nesting of reducible and irreducible loops. ACM Transactions on Programming Languages and Systems (TOPLAS) 19, 4 ( 1997 ), 557-567.Google Scholar
Digital Library
- ISO 23271: 2012 (E) 2012. Information technology-Common Language Infrastructure (CLI). Standard. International Organization for Standardization, Geneva, CH.Google Scholar
- ISO /IEC 9899: 1999, 1999. Programming languages-C. Standard. International Organization for Standardization, Geneva, CH.Google Scholar
- Nick P Johnson, Jordan Fix, Stephen R Beard, Taewook Oh, Thomas B Jablin, and David I August. 2017. A collaborative dependence analysis framework. In Proceedings of the 2017 International Symposium on Code Generation and Optimization. IEEE Press, 148-159.Google Scholar
Digital Library
- Simon L Peyton Jones. 1992. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. Journal of functional programming 2, 2 ( 1992 ), 127-202.Google Scholar
- Uday P Khedker, Alan Mycroft, and Prashant Singh Rawat. 2012. Liveness-based pointer analysis. In International Static Analysis Symposium. Springer, 265-282.Google Scholar
Cross Ref
- Chris Lattner. 2008. LLVM and Clang: Next generation compiler technology. In The BSD conference, Vol. 5.Google Scholar
- Chris Lattner. 2020a. Garbage Collection Safepoints in LLVM. https://llvm.org/docs/Statepoints.html. Accessed: 2020-02-11.Google Scholar
- Chris Lattner. 2020b. Writing an LLVM Backend. https://llvm.org/docs/WritingAnLLVMBackend.html. Accessed: 2020-02-19.Google Scholar
- Chris Lattner and Vikram Adve. 2003. Data structure analysis: A fast and scalable context-sensitive heap analysis. Technical Report. Citeseer.Google Scholar
- Xavier Leroy. 2009. Google Summer of Code Proposal. https://inbox.ocaml.org/caml-list/ [email protected]/ Accessed: 2020-02-14.Google Scholar
- Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2014. The OCaml system release 4.02. Institut National de Recherche en Informatique et en Automatique 54 ( 2014 ).Google Scholar
- Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java virtual machine specification. Pearson Education.Google Scholar
- Nicholas D Matsakis and Felix S Klock. 2014. The Rust Language. In ACM SIGAda Ada Letters, Vol. 34. ACM, 103-104.Google Scholar
Digital Library
- Mozilla. 2019. Closing the gap: cross-language LTO between Rust and C/C++. http://blog.llvm.org/ 2019 /09/closing-gapcross-language-lto-between.html. Accessed: 2019-10-01.Google Scholar
- Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F Sweeney. 2009. Producing wrong data without doing anything obviously wrong ! ACM Sigplan Notices 44, 3 ( 2009 ), 265-276.Google Scholar
- Esko Nuutila and Eljas Soisalon-Soininen. 1993. On finding the strong components in a directed graph. Helsingin Teknillinen Korkeakoulu. Tietojenkäsittelytekniikan Laitos.Google Scholar
- Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. Bolt: a practical binary optimizer for data centers and beyond. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Press, 2-14.Google Scholar
Cross Ref
- David J Pearce, Paul HJ Kelly, and Chris Hankin. 2007. Eficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 1 ( 2007 ), 4.Google Scholar
Digital Library
- François Pottier and Yann Régis-Gianas. 2016. Menhir reference manual. Inria, Aug ( 2016 ).Google Scholar
- Gabriel Scherer. 2015. Native compiler for oCaml on System Z. https://inbox.ocaml.org/caml-list/ CAPFanBEAN6BA2PhMJ00ybUZV[email protected]/ Accessed: 2020-02-14.Google Scholar
- Benjamin Schwarz, Saumya Debray, Gregory Andrews, and Matthew Legendre. 2001. Plto: A link-time optimizer for the Intel IA-32 architecture. In Proc. 2001 Workshop on Binary Translation (WBT-2001).Google Scholar
- Brandon Simmons. 2019. GHC LLVM LTO Experiments Scratch Notes. http://brandon.si/code/ghc-llvm-lto-experimentsscratch-notes/ Accessed: 2020-02-17.Google Scholar
- KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jafer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. ICFP ( 2020 ).Google Scholar
- Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 32-41.Google Scholar
Digital Library
- Yulei Sui, Xiaokang Fan, Hao Zhou, and Jingling Xue. 2018. Loop-oriented pointer analysis for automatic simd vectorization. ACM Transactions on Embedded Computing Systems (TECS) 17, 2 ( 2018 ), 56.Google Scholar
Digital Library
- Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benin. 2018. A transprecision floating-point platform for ultra-low power computing. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1051-1056.Google Scholar
- Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing 1, 2 ( 1972 ), 146-160.Google Scholar
- David A Terei and Manuel MT Chakravarty. 2010. An LLVM backend for GHC. In ACM Sigplan Notices, Vol. 45. ACM, 109-120.Google Scholar
Digital Library
- Jérôme Vouillon and Vincent Balat. 2014. From bytecode to JavaScript: the Js_of_ocaml compiler. Software: Practice and Experience 44, 8 ( 2014 ), 951-972.Google Scholar
- Stephen Weeks. 2006. Whole-program compilation in MLton. ML 6 ( 2006 ), 1-1.Google Scholar
- Mark N Wegman and F Kenneth Zadeck. 1991. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 2 ( 1991 ), 181-210.Google Scholar
Digital Library
- Reinhard Wilhelm, Mooly Sagiv, and Thomas Reps. 2000. Shape analysis. In International Conference on Compiler Construction. Springer, 1-17.Google Scholar
Cross Ref
- Jeremy Yallop, David Sheets, and Anil Madhavapeddy. 2016. Declarative foreign function binding through generic programming. In International Symposium on Functional and Logic Programming. Springer, 198-214.Google Scholar
Cross Ref
Index Terms
Duplo: a framework for OCaml post-link optimisation
Recommendations
OCaml-Java: OCaml on the JVM
TFP 2012: Proceedings of the 2012 Conference on Trends in Functional Programming - Volume 7829This article presents the OCaml-Java project whose goal is to allow compilation of OCaml sources into Java bytecodes. The ability to run OCaml code on a Java virtual machine provides the developer with means to leverage the strengths of the Java ...
Compiling Embedded Programs to Byte Code
PADL '02: Proceedings of the 4th International Symposium on Practical Aspects of Declarative LanguagesFunctional languages have proven substantially useful for hosting embedded domain-specific languages. They provide an infrastructure rich enough to define both a convenient syntax for the embedded language, a type system for embedded programs, and an ...
Making collection operations optimal with aggressive JIT compilation
SCALA 2017: Proceedings of the 8th ACM SIGPLAN International Symposium on ScalaFunctional collection combinators are a neat and widely accepted data processing abstraction. However, their generic nature results in high abstraction overheads -- Scala collections are known to be notoriously slow for typical tasks. We show that ...






Comments