skip to main content

Duplo: a framework for OCaml post-link optimisation

Published:03 August 2020Publication History
Skip Abstract Section

Abstract

We present a novel framework, Duplo, for the low-level post-link optimisation of OCaml programs, achieving a speedup of 7% and a reduction of at least 15% of the code size of widely-used OCaml applications. Unlike existing post-link optimisers, which typically operate on target-specific machine code, our framework operates on a Low-Level Intermediate Representation (LLIR) capable of representing both the OCaml programs and any C dependencies they invoke through the foreign-function interface (FFI). LLIR is analysed, transformed and lowered to machine code by our post-link optimiser, LLIR-OPT. Most importantly, LLIR allows the optimiser to cross the OCaml-C language boundary, mitigating the overhead incurred by the FFI and enabling analyses and transformations in a previously unavailable context. The optimised IR is then lowered to amd64 machine code through the existing target-specific code generator of LLVM, modified to handle garbage collection just as effectively as the native OCaml backend. We equip our optimiser with a suite of SSA-based transformations and points-to analyses capable of capturing the semantics and representing the memory models of both languages, along with a cross-language inliner to embed C methods into OCaml callers. We evaluate the gains of our framework, which can be attributed to both our optimiser and the more sophisticated amd64 backend of LLVM, on a wide-range of widely-used OCaml applications, as well as an existing suite of micro- and macro-benchmarks used to track the performance of the OCaml compiler.

Skip Supplemental Material Section

Supplemental Material

Presentation at ICFP '20

References

  1. Lars Ole Andersen. 1994. Program analysis and specialization for the C programming language. Ph.D. Dissertation. University of Cophenhagen.Google ScholarGoogle Scholar
  2. Andrew W Appel. 1998. SSA is functional programming. ACM SIGPLAN Notices 33, 4 ( 1998 ), 17-20.Google ScholarGoogle Scholar
  3. Benoit Boissinot, Sebastian Hack, Daniel Grund, Benoît Dupont de Dine hin, and Fabri e Rastello. 2008. Fast liveness checking for SSA-form programs. In Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization. 35-44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, 2003. CGO 2003. IEEE, 265-275.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pierre Chambart. 2016. PR # 608 : Whole program dead code elimination. https://github.com/ocaml/ocaml/pull/608.Google ScholarGoogle Scholar
  6. Rich Felker. 2019. The musl C standard library. Retrieved July 30 ( 2019 ), 2019.Google ScholarGoogle Scholar
  7. Michael Furr and Jefrey S Foster. 2005. Checking type safety of foreign function calls. ACM SIGPLAN Notices 40, 6 ( 2005 ), 62-72.Google ScholarGoogle Scholar
  8. Bolei Guo, Matthew J Bridges, Spyridon Triantafyllis, Guilherme Ottoni, Easwaran Raman, and David I August. 2005. Practical and accurate low-level pointer analysis. In Proceedings of the international symposium on Code generation and optimization. IEEE Computer Society, 291-302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ben Hardekopf and Calvin Lin. 2007a. The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code. In ACM SIGPLAN Notices, Vol. 42. ACM, 290-299.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ben Hardekopf and Calvin Lin. 2007b. Exploiting pointer and location equivalence to optimize pointer analysis. In International Static Analysis Symposium. Springer, 265-280.Google ScholarGoogle ScholarCross RefCross Ref
  11. Paul Havlak. 1997. Nesting of reducible and irreducible loops. ACM Transactions on Programming Languages and Systems (TOPLAS) 19, 4 ( 1997 ), 557-567.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. ISO 23271: 2012 (E) 2012. Information technology-Common Language Infrastructure (CLI). Standard. International Organization for Standardization, Geneva, CH.Google ScholarGoogle Scholar
  13. ISO /IEC 9899: 1999, 1999. Programming languages-C. Standard. International Organization for Standardization, Geneva, CH.Google ScholarGoogle Scholar
  14. Nick P Johnson, Jordan Fix, Stephen R Beard, Taewook Oh, Thomas B Jablin, and David I August. 2017. A collaborative dependence analysis framework. In Proceedings of the 2017 International Symposium on Code Generation and Optimization. IEEE Press, 148-159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Simon L Peyton Jones. 1992. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. Journal of functional programming 2, 2 ( 1992 ), 127-202.Google ScholarGoogle Scholar
  16. Uday P Khedker, Alan Mycroft, and Prashant Singh Rawat. 2012. Liveness-based pointer analysis. In International Static Analysis Symposium. Springer, 265-282.Google ScholarGoogle ScholarCross RefCross Ref
  17. Chris Lattner. 2008. LLVM and Clang: Next generation compiler technology. In The BSD conference, Vol. 5.Google ScholarGoogle Scholar
  18. Chris Lattner. 2020a. Garbage Collection Safepoints in LLVM. https://llvm.org/docs/Statepoints.html. Accessed: 2020-02-11.Google ScholarGoogle Scholar
  19. Chris Lattner. 2020b. Writing an LLVM Backend. https://llvm.org/docs/WritingAnLLVMBackend.html. Accessed: 2020-02-19.Google ScholarGoogle Scholar
  20. Chris Lattner and Vikram Adve. 2003. Data structure analysis: A fast and scalable context-sensitive heap analysis. Technical Report. Citeseer.Google ScholarGoogle Scholar
  21. Xavier Leroy. 2009. Google Summer of Code Proposal. https://inbox.ocaml.org/caml-list/ [email protected]/ Accessed: 2020-02-14.Google ScholarGoogle Scholar
  22. Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2014. The OCaml system release 4.02. Institut National de Recherche en Informatique et en Automatique 54 ( 2014 ).Google ScholarGoogle Scholar
  23. Tim Lindholm, Frank Yellin, Gilad Bracha, and Alex Buckley. 2014. The Java virtual machine specification. Pearson Education.Google ScholarGoogle Scholar
  24. Nicholas D Matsakis and Felix S Klock. 2014. The Rust Language. In ACM SIGAda Ada Letters, Vol. 34. ACM, 103-104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Mozilla. 2019. Closing the gap: cross-language LTO between Rust and C/C++. http://blog.llvm.org/ 2019 /09/closing-gapcross-language-lto-between.html. Accessed: 2019-10-01.Google ScholarGoogle Scholar
  26. Todd Mytkowicz, Amer Diwan, Matthias Hauswirth, and Peter F Sweeney. 2009. Producing wrong data without doing anything obviously wrong ! ACM Sigplan Notices 44, 3 ( 2009 ), 265-276.Google ScholarGoogle Scholar
  27. Esko Nuutila and Eljas Soisalon-Soininen. 1993. On finding the strong components in a directed graph. Helsingin Teknillinen Korkeakoulu. Tietojenkäsittelytekniikan Laitos.Google ScholarGoogle Scholar
  28. Maksim Panchenko, Rafael Auler, Bill Nell, and Guilherme Ottoni. 2019. Bolt: a practical binary optimizer for data centers and beyond. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Press, 2-14.Google ScholarGoogle ScholarCross RefCross Ref
  29. David J Pearce, Paul HJ Kelly, and Chris Hankin. 2007. Eficient field-sensitive pointer analysis of C. ACM Transactions on Programming Languages and Systems (TOPLAS) 30, 1 ( 2007 ), 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. François Pottier and Yann Régis-Gianas. 2016. Menhir reference manual. Inria, Aug ( 2016 ).Google ScholarGoogle Scholar
  31. Gabriel Scherer. 2015. Native compiler for oCaml on System Z. https://inbox.ocaml.org/caml-list/ CAPFanBEAN6BA2PhMJ00ybUZV[email protected]/ Accessed: 2020-02-14.Google ScholarGoogle Scholar
  32. Benjamin Schwarz, Saumya Debray, Gregory Andrews, and Matthew Legendre. 2001. Plto: A link-time optimizer for the Intel IA-32 architecture. In Proc. 2001 Workshop on Binary Translation (WBT-2001).Google ScholarGoogle Scholar
  33. Brandon Simmons. 2019. GHC LLVM LTO Experiments Scratch Notes. http://brandon.si/code/ghc-llvm-lto-experimentsscratch-notes/ Accessed: 2020-02-17.Google ScholarGoogle Scholar
  34. KC Sivaramakrishnan, Stephen Dolan, Leo White, Sadiq Jafer, Tom Kelly, Anmol Sahoo, Sudha Parimala, Atul Dhiman, and Anil Madhavapeddy. 2020. Retrofitting Parallelism onto OCaml. ICFP ( 2020 ).Google ScholarGoogle Scholar
  35. Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 32-41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yulei Sui, Xiaokang Fan, Hao Zhou, and Jingling Xue. 2018. Loop-oriented pointer analysis for automatic simd vectorization. ACM Transactions on Embedded Computing Systems (TECS) 17, 2 ( 2018 ), 56.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benin. 2018. A transprecision floating-point platform for ultra-low power computing. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1051-1056.Google ScholarGoogle Scholar
  38. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM journal on computing 1, 2 ( 1972 ), 146-160.Google ScholarGoogle Scholar
  39. David A Terei and Manuel MT Chakravarty. 2010. An LLVM backend for GHC. In ACM Sigplan Notices, Vol. 45. ACM, 109-120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jérôme Vouillon and Vincent Balat. 2014. From bytecode to JavaScript: the Js_of_ocaml compiler. Software: Practice and Experience 44, 8 ( 2014 ), 951-972.Google ScholarGoogle Scholar
  41. Stephen Weeks. 2006. Whole-program compilation in MLton. ML 6 ( 2006 ), 1-1.Google ScholarGoogle Scholar
  42. Mark N Wegman and F Kenneth Zadeck. 1991. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 2 ( 1991 ), 181-210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Reinhard Wilhelm, Mooly Sagiv, and Thomas Reps. 2000. Shape analysis. In International Conference on Compiler Construction. Springer, 1-17.Google ScholarGoogle ScholarCross RefCross Ref
  44. Jeremy Yallop, David Sheets, and Anil Madhavapeddy. 2016. Declarative foreign function binding through generic programming. In International Symposium on Functional and Logic Programming. Springer, 198-214.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Duplo: a framework for OCaml post-link optimisation

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)106
          • Downloads (Last 6 weeks)11

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!