ABSTRACT
In the light of evidence that Haskell programs compiled by GHC exhibit large numbers of mispredicted branches on modern processors, we re-examine the "tagless" aspect of the STG-machine that GHC uses as its evaluation model.
We propose two tagging strategies: a simple strategy called semi-tagging that seeks to avoid one common source of unpredictable indirect jumps, and a more complex strategy called dynamic pointer-tagging that uses the spare low bits in a pointer to encode information about the pointed-to object. Both of these strategies have been implemented and exhaustively measured in the context of a production compiler, GHC, and the paper contains detailed descriptions of the implementations. Our measurements demonstrate significant performance improvements (14% for dynamic pointer-tagging with only a 2% increase in code size), and we further demonstrate that much of the improvement can be attributed to the elimination of mispredicted branch instructions.
As part of our investigations we also discovered that one optimisation in the STG-machine, vectored-returns, is no longer worthwhile and we explain why.
- Urban Boquist. Code Optimisation Techniques for Lazy Functional Languages. PhD thesis, Chalmers University of Technology, April 1999. URL http://www.cs.chalmers.se/~boquist/phd/phd.ps.Google Scholar
- Agner Fog. The microarchitecture of Intel and AMD CPUs: An optimization guide for assembly programmers and compiler makers. online manual, 2006. http://www.agner.org/optimize/microarchitecture.pdf.Google Scholar
- Kevin Hammond. The spineless tagless G-machine - NOT. unpublished, 1993. citeseer.ist.psu.edu/hammond93spineless.html.Google Scholar
- Richard A. Kelsey and Jonathan A. Rees. A tractable scheme implementation. Lisp and Symbolic Computation, 7 (4): 315--335, 1994. http://repository.readscheme.org/ftp/papers/vlisp-lasc/scheme48.ps.gz. Google Scholar
Digital Library
- Robert A. MacLachlan. Design of CMU Common Lisp. online manual, 2003. http://common-lisp.net/project/cmucl/doc/CMUCL-design.pdf.Google Scholar
- Simon Marlow and Simon Peyton Jones. Making a fast curry: Push/enter vs. eval/apply for higher-order languages. In ACM SIGPLAN International Conference on Functional Programming (ICFP'04), pages 4--15, Snowbird, Utah, September 2004. ACM. Google Scholar
Digital Library
- Nicholas Nethercote and Alan Mycroft. Redux: A dynamic dataflow tracer. Electr. Notes Theor. Comput. Sci., 89 (2), 2003.Google Scholar
- Will D. Partain. The nofib benchmark suite of Haskell programs. In John Launchbury and Patrick M. Sansom, editors, Functional Programming, Glasgow 1992, Workshops in Computing, pages 195--202. Springer Verlag, 1992. Google Scholar
Digital Library
- Simon Peyton Jones, Norman Ramsey, and Fermin Reig. C--: a portable assembly language that supports garbage collection. In Gopalan Nadathur, editor, International Conference on Principles and Practice of Declarative Programming, number 1702 in Lecture Notes in Computer Science, pages 1--28, Berlin, September 1999. Springer. Google Scholar
Digital Library
- Simon L. Peyton Jones. Implementing lazy functional languages on stock hardware: The spineless tagless G-machine. Journal of Functional Programming, 2 (2): 127--202, April 1992.Google Scholar
Cross Ref
- Patrick M. Sansom and Simon L. Peyton Jones. Generational garbage collection for haskell. In Functional Programming Languages and Computer Architecture, pages 106--116, 1993. citeseer.ist.psu.edu/sansom93generational.html. Google Scholar
Digital Library
- Guy Lewis Steele. Data representation in PDP-10 MACLISP. Technical Report AI Lab Memo AIM-420, MIT AI Lab, 1977.Google Scholar
Index Terms
Faster laziness using dynamic pointer tagging
Recommendations
Faster laziness using dynamic pointer tagging
Proceedings of the ICFP '07 conferenceIn the light of evidence that Haskell programs compiled by GHC exhibit large numbers of mispredicted branches on modern processors, we re-examine the "tagless" aspect of the STG-machine that GHC uses as its evaluation model.
We propose two tagging ...
Reaching fast code faster: using modeling for efficient software thread integration on a VLIW DSP
CASES '06: Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systemsWhen integrating software threads together to boost performance on a processor with instruction-level parallel processing support, it is rarely clear which code regions should be aligned and integrated, and which regions should be left alone. This ...
Faster than C#: efficient implementation of dynamic languages on .NET
ICOOOLPS '09: Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming SystemsThe Common Language Infrastructure (CLI) is a virtual machine expressly designed for implementing statically typed languages such as C#, therefore programs written in dynamically typed languages are typically much slower than C# when executed on .NET.
...







Comments