Abstract
Program generators for high performance libraries are an appealing solution to the recurring problem of porting and optimizing code with every new processor generation, but only few such generators exist to date. This is due to not only the difficulty of the design, but also of the actual implementation, which often results in an ad-hoc collection of standalone programs and scripts that are hard to extend, maintain, or reuse. In this paper we ask whether and which programming language concepts and features are needed to enable a more systematic construction of such generators. The systematic approach we advocate extrapolates from existing generators: a) describing the problem and algorithmic knowledge using one, or several, domain-specific languages (DSLs), b) expressing optimizations and choices as rewrite rules on DSL programs, c) designing data structures that can be configured to control the type of code that is generated and the data representation used, and d) using autotuning to select the best-performing alternative. As a case study, we implement a small, but representative subset of Spiral in Scala using the Lightweight Modular Staging (LMS) framework. The first main contribution of this paper is the realization of c) using type classes to abstract over staging decisions, i.e. which pieces of a computation are performed immediately and for which pieces code is generated. Specifically, we abstract over different complex data representations jointly with different code representations including generating loops versus unrolled code with scalar replacement - a crucial and usually tedious performance transformation. The second main contribution is to provide full support for a) and d) within the LMS framework: we extend LMS to support translation between different DSLs and autotuning through search.
- Eigen C++ template library for linear algebra. http://eigen.tuxfamily.org.Google Scholar
- G. N. W. A. Logg, K.-A. Mardal, editor. Automated Solution of Differential Equations by the Finite Element Method. Springer, 2012. Google Scholar
Digital Library
- B. Aktemur, Y. Kameyama, O. Kiselyov, and C.-c. Shan. Shonan challenge for generative programming: short position paper. In Proc. Partial evaluation and program manipulation (PEPM), pages 147--154, 2013. Google Scholar
Digital Library
- G. Belter, E. R. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In SC. ACM, 2009. Google Scholar
Digital Library
- J. Bilmes, K. Asanović, C. whye Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proc. Int'l Conference on Supercomputing (ICS), pages 340--347, 1997. Google Scholar
Digital Library
- M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser. Stratego/xt 0.17. a language and toolset for program transformation. Sci. Comput. Program., 72(1-2):52--70, 2008. Google Scholar
Digital Library
- K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A heterogeneous parallel framework for domain-specific languages. In Proc. Parallel Architectures and Compilation Techniques (PACT), pages 89--100, 2011. Google Scholar
Digital Library
- C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing multi-stage languages using ASTs, Gensym, and reflection. In Proc. Generative Programming and Component Engineering (GPCE), pages 57--76, 2003. Google Scholar
Digital Library
- H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In Proc. Int'l conference on object oriented programming systems languages and applications (OOPSLA), pages 835--847, 2010. Google Scholar
Digital Library
- C. Click and K. D. Cooper. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst., 17:181--196, March 1995. Google Scholar
Digital Library
- A. Cohen, S. Donadio, M. J. Garzarán, C. A. Herrmann, O. Kiselyov, and D. A. Padua. In search of a program generator to implement generic transformations for high-performance computing. Sci. Comput. Program., 62(1):25--46, 2006. Google Scholar
Digital Library
- F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In Programming Languages Design and Implementation (PLDI), pages 315--326, 2005. Google Scholar
Digital Library
- F. Franchetti, Y. Voronenko, and M. Püschel. FFT program generation for shared memory: SMP and multicore. In Supercomputing (SC), 2006. Google Scholar
Digital Library
- F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006. Google Scholar
Digital Library
- M. Frigo. A fast Fourier transform compiler. In Proc. Programming Language Design and Implementation (PLDI), pages 169--180, 1999. Google Scholar
Digital Library
- J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. FLAME: Formal linear algebra methods environment. ACM Trans. on Mathematical Software, 27(4):422--455, 2001. Google Scholar
Digital Library
- JetBrains. Meta Programming System, 2009.Google Scholar
- N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google Scholar
Digital Library
- S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W. Humphrey, J. Reynders, S. Smith, and T. J.Williams. Array design and expression evaluation in POOMA II. In ISCOPE, pages 231--238, 1998. Google Scholar
Digital Library
- L. C. L. Kats and E. Visser. The Spoofax language workbench. rules for declarative specification of languages and IDEs. In SPLASH/OOPSLA Companion, pages 237--238, 2010. Google Scholar
Digital Library
- O. Kiselyov, K. N. Swadi, and W. Taha. A methodology for generating verified combinatorial circuits. In G. C. Buttazzo, editor, EMSOFT, pages 249--258. ACM, 2004. Google Scholar
Digital Library
- O. Kiselyov and W. Taha. Relating FFTW and split-radix. In Z. Wu, C. Chen, M. Guo, and J. Bu, editors, ICESS, volume 3605 of Lecture Notes in Computer Science, pages 488--493. Springer, 2004. Google Scholar
Digital Library
- H. Lee, K. J. Brown, A. K. Sujeeth, H. Chafi, T. Rompf, M. Odersky, and K. Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31(5):42--53, 2011. Google Scholar
Digital Library
- J. Mattingley and S. Boyd. CVXGEN: A code generator for embedded convex optimization. Optimization and Engineering, 13(1):1--27, 2012.Google Scholar
Cross Ref
- U. Norell and P. Jansson. Polytypic programming in Haskell. In P. W. Trinder, G. Michaelson, and R. Pena, editors, IFL, volume 3145 of Lecture Notes in Computer Science, pages 168--184. Springer, 2003. Google Scholar
Digital Library
- M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232-- 275, 2005.Google Scholar
Cross Ref
- M. Püschel, B. Singer, M. Veloso, and J. M. F. Moura. Fast automatic generation of DSP algorithms. In International Conference on Computational Science (ICCS), volume 2073 of Lecture Notes In Computer Science, pages 97--106. Springer, 2001. Google Scholar
Digital Library
- T. Rompf. Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. PhD thesis, EPFL, 2012.Google Scholar
- T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-virtualized: Linguistic reuse for deep embeddings. In Higher-Order and Symbolic Computation (Special issue for PEPM'12, to appear).Google Scholar
- T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55(6):121--130, 2012. Google Scholar
Digital Library
- T. Rompf, A. K. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs. In Proc. Principles of programming languages (POPL), pages 497--510, 2013. Google Scholar
Digital Library
- T. Rompf, A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Odersky, and K. Olukotun. Building-blocks for performance oriented DSLs. DSL, 2011.Google Scholar
Cross Ref
- A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun. OptiML: an implicitly parallel domain-specific language for machine learning. In Proceedings of the 28th International Conference on Machine Learning, ICML, 2011.Google Scholar
- W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000. Google Scholar
Digital Library
- S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. In Programming language design and implementation (PLDI), PLDI '11, pages 132--141, 2011. Google Scholar
Digital Library
- T. L. Veldhuizen. Expression templates, C++ gems. SIGS Publications, Inc., New York, NY, 1996. Google Scholar
Digital Library
- T. L. Veldhuizen. Arrays in blitz++. In ISCOPE, pages 223--230, 1998. Google Scholar
Digital Library
- Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In International Symposium on Code Generation and Optimization (CGO), pages 102--113, 2009. Google Scholar
Digital Library
- R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, volume 16 of Journal of Physics: Conference Series, pages 521--530, 2005.Google Scholar
- P. Wadler and S. Blott. How to make ad-hoc polymorphism less adhoc. In POPL, pages 60--76, 1989. Google Scholar
Digital Library
- R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.Google Scholar
Digital Library
- J. Xiong, J. Johnson, R. W. Johnson, and D. Padua. SPL: A language and compiler for DSP algorithms. In Programming Languages Design and Implementation (PLDI), pages 298--308, 2001. Google Scholar
Digital Library
Index Terms
Spiral in scala: towards the systematic construction of generators for performance libraries
Recommendations
Spiral in scala: towards the systematic construction of generators for performance libraries
GPCE '13: Proceedings of the 12th international conference on Generative programming: concepts & experiencesProgram generators for high performance libraries are an appealing solution to the recurring problem of porting and optimizing code with every new processor generation, but only few such generators exist to date. This is due to not only the difficulty ...
Making collection operations optimal with aggressive JIT compilation
SCALA 2017: Proceedings of the 8th ACM SIGPLAN International Symposium on ScalaFunctional collection combinators are a neat and widely accepted data processing abstraction. However, their generic nature results in high abstraction overheads -- Scala collections are known to be notoriously slow for typical tasks. We show that ...
Escape analysis in the context of dynamic compilation and deoptimization
VEE '05: Proceedings of the 1st ACM/USENIX international conference on Virtual execution environmentsIn object-oriented programming languages, an object is said to escape the method or thread in which it was created if it can also be accessed by other methods or threads. Knowing which objects do not escape allows a compiler to perform aggressive ...







Comments