skip to main content
research-article

Spiral in scala: towards the systematic construction of generators for performance libraries

Published:27 October 2013Publication History
Skip Abstract Section

Abstract

Program generators for high performance libraries are an appealing solution to the recurring problem of porting and optimizing code with every new processor generation, but only few such generators exist to date. This is due to not only the difficulty of the design, but also of the actual implementation, which often results in an ad-hoc collection of standalone programs and scripts that are hard to extend, maintain, or reuse. In this paper we ask whether and which programming language concepts and features are needed to enable a more systematic construction of such generators. The systematic approach we advocate extrapolates from existing generators: a) describing the problem and algorithmic knowledge using one, or several, domain-specific languages (DSLs), b) expressing optimizations and choices as rewrite rules on DSL programs, c) designing data structures that can be configured to control the type of code that is generated and the data representation used, and d) using autotuning to select the best-performing alternative. As a case study, we implement a small, but representative subset of Spiral in Scala using the Lightweight Modular Staging (LMS) framework. The first main contribution of this paper is the realization of c) using type classes to abstract over staging decisions, i.e. which pieces of a computation are performed immediately and for which pieces code is generated. Specifically, we abstract over different complex data representations jointly with different code representations including generating loops versus unrolled code with scalar replacement - a crucial and usually tedious performance transformation. The second main contribution is to provide full support for a) and d) within the LMS framework: we extend LMS to support translation between different DSLs and autotuning through search.

References

  1. Eigen C++ template library for linear algebra. http://eigen.tuxfamily.org.Google ScholarGoogle Scholar
  2. G. N. W. A. Logg, K.-A. Mardal, editor. Automated Solution of Differential Equations by the Finite Element Method. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Aktemur, Y. Kameyama, O. Kiselyov, and C.-c. Shan. Shonan challenge for generative programming: short position paper. In Proc. Partial evaluation and program manipulation (PEPM), pages 147--154, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Belter, E. R. Jessup, I. Karlin, and J. G. Siek. Automating the generation of composed linear algebra kernels. In SC. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Bilmes, K. Asanović, C. whye Chin, and J. Demmel. Optimizing matrix multiply using PHiPAC: a Portable, High-Performance, ANSI C coding methodology. In Proc. Int'l Conference on Supercomputing (ICS), pages 340--347, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser. Stratego/xt 0.17. a language and toolset for program transformation. Sci. Comput. Program., 72(1-2):52--70, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A heterogeneous parallel framework for domain-specific languages. In Proc. Parallel Architectures and Compilation Techniques (PACT), pages 89--100, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing multi-stage languages using ASTs, Gensym, and reflection. In Proc. Generative Programming and Component Engineering (GPCE), pages 57--76, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language virtualization for heterogeneous parallel computing. In Proc. Int'l conference on object oriented programming systems languages and applications (OOPSLA), pages 835--847, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Click and K. D. Cooper. Combining analyses, combining optimizations. ACM Trans. Program. Lang. Syst., 17:181--196, March 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Cohen, S. Donadio, M. J. Garzarán, C. A. Herrmann, O. Kiselyov, and D. A. Padua. In search of a program generator to implement generic transformations for high-performance computing. Sci. Comput. Program., 62(1):25--46, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Franchetti, Y. Voronenko, and M. Püschel. Formal loop merging for signal transforms. In Programming Languages Design and Implementation (PLDI), pages 315--326, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Franchetti, Y. Voronenko, and M. Püschel. FFT program generation for shared memory: SMP and multicore. In Supercomputing (SC), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Franchetti, Y. Voronenko, and M. Püschel. A rewriting system for the vectorization of signal transforms. In High Performance Computing for Computational Science (VECPAR), volume 4395 of Lecture Notes in Computer Science, pages 363--377. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Frigo. A fast Fourier transform compiler. In Proc. Programming Language Design and Implementation (PLDI), pages 169--180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. A. Gunnels, F. G. Gustavson, G. M. Henry, and R. A. van de Geijn. FLAME: Formal linear algebra methods environment. ACM Trans. on Mathematical Software, 27(4):422--455, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. JetBrains. Meta Programming System, 2009.Google ScholarGoogle Scholar
  18. N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Karmesin, J. Crotinger, J. Cummings, S. Haney, W. Humphrey, J. Reynders, S. Smith, and T. J.Williams. Array design and expression evaluation in POOMA II. In ISCOPE, pages 231--238, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. C. L. Kats and E. Visser. The Spoofax language workbench. rules for declarative specification of languages and IDEs. In SPLASH/OOPSLA Companion, pages 237--238, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Kiselyov, K. N. Swadi, and W. Taha. A methodology for generating verified combinatorial circuits. In G. C. Buttazzo, editor, EMSOFT, pages 249--258. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Kiselyov and W. Taha. Relating FFTW and split-radix. In Z. Wu, C. Chen, M. Guo, and J. Bu, editors, ICESS, volume 3605 of Lecture Notes in Computer Science, pages 488--493. Springer, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Lee, K. J. Brown, A. K. Sujeeth, H. Chafi, T. Rompf, M. Odersky, and K. Olukotun. Implementing domain-specific languages for heterogeneous parallel computing. IEEE Micro, 31(5):42--53, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Mattingley and S. Boyd. CVXGEN: A code generator for embedded convex optimization. Optimization and Engineering, 13(1):1--27, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  25. U. Norell and P. Jansson. Polytypic programming in Haskell. In P. W. Trinder, G. Michaelson, and R. Pena, editors, IFL, volume 3145 of Lecture Notes in Computer Science, pages 168--184. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Püschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE, special issue on "Program Generation, Optimization, and Adaptation", 93(2):232-- 275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  27. M. Püschel, B. Singer, M. Veloso, and J. M. F. Moura. Fast automatic generation of DSP algorithms. In International Conference on Computational Science (ICCS), volume 2073 of Lecture Notes In Computer Science, pages 97--106. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Rompf. Lightweight Modular Staging and Embedded Compilers: Abstraction Without Regret for High-Level High-Performance Programming. PhD thesis, EPFL, 2012.Google ScholarGoogle Scholar
  29. T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-virtualized: Linguistic reuse for deep embeddings. In Higher-Order and Symbolic Computation (Special issue for PEPM'12, to appear).Google ScholarGoogle Scholar
  30. T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled dsls. Commun. ACM, 55(6):121--130, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. Rompf, A. K. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs. In Proc. Principles of programming languages (POPL), pages 497--510, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Rompf, A. K. Sujeeth, H. Lee, K. J. Brown, H. Chafi, M. Odersky, and K. Olukotun. Building-blocks for performance oriented DSLs. DSL, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  33. A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun. OptiML: an implicitly parallel domain-specific language for machine learning. In Proceedings of the 28th International Conference on Machine Learning, ICML, 2011.Google ScholarGoogle Scholar
  34. W. Taha and T. Sheard. Metaml and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. In Programming language design and implementation (PLDI), PLDI '11, pages 132--141, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. L. Veldhuizen. Expression templates, C++ gems. SIGS Publications, Inc., New York, NY, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. L. Veldhuizen. Arrays in blitz++. In ISCOPE, pages 223--230, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Voronenko, F. de Mesmay, and M. Püschel. Computer generation of general size linear transform libraries. In International Symposium on Code Generation and Optimization (CGO), pages 102--113, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Vuduc, J. W. Demmel, and K. A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, volume 16 of Journal of Physics: Conference Series, pages 521--530, 2005.Google ScholarGoogle Scholar
  40. P. Wadler and S. Blott. How to make ad-hoc polymorphism less adhoc. In POPL, pages 60--76, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Whaley, A. Petitet, and J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3--35, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Xiong, J. Johnson, R. W. Johnson, and D. Padua. SPL: A language and compiler for DSP algorithms. In Programming Languages Design and Implementation (PLDI), pages 298--308, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Spiral in scala: towards the systematic construction of generators for performance libraries

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!