skip to main content
article

Meta-programming for cross-domain tensor optimizations

Published:07 April 2020Publication History
Skip Abstract Section

Abstract

Many modern application domains crucially rely on tensor operations. The optimization of programs that operate on tensors poses difficulties that are not adequately addressed by existing languages and tools. Frameworks such as TensorFlow offer good abstractions for tensor operations, but target a specific domain, i.e. machine learning, and their optimization strategies cannot easily be adjusted to other domains. General-purpose optimization tools such as Pluto and existing meta-languages offer more flexibility in applying optimizations but lack abstractions for tensors. This work closes the gap between domain-specific tensor languages and general-purpose optimization tools by proposing the Tensor optimizations Meta-Language (TeML). TeML offers high-level abstractions for both tensor operations and loop transformations, and enables flexible composition of transformations into effective optimization paths. This compositionality is built into TeML's design, as our formal language specification will reveal. We also show that TeML can express tensor computations as comfortably as TensorFlow and that it can reproduce Pluto's optimization paths. Thus, optimized programs generated by TeML execute at least as fast as the corresponding Pluto programs. In addition, TeML enables optimization paths that often allow outperforming Pluto.

References

  1. 2017. NumPy, package for scientific computing with Python. http: //www.numpy.org/ .Google ScholarGoogle Scholar
  2. Martín Abadi and Ashish Agarwal et al. 2015. TensorFlow: LargeScale Machine Learning on Heterogeneous Distributed Systems. http://download.tensorflow.org/paper/whitepaper2015.pdf.Google ScholarGoogle Scholar
  3. Martin S. Alnaes, Anders Logg, Kristian B. Olgaard, Marie E. Rognes, and Garth N. Wells. 2014. Unified Form Language: A Domain-specific Language for Weak Formulations of Partial Differential Equations. ACM Trans. Math. Softw. 40, 2, Article 9 (March 2014), 37 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lénaïc Bagnères, Oleksandr Zinenko, Stéphane Huot, and Cédric Bastoul. 2016. Opening Polyhedral Compiler’s Black Box. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO ’16). ACM, New York, NY, USA, 128–138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cédric Bastoul and Paul Feautrier. 2004. More Legal Transformations for Locality. In Euro-Par 2004 Parallel Processing, 10th International Euro-Par Conference, Pisa, Italy, August 31-September 3, 2004, Proceedings. 272–283.Google ScholarGoogle Scholar
  6. G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi chung Lam, Qingda Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models. Proc. IEEE 93, 2 (Feb 2005), 276–292.Google ScholarGoogle ScholarCross RefCross Ref
  7. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David WardeFarley, and Yoshua Bengio. 2010. Theano: a CPU and GPU Math Expression Compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy).Google ScholarGoogle ScholarCross RefCross Ref
  8. Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. 2008. A Practical Automatic Polyhedral Program Optimization System. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).Google ScholarGoogle Scholar
  9. Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. Technical Report. Technical Report 08-897, University of Southern California.Google ScholarGoogle Scholar
  10. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q. Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: End-to-End Optimization Stack for Deep Learning. CoRR abs/1802.04799 (2018). arXiv: 1802.04799 http://arxiv. org/abs/1802.04799Google ScholarGoogle Scholar
  11. Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: A Parallel DSL for Image Analysis and Visualization. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’12). ACM, New York, NY, USA, 111–120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Albert Cohen, Marc Sigler, Sylvain Girbal, Olivier Temam, David Parello, and Nicolas Vasilache. 2005. Facilitating the Search for Compositions of Program Transformations. In Proceedings of the 19th Annual International Conference on Supercomputing (ICS ’05). ACM, New York, NY, USA, 151–160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Sebastien Donadio, James Brodman, Thomas Roeder, Kamen Yotov, Denis Barthou, Albert Cohen, María Jesús Garzarán, David Padua, and Keshav Pingali. 2006. A Language for the Compact Representation of Multiple Program Versions. Springer Berlin Heidelberg, Berlin, Heidelberg, 136–151.Google ScholarGoogle Scholar
  14. Sylvain Girbal, Nicolas Vasilache, Cédric Bastoul, Albert Cohen, David Parello, Marc Sigler, and Olivier Temam. 2006. Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies. International Journal of Parallel Programming 34, 3 (2006), 261–317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Olfa Haggui, Claude Tadonki, Lionel Lacassagne, Fatma Sayadi, and Bouraoui Ouni. 2018. Harris corner detection on a NUMA manycore. Future Generation Computer Systems (2018).Google ScholarGoogle Scholar
  16. Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin E. Oancea. 2017. Futhark: Purely Functional GPU-programming with Nested Parallelism and In-place Array Updates. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 556–571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Immo Huismann, Jörg Stiller, and Jochen Fröhlich. 2016. Fast Static Condensation for the Helmholtz Equation in a Spectral-Element Discretization. Springer International Publishing, Cham, 371–380.Google ScholarGoogle Scholar
  18. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Andreas Klöckner. 2014. Loo.Py: Transformation-based Code Generation for GPUs and CPUs. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY’14). ACM, New York, NY, USA, Article 82, 6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Fabio Luporini, Ana Lucia Varbanescu, Florian Rathgeber, GheorgheTeodor Bercea, J. Ramanujam, David A. Ham, and Paul H. J. Kelly. 2015. Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly. ACM Trans. Archit. Code Optim. 11, 4, Article 57 (Jan. 2015), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ralph Müller-Pfefferkorn, Wolfgang E. Nagel, and Bernd Trenkler. 2004. Optimizing Cache Access: A Tool for Source-to-Source Transformations and Real-Life Compiler Tests. In Euro-Par 2004 Parallel Processing, Marco Danelutto, Marco Vanneschi, and Domenico Laforenza (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 72–81.Google ScholarGoogle Scholar
  22. M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 (Feb 2005), 232–275.Google ScholarGoogle Scholar
  23. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 519–530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Florian Rathgeber, Graham R. Markall, Lawrence Mitchell, Nicolas Loriant, David A. Ham, Carlo Bertolli, and Paul H. J. Kelly. 2012. PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC ’12). IEEE Computer Society, Washington, DC, USA, 1116–1123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Norman A. Rink. 2018. Modeling of languages for tensor manipulation. CoRR abs/1801.08771 (2018). arXiv: 1801.08771 http://arxiv.org/abs/ 1801.08771Google ScholarGoogle Scholar
  26. Norman A. Rink, Immo Huismann, Adilla Susungi, Jeronimo Castrillon, Jörg Stiller, Jochen Fröhlich, and Claude Tadonki. 2018. CFDlang: Highlevel Code Generation for High-order Methods in Fluid Dynamics. In Proceedings of the Real World Domain Specific Languages Workshop 2018 (RWDSL2018). ACM, New York, NY, USA, Article 5, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sven-Bodo Scholz. 2003. Single Assignment C: Efficient Support for High-level Array Operations in a Functional Setting. J. Funct. Program. 13, 6 (Nov. 2003), 1005–1059. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Daniele G. Spampinato, Diego Fabregat-Traver, Paolo Bientinesi, and Markus Püschel. 2018. Program Generation for Small-scale Linear Algebra Applications. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). ACM, New York, NY, USA, 327–339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Daniele G. Spampinato and Markus Püschel. 2016. A basic linear algebra compiler for structured matrices. In International Symposium on Code Generation and Optimization (CGO). 117–127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Paul Springer and Paolo Bientinesi. 2016. Design of a high-performance GEMM-like Tensor-Tensor Multiplication. CoRR abs/1607.00145 (2016). http://arxiv.org/abs/1607.00145Google ScholarGoogle Scholar
  31. Paul Springer, Aravind Sankaran, and Paolo Bientinesi. 2016. TTC: A Tensor Transposition Compiler for Multiple Architectures. CoRR abs/1607.01249 (2016). http://arxiv.org/abs/1607.01249Google ScholarGoogle Scholar
  32. Michel Steuwer, Christian Fensch, Sam Lindley, and Christophe Dubach. 2015. Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). ACM, New York, NY, USA, 205–217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: A Functional Data-parallel IR for High-performance GPU Code Generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17). IEEE Press, Piscataway, NJ, USA, 74–85. http://dl.acm.org/citation.cfm?id=3049832.3049841 Google ScholarGoogle ScholarCross RefCross Ref
  34. Adilla Susungi, Albert Cohen, and Claude Tadonki. 2017. More Data Locality for Static Control Programs on NUMA Architectures. In Proceedings of the 7th International Workshop on Polyhedral Compilation Techniques (IMPACT ’17).Google ScholarGoogle Scholar
  35. Adilla Susungi, Norman A. Rink, Jerónimo Castrillón, Immo Huismann, Albert Cohen, Claude Tadonki, Jörg Stiller, and Jochen Fröhlich. 2017. Towards Compositional and Generative Tensor Optimizations. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE 2017). ACM, New York, NY, USA, 169–175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Valiev, E.J. Bylaska, N. Govind, K. Kowalski, T.P. Straatsma, H.J.J. Van Dam, D. Wang, J. Nieplocha, E. Apra, T.L. Windus, and W.A. de Jong. 2010. NWChem: A comprehensive and scalable opensource solution for large scale molecular simulations. Computer Physics Communications 181, 9 (2010), 1477 – 1489.Google ScholarGoogle ScholarCross RefCross Ref
  37. Nicolas Vasilache, Albert Cohen, and Louis-Noël Pouchet. 2007. Automatic Correction of Loop Transformations. In 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), Brasov, Romania, September 15-19, 2007. 292–304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor Comprehensions: FrameworkAgnostic High-Performance Machine Learning Abstractions. CoRR abs/1802.04730 (2018). arXiv: 1802.04730 http://arxiv.org/abs/1802. 04730Google ScholarGoogle Scholar
  39. Q. Yi, K. Seymour, H. You, R. Vuduc, and D. Quinlan. 2007. POET: Parameterized Optimizations for Empirical Tuning. In 2007 IEEE International Parallel and Distributed Processing Symposium. 1–8.Google ScholarGoogle Scholar

Index Terms

  1. Meta-programming for cross-domain tensor optimizations

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 53, Issue 9
            GPCE '18
            September 2018
            214 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/3393934
            Issue’s Table of Contents
            • cover image ACM Conferences
              GPCE 2018: Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences
              November 2018
              214 pages
              ISBN:9781450360456
              DOI:10.1145/3278122

            Copyright © 2018 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 April 2020

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!