skip to main content
article

Towards compositional and generative tensor optimizations

Published:23 October 2017Publication History
Skip Abstract Section

Abstract

Many numerical algorithms are naturally expressed as operations on tensors (i.e. multi-dimensional arrays). Hence, tensor expressions occur in a wide range of application domains, e.g. quantum chemistry and physics; big data analysis and machine learning; and computational fluid dynamics. Each domain, typically, has developed its own strategies for efficiently generating optimized code, supported by tools such as domain-specific languages, compilers, and libraries. However, strategies and tools are rarely portable between domains, and generic solutions typically act as ''black boxes'' that offer little control over code generation and optimization. As a consequence, there are application domains without adequate support for easily generating optimized code, e.g. computational fluid dynamics. In this paper we propose a generic and easily extensible intermediate language for expressing tensor computations and code transformations in a modular and generative fashion. Beyond being an intermediate language, our solution also offers meta-programming capabilities for experts in code optimization. While applications from the domain of computational fluid dynamics serve to illustrate our proposed solution, we believe that our general approach can help unify research in tensor optimizations and make solutions more portable between domains.

References

  1. 2017. NumPy, package for scientific computing with Python. http: //www.numpy.org/ . (2017).Google ScholarGoogle Scholar
  2. 2017. XLA: Accelerated Linear Algebra. https://www.tensorflow.org/ performance/xla/ . (2017).Google ScholarGoogle Scholar
  3. 2017. Xtensor, Multi-dimensional arrays with broadcasting and lazy computing. https://github.com/QuantStack/xtensor . (2017).Google ScholarGoogle Scholar
  4. Martín Abadi and Ashish Agarwal et al. 2015. TensorFlow: LargeScale Machine Learning on Heterogeneous Distributed Systems. http://download.tensorflow.org/paper/whitepaper2015.pdf. (2015).Google ScholarGoogle Scholar
  5. Lénaïc Bagnères, Oleksandr Zinenko, Stéphane Huot, and Cédric Bastoul. 2016. Opening Polyhedral Compiler’s Black Box. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO ’16). ACM, New York, NY, USA, 128–138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R. J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi chung Lam, Qingda Lu, M. Nooijen, R. M. Pitzer, J. Ramanujam, P. Sadayappan, and A. Sibiryakov. 2005. Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models. Proc. IEEE 93, 2 (Feb 2005), 276–292. Google ScholarGoogle ScholarCross RefCross Ref
  7. James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David WardeFarley, and Yoshua Bengio. 2010. Theano: a CPU and GPU Math Expression Compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy).Google ScholarGoogle ScholarCross RefCross Ref
  8. Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. Technical Report. Technical Report 08-897, University of Southern California.Google ScholarGoogle Scholar
  9. Albert Cohen, SÃľbastien Donadio, Maria-Jesus Garzaran, Christoph Herrmann, Oleg Kiselyov, and David Padua. 2006. In search of a program generator to implement generic transformations for highperformance computing. Science of Computer Programming 62, 1 (2006), 25–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Albert Cohen, Sylvain Girbal, and Olivier Temam. 2004. A Polyhedral Approach to Ease the Composition of Program Transformations. Springer Berlin Heidelberg, Berlin, Heidelberg, 292–303. Google ScholarGoogle ScholarCross RefCross Ref
  11. Albert Cohen, Marc Sigler, Sylvain Girbal, Olivier Temam, David Parello, and Nicolas Vasilache. 2005. Facilitating the Search for Compositions of Program Transformations. In Proceedings of the 19th Annual International Conference on Supercomputing (ICS ’05). ACM, New York, NY, USA, 151–160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. O. Deville, P. F. Fischer, and E. H. Mund. 2002. High-Order Methods for Incompressible Fluid Flow. Cambridge University Press, Cambridge. Google ScholarGoogle ScholarCross RefCross Ref
  13. Immo Huismann, Jörg Stiller, and Jochen Fröhlich. 2016. Fast Static Condensation for the Helmholtz Equation in a Spectral-Element Discretization. Springer International Publishing, Cham, 371–380. Google ScholarGoogle ScholarCross RefCross Ref
  14. Immo Huismann, Jörg Stiller, and Jochen Fröhlich. 2017. Factorizing the factorization — a spectral-element solver for elliptic equations with linear operation count. J. Comput. Phys. 346 (2017), 437–448. Google ScholarGoogle ScholarCross RefCross Ref
  15. Khaled Z. Ibrahim, Samuel W. Williams, Evgeny Epifanovsky, and Anna I. Krylov. 2014. Analysis and tuning of libtensor framework on multicore architectures. In 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17-20, 2014. 1–10. Google ScholarGoogle ScholarCross RefCross Ref
  16. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Technical Report. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  17. P. M. W. Knijnenburg, T. Kisuki, and M. F. P. O’Boyle. 2002. Embedded Processor Design Challenges. Springer-Verlag New York, Inc., New York, NY, USA, Chapter Iterative Compilation, 171–187. http://dl.acm. org/citation.cfm?id=765198.765209 Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, B. W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo. 2005. SPIRAL: Code Generation for DSP Transforms. Proc. IEEE 93, 2 (Feb 2005), 232–275. Google ScholarGoogle ScholarCross RefCross Ref
  19. Zu-Qing Qu. 2004. Static Condensation. Springer London, London, 47–70. Google ScholarGoogle ScholarCross RefCross Ref
  20. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’13). ACM, New York, NY, USA, 519–530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. Mcrae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. 2016. Firedrake: Automating the Finite Element Method by Composing Abstractions. ACM Trans. Math. Softw. 43, 3, Article 24 (Dec. 2016), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gabe Rudy, Malik Murtaza Khan, Mary Hall, Chun Chen, and Jacqueline Chame. 2011. A Programming Language Interface to Describe Transformations and Code Generation. Springer Berlin Heidelberg, Berlin, Heidelberg, 136–150. Google ScholarGoogle ScholarCross RefCross Ref
  23. Paul Springer, Tong Su, and Paolo Bientinesi. 2017. HPTT: A Highperformance Tensor Transposition C++ Library. In Proceedings of the 4th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY 2017). ACM, New York, NY, USA, 56–62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: A Functional Data-parallel IR for High-performance GPU Code Generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO ’17). IEEE Press, Piscataway, NJ, USA, 74–85. http://dl.acm.org/citation.cfm?id=3049832.3049841 Google ScholarGoogle ScholarCross RefCross Ref
  25. Adilla Susungi, Albert Cohen, and Claude Tadonki. 2017. More Data Locality for Static Control Programs on NUMA Architectures. In Proceedings of the 7th International Workshop on Polyhedral Compilation Techniques (IMPACT ’17).Google ScholarGoogle Scholar

Index Terms

  1. Towards compositional and generative tensor optimizations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!