skip to main content
research-article

PetaBricks: a language and compiler for algorithmic choice

Authors Info & Claims
Published:15 June 2009Publication History
Skip Abstract Section

Abstract

It is often impossible to obtain a one-size-fits-all solution for high performance algorithms when considering different choices for data distributions, parallelism, transformations, and blocking. The best solution to these choices is often tightly coupled to different architectures, problem sizes, data, and available system resources. In some cases, completely different algorithms may provide the best performance. Current compiler and programming language techniques are able to change some of these parameters, but today there is no simple way for the programmer to express or the compiler to choose different algorithms to handle different parts of the data. Existing solutions normally can handle only coarse-grained, library level selections or hand coded cutoffs between base cases and recursive cases.

We present PetaBricks, a new implicitly parallel language and compiler where having multiple implementations of multiple algorithms to solve a problem is the natural way of programming. We make algorithmic choice a first class construct of the language. Choices are provided in a way that also allows our compiler to tune at a finer granularity. The PetaBricks compiler autotunes programs by making both fine-grained as well as algorithmic choices. Choices also include different automatic parallelization techniques, data distributions, algorithmic parameters, transformations, and blocking.

Additionally, we introduce novel techniques to autotune algorithms for different convergence criteria. When choosing between various direct and iterative methods, the PetaBricks compiler is able to tune a program in such a way that delivers near-optimal efficiency for any desired level of accuracy. The compiler has the flexibility of utilizing different convergence criteria for the various components within a single algorithm, providing the user with accuracy choice alongside algorithmic choice.

References

  1. Ayaz Ali, Lennart Johnsson, and Jaspal Subhlok. Scheduling FFT computation on SMP and multicore systems. In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 293---301, New York, NY, USA, 2007. ACM. ISBN 978-1--59593-768-1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ed Anderson, Zhaojun Bai, Christian Bischof, Susan Blackford, James Demmel, Jack Dongarra, Jeremy Du Croz, Anne Greenbaum, Sven Hammarling, A. McKenney, and Danny Sorensen. LAPACK Users' Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, third edition, 1999. ISBN 0-89871-447-8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, and Jim Demmel. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 340--347, New York, NY, USA, 1997. ACM. ISBN 0-89791-902-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eric A. Brewer. High-level optimization via automated statistical modeling. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 80--91, New York, NY, USA, 1995. ACM. ISBN 0-89791-701-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. James W. Demmel. Applied Numerical Linear Algebra. SIAM, August 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Matteo Frigo and Steven G. Johnson. The design and implementation of FFTW3. Proceedings of the IEEE, 93(2):216--231, February 2005. Invited paper, special issue on Program Generation, Optimization, and Platform Adaptation.Google ScholarGoogle ScholarCross RefCross Ref
  7. Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, volume 3, pages 1381--1384. IEEE, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  8. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk--5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Quebec, Canada, Jun 1998. Proceedings published ACM SIGPLAN Notices, Vol. 33, No. 5, May, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John A. Gunnels, Fred G. Gustavson, Greg M. Henry, and Robert A. van de Geijn. FLAME: Formal Linear Algebra Methods Environment. ACM Transactions on Mathematical Software, 27(4):422--455, December 2001. ISSN 0098--3500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Eun-jin Im and Katherine Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. In Proceedings of the International Conference on Computational Science, pages 127--136. Springer, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michail G. Lagoudakis and Michael L. Littman. Algorithm selection using reinforcement learning. In Proceedings of the International Conference On Machine Learning, pages 511--518. Morgan Kaufmann, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Xiaoming Li, Maria Jesus Garzaran, and David Padua. A dynamically tuned sorting library. In Proceedings of the International Symposium on Code Generation and Optimization, pages 111--122, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Xiaoming Li, Mara Jess Garzarn, and David Padua. Optimizing sorting with genetic algorithms. In Proceedings of the International Symposium on Code Generation and Optimization, pages 99--110. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Marek Olszewski and Michael Voss. Install-time system for automatic generation of optimized parallel sorting algorithms. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, pages 17--23, 2004.Google ScholarGoogle Scholar
  15. Justin Mazzola Paluska, Hubert Pham, Umar Saif, Grace Chau, Chris Terman, and Steve Ward. Structured decomposition of adaptive applications. In Proceedings of the Annual IEEE International Conference on Pervasive Computing and Communications, pages 1--10, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 978--0--7695--3113--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Markus Puschel, Jose M. F. Moura, Jeremy R. Johnson, David Padua, Manuela M. Veloso, Bryan W. Singer, Jianxin Xiong, Aca Gacic, Franz Franchetti, Robbert W. Johnson Yevgen Voronenko, Kang Chen, and Nicholas Rizzolo. SPIRAL: Code generation for DSP transforms. In Proceedings of the IEEE, volume 93, pages 232--275. IEEE, Feb 2005.Google ScholarGoogle ScholarCross RefCross Ref
  17. Richard H. Rand. Computer algebra in applied mathematics: an introduction to MACSYMA. Number 94 in Research notes in mathematics. 1984. ISBN 0-273-08632-4.Google ScholarGoogle Scholar
  18. Nathan Thomas, Gabriel Tanase, Olga Tkachyshyn, Jack Perdue, Nancy M. Amato, and Lawrence Rauchwerger. A framework for adaptive algorithm selection in STAPL. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 277--288, New York, NY, USA, 2005. ACM. ISBN 1-59593-080-9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael Voss and Rudolf Eigenmann. Adapt: Automated de-coupled adaptive program transformation. In Proceedings of the International Conference on Parallel Processing, pages 163--170, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Voss and Rudolf Eigenmann. High-level adaptive program optimization with adapt. ACM SIGPLAN Notices, 36(7):93--102, 2001. ISSN 0362--1340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Richard Vuduc, James W. Demmel, and Jeff A. Bilmes. Statistical models for empirical search-based performance tuning. International Journal ofGoogle ScholarGoogle Scholar
  22. High Performance Computing Applications, 18(1):65--94, 2004. ISSN 1094--3420.Google ScholarGoogle Scholar
  23. Richard Vuduc, James W. Demmel, and Katherine A. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proceedings of the Scientific Discovery through Advanced Computing Conference, Journal of Physics: Conference Series, San Francisco, CA, USA, June 2005. Institute of Physics Publishing.Google ScholarGoogle Scholar
  24. Richard Clint Whaley and Jack J. Dongarra. Automatically tuned linear algebra software. In Proceedings of the ACM/IEEE Conference on Supercomputing, pages 1--27, Washington, DC, USA, 1998. IEEE Computer Society. ISBN 0-89791-984-X. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Richard Clint Whaley and Antoine Petitet. Minimizing development and maintenance costs in supporting persistently optimized BLAS. Software: Practice and Experience, 35(2):101--121, February 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Samuel Webb Williams, Andrew Waterman, and David A. Patterson. Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Technical Report UCB/EECS--2008--134, EECS Department, University of California, Berkeley, Oct 2008.Google ScholarGoogle Scholar
  27. Jianxin Xiong, Jeremy Johnson, Robert Johnson, and David Padua. SPL: a language and compiler for DSP algorithms. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 298--308, New York, NY, USA, 2001. ACM. ISBN 1-58113-414-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Qing Yi and Richard Clint Whaley. Automated transformation for performance-critical kernels. In Proceedings of the ACM SIGPLAN Symposium on Library-Centric Software Design, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibulskis, Gerald DeJong, Maria Garzaran, David Padua, Keshav Pingali, Paul Stodghill, and Peng Wu. A comparison of empirical and model-driven optimization. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 63--76, New York, NY, USA, 2003. ACM. ISBN 1-58113-662-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hao Yu, Dongmin Zhang, and Lawrence Rauchwerger. An adaptive algorithm selection framework. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, pages 278--289, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0-7695-2229-7. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PetaBricks: a language and compiler for algorithmic choice

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 44, Issue 6
            PLDI '09
            June 2009
            478 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1543135
            Issue’s Table of Contents
            • cover image ACM Conferences
              PLDI '09: Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation
              June 2009
              492 pages
              ISBN:9781605583921
              DOI:10.1145/1542476

            Copyright © 2009 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 June 2009

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!