Abstract
Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning. OptiML programs are implicitly parallel and can achieve high performance on heterogeneous hardware with no modification required to the source code. For such a DSL-based approach to be tractable at large scales, better tools are required for DSL authors to simplify language creation and parallelization. To address this concern, we introduce Delite, a system designed specifically for DSLs that is both a framework for creating an implicitly parallel DSL as well as a dynamic runtime providing automated targeting to heterogeneous parallel hardware. We show that OptiML running on Delite achieves single-threaded, parallel, and GPU performance superior to explicitly parallelized MATLAB code in nearly all cases.
- High Performance Fortran. http://hpff.rice.edu/index.htm.Google Scholar
- Scala. http://www.scala-lang.org.Google Scholar
- AccelerEyes. Jacket. http://www.accelereyes.com/products/jacket.Google Scholar
- AMD. The Industry-Changing Impact of Accelerated Computing. Website. http://sites.amd.com/us/Documents/AMD_fusion_Whitepaper.pdf.Google Scholar
- O.S. Bagge, K.T. Kalleberg, M. Haveraaen, and E. Visser. Design of the CodeBoost transformation system for domain-specific optimisation of C programs. In Source Code Analysis and Manipulation, 2003. Proceedings. Third IEEE International Workshop on, pages 65--74, Sept. 2003.Google Scholar
Cross Ref
- Guy E. Blelloch. Programming parallel algorithms. Commun. ACM, 39(3):85--97, 1996. Google Scholar
Digital Library
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In PPOPP'95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216, New York, NY, USA, 1995. ACM. Google Scholar
Digital Library
- Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM TRANSACTIONS ON GRAPHICS, 23:777--786, 2004. Google Scholar
Digital Library
- Bryan C. Catanzaro, Armando Fox, Kurt Keutzer, David Patterson, Bor-Yiing Su, Marc Snir, Kunle Olukotun, Pat Hanrahan, and Hassan Chafi. Ubiquitous parallel computing from Berkeley, Illinois, and Stanford. IEEE Micro, 30(2):41--55, 2010. Google Scholar
Digital Library
- Hassan Chafi, Zach DeVito, Adrian Moors, Tiark Rompf, Arvind Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. Language virtualization for heterogeneous parallel computing. In Onward!, 2010.Google Scholar
- B.L. Chamberlain, D. Callahan, and H.P. Zima. Parallel Programmability and the Chapel Language. Int. J. High Perform. Comput. Appl., 21(3):291--312, 2007. Google Scholar
Digital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005. Google Scholar
Digital Library
- Gregory F. Diamos and Sudhakar Yalamanchili. Harmony: an execution model and runtime for heterogeneous many core systems. In HPDC'08: Proceedings of the 17th international symposium on High performance distributed computing, pages 197--200, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- Rickard E. Faith, Lars S. Nyland, and Jan F. Prins. Khepera: A system for rapid implementation of domain specific languages. In In Proceedings USENIX Conference on Domain-Speci Languages, pages 243--255, 1997. Google Scholar
Digital Library
- Samuel Z. Guyer and Calvin Lin. An annotation language for optimizing software libraries. In PLAN'99: Proceedings of the 2nd conference on Domain-specific languages, pages 39--52, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- Klaus Havelund, Michel Ingham, and David Wagner. A case study in DSL development: An experiment with Python and Scala. In The First Annual Scala Workshop at Scala Days 2010, 2010.Google Scholar
- Paul Hudak. Building domain-specific embedded languages. ACM Comput. Surv., page 196. Google Scholar
Digital Library
- Intel. From a Few Cores to Many: A Tera-scale Computing Research Review. Website. http://download.intel.com/research/platform/terascale/terascale_overvie%w_paper.pdf.Google Scholar
- Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys'07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 59--72, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD'09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 987--994, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(3):387, 2005. This provides a current overview of the entire Telescoping Languages Project.Google Scholar
Cross Ref
- Michael D. Linderman, Jamison D. Collins, Hong Wang, and Teresa H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS'08, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- Michael D. McCool, Kevin Wadleigh, Brent Henderson, and Hsin-Ying Lin. Performance evaluation of GPUs using the RapidMind development platform. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 181, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- Erik Meijer, Brian Beckman, and Gavin Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In SIGMOD'06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 706--706, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- Vijay Menon and Keshav Pingali. A case for source-level transformations in MATLAB. In PLAN'99: Proceedings of the 2nd conference on Domain-specific languages, pages 53--65, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- Marjan Mernik, Jan Heering, and Anthony M. Sloane. When and how to develop domain-specific languages. ACM Comput. Surv., 37(4):316--344, 2005. Google Scholar
Digital Library
- NVIDIA. CUDA. http://developer.nvidia.com/object/cuda.html.Google Scholar
- Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Kenneth G. Wilson, and Kunyung Chang. The case for a single-chip multiprocessor. In ASPLOS'96. Google Scholar
Digital Library
- PeakStream. The PeakStream platform: High productivity software development for multi-core processors. technical report, 2006.Google Scholar
- G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst., 4(2):175--187, 1993. Google Scholar
Digital Library
- David Tarditi, Sidd Puri, and Jose Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 325--335, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- The Khronos Group. OpenCL 1.0. http://www.khronos.org/opencl/.Google Scholar
- P. W. Trinder, H.-W. Loidl, and R. F. Pointon. Parallel and distributed Haskells. J. Funct. Program., 12(5):469--510, 2002. Google Scholar
Digital Library
- Arie van Deursen, Paul Klint, and Joost Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35(6):26--36, 2000. Google Scholar
Digital Library
- Perry H. Wang, Jamison D. Collins, Gautham N. Chinya, Hong Jiang, Xinmin Tian, Milind Girkar, Nick Y. Yang, Guei-Yuan Lueh, and Hong Wang. Exochi: architecture and programming environment for a heterogeneous multi-core multithreaded system. In PLDI'07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 156--166, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
Index Terms
A domain-specific approach to heterogeneous parallelism
Recommendations
A domain-specific approach to heterogeneous parallelism
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingExploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be ...
Implementing Domain-Specific Languages for Heterogeneous Parallel Computing
Domain-specific languages offer a solution to the performance and the productivity issues in heterogeneous computing systems. The Delite compiler framework simplifies the process of building embedded parallel DSLs. DSL developers can implement domain-...
A domain-specific language for building self-optimizing AST interpreters
GPCE '14Self-optimizing AST interpreters dynamically adapt to the provided input for faster execution. This adaptation includes initial tests of the input, changes to AST nodes, and insertion of guards that ensure assumptions still hold. Such specialization ...







Comments