skip to main content
research-article

A domain-specific approach to heterogeneous parallelism

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

Exploiting heterogeneous parallel hardware currently requires mapping application code to multiple disparate programming models. Unfortunately, general-purpose programming models available today can yield high performance but are too low-level to be accessible to the average programmer. We propose leveraging domain-specific languages (DSLs) to map high-level application code to heterogeneous devices. To demonstrate the potential of this approach we present OptiML, a DSL for machine learning. OptiML programs are implicitly parallel and can achieve high performance on heterogeneous hardware with no modification required to the source code. For such a DSL-based approach to be tractable at large scales, better tools are required for DSL authors to simplify language creation and parallelization. To address this concern, we introduce Delite, a system designed specifically for DSLs that is both a framework for creating an implicitly parallel DSL as well as a dynamic runtime providing automated targeting to heterogeneous parallel hardware. We show that OptiML running on Delite achieves single-threaded, parallel, and GPU performance superior to explicitly parallelized MATLAB code in nearly all cases.

References

  1. High Performance Fortran. http://hpff.rice.edu/index.htm.Google ScholarGoogle Scholar
  2. Scala. http://www.scala-lang.org.Google ScholarGoogle Scholar
  3. AccelerEyes. Jacket. http://www.accelereyes.com/products/jacket.Google ScholarGoogle Scholar
  4. AMD. The Industry-Changing Impact of Accelerated Computing. Website. http://sites.amd.com/us/Documents/AMD_fusion_Whitepaper.pdf.Google ScholarGoogle Scholar
  5. O.S. Bagge, K.T. Kalleberg, M. Haveraaen, and E. Visser. Design of the CodeBoost transformation system for domain-specific optimisation of C programs. In Source Code Analysis and Manipulation, 2003. Proceedings. Third IEEE International Workshop on, pages 65--74, Sept. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  6. Guy E. Blelloch. Programming parallel algorithms. Commun. ACM, 39(3):85--97, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In PPOPP'95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for GPUs: Stream computing on graphics hardware. ACM TRANSACTIONS ON GRAPHICS, 23:777--786, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bryan C. Catanzaro, Armando Fox, Kurt Keutzer, David Patterson, Bor-Yiing Su, Marc Snir, Kunle Olukotun, Pat Hanrahan, and Hassan Chafi. Ubiquitous parallel computing from Berkeley, Illinois, and Stanford. IEEE Micro, 30(2):41--55, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hassan Chafi, Zach DeVito, Adrian Moors, Tiark Rompf, Arvind Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. Language virtualization for heterogeneous parallel computing. In Onward!, 2010.Google ScholarGoogle Scholar
  11. B.L. Chamberlain, D. Callahan, and H.P. Zima. Parallel Programmability and the Chapel Language. Int. J. High Perform. Comput. Appl., 21(3):291--312, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gregory F. Diamos and Sudhakar Yalamanchili. Harmony: an execution model and runtime for heterogeneous many core systems. In HPDC'08: Proceedings of the 17th international symposium on High performance distributed computing, pages 197--200, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rickard E. Faith, Lars S. Nyland, and Jan F. Prins. Khepera: A system for rapid implementation of domain specific languages. In In Proceedings USENIX Conference on Domain-Speci Languages, pages 243--255, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Samuel Z. Guyer and Calvin Lin. An annotation language for optimizing software libraries. In PLAN'99: Proceedings of the 2nd conference on Domain-specific languages, pages 39--52, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Klaus Havelund, Michel Ingham, and David Wagner. A case study in DSL development: An experiment with Python and Scala. In The First Annual Scala Workshop at Scala Days 2010, 2010.Google ScholarGoogle Scholar
  17. Paul Hudak. Building domain-specific embedded languages. ACM Comput. Surv., page 196. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Intel. From a Few Cores to Many: A Tera-scale Computing Research Review. Website. http://download.intel.com/research/platform/terascale/terascale_overvie%w_paper.pdf.Google ScholarGoogle Scholar
  19. Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys'07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 59--72, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Michael Isard and Yuan Yu. Distributed data-parallel computing using a high-level programming language. In SIGMOD'09: Proceedings of the 35th SIGMOD international conference on Management of data, pages 987--994, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ken Kennedy, Bradley Broom, Arun Chauhan, Rob Fowler, John Garvin, Charles Koelbel, Cheryl McCosh, and John Mellor-Crummey. Telescoping languages: A system for automatic generation of domain languages. Proceedings of the IEEE, 93(3):387, 2005. This provides a current overview of the entire Telescoping Languages Project.Google ScholarGoogle ScholarCross RefCross Ref
  22. Michael D. Linderman, Jamison D. Collins, Hong Wang, and Teresa H. Meng. Merge: a programming model for heterogeneous multi-core systems. In ASPLOS'08, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Michael D. McCool, Kevin Wadleigh, Brent Henderson, and Hsin-Ying Lin. Performance evaluation of GPUs using the RapidMind development platform. In SC'06: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, page 181, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Erik Meijer, Brian Beckman, and Gavin Bierman. LINQ: Reconciling object, relations and XML in the .NET framework. In SIGMOD'06: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pages 706--706, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vijay Menon and Keshav Pingali. A case for source-level transformations in MATLAB. In PLAN'99: Proceedings of the 2nd conference on Domain-specific languages, pages 53--65, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Marjan Mernik, Jan Heering, and Anthony M. Sloane. When and how to develop domain-specific languages. ACM Comput. Surv., 37(4):316--344, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. NVIDIA. CUDA. http://developer.nvidia.com/object/cuda.html.Google ScholarGoogle Scholar
  28. Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Kenneth G. Wilson, and Kunyung Chang. The case for a single-chip multiprocessor. In ASPLOS'96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. PeakStream. The PeakStream platform: High productivity software development for multi-core processors. technical report, 2006.Google ScholarGoogle Scholar
  30. G. C. Sih and E. A. Lee. A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst., 4(2):175--187, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. David Tarditi, Sidd Puri, and Jose Oglesby. Accelerator: using data parallelism to program GPUs for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 325--335, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. The Khronos Group. OpenCL 1.0. http://www.khronos.org/opencl/.Google ScholarGoogle Scholar
  33. P. W. Trinder, H.-W. Loidl, and R. F. Pointon. Parallel and distributed Haskells. J. Funct. Program., 12(5):469--510, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Arie van Deursen, Paul Klint, and Joost Visser. Domain-specific languages: an annotated bibliography. SIGPLAN Not., 35(6):26--36, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Perry H. Wang, Jamison D. Collins, Gautham N. Chinya, Hong Jiang, Xinmin Tian, Milind Girkar, Nick Y. Yang, Guei-Yuan Lueh, and Hong Wang. Exochi: architecture and programming environment for a heterogeneous multi-core multithreaded system. In PLDI'07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 156--166, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A domain-specific approach to heterogeneous parallelism

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 46, Issue 8
          PPoPP '11
          August 2011
          300 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/2038037
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
            February 2011
            326 pages
            ISBN:9781450301190
            DOI:10.1145/1941553
            • General Chair:
            • Calin Cascaval,
            • Program Chair:
            • Pen-Chung Yew

          Copyright © 2011 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 February 2011

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!