skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Static stages for heterogeneous programming

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Heterogeneous hardware is central to modern advances in performance and efficiency. Mainstream programming models for heterogeneous architectures, however, sacrifice safety and expressiveness in favor of low-level control over performance details. The interfaces between hardware units consist of verbose, unsafe APIs; hardware-specific languages make it difficult to move code between units; and brittle preprocessor macros complicate the task of specializing general code for efficient accelerated execution. We propose a unified low-level programming model for heterogeneous systems that offers control over performance, safe communication constructs, cross-device code portability, and hygienic metaprogramming for specialization. The language extends constructs from multi-stage programming to separate code for different hardware units, to communicate between them, and to express compile-time code optimization. We introduce static staging, a different take on multi-stage programming that lets the compiler generate all code and communication constructs ahead of time.

To demonstrate our approach, we use static staging to implement BraidGL, a real-time graphics programming language for CPU-GPU systems. Current real-time graphics software in OpenGL uses stringly-typed APIs for communication and unsafe preprocessing to generate specialized GPU code variants. In BraidGL, programmers instead write hybrid CPU-GPU software in a unified language. The compiler statically generates target-specific code and guarantees safe communication between the CPU and the graphics pipeline stages. Example scenes demonstrate the language's productivity advantages: BraidGL eliminates the safety and expressiveness pitfalls of OpenGL and makes common specialization techniques easy to apply. The case study demonstrates how static staging can express core placement and specialization in general heterogeneous programming.

Skip Supplemental Material Section

Supplemental Material

References

  1. Advanced Micro Devices. Mantle Programming Guide and API Reference 1.0. https://www.amd.com/Documents/ Mantle- Programming- Guide- and- API- Reference.pdf .Google ScholarGoogle Scholar
  2. Jason Ansel, Cy P. Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman P. Amarasinghe. 2009. PetaBricks: a language and compiler for algorithmic choice. In ACM Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apple. Metal Shading Language Specification, Version 2.0. https://developer.apple.com/metal/ Metal- Shading- Language- Specification.pdf .Google ScholarGoogle Scholar
  4. Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A Java-compatible and Synthesizable Language for Heterogeneous Architectures. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chad Austin and Dirk Reiners. 2005. Renaissance: A functional shading language. In ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware.Google ScholarGoogle ScholarCross RefCross Ref
  6. Baggers. Varjo: Lisp to GLSL Language Translator. https://github.com/cbaggers/varjo .Google ScholarGoogle Scholar
  7. Alan Bawden. 1999. Quasiquotation in Lisp. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM).Google ScholarGoogle Scholar
  8. Zine-El-Abidine Benaissa, Eugenio Moggi, Walid Taha, and Tim Sheard. 1999. Logical Modalities and Multi-Stage Programming. In Federated Logic Conference (FLoC) Satellite Workshop on Intuitionistic Modal Logics and Applications (IMLA).Google ScholarGoogle Scholar
  9. Tobias Bexelius. GPipe. http://hackage.haskell.org/package/GPipe .Google ScholarGoogle Scholar
  10. Kovas Boguta. Gamma. https://github.com/kovasb/gamma .Google ScholarGoogle Scholar
  11. Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In International Conference on Parallel Architectures and Compilation Techniques (PACT). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Calcagno, E. Moggi, and T. Sheard. 2003a. Closed Types for a Safe Imperative MetaML. Journal of Functional Programming 13, 3 (May 2003), 545–571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cristiano Calcagno, Eugenio Moggi, and Walid Taha. 2004. ML-Like Inference for Classifiers. In European Symposium on Programming (ESOP). Google ScholarGoogle ScholarCross RefCross Ref
  14. Cristiano Calcagno, Walid Taha, Liwen Huang, and Xavier Leroy. 2003b. Implementing Multi-stage Languages Using ASTs, Gensym, and Reflection. In International Conference on Generative Programming and Component Engineering (GPCE). Google ScholarGoogle ScholarCross RefCross Ref
  15. Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A Domainspecific Approach to Heterogeneous Parallelism. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).Google ScholarGoogle Scholar
  16. Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications 21, 3 (2007), 291–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chiyan Chen and Hongwei Xi. 2003. Meta-programming Through Typeful Code Representation. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. James Cheney, Sam Lindley, and Philip Wadler. 2013. A Practical Theory of Language-integrated Query. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rowan Davies and Frank Pfenning. 1996. A Modal Analysis of Staged Computation. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. 2013. Terra: A Multi-stage Language for High-performance Computing. In ACM Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jason Eckhardt, Roumen Kaiabachev, Emir Pasalic, Kedar Swadi, and Walid Taha. 2007. Implicitly Heterogeneous Multi-stage Programming. New Generation Computing 25, 3 (Jan. 2007), 305–336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Conal Elliott. 2004. Programming Graphics Processors Functionally. In Haskell Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nicolas Feltman, Carlo Angiuli, Umut A. Acar, and Kayvon Fatahalian. 2016. Automatically Splitting a Two-Stage Lambda Calculus. In European Symposium on Programming (ESOP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Matthew Flatt. 2002. Composable and Compilable Macros: You Want It When?. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Matthew Flatt. 2016. Binding As Sets of Scopes. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Tim Foley and Pat Hanrahan. 2011. Spark: Modular, Composable Shaders for Graphics Hardware. In SIGGRAPH.Google ScholarGoogle Scholar
  28. Steven E. Ganz, Amr Sabry, and Walid Taha. 2001. Macros As Multi-stage Computations: Type-safe, Generative, Binding Macros in MacroML. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kate Gregory and Ade Miller. 2012. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++. O’Reilly. http://www.gregcons.com/cppamp/Google ScholarGoogle Scholar
  30. Ilya Grigorik, James Simonsen, and Jatinder Mann. High Resolution Time Level 2: W3C Working Draft. https://www.w3. org/TR/hr- time/ .Google ScholarGoogle Scholar
  31. Yuichiro Hanada and Atsushi Igarashi. 2014. On Cross-Stage Persistence in Multi-Stage Programming. In International Symposium on Functional and Logic Programming (FLOPS). Google ScholarGoogle ScholarCross RefCross Ref
  32. Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, and Lingjia Tang. 2015. DjiNN and Tonic: DNN As a Service and Its Implications for Future Warehouse Scale Computers. In International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yong He, Tim Foley, and Kayvon Fatahalian. 2016. A System for Rapid Exploration of Shader Optimization Choices. In SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yong He, Tim Foley, Natalya Tatarchuk, and Kayvon Fatahalian. 2015. A System for Rapid, Automatic Shader Level-of-detail. In SIGGRAPH Asia.Google ScholarGoogle Scholar
  35. Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin Oancea. 2017. Futhark: Purely Functional GP U-programming with Nested Parallelism and In-place Array Updates. In ACM Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Martin Hirzel and Robert Grimm. 2007. Jeannie: Granting Java Native Interface Developers Their Wishes. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lee Howes and Maria Rovatsou. SYCL Specification. https://www.khronos.org/registry/sycl/ .Google ScholarGoogle Scholar
  38. Dean Jackson and Jeff Gilbert. WebGL Specification. https://www.khronos.org/registry/webgl/specs/latest/1.0/ .Google ScholarGoogle Scholar
  39. Ulrik Jørring and William L. Scherlis. 1986. Compilers and Staging Transformations. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In International Symposium on Computer Architecture (ISCA).Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. John Kessenich. An Introduction to SPIR-V: A Khronos-Defined Intermediate Language for Native Representation of Graphical Shaders and Compute Kernels. https://www.khronos.org/registry/spir- v/papers/WhitePaper.pdf .Google ScholarGoogle Scholar
  42. Khronos. Vulkan 1.0.48: A Specification. https://www.khronos.org/registry/vulkan/specs/1.0/pdf/vkspec.pdf .Google ScholarGoogle Scholar
  43. Ik-Soon Kim, Kwangkeun Yi, and Cristiano Calcagno. 2006. A Polymorphic Modal Type System for Lisp-like Multi-staged Languages. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml. In International Symposium on Functional and Logic Programming (FLOPS). Google ScholarGoogle ScholarCross RefCross Ref
  45. Oleg Kiselyov. MetaOCaml – an OCaml dialect for multi-stage programming. http://okmij.org/ftp/ML/MetaOCaml.html .Google ScholarGoogle Scholar
  46. Andreas Klöckner. 2014. Loo.py: Transformation-based Code Generation for GP Us and CP Us. In International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GP U Run-time Code Generation. Parallel Comput. 38, 3 (March 2012), 157–174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Eugene Kohlbecker, Daniel P. Friedman, Matthias Felleisen, and Bruce Duba. 1986. Hygienic Macro Expansion. In ACM Conference on LISP and Functional Programming. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. LambdaCube. LambdaCube 3D. http://lambdacube3d.com .Google ScholarGoogle Scholar
  50. Byeongcheol Lee, Robert Grimm, Martin Hirzel, and Kathryn S. McKinley. 2012. Marco: Safe, Expressive Macros for Any Language. In European conference on Object-Oriented Programming (ECOOP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In IEEE/ACM International Symposium on Microarchitecture (MICRO). Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Geoffrey Mainland. 2012. Explicitly heterogeneous metaprogramming with MetaHaskell. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. David Majda. PEG.js: Parser Generator for JavaScript. http://pegjs.org .Google ScholarGoogle Scholar
  54. Michael McCool, Stefanus Du Toit, Tiberiu Popa, Bryan Chan, and Kevin Moule. 2004. Shader Algebra. In SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Michael McCool, Zheng Qin, and Tiberiu S. Popa. 2002. Shader Metaprogramming. In ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware.Google ScholarGoogle Scholar
  56. Sean McDirmid. Two Lightweight DSLs for Rich UI Programming. http://research.microsoft.com/pubs/191794/ldsl09.pdf .Google ScholarGoogle Scholar
  57. Morgan McGuire. Computer Graphics Archive. http://graphics.cs.williams.edu/data .Google ScholarGoogle Scholar
  58. Microsoft. Direct3D. https://msdn.microsoft.com/en- us/library/windows/desktop/hh309466(v=vs.85).aspx .Google ScholarGoogle Scholar
  59. Eugenio Moggi, Walid Taha, Zine-El-Abidine Benaissa, and Tim Sheard. 1999. An Idealized MetaML: Simpler, and More Expressive. In European Symposium on Programming (ESOP). Google ScholarGoogle ScholarCross RefCross Ref
  60. Tom Murphy, VII, Karl Crary, and Robert Harper. 2007. Type-safe Distributed Programming with ML5. In Conference on Trustworthy Global Computing (TGC).Google ScholarGoogle Scholar
  61. Todd Mytkowicz and Wolfram Schulte. 2014. Waiting for Godot? The Right Language Abstractions for Parallel Programming Should Be Here Soon: The Multicore Transformation. Ubiquity (June 2014), 4:1–4:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Shayan Najd, Sam Lindley, Josef Svenningsson, and Philip Wadler. 2016. Everything Old is New Again: Quoted Domainspecific Languages. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Aleksandar Nanevski and Frank Pfenning. 2005. Staged Computation with Names and Necessity. Journal of Functional Programming (JFP) 15 (Nov. 2005), 893–939. Issue 6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (March 2008), 40–53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. OpenACC. The OpenACC Application Programming Interface. http://www.openacc.org/sites/default/files/OpenACC_2pt5. pdf .Google ScholarGoogle Scholar
  66. Bui Tuong Phong. 1975. Illumination for Computer Generated Pictures. Commun. ACM 18, 6 (June 1975), 311–317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Phitchaya Mangpo Phothilimthana, Jason Ansel, Jonathan Ragan-Kelley, and Saman Amarasinghe. 2013. Portable Performance on Heterogeneous Architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Kekoa Proudfoot, William R. Mark, Svetoslav Tzvetkov, and Pat Hanrahan. 2001. A Real-time Procedural Shading System for Programmable Graphics Hardware. In SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Y. Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarCross RefCross Ref
  70. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In ACM Conference on Programming Language Design and Implementation (PLDI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Tiark Rompf and Martin Odersky. 2010. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In International Conference on Generative Programming and Component Engineering (GPCE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Tiark Rompf, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, and Kunle Olukotun. 2014. Surgical Precision JIT Compilers. In ACM Conference on Programming Language Design and Implementation (PLDI).Google ScholarGoogle Scholar
  73. Adrian Sampson. Braid source code, documentation, and interactive compiler. https://capra.cs.cornell.edu/braid/ .Google ScholarGoogle Scholar
  74. Ben Sander, Greg Stoner, Siu-Chi Chan, Wen-Heng Chung, and Robin Maffeo. HCC: A C++ Compiler For Heterogeneous Computing. http://www.open- std.org/jtc1/sc22/wg21/docs/papers/2015/p0069r0.pdf .Google ScholarGoogle Scholar
  75. Carlos Scheidegger. Lux: the DSEL for WebGL graphics. http://cscheid.github.io/lux/ .Google ScholarGoogle Scholar
  76. Mark Segal and Kurt Akeley. The OpenGL 4.5 Graphics System: A Specification. https://www.opengl.org/registry/doc/ glspec45.core.pdf .Google ScholarGoogle Scholar
  77. Stanford. The Stanford 3D Scanning Repository. http://graphics.stanford.edu/data/3Dscanrep/ .Google ScholarGoogle Scholar
  78. John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Design & Test 12, 3 (May 2010), 66–73.Google ScholarGoogle Scholar
  79. Walid Taha. 2003. Domain-Specific Program Generation: International Seminar, Dagstuhl Castle, Germany, March 23–28, 2003. Revised Papers. Chapter A Gentle Introduction to Multi-stage Programming, 30–50.Google ScholarGoogle Scholar
  80. Walid Taha and Michael Florentin Nielsen. 2003. Environment Classifiers. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Walid Taha and Tim Sheard. 1997. Multi-stage Programming with Explicit Annotations. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Naoki Takashima, Hiroki Sakamoto, and Yukiyoshi Kameyama. 2015. Generate and Offshore: Type-safe and Modular Code Generation for Low-level Optimization. In Workshop on Functional High-Performance Computing (FHPC). Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Web Hypertext Application Technology Working Group. HTML Living Standard. Section 8.9: Animation Frames. https://html.spec.whatwg.org/multipage/webappapis.html .Google ScholarGoogle Scholar

Index Terms

  1. Static stages for heterogeneous programming

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!