Abstract
Heterogeneous hardware is central to modern advances in performance and efficiency. Mainstream programming models for heterogeneous architectures, however, sacrifice safety and expressiveness in favor of low-level control over performance details. The interfaces between hardware units consist of verbose, unsafe APIs; hardware-specific languages make it difficult to move code between units; and brittle preprocessor macros complicate the task of specializing general code for efficient accelerated execution. We propose a unified low-level programming model for heterogeneous systems that offers control over performance, safe communication constructs, cross-device code portability, and hygienic metaprogramming for specialization. The language extends constructs from multi-stage programming to separate code for different hardware units, to communicate between them, and to express compile-time code optimization. We introduce static staging, a different take on multi-stage programming that lets the compiler generate all code and communication constructs ahead of time.
To demonstrate our approach, we use static staging to implement BraidGL, a real-time graphics programming language for CPU-GPU systems. Current real-time graphics software in OpenGL uses stringly-typed APIs for communication and unsafe preprocessing to generate specialized GPU code variants. In BraidGL, programmers instead write hybrid CPU-GPU software in a unified language. The compiler statically generates target-specific code and guarantees safe communication between the CPU and the graphics pipeline stages. Example scenes demonstrate the language's productivity advantages: BraidGL eliminates the safety and expressiveness pitfalls of OpenGL and makes common specialization techniques easy to apply. The case study demonstrates how static staging can express core placement and specialization in general heterogeneous programming.
Supplemental Material
Available for Download
- Advanced Micro Devices. Mantle Programming Guide and API Reference 1.0. https://www.amd.com/Documents/ Mantle- Programming- Guide- and- API- Reference.pdf .Google Scholar
- Jason Ansel, Cy P. Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman P. Amarasinghe. 2009. PetaBricks: a language and compiler for algorithmic choice. In ACM Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- Apple. Metal Shading Language Specification, Version 2.0. https://developer.apple.com/metal/ Metal- Shading- Language- Specification.pdf .Google Scholar
- Joshua Auerbach, David F. Bacon, Perry Cheng, and Rodric Rabbah. 2010. Lime: A Java-compatible and Synthesizable Language for Heterogeneous Architectures. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google Scholar
Digital Library
- Chad Austin and Dirk Reiners. 2005. Renaissance: A functional shading language. In ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware.Google Scholar
Cross Ref
- Baggers. Varjo: Lisp to GLSL Language Translator. https://github.com/cbaggers/varjo .Google Scholar
- Alan Bawden. 1999. Quasiquotation in Lisp. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM).Google Scholar
- Zine-El-Abidine Benaissa, Eugenio Moggi, Walid Taha, and Tim Sheard. 1999. Logical Modalities and Multi-Stage Programming. In Federated Logic Conference (FLoC) Satellite Workshop on Intuitionistic Modal Logics and Applications (IMLA).Google Scholar
- Tobias Bexelius. GPipe. http://hackage.haskell.org/package/GPipe .Google Scholar
- Kovas Boguta. Gamma. https://github.com/kovasb/gamma .Google Scholar
- Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In International Conference on Parallel Architectures and Compilation Techniques (PACT). Google Scholar
Digital Library
- C. Calcagno, E. Moggi, and T. Sheard. 2003a. Closed Types for a Safe Imperative MetaML. Journal of Functional Programming 13, 3 (May 2003), 545–571. Google Scholar
Digital Library
- Cristiano Calcagno, Eugenio Moggi, and Walid Taha. 2004. ML-Like Inference for Classifiers. In European Symposium on Programming (ESOP). Google Scholar
Cross Ref
- Cristiano Calcagno, Walid Taha, Liwen Huang, and Xavier Leroy. 2003b. Implementing Multi-stage Languages Using ASTs, Gensym, and Reflection. In International Conference on Generative Programming and Component Engineering (GPCE). Google Scholar
Cross Ref
- Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A Domainspecific Approach to Heterogeneous Parallelism. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).Google Scholar
- Bradford L. Chamberlain, David Callahan, and Hans P. Zima. 2007. Parallel Programmability and the Chapel Language. International Journal of High Performance Computing Applications 21, 3 (2007), 291–312. Google Scholar
Digital Library
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: An Object-oriented Approach to Non-uniform Cluster Computing. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google Scholar
Digital Library
- Chiyan Chen and Hongwei Xi. 2003. Meta-programming Through Typeful Code Representation. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google Scholar
Digital Library
- James Cheney, Sam Lindley, and Philip Wadler. 2013. A Practical Theory of Language-integrated Query. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google Scholar
Digital Library
- Rowan Davies and Frank Pfenning. 1996. A Modal Analysis of Staged Computation. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google Scholar
Digital Library
- Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. 2013. Terra: A Multi-stage Language for High-performance Computing. In ACM Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- Jason Eckhardt, Roumen Kaiabachev, Emir Pasalic, Kedar Swadi, and Walid Taha. 2007. Implicitly Heterogeneous Multi-stage Programming. New Generation Computing 25, 3 (Jan. 2007), 305–336. Google Scholar
Digital Library
- Conal Elliott. 2004. Programming Graphics Processors Functionally. In Haskell Workshop. Google Scholar
Digital Library
- Nicolas Feltman, Carlo Angiuli, Umut A. Acar, and Kayvon Fatahalian. 2016. Automatically Splitting a Two-Stage Lambda Calculus. In European Symposium on Programming (ESOP). Google Scholar
Digital Library
- Matthew Flatt. 2002. Composable and Compilable Macros: You Want It When?. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google Scholar
Digital Library
- Matthew Flatt. 2016. Binding As Sets of Scopes. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google Scholar
Digital Library
- Tim Foley and Pat Hanrahan. 2011. Spark: Modular, Composable Shaders for Graphics Hardware. In SIGGRAPH.Google Scholar
- Steven E. Ganz, Amr Sabry, and Walid Taha. 2001. Macros As Multi-stage Computations: Type-safe, Generative, Binding Macros in MacroML. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google Scholar
Digital Library
- Kate Gregory and Ade Miller. 2012. C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++. O’Reilly. http://www.gregcons.com/cppamp/Google Scholar
- Ilya Grigorik, James Simonsen, and Jatinder Mann. High Resolution Time Level 2: W3C Working Draft. https://www.w3. org/TR/hr- time/ .Google Scholar
- Yuichiro Hanada and Atsushi Igarashi. 2014. On Cross-Stage Persistence in Multi-Stage Programming. In International Symposium on Functional and Logic Programming (FLOPS). Google Scholar
Cross Ref
- Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, and Lingjia Tang. 2015. DjiNN and Tonic: DNN As a Service and Its Implications for Future Warehouse Scale Computers. In International Symposium on Computer Architecture (ISCA). Google Scholar
Digital Library
- Yong He, Tim Foley, and Kayvon Fatahalian. 2016. A System for Rapid Exploration of Shader Optimization Choices. In SIGGRAPH. Google Scholar
Digital Library
- Yong He, Tim Foley, Natalya Tatarchuk, and Kayvon Fatahalian. 2015. A System for Rapid, Automatic Shader Level-of-detail. In SIGGRAPH Asia.Google Scholar
- Troels Henriksen, Niels G. W. Serup, Martin Elsman, Fritz Henglein, and Cosmin Oancea. 2017. Futhark: Purely Functional GP U-programming with Nested Parallelism and In-place Array Updates. In ACM Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- Martin Hirzel and Robert Grimm. 2007. Jeannie: Granting Java Native Interface Developers Their Wishes. In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google Scholar
Digital Library
- Lee Howes and Maria Rovatsou. SYCL Specification. https://www.khronos.org/registry/sycl/ .Google Scholar
- Dean Jackson and Jeff Gilbert. WebGL Specification. https://www.khronos.org/registry/webgl/specs/latest/1.0/ .Google Scholar
- Ulrik Jørring and William L. Scherlis. 1986. Compilers and Staging Transformations. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google Scholar
Digital Library
- Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In International Symposium on Computer Architecture (ISCA).Google Scholar
Digital Library
- John Kessenich. An Introduction to SPIR-V: A Khronos-Defined Intermediate Language for Native Representation of Graphical Shaders and Compute Kernels. https://www.khronos.org/registry/spir- v/papers/WhitePaper.pdf .Google Scholar
- Khronos. Vulkan 1.0.48: A Specification. https://www.khronos.org/registry/vulkan/specs/1.0/pdf/vkspec.pdf .Google Scholar
- Ik-Soon Kim, Kwangkeun Yi, and Cristiano Calcagno. 2006. A Polymorphic Modal Type System for Lisp-like Multi-staged Languages. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google Scholar
Digital Library
- Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml. In International Symposium on Functional and Logic Programming (FLOPS). Google Scholar
Cross Ref
- Oleg Kiselyov. MetaOCaml – an OCaml dialect for multi-stage programming. http://okmij.org/ftp/ML/MetaOCaml.html .Google Scholar
- Andreas Klöckner. 2014. Loo.py: Transformation-based Code Generation for GP Us and CP Us. In International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY). Google Scholar
Digital Library
- Andreas Klöckner, Nicolas Pinto, Yunsup Lee, Bryan Catanzaro, Paul Ivanov, and Ahmed Fasih. 2012. PyCUDA and PyOpenCL: A Scripting-based Approach to GP U Run-time Code Generation. Parallel Comput. 38, 3 (March 2012), 157–174. Google Scholar
Digital Library
- Eugene Kohlbecker, Daniel P. Friedman, Matthias Felleisen, and Bruce Duba. 1986. Hygienic Macro Expansion. In ACM Conference on LISP and Functional Programming. Google Scholar
Digital Library
- LambdaCube. LambdaCube 3D. http://lambdacube3d.com .Google Scholar
- Byeongcheol Lee, Robert Grimm, Martin Hirzel, and Kathryn S. McKinley. 2012. Marco: Safe, Expressive Macros for Any Language. In European conference on Object-Oriented Programming (ECOOP). Google Scholar
Digital Library
- Chi-Keung Luk, Sunpyo Hong, and Hyesoon Kim. 2009. Qilin: Exploiting Parallelism on Heterogeneous Multiprocessors with Adaptive Mapping. In IEEE/ACM International Symposium on Microarchitecture (MICRO). Google Scholar
Digital Library
- Geoffrey Mainland. 2012. Explicitly heterogeneous metaprogramming with MetaHaskell. In ACM SIGPLAN International Conference on Functional Programming (ICFP). Google Scholar
Digital Library
- David Majda. PEG.js: Parser Generator for JavaScript. http://pegjs.org .Google Scholar
- Michael McCool, Stefanus Du Toit, Tiberiu Popa, Bryan Chan, and Kevin Moule. 2004. Shader Algebra. In SIGGRAPH. Google Scholar
Digital Library
- Michael McCool, Zheng Qin, and Tiberiu S. Popa. 2002. Shader Metaprogramming. In ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware.Google Scholar
- Sean McDirmid. Two Lightweight DSLs for Rich UI Programming. http://research.microsoft.com/pubs/191794/ldsl09.pdf .Google Scholar
- Morgan McGuire. Computer Graphics Archive. http://graphics.cs.williams.edu/data .Google Scholar
- Microsoft. Direct3D. https://msdn.microsoft.com/en- us/library/windows/desktop/hh309466(v=vs.85).aspx .Google Scholar
- Eugenio Moggi, Walid Taha, Zine-El-Abidine Benaissa, and Tim Sheard. 1999. An Idealized MetaML: Simpler, and More Expressive. In European Symposium on Programming (ESOP). Google Scholar
Cross Ref
- Tom Murphy, VII, Karl Crary, and Robert Harper. 2007. Type-safe Distributed Programming with ML5. In Conference on Trustworthy Global Computing (TGC).Google Scholar
- Todd Mytkowicz and Wolfram Schulte. 2014. Waiting for Godot? The Right Language Abstractions for Parallel Programming Should Be Here Soon: The Multicore Transformation. Ubiquity (June 2014), 4:1–4:12.Google Scholar
Digital Library
- Shayan Najd, Sam Lindley, Josef Svenningsson, and Philip Wadler. 2016. Everything Old is New Again: Quoted Domainspecific Languages. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM).Google Scholar
Digital Library
- Aleksandar Nanevski and Frank Pfenning. 2005. Staged Computation with Names and Necessity. Journal of Functional Programming (JFP) 15 (Nov. 2005), 893–939. Issue 6.Google Scholar
Digital Library
- John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA. Queue 6, 2 (March 2008), 40–53. Google Scholar
Digital Library
- OpenACC. The OpenACC Application Programming Interface. http://www.openacc.org/sites/default/files/OpenACC_2pt5. pdf .Google Scholar
- Bui Tuong Phong. 1975. Illumination for Computer Generated Pictures. Commun. ACM 18, 6 (June 1975), 311–317. Google Scholar
Digital Library
- Phitchaya Mangpo Phothilimthana, Jason Ansel, Jonathan Ragan-Kelley, and Saman Amarasinghe. 2013. Portable Performance on Heterogeneous Architectures. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Google Scholar
Digital Library
- Kekoa Proudfoot, William R. Mark, Svetoslav Tzvetkov, and Pat Hanrahan. 2001. A Real-time Procedural Shading System for Programmable Graphics Hardware. In SIGGRAPH. Google Scholar
Digital Library
- Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth, Gopal Jan, Gray Michael, Haselman Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Y. Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In International Symposium on Computer Architecture (ISCA). Google Scholar
Cross Ref
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. In ACM Conference on Programming Language Design and Implementation (PLDI). Google Scholar
Digital Library
- Tiark Rompf and Martin Odersky. 2010. Lightweight Modular Staging: A Pragmatic Approach to Runtime Code Generation and Compiled DSLs. In International Conference on Generative Programming and Component Engineering (GPCE). Google Scholar
Digital Library
- Tiark Rompf, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Hassan Chafi, and Kunle Olukotun. 2014. Surgical Precision JIT Compilers. In ACM Conference on Programming Language Design and Implementation (PLDI).Google Scholar
- Adrian Sampson. Braid source code, documentation, and interactive compiler. https://capra.cs.cornell.edu/braid/ .Google Scholar
- Ben Sander, Greg Stoner, Siu-Chi Chan, Wen-Heng Chung, and Robin Maffeo. HCC: A C++ Compiler For Heterogeneous Computing. http://www.open- std.org/jtc1/sc22/wg21/docs/papers/2015/p0069r0.pdf .Google Scholar
- Carlos Scheidegger. Lux: the DSEL for WebGL graphics. http://cscheid.github.io/lux/ .Google Scholar
- Mark Segal and Kurt Akeley. The OpenGL 4.5 Graphics System: A Specification. https://www.opengl.org/registry/doc/ glspec45.core.pdf .Google Scholar
- Stanford. The Stanford 3D Scanning Repository. http://graphics.stanford.edu/data/3Dscanrep/ .Google Scholar
- John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. IEEE Design & Test 12, 3 (May 2010), 66–73.Google Scholar
- Walid Taha. 2003. Domain-Specific Program Generation: International Seminar, Dagstuhl Castle, Germany, March 23–28, 2003. Revised Papers. Chapter A Gentle Introduction to Multi-stage Programming, 30–50.Google Scholar
- Walid Taha and Michael Florentin Nielsen. 2003. Environment Classifiers. In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL). Google Scholar
Digital Library
- Walid Taha and Tim Sheard. 1997. Multi-stage Programming with Explicit Annotations. In ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM). Google Scholar
Digital Library
- Naoki Takashima, Hiroki Sakamoto, and Yukiyoshi Kameyama. 2015. Generate and Offshore: Type-safe and Modular Code Generation for Low-level Optimization. In Workshop on Functional High-Performance Computing (FHPC). Google Scholar
Digital Library
- Web Hypertext Application Technology Working Group. HTML Living Standard. Section 8.9: Animation Frames. https://html.spec.whatwg.org/multipage/webappapis.html .Google Scholar
Index Terms
Static stages for heterogeneous programming
Recommendations
CLOP: a multi-stage compiler to seamlessly embed heterogeneous code
GPCE 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesHeterogeneous programming complicates software development. We present CLOP, a platform that embeds code targeting heterogeneous compute devices in a convenient and clean way, allowing unobstructed data flow between the host code and the devices, ...
CLOP: a multi-stage compiler to seamlessly embed heterogeneous code
GPCE '15Heterogeneous programming complicates software development. We present CLOP, a platform that embeds code targeting heterogeneous compute devices in a convenient and clean way, allowing unobstructed data flow between the host code and the devices, ...
Multi-stage programming for GPUs in C++ using PACXX
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing UnitWriting and optimizing programs for high performance on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers. A promising optimization technique is multi-stage programming -- evaluating parts of the ...






Comments