Abstract
This paper advocates programming high-performance code using partial evaluation. We present a clean-slate programming system with a simple, annotation-based, online partial evaluator that operates on a CPS-style intermediate representation. Our system exposes code generation for accelerators (vectorization/parallelization for CPUs and GPUs) via compiler-known higher-order functions that can be subjected to partial evaluation. This way, generic implementations can be instantiated with target-specific code at compile time.
In our experimental evaluation we present three extensive case studies from image processing, ray tracing, and genome sequence alignment. We demonstrate that using partial evaluation, we obtain high-performance implementations for CPUs and GPUs from one language and one code base in a generic way. The performance of our codes is mostly within 10%, often closer to the performance of multi man-year, industry-grade, manually-optimized expert codes that are considered to be among the top contenders in their fields.
Supplemental Material
- Timo Aila and Samuli Laine. 2009. Understanding the efficiency of ray traversal on GPUs. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS Conference on High Performance Graphics 2009, New Orleans, Louisiana, USA, August 1-3, 2009. 145–149. Google Scholar
Digital Library
- Timo Aila, Samuli Laine, and Tero Karras. 2012. Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum. Technical Report NVR-2012-002. NVIDIA Technical Report.Google Scholar
- Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, and Massimo Torquati. 2011. Accelerating Code on Multi-cores with FastFlow. In Euro-Par 2011 Parallel Processing - 17th International Conference, Euro-Par 2011, Bordeaux, France, August 29 - September 2, 2011, Proceedings, Part II. 170–181. Google Scholar
Digital Library
- Nada Amin and Tiark Rompf. 2018. Collapsing towers of interpreters. PACMPL 2, POPL (2018), 52:1–52:33. Google Scholar
Digital Library
- L.O Andersen. 1994. Program Analysis and Specialization for the C Programming Language. Ph.D. Dissertation. Københavns Universitet. Datalogisk Institut.Google Scholar
- Lars Ole Andersen. 1993. Binding-Time Analysis and the Taming of C Pointers. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM’93, Copenhagen, Denmark, June 14-16, 1993. 47–58. Google Scholar
Digital Library
- Kenichi Asai. 2001. Integrating Partial Evaluators into Interpreters. In Semantics, Applications, and Implementation of Program Generation, Second International Workshop, SAIG 2001, Florence, Italy, September 6, 2001, Proceedings. 126–145. Google Scholar
Digital Library
- Kenichi Asai. 2016. Toward introducing binding-time analysis to MetaOCaml. In Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2016, St. Petersburg, FL, USA, January 20 - 22, 2016. 97–102. Google Scholar
Digital Library
- Kenichi Asai and Yukiyoshi Kameyama. 2016. Automatic Staging via Partial Evaluation Techniques. In 7th International Symposium on Symbolic Computation in Software Science, SCSS 2016, Tokyo, Japan, March 28-31, 2016. 1–13. http: //www.easychair.org/publications/paper/262500Google Scholar
- Lennart Augustsson. 2010. O, partial evaluator, where art thou?. In Proceedings of the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2010, Madrid, Spain, January 18-19, 2010. 1–2. Google Scholar
Digital Library
- Alan Bawden. 1999. Quasiquotation in Lisp. In Proceedings of the 1999 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, San Antonio, Texas, USA, January 22-23, 1999. Technical report BRICS-NS-99-1. 4–12.Google Scholar
- N. Bell and J. Hoberock. 2011. Thrust: A productivity-oriented library for CUDA. In GPU Computing Gems Jade Edition (1st ed.), Wen-mei W. Hwu (Ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.Google Scholar
- Lars Birkedal and Morten Welinder. 1994. Hand-Writing Program Generator Generators. In Programming Language Implementation and Logic Programming, 6th International Symposium, PLILP’94, Madrid, Spain, September 14-16, 1994, Proceedings. 198–214. Google Scholar
Digital Library
- Lars Birkedal and Morten Welinder. 1995. Binding-Time Analysis for Standard ML. Lisp and Symbolic Computation 8, 3 (1995), 191–208. Google Scholar
Digital Library
- Anders Bondorf and Jesper Jørgensen. 1993. Efficient Analysis for Realistic Off-Line Partial Evaluation. J. Funct. Program. 3, 3 (1993), 315–346.Google Scholar
Cross Ref
- Matthias Braun, Sebastian Buchwald, Sebastian Hack, Roland Leißa, Christoph Mallon, and Andreas Zwinkau. 2013. Simple and Efficient Construction of Static Single Assignment Form. In Compiler Construction - 22nd International Conference, CC 2013, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2013, Rome, Italy, March 16-24, 2013. Proceedings. 102–122. Google Scholar
Digital Library
- Kevin J. Brown, Arvind K. Sujeeth, HyoukJoong Lee, Tiark Rompf, Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011, Galveston, TX, USA, October 10-14, 2011. 89–100. Google Scholar
Digital Library
- Mikhail A. Bulyonkov. 1993. Extracting Polyvariant Binding Time Analysis from Polyvariant Specializer. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM’93, Copenhagen, Denmark, June 14-16, 1993. 59–65. Google Scholar
Digital Library
- Jacques Carette, Oleg Kiselyov, and Chung-chieh Shan. 2007. Finally Tagless, Partially Evaluated. In Programming Languages and Systems, 5th Asian Symposium, APLAS 2007, Singapore, November 29-December 1, 2007, Proceedings. 222–238. Google Scholar
Digital Library
- Hassan Chafi, Zach DeVito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, and Kunle Olukotun. 2010. Language virtualization for heterogeneous parallel computing. In Proceedings of the 25th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2010, October 17-21, 2010, Reno/Tahoe, Nevada, USA. 835–847. Google Scholar
Digital Library
- Murray Cole. 1988. Algorithmic skeletons : a structured approach to the management of parallel computation. Ph.D. Dissertation. University of Edinburgh, UK. http://hdl.handle.net/1842/11997 Google Scholar
Digital Library
- Charles Consel. 1988. New Insights into Partial Evaluation: the SCHISM Experiment. In ESOP ’88, 2nd European Symposium on Programming, Nancy, France, March 21-24, 1988, Proceedings. 236–246. Google Scholar
Digital Library
- Charles Consel. 1993a. Polyvariant Binding-Time Analysis For Applicative Languages. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM’93, Copenhagen, Denmark, June 14-16, 1993. 66–77. Google Scholar
Digital Library
- Charles Consel. 1993b. A Tour of Schism: A Partial Evaluation System For Higher-Order Applicative Languages. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, PEPM’93, Copenhagen, Denmark, June 14-16, 1993. 145–154. Google Scholar
Digital Library
- William R. Cook and Ralf Lämmel. 2011. Tutorial on Online Partial Evaluation. In Proceedings IFIP Working Conference on Domain-Specific Languages, DSL 2011, Bordeaux, France, 6-8th September 2011. 168–180.Google Scholar
- Jeff Daily. 2016. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments. BMC bioinformatics 17, 1 (2016), 1.Google Scholar
- Manuvir Das, Thomas W. Reps, and Pascal Van Hentenryck. 1995. Semantic Foundations of Binding Time Analysis for Imperative Programs. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, La Jolla, California, USA, June 21-23, 1995. 100–110. Google Scholar
Digital Library
- André L. M. de Santos. 1995. Compilation by Transformation in Non-Strict Functional Languages. Ph.D. Dissertation. University of Glasgow.Google Scholar
- Zachary DeVito, James Hegarty, Alex Aiken, Pat Hanrahan, and Jan Vitek. 2013. Terra: a multi-stage language for highperformance computing. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013. 105–116. Google Scholar
Digital Library
- Zach DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: a domain specific language for building portable mesh-based PDE solvers. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2011, Seattle, WA, USA, November 12-18, 2011. 9:1–9:12. Google Scholar
Digital Library
- Andreas Döring, David Weese, Tobias Rausch, and Knut Reinert. 2008. SeqAn An efficient, generic C++ library for sequence analysis. BMC Bioinformatics 9, 1 (09 Jan 2008), 11.Google Scholar
- Juan José Fumero, Michel Steuwer, Lukas Stadler, and Christophe Dubach. 2017. Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2017, Xi’an, China, April 8-9, 2017. 60–73. Google Scholar
Digital Library
- Yoshihiko Futamura. 1982. Parital Computation of Programs. In RIMS Symposium on Software Science and Engineering, Kyoto, Japan, 1982, Proceedings. 1–35. Google Scholar
Digital Library
- Yoshihiko Futamura. 1999. Partial Evaluation of Computation Process–An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation 12, 4 (01 Dec 1999), 381–391. Google Scholar
Digital Library
- Marc Gengler and M. Rytz. 1992. A Polyvariant Binding Time Analysis Handling Partially Known Values. In Actes WSA’92 Workshop on Static Analysis (Bordeaux), September 1992, Laboratoire Bordelais de Recherche en Informatique (LaBRI), Proceedings. 322–330.Google Scholar
- Robert Glück. 2012. A self-applicable online partial evaluator for recursive flowchart languages. Softw., Pract. Exper. 42, 6 (2012), 649–673. Google Scholar
Digital Library
- Brian Grant, Markus Mock, Matthai Philipose, Craig Chambers, and Susan J. Eggers. 2000. DyC: an expressive annotationdirected dynamic compiler for C. Theor. Comput. Sci. 248, 1-2 (2000), 147–199. Google Scholar
Digital Library
- Bastian Hagedorn, Larisa Stoltzfus, Michel Steuwer, Sergei Gorlatch, and Christophe Dubach. 2018. High performance stencil code generation with lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization, CGO 2018, Vösendorf / Vienna, Austria, February 24-28, 2018. 100–112. Google Scholar
Digital Library
- Michael Haidl and Sergei Gorlatch. 2014. PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14. In Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, LLVM 2014, New Orleans, LA, USA, November 17, 2014. 1–11. Google Scholar
Digital Library
- Michael Haidl, Simon Moll, Lars Klein, Huihui Sun, Sebastian Hack, and Sergei Gorlatch. 2017. PACXXv2 + RV: An LLVM-based Portable High-Performance Programming Model. In Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, LLVM-HPC@SC 2017, Denver, CO, USA, November 13, 2017. 7:1–7:12. Google Scholar
Digital Library
- Michael Haidl, Michel Steuwer, Tim Humernbrum, and Sergei Gorlatch. 2016. Multi-stage programming for GPUs in C++ using PACXX. In Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016, Barcelona, Spain, March 12 - 16, 2016. 32–41. Google Scholar
Digital Library
- Christopher G. Harris and Mike Stephens. 1988. A Combined Corner and Edge Detector. In Proceedings of the Alvey Vision Conference, AVC 1988, Manchester, UK, September, 1988. 1–6.Google Scholar
- Daniel S. Hirschberg. 1975. A Linear Space Algorithm for Computing Maximal Common Subsequences. Commun. ACM 18, 6 (1975), 341–343. Google Scholar
Digital Library
- Christian Hofer, Klaus Ostermann, Tillmann Rendel, and Adriaan Moors. 2008. Polymorphic embedding of dsls. In Generative Programming and Component Engineering, 7th International Conference, GPCE 2008, Nashville, TN, USA, October 19-23, 2008, Proceedings. 137–148. Google Scholar
Digital Library
- Luke Hornof and Jacques Noyé. 2000. Accurate binding-time analysis for imperative languages: flow, context, and return sensitivity. Theor. Comput. Sci. 248, 1-2 (2000), 3–27. Google Scholar
Digital Library
- Paul Hudak. 1998. Modular domain specific languages and tools. In Proceedings of the Fifth International Conference on Software Reuse, ICSR 1998, Victoria, BC, Canada, June 2-5, 1998. 134–142. Google Scholar
Digital Library
- Thomas Johnsson. 1985. Lambda Lifting: Treansforming Programs to Recursive Equations. In Functional Programming Languages and Computer Architecture, FPCA 1985, Nancy, France, September 16-19, 1985, Proceedings. 190–203. Google Scholar
Digital Library
- Neil D. Jones. 1995. Special Address: MIX ten years after. In Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, La Jolla, California, USA, June 21-23, 1995. 24–38. Google Scholar
Digital Library
- Neil D. Jones and Arne J. Glenstrup. 2002. Program generation, termination, and binding-time analysis. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming (ICFP ’02), Pittsburgh, Pennsylvania, USA, October 4-6, 2002. 283. Google Scholar
Digital Library
- Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. 1993. Partial evaluation and automatic program generation. Prentice Hall. Google Scholar
Digital Library
- Neil D. Jones, Peter Sestoft, and Harald Søndergaard. 1989. Mix: A Self-Applicable Partial Evaluator for Experiments in Compiler Generation. Lisp and Symbolic Computation 2, 1 (1989), 9–50.Google Scholar
Cross Ref
- Jesper Jørgensen. 1998. SIMILIX: A Self-Applicable Partial Evaluator for Scheme. In Partial Evaluation - Practice and Theory, DIKU 1998 International Summer School, Copenhagen, Denmark, June 29 - July 10, 1998. 83–107. Google Scholar
Digital Library
- Ulrik Jørring and William L. Scherlis. 1986. Compilers and Staging Transformations. In Conference Record of the Thirteenth Annual ACM Symposium on Principles of Programming Languages, St. Petersburg Beach, Florida, USA, January 1986. 86–96. Google Scholar
Digital Library
- Vojin Jovanovic, Amir Shaikhha, Sandro Stucki, Vladimir Nikolaev, Christoph Koch, and Martin Odersky. 2014. Yin-yang: concealing the deep embedding of DSLs. In Generative Programming: Concepts and Experiences, GPCE’14, Vasteras, Sweden, September 15-16, 2014. 73–82. Google Scholar
Digital Library
- Morry Katz and Daniel Weise. 1992. Towards a New Perspective on Partial Evaluation. In PEPM’92, ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Fairmont Hotel, San Francisco, CA, USA, June 19-20, 1992, Proceedings (TR YALEU/DCS/RR-909). 29–37.Google Scholar
- Richard Kelsey. 1995. A Correspondence between Continuation Passing Style and Static Single Assignment Form. In Proceedings ACM SIGPLAN Workshop on Intermediate Representations (IR’95), San Francisco, CA, USA, January 22, 1995. 13–23. Google Scholar
Digital Library
- Ronan Keryell, Ruyman Reyes, and Lee Howes. 2015. Khronos SYCL for OpenCL: a tutorial. In Proceedings of the 3rd International Workshop on OpenCL, IWOCL 2015, Palo Alto, California, USA, May 12-13, 2015. 24:1. Google Scholar
Digital Library
- Oleg Kiselyov. 2014. The Design and Implementation of BER MetaOCaml - System Description. In Functional and Logic Programming - 12th International Symposium, FLOPS 2014, Kanazawa, Japan, June 4-6, 2014. Proceedings. 86–102.Google Scholar
- Eugene E. Kohlbecker, Daniel P. Friedman, Matthias Felleisen, and Bruce F. Duba. 1986. Hygienic Macro Expansion. In LISP and Functional Programming. 151–161. Google Scholar
Digital Library
- Chris Lattner and Vikram S. Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 20-24 March 2004, San Jose, CA, USA. 75–88. Google Scholar
Digital Library
- Roland Leißa, Klaas Boesche, Sebastian Hack, Richard Membarth, and Philipp Slusallek. 2015a. Shallow embedding of DSLs via online partial evaluation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2015, Pittsburgh, PA, USA, October 26-27, 2015. 11–20. Google Scholar
Digital Library
- Roland Leißa, Marcel Köster, and Sebastian Hack. 2015b. A graph-based higher-order intermediate representation. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015, San Francisco, CA, USA, February 07 - 11, 2015. 202–212. Google Scholar
Digital Library
- David Leopoldseder, Lukas Stadler, Thomas Würthinger, Josef Eisl, Doug Simon, and Hanspeter Mössenböck. 2018. Dominance-based duplication simulation (DBDS): code duplication to enable compiler optimizations. In Proceedings of the 2018 International Symposium on Code Generation and Optimization, CGO 2018, Vösendorf / Vienna, Austria, February 24-28, 2018. 126–137. Google Scholar
Digital Library
- Richard Membarth, Oliver Reiche, Frank Hannig, Jürgen Teich, Mario Körner, and Wieland Eckert. 2016. HIPAcc: A Domain-Specific Language and Compiler for Image Processing. IEEE Trans. Parallel Distrib. Syst. 27, 1 (2016), 210–224. Google Scholar
Digital Library
- Jan Midtgaard. 2012. Control-flow analysis of functional programs. ACM Comput. Surv. 44, 3 (2012), 10:1–10:33. Google Scholar
Digital Library
- Simon Moll and Sebastian Hack. 2018. Partial control-flow linearization. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 543–556. Google Scholar
Digital Library
- Chris J. Newburn, Byoungro So, Zhenying Liu, Michael D. McCool, Anwar M. Ghuloum, Stefanus Du Toit, Zhi-Gang Wang, Zhaohui Du, Yongjian Chen, Gansha Wu, Peng Guo, Zhanglin Liu, and Dan Zhang. 2011. Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language. In Proceedings of the CGO 2011, The 9th International Symposium on Code Generation and Optimization, Chamonix, France, April 2-6, 2011. 224–235. Google Scholar
Digital Library
- John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA. ACM Queue 6, 2 (2008), 40–53. Google Scholar
Digital Library
- Hanne Riis Nielson and Flemming Nielson. 1992. Semantics with applications - a formal introduction. Wiley. Google Scholar
Digital Library
- Georg Ofenbeck, Tiark Rompf, and Markus Püschel. 2017. Staging for generic programming in space and time. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2017, Vancouver, BC, Canada, October 23-24, 2017. 15–28. Google Scholar
Digital Library
- Jens Palsberg and Michael I. Schwartzbach. 1994. Binding-time Analysis: Abstract Interpretation versus Type Inference. In Proceedings of the IEEE Computer Society 1994 International Conference on Computer Languages, May 16-19, 1994, Toulouse, France. 277–288.Google Scholar
- Jacopo Pantaleoni and Nuno Subtil. 2014. NVBIO. http://nvlabs.github.io/nvbio/ . {Online; accessed 06-October-2017}.Google Scholar
- Steven G. Parker, James Bigler, Andreas Dietrich, Heiko Friedrich, Jared Hoberock, David P. Luebke, David K. McAllister, Morgan McGuire, R. Keith Morley, Austin Robison, and Martin Stich. 2010. OptiX: a general purpose ray tracing engine. ACM Trans. Graph. 29, 4 (2010), 66:1–66:13. Google Scholar
Digital Library
- Arsène Pérard-Gayot, Martin Weier, Richard Membarth, Philipp Slusallek, Roland Leißa, and Sebastian Hack. 2017. RaTrace: simple and efficient abstractions for BVH ray traversal algorithms. In Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2017, Vancouver, BC, Canada, October 23-24, 2017. 157–168. Google Scholar
Digital Library
- Sander Pronk, Szilárd Páll, Roland Schulz, Per Larsson, Pär Bjelkmar, Rossen Apostolov, Michael R. Shirts, Jeremy C. Smith, Peter M. Kasson, David van der Spoel, Berk Hess, and Erik Lindahl. 2013. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29, 7 (2013), 845–854. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman P. Amarasinghe, and Frédo Durand. 2012. Decoupling algorithms from schedules for easy optimization of image processing pipelines. ACM Trans. Graph. 31, 4 (2012), 32:1–32:12. Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman P. Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’13, Seattle, WA, USA, June 16-19, 2013. 519–530. Google Scholar
Digital Library
- Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. McRae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. 2017. Firedrake: Automating the Finite Element Method by Composing Abstractions. ACM Trans. Math. Softw. 43, 3 (2017), 24:1–24:27. Google Scholar
Digital Library
- James Reinders. 2007. Intel threading building blocks - outfitting C++ for multi-core processor parallelism. O’Reilly. http: //www.oreilly.com/catalog/9780596514808/index.html Google Scholar
Digital Library
- Tiark Rompf and Martin Odersky. 2010. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. In Generative Programming And Component Engineering, Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, GPCE 2010, Eindhoven, The Netherlands, October 10-13, 2010. 127–136. Google Scholar
Digital Library
- Erik Ruf and Daniel Weise. 1992. Improving the Accuracy of Higher-Order Specialization using Control Flow Analysis. In PEPM’92, ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Fairmont Hotel, San Francisco, CA, USA, June 19-20, 1992, Proceedings (TR YALEU/DCS/RR-909). 67–74.Google Scholar
- Erik Steven Ruf. 1993. Topics in Online Partial Evaluation. Ph.D. Dissertation. Stanford University, Stanford, CA, USA. UMI Order No. GAX93-26550.Google Scholar
- Bernhard Rytz and Marc Gengler. 1992. A Polyvariant Binding Time Analysis. In PEPM’92, ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Fairmont Hotel, San Francisco, CA, USA, June 19-20, 1992, Proceedings (TR YALEU/DCS/RR-909). 21–28.Google Scholar
- Ulrik Pagh Schultz, Julia L. Lawall, and Charles Consel. 2003. Automatic program specialization for Java. ACM Trans. Program. Lang. Syst. 25, 4 (2003), 452–499. Google Scholar
Digital Library
- Amin Shali and William R. Cook. 2011. Hybrid partial evaluation. In Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2011, part of SPLASH 2011, Portland, OR, USA, October 22 - 27, 2011. 375–390. Google Scholar
Digital Library
- Olin Grigsby Shivers. 1991. Control-flow Analysis of Higher-order Languages of Taming Lambda. Ph.D. Dissertation. Carnegie Mellon University, Pittsburgh, PA, USA. UMI Order No. GAX91-26964.Google Scholar
- Lukas Stadler, Adam Welc, Christian Humer, and Mick Jordan. 2016. Optimizing R language execution via aggressive speculation. In Proceedings of the 12th Symposium on Dynamic Languages, DLS 2016, Amsterdam, The Netherlands, November 1, 2016. 84–95. Google Scholar
Digital Library
- Michel Steuwer, Philipp Kegel, and Sergei Gorlatch. 2011. SkelCL - A Portable Skeleton Library for High-Level GPU Programming. In 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May 2011 - Workshop Proceedings. 1176–1182. Google Scholar
Digital Library
- Michel Steuwer, Toomas Remmelg, and Christophe Dubach. 2017. Lift: a functional data-parallel IR for high-performance GPU code generation. In Proceedings of the 2017 International Symposium on Code Generation and Optimization, CGO 2017, Austin, TX, USA, February 4-8, 2017. 74–85. http://dl.acm.org/citation.cfm?id=3049841 Google Scholar
Digital Library
- John E. Stone, David Gohara, and Guochun Shi. 2010. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems. Computing in Science and Engineering 12, 3 (2010), 66–73. Google Scholar
Digital Library
- Arvind K. Sujeeth, HyoukJoong Lee, Kevin J. Brown, Tiark Rompf, Hassan Chafi, Michael Wu, Anand R. Atreya, Martin Odersky, and Kunle Olukotun. 2011. OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 -July 2, 2011. 609–616. Google Scholar
Digital Library
- Walid Taha and Tim Sheard. 2000. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci. 248, 1-2 (2000), 211–242. Google Scholar
Digital Library
- Sam Tobin-Hochstadt, Vincent St-Amour, Ryan Culpepper, Matthew Flatt, and Matthias Felleisen. 2011. Languages as libraries. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 132–141. Google Scholar
Digital Library
- Todd L. Veldhuizen. 1998. C++ Templates as Partial Evaluation. CoRR cs.PL/9810010 (1998), 1–13. http://arxiv.org/abs/cs. PL/9810010Google Scholar
- Ingo Wald, Sven Woop, Carsten Benthin, Gregory S. Johnson, and Manfred Ernst. 2014. Embree: A Kernel Framework for Efficient CPU Ray Tracing. ACM Transactions on Graphics 33, 4, Article 143 (2014), 8 pages. Google Scholar
Digital Library
- Thomas Würthinger, Christian Wimmer, Christian Humer, Andreas Wöß, Lukas Stadler, Chris Seaton, Gilles Duboscq, Doug Simon, and Matthias Grimmer. 2017. Practical partial evaluation for high-performance dynamic language runtimes. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 662–676. Google Scholar
Digital Library
- Thomas Würthinger, Christian Wimmer, Andreas Wöß, Lukas Stadler, Gilles Duboscq, Christian Humer, Gregor Richards, Doug Simon, and Mario Wolczko. 2013. One VM to rule them all. In ACM Symposium on New Ideas in Programming and Reflections on Software, Onward! 2013, part of SPLASH ’13, Indianapolis, IN, USA, October 26-31, 2013. 187–204. Google Scholar
Digital Library
Index Terms
AnyDSL: a partial evaluation framework for programming high-performance libraries
Recommendations
A Synergetic Approach to Throughput Computing on x86-Based Multicore Desktops
In the era of multicores, many applications that require substantial computing power and data crunching can now run on desktop PCs. However, to achieve the best possible performance, developers must write applications in a way that exploits both ...
All-pairs computations on many-core graphics processors
Developing high-performance applications on emerging multi- and many-core architectures requires efficient mapping techniques and architecture-specific tuning methodologies to realize performance closer to their peak compute capability and memory ...
LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading
LLVM-HPC '16: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPCLLVM has become an integral part of the software-development ecosystem for developing advanced compilers, high-performance computing software and tools. This paper presents a small set of LLVM IR extensions for explicitly parallel vector, and offloading ...





Comments