Abstract
Domain-specific languages provide a promising path to automatically compile high-level code to parallel, heterogeneous, and distributed hardware. However, in practice high performance DSLs still require considerable software expertise to develop and force users into tool-chains that hinder prototyping and debugging. To address these problems, we present Forge, a new meta DSL for declaratively specifying high performance embedded DSLs. Forge provides DSL authors with high-level abstractions (e.g., data structures, parallel patterns, effects) for specifying their DSL in a way that permits high performance. From this high-level specification, Forge automatically generates both a naïve Scala library implementation of the DSL and a high performance version using the Delite DSL framework. Users of a Forge-generated DSL can prototype their application using the library version, and then switch to the Delite version to run on multicore CPUs, GPUs, and clusters without changing the application code. Forge-generated Delite DSLs perform within 2x of hand-optimized C++ and up to 40x better than Spark, an alternative high-level distributed programming environment. Compared to a manually implemented Delite DSL, Forge provides a factor of 3-6x reduction in lines of code and does not sacrifice any performance. Furthermore, Forge specifications can be generated from existing Scala libraries, are easy to maintain, shield DSL developers from changes in the Delite framework, and enable DSLs to be retargeted to other frameworks transparently.
- Apache. Hadoop. http://hadoop.apache.org/.Google Scholar
- E. Axelsson, K. Claessen, M. Sheeran, J. Svenningsson, D. Engdal, and A. Persson. The Design and Implementation of Feldspar: An Embedded Language for Digital Signal Processing. IFL'10, 2011. Google Scholar
Digital Library
- M. Bravenboer, K. T. Kalleberg, R. Vermaas, and E. Visser. Stratego/XT 0.17. A language and toolset for program transformation. Sci. Comput. Program., 72(1-2):52--70, June 2008. Google Scholar
Digital Library
- K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. A Heterogeneous Parallel Framework for Domain-Specific Languages. PACT, 2011. Google Scholar
Digital Library
- K. J. Brown, A. K. Sujeeth, H. Lee, T. Rompf, C. D. Sa, M. Odersky, and K. Olukotun. Big Data Analytics with Delite. http://ppl.stanford.edu/papers/delitescaladays13.pdf, 2013.Google Scholar
- C. Calcagno, W. Taha, L. Huang, and X. Leroy. Implementing Multistage Languages Using ASTs, Gensym, and Reflection. GPCE, 2003. Google Scholar
Digital Library
- J. Carette, O. Kiselyov, and C. chieh Shan. Finally tagless, partially evaluated: Tagless staged interpreters for simpler typed languages. J. Funct. Program., 19(5):509--543, 2009. Google Scholar
Digital Library
- B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: compiling an embedded data parallel language. PPoPP, 2011. Google Scholar
Digital Library
- H. Chafi, Z. DeVito, A. Moors, T. Rompf, A. K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun. Language Virtualization for Heterogeneous Parallel Computing. Onward!, 2010.Google Scholar
- H. Chafi, A. K. Sujeeth, K. J. Brown, H. Lee, A. R. Atreya, and K. Olukotun. A domain-specific approach to heterogeneous parallelism. PPoPP, 2011. Google Scholar
Digital Library
- C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. PLDI, 2010. Google Scholar
Digital Library
- R. Chandra, L. Dagum, D. Kohr, D. Maydan, J. McDonald, and R. Menon. Parallel programming in OpenMP. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001. Google Scholar
Digital Library
- A. Danial. CLOC - Count Lines of Code. Open source, 2009.Google Scholar
- Z. DeVito, N. Joubert, F. Palacios, S. Oakley, M. Medina, M. Barrientos, E. Elsen, F. Ham, A. Aiken, K. Duraisamy, E. Darve, J. Alonso, and P. Hanrahan. Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. SC, 2011. Google Scholar
Digital Library
- S. Erdweg, L. C. Kats, T. Rendel, C. Kästner, K. Ostermann, and E. Visser. SugarJ: library-based language extensibility. OOPSLA, 2011. Google Scholar
Digital Library
- M. Eysholdt and H. Behrens. Xtext: implement your language faster than the quick and dirty way. SPLASH '10, 2010. Google Scholar
Digital Library
- T. E. Foundation. Eclipse Modeling Framework Project (EMF). http://www.eclipse.org/modeling/emf/, 2013.Google Scholar
- Google. Protocol Buffers Data Interchange Format. http://code.google.com/p/protobuf, 2011.Google Scholar
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. NSDI, 2011. Google Scholar
Digital Library
- C. Hofer, K. Ostermann, T. Rendel, and A. Moors. Polymorphic embedding of DSLs. GPCE, 2008. Google Scholar
Digital Library
- S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for Easy and Efficient Graph Analysis. ASPLOS, 2012. Google Scholar
Digital Library
- P. Hudak. Building domain-specific embedded languages. ACM Computing Surveys, 28, 1996. Google Scholar
Digital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. EuroSys, 2007. Google Scholar
Digital Library
- JetBrains. Meta Programming System. http://www.jetbrains.com/mps/, 2009.Google Scholar
- S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: interactive visual specification of data transformation scripts. CHI '11, 2011. Google Scholar
Digital Library
- L. C. Kats and E. Visser. The spoofax language workbench: rules for declarative specification of languages and IDEs. OOPSLA '10, 2010. Google Scholar
Digital Library
- E. Meijer, B. Beckman, and G. Bierman. LINQ: Reconciling Object, Relations and XML in the .NET framework. SIGMOD, 2006. Google Scholar
Digital Library
- P.-A. Muller, F. Fleurey, D. Vojtisek, Z. Drey, D. Pollet, F. Fondement, P. Studer, J.-M. Jézéquel, et al. On executable meta-languages applied to model transformations. MTiP, 2005.Google Scholar
- NVIDIA. CUDA. http://developer.nvidia.com/object/cuda.html.Google Scholar
- N. Nystrom, D. White, and K. Das. Firepile: run-time compilation for GPUs in scala. GPCE, 2011. Google Scholar
Digital Library
- T. Rompf and M. Odersky. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs. Commun. ACM, 55(6):121--130, 2012. Google Scholar
Digital Library
- T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-Virtualized: Linguistic Reuse for Deep Embeddings. Higher-Order and Symbolic Computation (Special issue for PEPM'12, to appear).Google Scholar
- T. Rompf, A. K. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing Data Structures in High-Level Programs. POPL, 2013. Google Scholar
Digital Library
- A. K. Sujeeth, H. Lee, K. J. Brown, T. Rompf, M. Wu, A. R. Atreya, M. Odersky, and K. Olukotun. OptiML: an Implicitly Parallel Domain-Specific Language for Machine Learning. ICML, 2011.Google Scholar
- A. K. Sujeeth, T. Rompf, K. J. Brown, H. Lee, H. Chafi, V. Popic, M. Wu, A. Prokopec, V. Jovanovic, M. Odersky, and K. Olukotun. Composition and Reuse with Compiled Domain-Specific Languages. ECOOP, 2013. Google Scholar
Digital Library
- H. Sutter. The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software. Dr. Dobb's Journal, 30(3):202--210, 2005.Google Scholar
- W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theor. Comput. Sci., 248(1-2):211--242, 2000. Google Scholar
Digital Library
- The Khronos Group. OpenCL 1.0. http://www.khronos.org/opencl/.Google Scholar
- S. Tobin-Hochstadt, V. St-Amour, R. Culpepper, M. Flatt, and M. Felleisen. Languages as libraries. PLDI '11, 2011. Google Scholar
Digital Library
- Typesafe. Simple Build Tool. http://www.scalasbt.org.Google Scholar
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. NSDI, 2011. Google Scholar
Digital Library
Index Terms
Forge: generating a high performance DSL implementation from a declarative specification
Recommendations
Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages
Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)Developing high-performance software is a difficult task that requires the use of low-level, architecture-specific programming models (e.g., OpenMP for CMPs, CUDA for GPUs, MPI for clusters). It is typically not possible to write a single application ...
Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs
GPCE '10: Proceedings of the ninth international conference on Generative programming and component engineeringSoftware engineering demands generality and abstraction, performance demands specialization and concretization. Generative programming can provide both, but the effort required to develop high-quality program generators likely offsets their benefits, ...
Scala-virtualized
PEPM '12: Proceedings of the ACM SIGPLAN 2012 workshop on Partial evaluation and program manipulationScala-Virtualized extends the Scala language to better support hosting embedded DSLs. Embedding a DSL in Scala-Virtualized comes with all the benefits of a shallow embedding thanks to Scala's flexible syntax, without giving up analyzing and manipulating ...







Comments