ABSTRACT
Domain-specific languages (DSLs) are able to provide intuitive high-level abstractions that are easy to work with while attaining better performance than general-purpose languages. Yet, implementing new DSLs is a burdensome task. As a result, new DSLs are usually embedded in general-purpose languages. While low-level languages like C or C++ often provide better performance as a host than high-level languages like Python, high-level languages are becoming more prevalent in many domains due to their ease and flexibility. Here, we present Codon, a domain-extensible compiler and DSL framework for high-performance DSLs with Python's syntax and semantics. Codon builds on previous work on ahead-of-time type checking and compilation of Python programs and leverages a novel intermediate representation to easily incorporate domain-specific optimizations and analyses. We showcase and evaluate several compiler extensions and DSLs for Codon targeting various domains, including bioinformatics, secure multi-party computation, block-based data compression and parallel programming, showing that Codon DSLs can provide benefits of familiar high-level languages and achieve performance typically only seen with low-level languages, thus bridging the gap between performance and usability.
Supplemental Material
Available for Download
Appendices A–D
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org
Google Scholar
- Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf
Google Scholar
Digital Library
- Anaconda. 2018. Numba. https://numba.pydata.org/
Google Scholar
- Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. 2007. RPython: A Step towards Reconciling Dynamically and Statically Typed OO Languages. In Proceedings of the 2007 Symposium on Dynamic Languages (DLS ’07). Association for Computing Machinery, New York, NY, USA. 53–64. isbn:9781595938688 https://doi.org/10.1145/1297081.1297091
Google Scholar
Digital Library
- Anonymous. 2023. Hierarchical Multi-Dimensional Arrays and its Implementation in the CoLa Domain Specific Language for Block-Based Data Compression. In Under submission to CGO’23.
Google Scholar
- John Aycock. 2000. Aggressive Type Inference. In Proceedings of the 8th International Python Conference. 1050, 18.
Google Scholar
- Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, 193–205. isbn:9781728114361
Google Scholar
Cross Ref
- Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, Piscataway, NJ, USA. 193–205. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314896
Google Scholar
Cross Ref
- Hans-Juergen Boehm and Mark Weiser. 1988. Garbage Collection in an Uncooperative Environment. Softw. Pract. Exper., 18, 9 (1988), Sept., 807–820. issn:0038-0644 https://doi.org/10.1002/spe.4380180902
Google Scholar
Digital Library
- Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, and Armin Rigo. 2009. Tracing the Meta-level: PyPy’s Tracing JIT Compiler. In Proceedings of the 4th Workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems (ICOOOLPS ’09). ACM, New York, NY, USA. 18–25. isbn:978-1-60558-541-3 https://doi.org/10.1145/1565824.1565827
Google Scholar
Digital Library
- Ajay Brahmakshatriya and Saman Amarasinghe. 2021. BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 39–51. https://doi.org/10.1109/CGO51591.2021.9370333
Google Scholar
Digital Library
- K. J. Brown, A. K. Sujeeth, H. J. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In 2011 International Conference on Parallel Architectures and Compilation Techniques. 89–100. https://doi.org/10.1109/PACT.2011.15
Google Scholar
Digital Library
- Tyler A. Cabutto, Sean P. Heeney, Shaun V. Ault, Guifen Mao, and Jin Wang. 2018. An Overview of the Julia Programming Language. In Proceedings of the 2018 International Conference on Computing and Big Data (ICCBD ’18). Association for Computing Machinery, New York, NY, USA. 87–91. isbn:9781450365406 https://doi.org/10.1145/3277104.3277119
Google Scholar
Digital Library
- Brett Cannon. 2005. Localized Type Inference of Atomic Types in Python.
Google Scholar
- Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A Domain-specific Approach to Heterogeneous Parallelism. SIGPLAN Not., 46, 8 (2011), Feb., 35–46. issn:0362-1340 https://doi.org/10.1145/2038037.1941561
Google Scholar
Digital Library
- Zhifei Chen, Yanhui Li, Bihuan Chen, Wanwangying Ma, Lin Chen, and Baowen Xu. 2020. An empirical study on dynamic typing related practices in python systems. In Proceedings of the 28th International Conference on Program Comprehension. 83–93.
Google Scholar
Digital Library
- Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: a parallel DSL for image analysis and visualization. In Acm sigplan notices. 47, 111–120.
Google Scholar
- Hyunghoon Cho, David J. Wu, and Bonnie Berger. 2018. Secure genome-wide association analysis using multiparty computation. Nature Biotechnology, 36, 6 (2018), 01 Jul, 547–551. issn:1546-1696 https://doi.org/10.1038/nbt.4108
Google Scholar
Cross Ref
- Ronald Cramer, Ivan Bjerre Damgård, and Jesper Buus Nielsen. 2015. Secure Multiparty Computation and Secret Sharing. Cambridge University Press. https://doi.org/10.1017/CBO9781107337756
Google Scholar
Cross Ref
- Werner Dietl, Stephanie Dietzel, Michael D. Ernst, Kivanç Muşlu, and Todd W. Schiller. 2011. Building and Using Pluggable Type-Checkers. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). Association for Computing Machinery, New York, NY, USA. 681–690. isbn:9781450304450 https://doi.org/10.1145/1985793.1985889
Google Scholar
Digital Library
- Eugene Duboviy. [n. d.]. pybenchmark. https://github.com/duboviy/pybenchmark
Google Scholar
- Mark Dufour. 2006. Shed skin: An optimizing python-to-c++ compiler. Master’s thesis. Delft University of Technology.
Google Scholar
- Torbjörn Ekman and Görel Hedin. 2007. The jastadd extensible java compiler. In Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems, languages and applications. 1–18.
Google Scholar
Digital Library
- Michael Furr, Jong-hoon An, and Jeffrey S Foster. 2009. Profile-guided static typing for dynamic scripting languages. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 283–300.
Google Scholar
Digital Library
- Charles R. Harris, K. Jarrod Millman, St’efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern’andez del R’ıo, Mark Wiebe, Pearu Peterson, Pierre G’erard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature, 585, 7825 (2020), Sept., 357–362. https://doi.org/10.1038/s41586-020-2649-2
Google Scholar
Cross Ref
- K Hayen. 2012. Nuitka. http://nuitka.net
Google Scholar
- Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures. ACM Trans. Graph., 38, 6 (2019), Article 201, Nov., 16 pages. issn:0730-0301 https://doi.org/10.1145/3355089.3356506
Google Scholar
Digital Library
- Independent JPEG Group. 2022. JPEG software. https://ijg.org/
Google Scholar
- Joint Video Team. 2009. JM software (v19.0). http://iphome.hhi.de/suehring/
Google Scholar
- Ted Kaminski, Lucas Kramer, Travis Carlson, and Eric Van Wyk. 2017. Reliable and automatic composition of language extensions to C: the ableC extensible language framework. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–29.
Google Scholar
Digital Library
- Milod Kazerounian, Brianna M. Ren, and Jeffrey S. Foster. 2020. Sound, Heuristic Type Annotation Inference for Ruby. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020). Association for Computing Machinery, New York, NY, USA. 112–125. isbn:9781450381758 https://doi.org/10.1145/3426422.3426985
Google Scholar
Digital Library
- Marcel Keller. 2020. MP-SPDZ: A versatile framework for multi-party computation. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1575–1590.
Google Scholar
Digital Library
- Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017. Taco: A tool to generate tensor algebra kernels. In Proc. IEEE/ACM Automated Software Engineering. 943–948.
Google Scholar
Cross Ref
- Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David IW Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M Kaufman, Gurtej Kanwar, and Wojciech Matusik. 2016. Simit: A language for physical simulation. ACM Transactions on Graphics (TOG), 35, 2 (2016), 20.
Google Scholar
Digital Library
- C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.. Palo Alto, California. 75–86. https://doi.org/10.1109/CGO.2004.1281665
Google Scholar
Cross Ref
- Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore’s Law. arxiv:2002.11054.
Google Scholar
- Didier Le Botlan and Didier Rémy. 2014. MLF: raising ML to the power of System F. ACM SIGPLAN Notices, 49, 4S (2014), 52–63.
Google Scholar
Digital Library
- Jukka Antero Lehtosalo. 2015. Adapting dynamic object-oriented languages to mixed dynamic and static typing. Ph. D. Dissertation. University of Cambridge.
Google Scholar
- Roland Leiß a, Klaas Boesche, Sebastian Hack, Arsène Pérard-Gayot, Richard Membarth, Philipp Slusallek, André Müller, and Bertil Schmidt. 2018. AnyDSL: A Partial Evaluation Framework for Programming High-Performance Libraries. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 119, Oct., 30 pages. https://doi.org/10.1145/3276489
Google Scholar
Digital Library
- Heng Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arxiv:1303.3997.
Google Scholar
- Manas. 2023. Crystal. https://crystal-lang.org/
Google Scholar
- John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA: Is CUDA the Parallel Programming Model That Application Developers Have Been Waiting For? Queue, 6, 2 (2008), March, 40–53. issn:1542-7730 https://doi.org/10.1145/1365490.1365500
Google Scholar
Digital Library
- Gor Nishanov. 2017. ISO/IEC TS 22277:2017. https://www.iso.org/standard/73008.html
Google Scholar
- Matthew M. Papi, Mahmood Ali, Telmo Luis Correa, Jeff H. Perkins, and Michael D. Ernst. 2008. Practical Pluggable Types for Java. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA ’08). Association for Computing Machinery, New York, NY, USA. 201–212. isbn:9781605580500 https://doi.org/10.1145/1390630.1390656
Google Scholar
Digital Library
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Google Scholar
Digital Library
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12 (2011), 2825–2830.
Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48, 6 (2013), 519–530.
Google Scholar
Digital Library
- Jaak Randmets. 2017. Programming languages for secure multi-party computation application development. Ph. D. Dissertation. PhD Thesis, University of Tartu.
Google Scholar
- Brianna M Ren and Jeffrey S Foster. 2016. Just-in-time static type checking for dynamic languages. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. 462–476.
Google Scholar
Digital Library
- Richter, Thomas. 2022. libjpeg. https://github.com/thorfdbg/libjpeg
Google Scholar
- Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.
Google Scholar
- Berry Schoenmakers. 2018. MPyC—Python package for secure multiparty computation. In Workshop on the Theory and Practice of MPC. https://github. com/lschoe/mpyc.
Google Scholar
- Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: A High-Performance Language for Bioinformatics. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 125, Oct., 29 pages. https://doi.org/10.1145/3360551
Google Scholar
Digital Library
- Ariya Shajii, Ibrahim Numanagić, Alexander T Leighton, Haley Greenyer, Saman Amarasinghe, and Bonnie Berger. 2021. A Python-based programming language for high-performance computational genomics. Nature Biotechnology, 39, 9 (2021), 1062–1064. issn:1546-1696 https://doi.org/10.1038/s41587-021-00985-6
Google Scholar
Cross Ref
- Haris Smajlović, Ariya Shajii, Bonnie Berger, Hyunghoon Cho, and Ibrahim Numanagić. 2023. Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biology, 24, 1 (2023), 1–18.
Google Scholar
Cross Ref
- Stack Overflow. 2022. Stack Overflow Developer Survey 2022. https://survey.stackoverflow.co/2022/
Google Scholar
- Rust Team. 2013. The MIR. https://rust-lang.org
Google Scholar
- Rajan Walia, Chung chieh Shan, and Sam Tobin-Hochstadt. 2020. Sham: A DSL for Fast DSLs. arxiv:2005.09028.
Google Scholar
- Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steve W. K. Tjiang, Shih-Wei Liao, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennessy. 1994. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. SIGPLAN Not., 29, 12 (1994), Dec., 31–37. issn:0362-1340 https://doi.org/10.1145/193209.193217
Google Scholar
Digital Library
- Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-performance Graph DSL. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 121, Oct., 30 pages. issn:2475-1421 https://doi.org/10.1145/3276491
Google Scholar
Digital Library
Index Terms
Codon: A Compiler for High-Performance Pythonic Applications and DSLs
Recommendations
A Multi-target, Multi-paradigm DSL Compiler for Algorithmic Graph Processing
SLE 2022: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language EngineeringDomain-specific language compilers need to close the gap between the domain abstractions of the language and the low-level concepts of the target platform.This can be challenging to achieve for compilers targeting multiple platforms with potentially ...
Compiler generation for performance-oriented embedded DSLs (short paper)
GPCE 2019: Proceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesIn this paper, we present a framework for generating optimizing compilers for performance-oriented embedded DSLs (EDSLs). This framework provides facilities to automatically generate the boilerplate code required for building DSL compilers on top of the ...
Reflections on LMS: exploring front-end alternatives
SCALA 2016: Proceedings of the 2016 7th ACM SIGPLAN Symposium on ScalaMetaprogramming techniques to generate code at runtime in a general-purpose meta-language have seen a surge of interest in recent years, driven by the widening performance gap between high-level languages and emerging hardware platforms. In the context ...






Comments