skip to main content
10.1145/3578360.3580275acmconferencesArticle/Chapter ViewAbstractPublication PagesccConference Proceedingsconference-collections
research-article
Open Access

Codon: A Compiler for High-Performance Pythonic Applications and DSLs

Published:17 February 2023Publication History

ABSTRACT

Domain-specific languages (DSLs) are able to provide intuitive high-level abstractions that are easy to work with while attaining better performance than general-purpose languages. Yet, implementing new DSLs is a burdensome task. As a result, new DSLs are usually embedded in general-purpose languages. While low-level languages like C or C++ often provide better performance as a host than high-level languages like Python, high-level languages are becoming more prevalent in many domains due to their ease and flexibility. Here, we present Codon, a domain-extensible compiler and DSL framework for high-performance DSLs with Python's syntax and semantics. Codon builds on previous work on ahead-of-time type checking and compilation of Python programs and leverages a novel intermediate representation to easily incorporate domain-specific optimizations and analyses. We showcase and evaluate several compiler extensions and DSLs for Codon targeting various domains, including bioinformatics, secure multi-party computation, block-based data compression and parallel programming, showing that Codon DSLs can provide benefits of familiar high-level languages and achieve performance typically only seen with low-level languages, thus bridging the gap between performance and usability.

Skip Supplemental Material Section

Supplemental Material

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org Google ScholarGoogle Scholar
  2. Martin Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 265–283. https://www.usenix.org/system/files/conference/osdi16/osdi16-abadi.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anaconda. 2018. Numba. https://numba.pydata.org/ Google ScholarGoogle Scholar
  4. Davide Ancona, Massimo Ancona, Antonio Cuni, and Nicholas D. Matsakis. 2007. RPython: A Step towards Reconciling Dynamically and Statically Typed OO Languages. In Proceedings of the 2007 Symposium on Dynamic Languages (DLS ’07). Association for Computing Machinery, New York, NY, USA. 53–64. isbn:9781595938688 https://doi.org/10.1145/1297081.1297091 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Anonymous. 2023. Hierarchical Multi-Dimensional Arrays and its Implementation in the CoLa Domain Specific Language for Block-Based Data Compression. In Under submission to CGO’23. Google ScholarGoogle Scholar
  6. John Aycock. 2000. Aggressive Type Inference. In Proceedings of the 8th International Python Conference. 1050, 18. Google ScholarGoogle Scholar
  7. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, 193–205. isbn:9781728114361 Google ScholarGoogle ScholarCross RefCross Ref
  8. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2019). IEEE Press, Piscataway, NJ, USA. 193–205. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314896 Google ScholarGoogle ScholarCross RefCross Ref
  9. Hans-Juergen Boehm and Mark Weiser. 1988. Garbage Collection in an Uncooperative Environment. Softw. Pract. Exper., 18, 9 (1988), Sept., 807–820. issn:0038-0644 https://doi.org/10.1002/spe.4380180902 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Carl Friedrich Bolz, Antonio Cuni, Maciej Fijalkowski, and Armin Rigo. 2009. Tracing the Meta-level: PyPy’s Tracing JIT Compiler. In Proceedings of the 4th Workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems (ICOOOLPS ’09). ACM, New York, NY, USA. 18–25. isbn:978-1-60558-541-3 https://doi.org/10.1145/1565824.1565827 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ajay Brahmakshatriya and Saman Amarasinghe. 2021. BuildIt: A Type-Based Multi-stage Programming Framework for Code Generation in C++. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 39–51. https://doi.org/10.1109/CGO51591.2021.9370333 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. J. Brown, A. K. Sujeeth, H. J. Lee, T. Rompf, H. Chafi, M. Odersky, and K. Olukotun. 2011. A Heterogeneous Parallel Framework for Domain-Specific Languages. In 2011 International Conference on Parallel Architectures and Compilation Techniques. 89–100. https://doi.org/10.1109/PACT.2011.15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tyler A. Cabutto, Sean P. Heeney, Shaun V. Ault, Guifen Mao, and Jin Wang. 2018. An Overview of the Julia Programming Language. In Proceedings of the 2018 International Conference on Computing and Big Data (ICCBD ’18). Association for Computing Machinery, New York, NY, USA. 87–91. isbn:9781450365406 https://doi.org/10.1145/3277104.3277119 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Brett Cannon. 2005. Localized Type Inference of Atomic Types in Python. Google ScholarGoogle Scholar
  15. Hassan Chafi, Arvind K. Sujeeth, Kevin J. Brown, HyoukJoong Lee, Anand R. Atreya, and Kunle Olukotun. 2011. A Domain-specific Approach to Heterogeneous Parallelism. SIGPLAN Not., 46, 8 (2011), Feb., 35–46. issn:0362-1340 https://doi.org/10.1145/2038037.1941561 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhifei Chen, Yanhui Li, Bihuan Chen, Wanwangying Ma, Lin Chen, and Baowen Xu. 2020. An empirical study on dynamic typing related practices in python systems. In Proceedings of the 28th International Conference on Program Comprehension. 83–93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Charisee Chiw, Gordon Kindlmann, John Reppy, Lamont Samuels, and Nick Seltzer. 2012. Diderot: a parallel DSL for image analysis and visualization. In Acm sigplan notices. 47, 111–120. Google ScholarGoogle Scholar
  18. Hyunghoon Cho, David J. Wu, and Bonnie Berger. 2018. Secure genome-wide association analysis using multiparty computation. Nature Biotechnology, 36, 6 (2018), 01 Jul, 547–551. issn:1546-1696 https://doi.org/10.1038/nbt.4108 Google ScholarGoogle ScholarCross RefCross Ref
  19. Ronald Cramer, Ivan Bjerre Damgård, and Jesper Buus Nielsen. 2015. Secure Multiparty Computation and Secret Sharing. Cambridge University Press. https://doi.org/10.1017/CBO9781107337756 Google ScholarGoogle ScholarCross RefCross Ref
  20. Werner Dietl, Stephanie Dietzel, Michael D. Ernst, Kivanç Muşlu, and Todd W. Schiller. 2011. Building and Using Pluggable Type-Checkers. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). Association for Computing Machinery, New York, NY, USA. 681–690. isbn:9781450304450 https://doi.org/10.1145/1985793.1985889 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Eugene Duboviy. [n. d.]. pybenchmark. https://github.com/duboviy/pybenchmark Google ScholarGoogle Scholar
  22. Mark Dufour. 2006. Shed skin: An optimizing python-to-c++ compiler. Master’s thesis. Delft University of Technology. Google ScholarGoogle Scholar
  23. Torbjörn Ekman and Görel Hedin. 2007. The jastadd extensible java compiler. In Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems, languages and applications. 1–18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Furr, Jong-hoon An, and Jeffrey S Foster. 2009. Profile-guided static typing for dynamic scripting languages. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications. 283–300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Charles R. Harris, K. Jarrod Millman, St’efan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fern’andez del R’ıo, Mark Wiebe, Pearu Peterson, Pierre G’erard-Marchant, Kevin Sheppard, Tyler Reddy, Warren Weckesser, Hameer Abbasi, Christoph Gohlke, and Travis E. Oliphant. 2020. Array programming with NumPy. Nature, 585, 7825 (2020), Sept., 357–362. https://doi.org/10.1038/s41586-020-2649-2 Google ScholarGoogle ScholarCross RefCross Ref
  26. K Hayen. 2012. Nuitka. http://nuitka.net Google ScholarGoogle Scholar
  27. Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Frédo Durand. 2019. Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures. ACM Trans. Graph., 38, 6 (2019), Article 201, Nov., 16 pages. issn:0730-0301 https://doi.org/10.1145/3355089.3356506 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Independent JPEG Group. 2022. JPEG software. https://ijg.org/ Google ScholarGoogle Scholar
  29. Joint Video Team. 2009. JM software (v19.0). http://iphome.hhi.de/suehring/ Google ScholarGoogle Scholar
  30. Ted Kaminski, Lucas Kramer, Travis Carlson, and Eric Van Wyk. 2017. Reliable and automatic composition of language extensions to C: the ableC extensible language framework. Proceedings of the ACM on Programming Languages, 1, OOPSLA (2017), 1–29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Milod Kazerounian, Brianna M. Ren, and Jeffrey S. Foster. 2020. Sound, Heuristic Type Annotation Inference for Ruby. In Proceedings of the 16th ACM SIGPLAN International Symposium on Dynamic Languages (DLS 2020). Association for Computing Machinery, New York, NY, USA. 112–125. isbn:9781450381758 https://doi.org/10.1145/3426422.3426985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Marcel Keller. 2020. MP-SPDZ: A versatile framework for multi-party computation. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 1575–1590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, and Saman Amarasinghe. 2017. Taco: A tool to generate tensor algebra kernels. In Proc. IEEE/ACM Automated Software Engineering. 943–948. Google ScholarGoogle ScholarCross RefCross Ref
  34. Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David IW Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M Kaufman, Gurtej Kanwar, and Wojciech Matusik. 2016. Simit: A language for physical simulation. ACM Transactions on Graphics (TOG), 35, 2 (2016), 20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.. Palo Alto, California. 75–86. https://doi.org/10.1109/CGO.2004.1281665 Google ScholarGoogle ScholarCross RefCross Ref
  36. Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Oleksandr Zinenko. 2020. MLIR: A Compiler Infrastructure for the End of Moore’s Law. arxiv:2002.11054. Google ScholarGoogle Scholar
  37. Didier Le Botlan and Didier Rémy. 2014. MLF: raising ML to the power of System F. ACM SIGPLAN Notices, 49, 4S (2014), 52–63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jukka Antero Lehtosalo. 2015. Adapting dynamic object-oriented languages to mixed dynamic and static typing. Ph. D. Dissertation. University of Cambridge. Google ScholarGoogle Scholar
  39. Roland Leiß a, Klaas Boesche, Sebastian Hack, Arsène Pérard-Gayot, Richard Membarth, Philipp Slusallek, André Müller, and Bertil Schmidt. 2018. AnyDSL: A Partial Evaluation Framework for Programming High-Performance Libraries. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 119, Oct., 30 pages. https://doi.org/10.1145/3276489 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Heng Li. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arxiv:1303.3997. Google ScholarGoogle Scholar
  41. Manas. 2023. Crystal. https://crystal-lang.org/ Google ScholarGoogle Scholar
  42. John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron. 2008. Scalable Parallel Programming with CUDA: Is CUDA the Parallel Programming Model That Application Developers Have Been Waiting For? Queue, 6, 2 (2008), March, 40–53. issn:1542-7730 https://doi.org/10.1145/1365490.1365500 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Gor Nishanov. 2017. ISO/IEC TS 22277:2017. https://www.iso.org/standard/73008.html Google ScholarGoogle Scholar
  44. Matthew M. Papi, Mahmood Ali, Telmo Luis Correa, Jeff H. Perkins, and Michael D. Ernst. 2008. Practical Pluggable Types for Java. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA ’08). Association for Computing Machinery, New York, NY, USA. 201–212. isbn:9781605580500 https://doi.org/10.1145/1390630.1390656 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d' Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12 (2011), 2825–2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48, 6 (2013), 519–530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jaak Randmets. 2017. Programming languages for secure multi-party computation application development. Ph. D. Dissertation. PhD Thesis, University of Tartu. Google ScholarGoogle Scholar
  49. Brianna M Ren and Jeffrey S Foster. 2016. Just-in-time static type checking for dynamic languages. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. 462–476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Richter, Thomas. 2022. libjpeg. https://github.com/thorfdbg/libjpeg Google ScholarGoogle Scholar
  51. Michael Salib. 2004. Starkiller: A static type inferencer and compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology. Google ScholarGoogle Scholar
  52. Berry Schoenmakers. 2018. MPyC—Python package for secure multiparty computation. In Workshop on the Theory and Practice of MPC. https://github. com/lschoe/mpyc. Google ScholarGoogle Scholar
  53. Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: A High-Performance Language for Bioinformatics. Proc. ACM Program. Lang., 3, OOPSLA (2019), Article 125, Oct., 29 pages. https://doi.org/10.1145/3360551 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ariya Shajii, Ibrahim Numanagić, Alexander T Leighton, Haley Greenyer, Saman Amarasinghe, and Bonnie Berger. 2021. A Python-based programming language for high-performance computational genomics. Nature Biotechnology, 39, 9 (2021), 1062–1064. issn:1546-1696 https://doi.org/10.1038/s41587-021-00985-6 Google ScholarGoogle ScholarCross RefCross Ref
  55. Haris Smajlović, Ariya Shajii, Bonnie Berger, Hyunghoon Cho, and Ibrahim Numanagić. 2023. Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biology, 24, 1 (2023), 1–18. Google ScholarGoogle ScholarCross RefCross Ref
  56. Stack Overflow. 2022. Stack Overflow Developer Survey 2022. https://survey.stackoverflow.co/2022/ Google ScholarGoogle Scholar
  57. Rust Team. 2013. The MIR. https://rust-lang.org Google ScholarGoogle Scholar
  58. Rajan Walia, Chung chieh Shan, and Sam Tobin-Hochstadt. 2020. Sham: A DSL for Fast DSLs. arxiv:2005.09028. Google ScholarGoogle Scholar
  59. Robert P. Wilson, Robert S. French, Christopher S. Wilson, Saman P. Amarasinghe, Jennifer M. Anderson, Steve W. K. Tjiang, Shih-Wei Liao, Chau-Wen Tseng, Mary W. Hall, Monica S. Lam, and John L. Hennessy. 1994. SUIF: An Infrastructure for Research on Parallelizing and Optimizing Compilers. SIGPLAN Not., 29, 12 (1994), Dec., 31–37. issn:0362-1340 https://doi.org/10.1145/193209.193217 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yunming Zhang, Mengjiao Yang, Riyadh Baghdadi, Shoaib Kamil, Julian Shun, and Saman Amarasinghe. 2018. GraphIt: A High-performance Graph DSL. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 121, Oct., 30 pages. issn:2475-1421 https://doi.org/10.1145/3276491 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Codon: A Compiler for High-Performance Pythonic Applications and DSLs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Article Metrics

          • Downloads (Last 12 months)3,989
          • Downloads (Last 6 weeks)140

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!