Abstract
A transpiler converts code from one programming language to another. Many practical uses of transpilers require the user to be able to guide or customize the program produced from a given input program. This customizability is important for satisfying many application-specific goals for the produced code such as ensuring performance, readability, ease of exposition or maintainability, compatibility with external environment or analysis tools, and so on. Conventional transpilers are deterministic rule-driven systems often written without offering customizability per user and per program. Recent advances in transpilers based on neural networks offer some customizability to users, e.g. through interactive prompts, but they are still difficult to precisely control the production of a desired output. Both conventional and neural transpilation also suffer from the "last mile" problem: they produce correct code on average, i.e., on most parts of a given program, but not necessarily for all parts of it. We propose a new transpilation approach that offers fine-grained customizability and reusability of transpilation rules created by others, without burdening the user to understand the global semantics of the given source program. Our approach is mostly automatic and incremental, i.e., constructs translation rules needed to transpile the given program as per the user's guidance piece-by-piece. Users can rely on existing transpilation rules to translate most of the program correctly while focusing their effort locally, only on parts that are incorrect or need customization. This improves the correctness of the end result. We implement the transpiler as a tool called DuoGlot, which translates Python to Javascript programs, and evaluate it on the popular GeeksForGeeks benchmarks. DuoGlot achieves 90% translation accuracy and so it outperforms all existing translators (both handcrafted and neural-based), while it produces readable code. We evaluate DuoGlot on two additional benchmarks, containing more challenging and longer programs, and similarly observe improved accuracy compared to the other transpilers.
- Ghazi Alkhatib. 1992. The maintenance problem of application software: An empirical analysis. Journal of Software Maintenance: Research and Practice, 4, 2 (1992), 83–104. https://doi.org/10.1002/smr.4360040203
Google Scholar
Digital Library
- Shay Artzi, Julian Dolby, Frank Tip, and Marco Pistoia. 2010. Practical fault localization for dynamic web applications. In 2010 ACM/IEEE 32nd International Conference on Software Engineering. 1, 265–274. https://doi.org/10.1145/1806799.1806840
Google Scholar
Digital Library
- Johannes Bader, Jonathan Aldrich, and Éric Tanter. 2018. Gradual program verification. In International Conference on Verification, Model Checking, and Abstract Interpretation. 25–46. https://doi.org/10.1007/978-3-319-73721-8_2
Google Scholar
Cross Ref
- Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag Sverre Seljebotn, and Kurt Smith. 2011. Cython: The best of both worlds. Computing in Science & Engineering, 13, 2 (2011), 31–39. https://doi.org/10.1109/MCSE.2010.118
Google Scholar
Digital Library
- Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding typescript. In European Conference on Object-Oriented Programming. 257–281. https://doi.org/10.1007/978-3-662-44202-9_11
Google Scholar
Digital Library
- Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). 33, Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Google Scholar
- Max Brunsfeld. 2018. Atom understands your code better than ever before. https://github.blog/2018-10-31-atoms-new-parsing-system/
Google Scholar
- Trevor Burnham. 2015. Coffeescript: accelerated Javascript development. Pragmatic Bookshelf. https://pragprog.com/titles/tbcoffee2/coffeescript/
Google Scholar
- Raymond PL Buse and Westley R Weimer. 2008. A metric for software readability. In Proceedings of the 2008 international symposium on Software testing and analysis. 121–130. https://doi.org/10.1145/1390630.1390647
Google Scholar
Digital Library
- Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, https://doi.org/10.48550/arXiv.2107.03374
Google Scholar
- Michael L Collard, Michael John Decker, and Jonathan I Maletic. 2013. SrcML: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In 2013 IEEE International Conference on Software Maintenance. 516–519. https://doi.org/10.1109/ICSM.2013.85
Google Scholar
Digital Library
- Hubert Comon, Max Dauchet, Rémi Gilleron, Florent Jacquemard, Denis Lugiez, Christof Löding, Sophie Tison, and Marc Tommasi. 2008. Tree automata techniques and applications. https://hal.inria.fr/hal-03367725
Google Scholar
- James R Cordy. 2006. The TXL source transformation language. Science of Computer Programming, 61, 3 (2006), 190–210.
Google Scholar
Digital Library
- Amin Milani Fard and Ali Mesbah. 2017. JavaScript: The (un) covered parts. In 2017 IEEE international conference on software testing, verification and validation (ICST). 230–240. https://doi.org/10.1109/ICST.2017.28
Google Scholar
Cross Ref
- Bryan Ford. 2004. Parsing expression grammars: a recognition-based syntactic foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 111–122. https://doi.org/10.1145/982962.964011
Google Scholar
Digital Library
- Liang Gong, Michael Pradel, and Koushik Sen. 2015. JITProf: pinpointing JIT-unfriendly JavaScript code. In ESEC/FSE (ESEC/FSE 2015). Association for Computing Machinery, New York, NY, USA. 357–368. isbn:9781450336758 https://doi.org/10.1145/2786805.2786831
Google Scholar
Digital Library
- Google. 2021. TensorFlow 1.x vs TensorFlow 2 - Behaviors and APIs. https://www.tensorflow.org/guide/migrate/tf1_vs_tf2
Google Scholar
- Jonathan Graehl, Kevin Knight, and Jonathan May. 2008. Training tree transducers. Computational Linguistics, 34, 3 (2008), 391–427. https://doi.org/10.1162/coli.2008.07-051-R2-03-57
Google Scholar
Digital Library
- Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. 2017. Program synthesis. Foundations and Trends® in Programming Languages, 4, 1-2 (2017), 1–119. https://doi.org/10.1561/2500000010
Google Scholar
Cross Ref
- Anna Irrera. 2017. Banks scramble to fix old systems as IT ’cowboys’ ride into sunset. https://www.reuters.com/article/us-usa-banks-cobol-idUSKBN17C0D8
Google Scholar
- Marko Ivanković, Goran Petrović, René Just, and Gordon Fraser. 2019. Code coverage at Google. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 955–963. https://doi.org/10.1145/3338906.3340459
Google Scholar
Digital Library
- Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. 2016. Verified lifting of stencil computations. ACM SIGPLAN Notices, 51, 6 (2016), 711–726. https://doi.org/10.1145/2908080.2908117
Google Scholar
Digital Library
- Svetoslav Karaivanov, Veselin Raychev, and Martin Vechev. 2014. Phrase-based statistical translation of programming languages. In Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software. 173–184. https://doi.org/10.1145/2661136.2661148
Google Scholar
Digital Library
- Cody Koeninger. 2020. Cloudflare Workers Announces Broad Language Support. https://blog.cloudflare.com/cloudflare-workers-announces-broad-language-support/
Google Scholar
- Vladimir I Levenshtein. 1966. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady. 10, 707–710.
Google Scholar
- Benjamin Mariano, Yanju Chen, Yu Feng, Greg Durrett, and Işil Dillig. 2022. Automated Transpilation of Imperative to Functional Code Using Neural-Guided Program Synthesis. Proc. ACM Program. Lang., 6, OOPSLA1 (2022), Article 71, apr, 27 pages. https://doi.org/10.1145/3527315
Google Scholar
Digital Library
- Gayle Laakmann McDowell. 2015. Cracking the coding interview: 189 programming questions and solutions. CareerCup, LLC. isbn:9780984782857
Google Scholar
- MetaResearch. 2022. GeeksForGeeks benchmark. https://github.com/facebookresearch/CodeGen/tree/659963b195f5edaa42502d97605386b7fecbaa62/data/transcoder_evaluation_gfg
Google Scholar
- Anders Miltner, Sumit Gulwani, Vu Le, Alan Leung, Arjun Radhakrishna, Gustavo Soares, Ashish Tiwari, and Abhishek Udupa. 2019. On the fly synthesis of edit suggestions. Proceedings of the ACM on Programming Languages, 3, OOPSLA (2019), 1–29. https://doi.org/10.1145/3360569
Google Scholar
Digital Library
- Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2013. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 651–654. https://doi.org/10.1145/2491411.2494584
Google Scholar
Digital Library
- Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2015. Divide-and-conquer approach for multi-phase statistical migration for source code (t). In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). 585–596. https://doi.org/10.1109/ASE.2015.74
Google Scholar
Digital Library
- Trong Duc Nguyen, Anh Tuan Nguyen, and Tien N Nguyen. 2016. Mapping API elements for code migration with vector representations. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). 756–758. https://doi.org/10.1145/2889160.2892661
Google Scholar
Digital Library
- Frolin S Ocariza Jr, Guanpeng Li, Karthik Pattabiraman, and Ali Mesbah. 2016. Automatic fault localization for client-side JavaScript. Software Testing, Verification and Reliability, 26, 1 (2016), 69–88. https://doi.org/10.1002/stvr.1576
Google Scholar
Digital Library
- Erik Pasternak, Rachel Fenichel, and Andrew N. Marshall. 2017. Tips for creating a block language with blockly. In 2017 IEEE Blocks and Beyond Workshop (B&B). 21–24. https://doi.org/10.1109/BLOCKS.2017.8120404
Google Scholar
Cross Ref
- Mateusz Pawlik and Nikolaus Augsten. 2016. Tree edit distance: Robust and memory-efficient. Information Systems, 56 (2016), 157–173. https://doi.org/10.1016/j.is.2015.08.004
Google Scholar
Digital Library
- Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: a framework for inductive program synthesis. In Object-Oriented Programming, Systems, Languages & Applications (OOPSLA). 50, 107–126. https://doi.org/10.1145/2858965.2814310
Google Scholar
Digital Library
- Reudismam Rolim, Gustavo Soares, Loris D’Antoni, Oleksandr Polozov, Sumit Gulwani, Rohit Gheyi, Ryo Suzuki, and Björn Hartmann. 2017. Learning syntactic program transformations from examples. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 404–415. https://doi.org/10.1109/ICSE.2017.44
Google Scholar
Digital Library
- Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. Advances in Neural Information Processing Systems, 33 (2020), 20601–20611. https://proceedings.neurips.cc/paper/2020/file/ed23fbf18c2cd35f8c7f8de44f85c08d-Paper.pdf
Google Scholar
- Baptiste Roziere, Jie M Zhang, Francois Charton, Mark Harman, Gabriel Synnaeve, and Guillaume Lample. 2021. Leveraging Automated Unit Tests for Unsupervised Code Translation. arXiv preprint arXiv:2110.06773, https://doi.org/10.48550/arXiv.2110.06773
Google Scholar
- Shiqi Shen, Aashish Kolluri, Zhen Dong, Prateek Saxena, and Abhik Roychoudhury. 2021. Localizing Vulnerabilities Statistically From One Exploit. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. 537–549. https://doi.org/10.1145/3433210.3437528
Google Scholar
Digital Library
- Michael Sipser. 1996. Introduction to the Theory of Computation. ACM Sigact News, 27, 1 (1996), 27–29.
Google Scholar
Digital Library
- Andrey A Terekhov and Chris Verhoef. 2000. The realities of language conversions. IEEE Software, 17, 6 (2000), 111–124. https://doi.org/10.1109/52.895180
Google Scholar
Digital Library
- Guido van Rossum. 2009. What’s New In Python 3.0. https://docs.python.org/release/3.0.1/whatsnew/3.0.html
Google Scholar
- Guido van Rossum, Pablo Galindo, and Lysandros Nikolaou. 2020. PEP 617 – New PEG parser for CPython | peps.python.org. https://peps.python.org/pep-0617/
Google Scholar
- Bo Wang. 2023. DuoGlot: A User-Customizable Code Translator. https://github.com/HALOCORE/DuoGlot
Google Scholar
- Bo Wang, Aashish Kolluri, Ivica Nikolić, Teodora Baluta, and Prateek Saxena. 2023. DuoGlot: User-Customizable Transpilation of Scripting Languages (Artifact). https://doi.org/10.5281/zenodo.7709003 This artifact is also available on Github: https://github.com/HALOCORE/DuoGlot
Google Scholar
Digital Library
- Gerald M Weinberg. 1971. The psychology of computer programming. 29, Van Nostrand Reinhold New York. isbn:978-0-442-29264-5
Google Scholar
- Paul R Wellin, Richard J Gaylord, and Samuel N Kamin. 2005. An introduction to programming with Mathematica®. Cambridge University Press. https://doi.org/10.1017/CBO9780511801303
Google Scholar
Cross Ref
- Chen Xinyun, Liu Chang, and Song Dawn. 2018. Tree-to-tree neural networks for program translation. NeurIPS, https://proceedings.neurips.cc/paper/2018/file/d759175de8ea5b1d9a2660e45554894f-Paper.pdf
Google Scholar
- Hongyu Zhai, Casey Casalnuovo, and Prem Devanbu. 2019. Test coverage in python programs. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 116–120. https://doi.org/10.1109/MSR.2019.00027
Google Scholar
Digital Library
Index Terms
User-Customizable Transpilation of Scripting Languages






Comments