Abstract
While many mainstream languages such as Java, Python, and C# increasingly incorporate functional APIs to simplify programming and improve parallelization/performance, there are no effective techniques that can be used to automatically translate existing imperative code to functional variants using these APIs. Motivated by this problem, this paper presents a transpilation approach based on inductive program synthesis for modernizing existing code. Our method is based on the observation that the overwhelming majority of source/target programs in this setting satisfy an assumption that we call trace-compatibility: not only do the programs share syntactically identical low-level expressions, but these expressions also take the same values in corresponding execution traces. Our method leverages this observation to design a new neural-guided synthesis algorithm that (1) uses a novel neural architecture called cognate grammar network (CGN) and (2) leverages a form of concolic execution to prune partial programs based on intermediate values that arise during a computation. We have implemented our approach in a tool called NGST2 and use it to translate imperative Java and Python code to functional variants that use the Stream and functools APIs respectively. Our experiments show that NGST2 significantly outperforms several baselines and that our proposed neural architecture and pruning techniques are vital for achieving good results.
- Karan Aggarwal, Mohammad Salameh, and Abram Hindle. 2015. Using machine translation for converting Python 2 to Python 3 code. PeerJ PrePrints. https://doi.org/10.7287/peerj.preprints.1459v1 Google Scholar
Cross Ref
- Maaz Bin Safeer Ahmad and Alvin Cheung. 2018. Automatically leveraging mapreduce frameworks for data-intensive applications. In Proceedings of the 2018 International Conference on Management of Data. ACM, New York, NY, USA. 1205–1220. https://doi.org/10.1145/3183713.3196891 Google Scholar
Digital Library
- Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive program synthesis. In International conference on computer aided verification. Springer, New York, NY, USA. 934–950. https://doi.org/10.1007/978-3-642-39799-8_67 Google Scholar
Cross Ref
- Rajeev Alur, Pavol Černỳ, and Arjun Radhakrishna. 2015. Synthesis through unification. In International Conference on Computer Aided Verification. Springer, New York, NY, USA. 163–179. https://doi.org/10.1007/978-3-319-21668-3_10 Google Scholar
Cross Ref
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proceedings of the International Conference on Learning Representations (ICLR). International Conference on Learning Representations (ICLR), La Jolla, CA, USA. 1–15.Google Scholar
- Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. Deepcoder: Learning to write programs. https://doi.org/10.48550/arXiv.1611.01989 arxiv:1611.01989.Google Scholar
- Sahil Bhatia, Pushmeet Kohli, and Rishabh Singh. 2018. Neuro-symbolic program corrector for introductory programming assignments. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, New York, NY, USA. 60–70. https://doi.org/10.1145/3180155.3180219 Google Scholar
Digital Library
- Qiaochu Chen, Aaron Lamoreaux, Xinyu Wang, Greg Durrett, Osbert Bastani, and Isil Dillig. 2021. Web question answering with neurosymbolic program synthesis. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. ACM, New York, NY, USA. 328–343. https://doi.org/10.1145/3453483.3454047 Google Scholar
Digital Library
- Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation. arxiv:1802.03691.Google Scholar
- Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. Optimizing database-backed applications with query synthesis. ACM SIGPLAN Notices, 48, 6 (2013), 3–14. https://doi.org/10.1145/2499370.2462180 Google Scholar
Digital Library
- Patrick Cousot and Radhia Cousot. 1992. Abstract interpretation frameworks. Journal of logic and computation, 2, 4 (1992), 511–547. https://doi.org/10.1093/logcom/2.4.511 Google Scholar
Cross Ref
- Patrick Cousot and Radhia Cousot. 1994. Higher-order abstract interpretation (and application to comportment analysis generalizing strictness, termination, projection and PER analysis of functional languages). In Proceedings of 1994 IEEE International Conference on Computer Languages (ICCL’94). IEEE, New York, NY, USA. 95–112. https://doi.org/10.1109/ICCL.1994.288389 Google Scholar
Cross Ref
- Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. 2020. Dreamcoder: Growing generalizable, interpretable knowledge with wake-sleep bayesian program learning. https://doi.org/10.48550/arXiv.2006.08381 arxiv:2006.08381.Google Scholar
- Azadeh Farzan and Victor Nicolet. 2017. Synthesis of divide and conquer parallelism for loops. ACM SIGPLAN Notices, 52, 6 (2017), 540–555. https://doi.org/10.1145/3140587.3062355 Google Scholar
Digital Library
- Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning. ACM SIGPLAN Notices, 53, 4 (2018), 420–435. https://doi.org/10.1145/3296979.3192382 Google Scholar
Digital Library
- John K Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing data structure transformations from input-output examples. ACM SIGPLAN Notices, 50, 6 (2015), 229–239. https://doi.org/10.1145/2813885.2737977 Google Scholar
Digital Library
- Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. Li. 2016. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 1631–1640. https://doi.org/10.18653/v1/P16-1154 Google Scholar
Cross Ref
- Alex Gyori, Lyle Franklin, Danny Dig, and Jan Lahoda. 2013. Crossing the gap from imperative to functional programming through refactoring. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, New York, NY, USA. 543–553. https://doi.org/10.1145/2491411.2491461 Google Scholar
Digital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput., 9, 8 (1997), Nov., 1735–1780. issn:0899-7667 https://doi.org/10.1162/neco.1997.9.8.1735 Google Scholar
Digital Library
- Paul Hudak and Jonathan Young. 1991. Collecting interpretations of expressions. ACM Transactions on Programming Languages and Systems (TOPLAS), 13, 2 (1991), 269–290. https://doi.org/10.1145/103135.103139 Google Scholar
Digital Library
- Robin Jia and Percy Liang. 2016. Data Recombination for Neural Semantic Parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany. 12–22. https://doi.org/10.18653/v1/P16-1002 Google Scholar
Cross Ref
- Shoaib Kamil, Alvin Cheung, Shachar Itzhaky, and Armando Solar-Lezama. 2016. Verified lifting of stencil computations. ACM SIGPLAN Notices, 51, 6 (2016), 711–726. https://doi.org/10.1145/2980983.2908117 Google Scholar
Digital Library
- Raffi Khatchadourian, Yiming Tang, and Mehdi Bagherzadeh. 2020. Safe automated refactoring for intelligent parallelization of Java 8 streams. Science of Computer Programming, 195 (2020), 102476. https://doi.org/10.1016/j.scico.2020.102476 Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR). International Conference on Learning Representations (ICLR), La Jolla, CA, USA. 1–15.Google Scholar
- Nikita Kitaev and Dan Klein. 2018. Constituency Parsing with a Self-Attentive Encoder. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia. 2676–2686. https://doi.org/10.18653/v1/P18-1249 Google Scholar
Cross Ref
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, and Richard Zens. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th annual meeting of the association for computational linguistics companion volume proceedings of the demo and poster sessions. Association for Computational Linguistics, La Jolla, CA, USA. 177–180.Google Scholar
Cross Ref
- Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. https://doi.org/10.48550/arXiv.2006.03511 arxiv:2006.03511.Google Scholar
- Jian Li, Yue Wang, Michael R. Lyu, and Irwin King. 2018. Code Completion with Neural Attention and Pointer Networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, New York, NY, USA. 4159–4165. https://doi.org/10.24963/ijcai.2018/578 Google Scholar
Cross Ref
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal. 1412–1421. https://doi.org/10.18653/v1/D15-1166 Google Scholar
Cross Ref
- Benjamin Mariano, Yanju Chen, Yu Feng, Greg Durrett, and Isil Dillig. 2022. Automated Transpilation of Imperative to Functional Code using Neural-Guided Program Synthesis (Extended Version). https://doi.org/10.48550/arXiv.2203.09452 arxiv:2203.09452.Google Scholar
- Benjamin Mariano, Yanju Chen, Yu Feng, Shuvendu K Lahiri, and Isil Dillig. 2020. Demystifying Loops in Smart Contracts. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, New York, NY, USA. 262–274.Google Scholar
- Maxwell I Nye, Armando Solar-Lezama, Joshua B Tenenbaum, and Brenden M Lake. 2020. Learning compositional rules via neural program synthesis. https://doi.org/10.48550/arXiv.2003.05562 arxiv:2003.05562.Google Scholar
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, Sanjoy Dasgupta and David McAllester (Eds.) (Proceedings of Machine Learning Research, Vol. 28). PMLR, Atlanta, Georgia, USA. 1310–1318. http://proceedings.mlr.press/v28/pascanu13.htmlGoogle Scholar
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program synthesis from polymorphic refinement types. ACM SIGPLAN Notices, 51, 6 (2016), 522–538. https://doi.org/10.1145/2980983.2908093 Google Scholar
Digital Library
- Maxim Rabinovich, Mitchell Stern, and Dan Klein. 2017. Abstract syntax networks for code generation and semantic parsing. https://doi.org/10.48550/arXiv.1704.07535 arxiv:1704.07535.Google Scholar
- Cosmin Radoi, Stephen J Fink, Rodric Rabbah, and Manu Sridharan. 2014. Translating imperative code to MapReduce. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications. ACM, New York, NY, USA. 909–927. https://doi.org/10.1145/2660193.2660228 Google Scholar
Digital Library
- Veselin Raychev, Madanlal Musuvathi, and Todd Mytkowicz. 2015. Parallelizing user-defined aggregations using symbolic execution. In Proceedings of the 25th Symposium on Operating Systems Principles. ACM, New York, NY, USA. 153–167. https://doi.org/10.1145/2815400.2815418 Google Scholar
Digital Library
- Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada. 1073–1083. https://doi.org/10.18653/v1/P17-1099 Google Scholar
Cross Ref
- Koushik Sen, Darko Marinov, and Gul Agha. 2005. CUTE: A concolic unit testing engine for C. ACM SIGSOFT Software Engineering Notes, 30, 5 (2005), 263–272. https://doi.org/10.1145/1095430.1081750 Google Scholar
Digital Library
- Calvin Smith and Aws Albarghouthi. 2016. MapReduce program synthesis. Acm Sigplan Notices, 51, 6 (2016), 326–340. https://doi.org/10.1145/2980983.2908102 Google Scholar
Digital Library
- Armando Solar-Lezama, Gilad Arnold, Liviu Tancau, Rastislav Bodik, Vijay Saraswat, and Sanjit Seshia. 2007. Sketching stencils. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, USA. 167–178. https://doi.org/10.1145/1250734.1250754 Google Scholar
Digital Library
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15, 56 (2014), 1929–1958. http://jmlr.org/papers/v15/srivastava14a.htmlGoogle Scholar
Digital Library
- Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. https://doi.org/10.48550/arXiv.1409.3215 arxiv:1409.3215.Google Scholar
- Emina Torlak and Rastislav Bodik. 2013. Growing solver-aided languages with Rosette. In Proceedings of the 2013 ACM international symposium on New ideas, new paradigms, and reflections on programming & software. ACM, New York, NY, USA. 135–152. https://doi.org/10.1145/2509578.2509586 Google Scholar
Digital Library
- Xi Ye, Qiaochu Chen, Isil Dillig, and Greg Durrett. 2020. Optimal Neural Program Synthesis from Multimodal Specifications. https://doi.org/10.48550/arXiv.2010.01678 arxiv:2010.01678.Google Scholar
Index Terms
Automated transpilation of imperative to functional code using neural-guided program synthesis
Recommendations
Can reactive synthesis and syntax-guided synthesis be friends?
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationWhile reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...
Provenance-guided synthesis of Datalog programs
We propose a new approach to synthesize Datalog programs from input-output specifications. Our approach leverages query provenance to scale the counterexample-guided inductive synthesis (CEGIS) procedure for program synthesis. In each iteration of the ...
Can reactive synthesis and syntax-guided synthesis be friends?
SPLASH Companion 2021: Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for HumanityWhile reactive synthesis and syntax-guided synthesis (SyGuS) have seen enormous progress in recent years, combining the two approaches has remained a challenge. In this work, we present the synthesis of reactive programs from Temporal Stream Logic ...






Comments