Abstract
While editing code, it is common for developers to make multiple related repeated edits that are all instances of a more general program transformation. Since this process can be tedious and error-prone, we study the problem of automatically learning program transformations from past edits, which can then be used to predict future edits. We take a novel view of the problem as a semi-supervised learning problem: apart from the concrete edits that are instances of the general transformation, the learning procedure also exploits access to additional inputs (program subtrees) that are marked as positive or negative depending on whether the transformation applies on those inputs. We present a procedure to solve the semi-supervised transformation learning problem using anti-unification and programming-by-example synthesis technology. To eliminate reliance on access to marked additional inputs, we generalize the semi-supervised learning procedure to a feedback-driven procedure that also generates the marked additional inputs in an iterative loop. We apply these ideas to build and evaluate three applications that use different mechanisms for generating feedback. Compared to existing tools that learn program transformations from edits, our feedback-driven semi-supervised approach is vastly more effective in successfully predicting edits with significantly lesser amounts of past edit data.
Supplemental Material
- R. Alur, R. Bodik, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In 2013 Formal Methods in Computer-Aided Design, pages 1-8, 2013.Google Scholar
Cross Ref
- R. Alur, M. M. K. Martin, M. Raghothaman, C. Stergiou, S. Tripakis, and A. Udupa. Synthesizing finite-state protocols from scenarios and requirements. In E. Yahav, editor, 10th International Haifa Verification Conference, volume 8855 of Lecture Notes in Computer Science, pages 75-91. Springer, 2014. doi: 10.1007/978-3-319-13338-6_7. URL https: //doi.org/10.1007/978-3-319-13338-6_7. Google Scholar
Cross Ref
- R. Alur, P. Černy`, and A. Radhakrishna. Synthesis through unification. In International Conference on Computer Aided Verification, pages 163-179. Springer, 2015.Google Scholar
Cross Ref
- R. Alur, A. Radhakrishna, and A. Udupa. Scaling enumerative program synthesis via divide and conquer. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 319-336. Springer, 2017.Google Scholar
Cross Ref
- S. An, R. Singh, S. Misailovic, and R. Samanta. Augmented example-based synthesis using relational perturbation properties. Proceedings of the ACM on Programming Languages, 4 (POPL): 1-24, 2019.Google Scholar
- J. Bader, A. Scott, M. Pradel, and S. Chandra. Getafix: Learning to fix bugs automatically. Proc. ACM Program. Lang., 3 (OOPSLA), Oct. 2019. doi: 10.1145/3360585. URL https://doi.org/10.1145/3360585. Google Scholar
Digital Library
- A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler. A few billion lines of code later. Commun. ACM, 53 ( 2 ), 2010. ISSN 0001-0782. doi: 10.1145/1646353.1646374. URL https://doi.org/10.1145/1646353.1646374. Google Scholar
Digital Library
- J. R. Buchi and L. H. Landweber. Solving sequential conditions by finite-state strategies. Transactions of the American Mathematical Society, 138 : 295-311, 1969. ISSN 00029947. URL http://www.jstor.org/stable/1994916.Google Scholar
Cross Ref
- P. Cerný, K. Chatterjee, T. A. Henzinger, A. Radhakrishna, and R. Singh. Quantitative synthesis for concurrent programs. In G. Gopalakrishnan and S. Qadeer, editors, 23rd International Conference on Computer Aided Verification (CAV), volume 6806 of Lecture Notes in Computer Science, pages 243-259. Springer, 2011. doi: 10.1007/978-3-642-22110-1_20. URL https://doi.org/10.1007/978-3-642-22110-1_20. Google Scholar
Cross Ref
- P. Cerný, T. A. Henzinger, A. Radhakrishna, L. Ryzhyk, and T. Tarrach. Eficient synthesis for concurrency by semanticspreserving transformations. In N. Sharygina and H. Veith, editors, 25th International Conference on Computer Aided Verification (CAV), volume 8044 of Lecture Notes in Computer Science, pages 951-967. Springer, 2013. doi: 10.1007/978-3-642-39799-8_68. URL https://doi.org/10.1007/978-3-642-39799-8_68. Google Scholar
Cross Ref
- Eclipse Foundation. Eclipse. At https://www.eclipse.org/, 2020.Google Scholar
- W. S. Evans, C. W. Fraser, and F. Ma. Clone detection via structural abstraction. Software Quality Journal, 17 ( 4 ): 309-330, 2009.Google Scholar
Digital Library
- J. K. Feser, S. Chaudhuri, and I. Dillig. Synthesizing data structure transformations from input-output examples. In D. Grove and S. Blackburn, editors, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015, pages 229-239. ACM, 2015. doi: 10.1145/2737924.2737977. URL https://doi.org/10.1145/2737924.2737977. Google Scholar
Digital Library
- J. Frankle, P. Osera, D. Walker, and S. Zdancewic. Example-directed synthesis: a type-theoretic interpretation. In R. Bodík and R. Majumdar, editors, Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20-22, 2016, pages 802-815. ACM, 2016. doi: 10.1145/2837614. 2837629. URL https://doi.org/10.1145/2837614.2837629. Google Scholar
Digital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '11). ACM New York, NY, USA, 2011.Google Scholar
Digital Library
- T. Gvero, V. Kuncak, I. Kuraj, and R. Piskac. Complete completion using types and weights. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation, pages 27-38, 2013.Google Scholar
Digital Library
- K. Huang, X. Qiu, P. Shen, and Y. Wang. Reconciling enumerative and deductive program synthesis. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1159-1174, 2020.Google Scholar
Digital Library
- JetBrains. IntelliJ. At https://www.jetbrains.com/idea/, 2020a.Google Scholar
- JetBrains. ReSharper. At https://www.jetbrains.com/resharper/, 2020b.Google Scholar
- S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In 2010 ACM/IEEE 32nd International Conference on Software Engineering, volume 1, pages 215-224. IEEE, 2010.Google Scholar
Digital Library
- L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In 29th International Conference on Software Engineering (ICSE'07), pages 96-105. IEEE, 2007.Google Scholar
Digital Library
- M. Kim, T. Zimmermann, and N. Nagappan. A field study of refactoring challenges and benefits. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE '12, pages 50 : 1-50 : 11, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1614-9. doi: 10.1145/2393596.2393655. URL http://doi.acm.org/10.1145/2393596.2393655. Google Scholar
Digital Library
- V. Le, D. Perelman, O. Polozov, M. Raza, A. Udupa, and S. Gulwani. Interactive program synthesis. arXiv preprint arXiv:1703.03539, 2017.Google Scholar
- Z. Manna and R. J. Waldinger. A deductive approach to program synthesis. ACM Transactions on Programming Languages and Systems, 2 ( 1 ): 90-121, 1980. doi: 10.1145/357084.357090. URL https://doi.org/10.1145/357084.357090. Google Scholar
Digital Library
- M. Mayer, G. Soares, M. Grechkin, V. Le, M. Marron, O. Polozov, R. Singh, B. Zorn, and S. Gulwani. User interaction models for disambiguation in programming by example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, pages 291-301, 2015.Google Scholar
Digital Library
- N. Meng, M. Kim, and K. S. McKinley. Systematic editing: generating program transformations from an example. ACM SIGPLAN Notices, 46 ( 6 ): 329-342, 2011.Google Scholar
- N. Meng, M. Kim, and K. S. McKinley. Lase: locating and applying systematic edits by learning from examples. In 2013 35th International Conference on Software Engineering (ICSE), pages 502-511. IEEE, 2013.Google Scholar
Digital Library
- T. Mens and T. Tourwe. A survey of software refactoring. IEEE Transactions on Software Engineering, 30 ( 2 ): 126-139, Feb 2004. ISSN 0098-5589. doi: 10.1109/TSE. 2004. 1265817.Google Scholar
Digital Library
- Microsoft. Visual Studio. At https://www.visualstudio.com, 2019.Google Scholar
- Microsoft. Intellicode suggestions. At https://docs.microsoft.com/en-us/visualstudio/intellicode/intellicode-suggestions, 2020.Google Scholar
- A. Miltner, S. Gulwani, V. Le, A. Leung, A. Radhakrishna, G. Soares, A. Tiwari, and A. Udupa. On the fly synthesis of edit suggestions. Proceedings of the ACM on Programming Languages, 3 (OOPSLA): 1-29, 2019.Google Scholar
Digital Library
- T. M. Mitchell. Generalization as search. Artificial intelligence, 18 ( 2 ): 203-226, 1982.Google Scholar
- H. A. Nguyen, A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, and H. Rajan. A study of repetitiveness of code changes in software evolution. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 180-190. IEEE, 2013.Google Scholar
Digital Library
- W. F. Opdyke. Refactoring Object-oriented Frameworks. PhD thesis, Champaign, IL, USA, 1992. UMI Order No. GAX93-05645.Google Scholar
- D. Perelman, S. Gulwani, T. Ball, and D. Grossman. Type-directed completion of partial expressions. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, pages 275-286, 2012.Google Scholar
Digital Library
- A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 179-190, 1989.Google Scholar
Digital Library
- O. Polozov and S. Gulwani. Flashmeta: a framework for inductive program synthesis. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 107-126, 2015.Google Scholar
Digital Library
- V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 419-428, 2014.Google Scholar
Digital Library
- A. Reynolds, M. Deters, V. Kuncak, C. Tinelli, and C. Barrett. Counterexample-guided quantifier instantiation for synthesis in smt. In International Conference on Computer Aided Verification, pages 198-216. Springer, 2015.Google Scholar
Cross Ref
- R. Rolim, G. Soares, L. D'Antoni, O. Polozov, S. Gulwani, R. Gheyi, R. Suzuki, and B. Hartmann. Learning syntactic program transformations from examples. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pages 404-415. IEEE, 2017.Google Scholar
Digital Library
- R. Rolim, G. Soares, R. Gheyi, T. Barik, and L. D'Antoni. Learning quick fixes from code repositories, 2018.Google Scholar
- R. Singh. Blinkfill: Semi-supervised programming by example for syntactic string transformations. Proc. VLDB Endow., 9 ( 10 ): 816-827, June 2016. ISSN 2150-8097. doi: 10.14778/2977797.2977807. URL https://doi.org/10.14778/2977797.2977807. Google Scholar
Digital Library
- R. Singh and A. Solar-Lezama. Synthesizing data structure manipulations from storyboards. In T. Gyimóthy and A. Zeller, editors, SIGSOFT/FSE'11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC'11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011, pages 289-299. ACM, 2011. doi: 10.1145/2025113.2025153. URL https://doi.org/10.1145/2025113.2025153. Google Scholar
Digital Library
- A. Solar-Lezama, R. M. Rabbah, R. Bodík, and K. Ebcioglu. Programming by sketching for bit-streaming programs. In V. Sarkar and M. W. Hall, editors, Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, Chicago, IL, USA, June 12-15, 2005, pages 281-294. ACM, 2005. doi: 10.1145/1065010.1065045. URL https://doi.org/10.1145/1065010.1065045. Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodik, S. Seshia, and V. Saraswat. Combinatorial sketching for finite programs. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, pages 404-415, 2006.Google Scholar
Digital Library
- A. Solar-Lezama, L. Tancau, R. Bodík, S. A. Seshia, and V. A. Saraswat. Combinatorial sketching for finite programs. In J. P. Shen and M. Martonosi, editors, Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006, pages 404-415. ACM, 2006. doi: 10.1145/1168857.1168907. URL https://doi.org/10.1145/1168857.1168907. Google Scholar
Digital Library
- A. Solar-Lezama, G. Arnold, L. Tancau, R. Bodík, V. A. Saraswat, and S. A. Seshia. Sketching stencils. In J. Ferrante and K. S. McKinley, editors, Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007, pages 167-178. ACM, 2007. doi: 10.1145/1250734.1250754. URL https://doi.org/10.1145/1250734.1250754. Google Scholar
Digital Library
- A. Solar-Lezama, C. G. Jones, and R. Bodík. Sketching concurrent data structures. In R. Gupta and S. P. Amarasinghe, editors, Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, Tucson, AZ, USA, June 7-13, 2008, pages 136-148. ACM, 2008. doi: 10.1145/1375581.1375599. URL https://doi.org/10.1145/1375581.1375599. Google Scholar
Digital Library
- F. Steimann and J. von Pilgrim. Refactorings without names. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012, pages 290-293, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-1204-2. doi: 10.1145/2351676.2351726. URL http://doi.acm.org/10.1145/2351676.2351726. Google Scholar
Digital Library
- A. Udupa, A. Raghavan, J. V. Deshmukh, S. Mador-Haim, M. M. Martin, and R. Alur. Transit: specifying protocols with concolic snippets. ACM SIGPLAN Notices, 48 ( 6 ): 287-296, 2013.Google Scholar
- M. Vakilian, N. Chen, S. Negara, B. A. Rajkumar, B. P. Bailey, and R. E. Johnson. Use, disuse, and misuse of automated refactorings. In 2012 34th International Conference on Software Engineering (ICSE), pages 233-243. IEEE, 2012.Google Scholar
Cross Ref
- M. T. Vechev, E. Yahav, and G. Yorsh. Abstraction-guided synthesis of synchronization. In M. V. Hermenegildo and J. Palsberg, editors, Proceedings of the 37th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2010, Madrid, Spain, January 17-23, 2010, pages 327-338. ACM, 2010. doi: 10.1145/1706299.1706338. URL https: //doi.org/10.1145/1706299.1706338. Google Scholar
Digital Library
- N. Yaghmazadeh, X. Wang, and I. Dillig. Automated migration of hierarchical data to relational tables using programmingby-example. Proc. VLDB Endow., 11 ( 5 ): 580-593, 2018. doi: 10.1145/3187009.3177735. URL http://www.vldb.org/pvldb/ vol11/p580-yaghmazadeh.pdf.Google Scholar
Digital Library
- K. Yessenov, Z. Xu, and A. Solar-Lezama. Data-driven synthesis for object-oriented frameworks. ACM SIGPLAN Notices, 46 ( 10 ): 65-82, 2011.Google Scholar
- H. Zhang, A. Jain, G. Khandelwal, C. Kaushik, S. Ge, and W. Hu. Bing developer assistant: improving developer productivity by recommending sample code. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 956-961, 2016.Google Scholar
Digital Library
- X. Zhu and A. B. Goldberg. Introduction to semi-supervised learning. Synthesis lectures on artificial intelligence and machine learning, 3 ( 1 ): 1-130, 2009.Google Scholar
- X. J. Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2005.Google Scholar
Index Terms
Feedback-driven semi-supervised synthesis of program transformations
Recommendations
Learning syntactic program transformations from examples
ICSE '17: Proceedings of the 39th International Conference on Software EngineeringAutomatic program transformation tools can be valuable for programmers to help them with refactoring tasks, and for Computer Science students in the form of tutoring systems that suggest repairs to programming assignments. However, manually creating ...
Spreadsheet table transformations from examples
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationEvery day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Synthesizing transformations on hierarchically structured data
PLDI '16This paper presents a new approach for synthesizing transformations on tree-structured data, such as Unix directories and XML documents. We consider a general abstraction for such data, called hierarchical data trees (HDTs) and present a novel example-...






Comments