skip to main content
research-article
Open Access
Artifacts Available / v1.1

Generalizable synthesis through unification

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

The generalizability of PBE solvers is the key to the empirical synthesis performance. Despite the importance of generalizability, related studies on PBE solvers are still limited. In theory, few existing solvers provide theoretical guarantees on generalizability, and in practice, there is a lack of PBE solvers with satisfactory generalizability on important domains such as conditional linear integer arithmetic (CLIA). In this paper, we adopt a concept from the computational learning theory, Occam learning, and perform a comprehensive study on the framework of synthesis through unification (STUN), a state-of-the-art framework for synthesizing programs with nested if-then-else operators. We prove that Eusolver, a state-of-the-art STUN solver, does not satisfy the condition of Occam learning, and then we design a novel STUN solver, PolyGen, of which the generalizability is theoretically guaranteed by Occam learning. We evaluate PolyGen on the domains of CLIA and demonstrate that PolyGen significantly outperforms two state-of-the-art PBE solvers on CLIA, Eusolver and Euphony, on both generalizability and efficiency.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

In this paper, we adopt a concept from the computational learning theory, Occam learning, and perform a comprehensive study on the framework of synthesis through unification (STUN), a state-of-the-art framework for synthesizing programs with nested if-then-else operators. We prove that Eusolver, a state-of-the-art STUN solver, does not satisfy the condition of Occam learning, and then we design a novel STUN solver, PolyGen, of which the generalizability is theoretically guaranteed by Occam learning. We evaluate PolyGen on the domains of CLIA and demonstrate that PolyGen significantly outperforms two state-of-the-art PBE solvers on CLIA, Eusolver and Euphony, on both generalizability and efficiency.

References

  1. David J. Aldous and Umesh V. Vazirani. 1995. A Markovian Extension of Valiant’s Learning Model. Inf. Comput., 117, 2 (1995), 181–186. https://doi.org/10.1006/inco.1995.1037 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013. 1–8. http://ieeexplore.ieee.org/document/6679385/Google ScholarGoogle Scholar
  3. Rajeev Alur, Pavol Cerný, and Arjun Radhakrishna. 2015. Synthesis Through Unification. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part II. 163–179. https://doi.org/10.1007/978-3-319-21668-3_10 Google ScholarGoogle ScholarCross RefCross Ref
  4. Rajeev Alur, Dana Fisman, Saswat Padhi, Rishabh Singh, and Abhishek Udupa. 2019. SyGuS-Comp 2018: Results and Analysis. CoRR, abs/1904.07146 (2019), arxiv:1904.07146. arxiv:1904.07146Google ScholarGoogle Scholar
  5. Rajeev Alur, Arjun Radhakrishna, and Abhishek Udupa. 2017. Scaling Enumerative Program Synthesis via Divide and Conquer. In Tools and Algorithms for the Construction and Analysis of Systems - 23rd International Conference, TACAS 2017, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2017, Uppsala, Sweden, April 22-29, 2017, Proceedings, Part I. 319–336. https://doi.org/10.1007/978-3-662-54577-5_18 Google ScholarGoogle Scholar
  6. Dana Angluin and Philip D. Laird. 1987. Learning From Noisy Examples. Mach. Learn., 2, 4 (1987), 343–370. https://doi.org/10.1007/BF00116829 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. https://openreview.net/forum?id=ByldLrqlxGoogle ScholarGoogle Scholar
  8. Tim Blazytko, Moritz Contag, Cornelius Aschermann, and Thorsten Holz. 2017. Syntia: Synthesizing the Semantics of Obfuscated Code. In 26th USENIX Security Symposium, USENIX Security 2017, Vancouver, BC, Canada, August 16-18, 2017. 643–659. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/blazytko Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. 1987. Occam’s Razor. Inf. Process. Lett., 24, 6 (1987), 377–380. https://doi.org/10.1016/0020-0190(87)90114-1 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yanju Chen, Ruben Martins, and Yu Feng. 2019. Maximal multi-layer specification synthesis. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019. 602–612. https://doi.org/10.1145/3338906.3338951 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Vasek Chvátal. 1979. A Greedy Heuristic for the Set-Covering Problem. Math. Oper. Res., 4, 3 (1979), 233–235. https://doi.org/10.1287/moor.4.3.233 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. William W. Cohen. 1995. Pac-Learning Recursive Logic Programs: Efficient Algorithms. J. Artif. Intell. Res., 2 (1995), 501–539. https://doi.org/10.1613/jair.97 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William W. Cohen. 1995. Pac-learning Recursive Logic Programs: Negative Results. J. Artif. Intell. Res., 2 (1995), 541–573. https://doi.org/10.1613/jair.1917 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Robin David, Luigi Coniglio, and Mariano Ceccato. 2020. QSynth-A Program Synthesis based Approach for Binary Code Deobfuscation. In BAR 2020 Workshop.Google ScholarGoogle ScholarCross RefCross Ref
  15. Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, 14th International Conference, TACAS 2008, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2008, Budapest, Hungary, March 29-April 6, 2008. Proceedings. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, and Pushmeet Kohli. 2017. RobustFill: Neural Program Learning under Noisy I/O. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. 990–998. http://proceedings.mlr.press/v70/devlin17a.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Samuel Drews, Aws Albarghouthi, and Loris D’Antoni. 2019. Efficient Synthesis with Probabilistic Constraints. In Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I. 278–296. https://doi.org/10.1007/978-3-030-25540-4_15 Google ScholarGoogle ScholarCross RefCross Ref
  18. Saso Dzeroski, Stephen Muggleton, and Stuart J. Russell. 1992. PAC-Learnability of Determinate Logic Programs. In Proceedings of the Fifth Annual ACM Conference on Computational Learning Theory, COLT 1992, Pittsburgh, PA, USA, July 27-29, 1992. 128–135. https://doi.org/10.1145/130385.130399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. 2001. Dynamically Discovering Likely Program Invariants to Support Program Evolution. IEEE Trans. Software Eng., 27, 2 (2001), 99–123. https://doi.org/10.1109/32.908957 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Azadeh Farzan and Victor Nicolet. 2017. Synthesis of divide and conquer parallelism for loops. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18-23, 2017. 540–555. https://doi.org/10.1145/3062341.3062355 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011. 317–330. https://doi.org/10.1145/1926385.1926423 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. LLC Gurobi Optimization. 2021. Gurobi Optimizer Reference Manual. http://www.gurobi.comGoogle ScholarGoogle Scholar
  23. Thomas R. Hancock, Tao Jiang, Ming Li, and John Tromp. 1995. Lower Bounds on Learning Decision Lists and Trees (Extended Abstract). In STACS 95, 12th Annual Symposium on Theoretical Aspects of Computer Science, Munich, Germany, March 2-4, 1995, Proceedings. 527–538. https://doi.org/10.1007/3-540-59042-0_102 Google ScholarGoogle ScholarCross RefCross Ref
  24. Qinheping Hu, John Cyphert, Loris D’Antoni, and Thomas W. Reps. 2020. Exact and approximate methods for proving unrealizability of syntax-guided synthesis problems. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020. 1128–1142. https://doi.org/10.1145/3385412.3385979 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kangjing Huang, Xiaokang Qiu, Peiyuan Shen, and Yanjun Wang. 2020. Reconciling enumerative and deductive program synthesis. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020. 1159–1174. https://doi.org/10.1145/3385412.3386027 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Susmit Jha, Sumit Gulwani, Sanjit A Seshia, and Ashish Tiwari. 2010. Oracle-guided component-based program synthesis. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 215–224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Susmit Jha and Sanjit A. Seshia. 2017. A theory of formal synthesis via inductive learning. Acta Informatica, 54, 7 (2017), 693–726. https://doi.org/10.1007/s00236-017-0294-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ruyi Ji, Jingjing Liang, Yingfei Xiong, Lu Zhang, and Zhenjiang Hu. 2020. Question selection for interactive program synthesis. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 1143–1158. https://doi.org/10.1145/3385412.3386025 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ruyi Ji, Yican Sun, Yingfei Xiong, and Zhenjiang Hu. 2020. Guiding dynamic programing via structural probability for accelerating programming by example. Proc. ACM Program. Lang., 4, OOPSLA (2020), 224:1–224:29. https://doi.org/10.1145/3428292 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ruyi Ji, Jingtao Xia, Yingfei Xiong, and Zhenjiang Hu. 2021. Artifact for OOPSLA’21: Generalizable Synthesis Through Unification. https://doi.org/10.5281/zenodo.5499720 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, and Sumit Gulwani. 2018. Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. https://openreview.net/forum?id=rywDjg-RWGoogle ScholarGoogle Scholar
  32. Michael J. Kearns and Ming Li. 1988. Learning in the Presence of Malicious Errors (Extended Abstract). In Proceedings of the 20th Annual ACM Symposium on Theory of Computing, May 2-4, 1988, Chicago, Illinois, USA. 267–280. https://doi.org/10.1145/62212.62238 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michael J. Kearns and Robert E. Schapire. 1994. Efficient Distribution-Free Learning of Probabilistic Concepts. J. Comput. Syst. Sci., 48, 3 (1994), 464–497. https://doi.org/10.1016/S0022-0000(05)80062-5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jinwoo Kim, Qinheping Hu, Loris D’Antoni, and Thomas W. Reps. 2021. Semantics-guided synthesis. Proc. ACM Program. Lang., 5, POPL (2021), 1–32. https://doi.org/10.1145/3434311 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Tessa A. Lau, Steven A. Wolfman, Pedro M. Domingos, and Daniel S. Weld. 2003. Programming by Demonstration Using Version Space Algebra. Mach. Learn., 53, 1-2 (2003), 111–156. https://doi.org/10.1023/A:1025671410623 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xuan-Bach D. Le, Duc-Hiep Chu, David Lo, Claire Le Goues, and Willem Visser. 2017. S3: syntax- and semantic-guided repair synthesis via programming by examples. In ESEC/FSE. 593–604. https://doi.org/10.1145/3106237.3106309 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Woosuk Lee, Kihong Heo, Rajeev Alur, and Mayur Naik. 2018. Accelerating search-based program synthesis using learned probabilistic models. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, Philadelphia, PA, USA, June 18-22, 2018. 436–449. https://doi.org/10.1145/3192366.3192410 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Percy Liang, Michael I. Jordan, and Dan Klein. 2010. Learning Programs: A Hierarchical Bayesian Approach. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel. 639–646. https://icml.cc/Conferences/2010/papers/568.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Benjamin G. Zorn, and Sumit Gulwani. 2015. User Interaction Models for Disambiguation in Programming by Example. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology, UIST 2015, Charlotte, NC, USA, November 8-11, 2015, Celine Latulipe, Bjoern Hartmann, and Tovi Grossman (Eds.). ACM, 291–301. https://doi.org/10.1145/2807442.2807459 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Sergey Mechtaev, Alberto Griggio, Alessandro Cimatti, and Abhik Roychoudhury. 2018. Symbolic execution with existential second-order constraints. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 389–399. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for Simple Program Repairs. In ICSE. 448–458. https://doi.org/10.1109/ICSE.2015.63 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sergey Mechtaev, Jooyong Yi, and Abhik Roychoudhury. 2015. DirectFix: Looking for Simple Program Repairs. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1. 448–458. https://doi.org/10.1109/ICSE.2015.63 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Aditya Krishna Menon, Omer Tamuz, Sumit Gulwani, Butler W. Lampson, and Adam Kalai. 2013. A Machine Learning Framework for Programming by Example. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013. 187–195. http://proceedings.mlr.press/v28/menon13.html Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Kazutaka Morita, Akimasa Morihata, Kiminori Matsuzaki, Zhenjiang Hu, and Masato Takeichi. 2007. Automatic inversion generates divide-and-conquer parallel programs. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, San Diego, California, USA, June 10-13, 2007. 146–155. https://doi.org/10.1145/1250734.1250752 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Dana Moshkovitz. 2011. The Projection Games Conjecture and The NP-Hardness of ln n-Approximating Set-Cover. Electron. Colloquium Comput. Complex., 18 (2011), 112. http://eccc.hpi-web.de/report/2011/112Google ScholarGoogle Scholar
  46. B. K. Natarajan. 1993. Occam’s Razor for Functions. In Proceedings of the Sixth Annual ACM Conference on Computational Learning Theory, COLT 1993, Santa Cruz, CA, USA, July 26-28, 1993. 370–376. https://doi.org/10.1145/168304.168380 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Saswat Padhi, Prateek Jain, Daniel Perelman, Oleksandr Polozov, Sumit Gulwani, and Todd D. Millstein. 2018. FlashProfile: a framework for synthesizing data profiles. PACMPL, 2, OOPSLA (2018), 150:1–150:28. https://doi.org/10.1145/3276520 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Saswat Padhi and Todd D. Millstein. 2017. Data-Driven Loop Invariant Inference with Automatic Feature Synthesis. CoRR, abs/1707.02029 (2017), arxiv:1707.02029. arxiv:1707.02029Google ScholarGoogle Scholar
  49. J. Ross Quinlan. 1986. Induction of Decision Trees. Mach. Learn., 1, 1 (1986), 81–106. https://doi.org/10.1023/A:1022643204877 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Veselin Raychev, Pavol Bielik, Martin T. Vechev, and Andreas Krause. 2016. Learning programs from noisy data. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016. 761–774. https://doi.org/10.1145/2837614.2837671 Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mohammad Raza and Sumit Gulwani. 2018. Disjunctive Program Synthesis: A Robust Approach to Programming by Example. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 1403–1412. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17055Google ScholarGoogle Scholar
  52. Andrew Reynolds, Haniel Barbosa, Andres Nötzli, Clark W. Barrett, and Cesare Tinelli. 2019. cvc4sy: Smart and Fast Term Enumeration for Syntax-Guided Synthesis. In Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part II. 74–83. https://doi.org/10.1007/978-3-030-25543-5_5 Google ScholarGoogle ScholarCross RefCross Ref
  53. Ronald L. Rivest. 1987. Learning Decision Lists. Mach. Learn., 2, 3 (1987), 229–246. https://doi.org/10.1007/BF00058680 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. David E. Shaw, William R. Swartout, and C. Cordell Green. 1975. Inferring LISP Programs From Examples. In Advance Papers of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, Georgia, USSR, September 3-8, 1975. 260–267. http://ijcai.org/Proceedings/75/Papers/037.pdf Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Rishabh Singh and Sumit Gulwani. 2015. Predicting a Correct Program in Programming by Example. In Computer Aided Verification - 27th International Conference, CAV 2015, San Francisco, CA, USA, July 18-24, 2015, Proceedings, Part I. 398–414. https://doi.org/10.1007/978-3-319-21690-4_23 Google ScholarGoogle ScholarCross RefCross Ref
  56. Armando Solar-Lezama, Liviu Tancau, Rastislav Bodík, Sanjit A. Seshia, and Vijay A. Saraswat. 2006. Combinatorial sketching for finite programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2006, San Jose, CA, USA, October 21-25, 2006. 404–415. https://doi.org/10.1145/1168857.1168907 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Leslie G. Valiant. 1984. A Theory of the Learnable. Commun. ACM, 27, 11 (1984), 1134–1142. https://doi.org/10.1145/1968.1972 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Interactive Query Synthesis from Input-Output Examples. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 1631–1634. https://doi.org/10.1145/3035918.3058738 Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Juan Zhai, Jianjun Huang, Shiqing Ma, Xiangyu Zhang, Lin Tan, Jianhua Zhao, and Feng Qin. 2016. Automatic model generation from documentation for Java API functions. In ICSE. 380–391. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generalizable synthesis through unification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!