skip to main content

Data-driven abductive inference of library specifications

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

Programmers often leverage data structure libraries that provide useful and reusable abstractions. Modular verification of programs that make use of these libraries naturally rely on specifications that capture important properties about how the library expects these data structures to be accessed and manipulated. However, these specifications are often missing or incomplete, making it hard for clients to be confident they are using the library safely. When library source code is also unavailable, as is often the case, the challenge to infer meaningful specifications is further exacerbated. In this paper, we present a novel data-driven abductive inference mechanism that infers specifications for library methods sufficient to enable verification of the library's clients. Our technique combines a data-driven learning-based framework to postulate candidate specifications, along with SMT-provided counterexamples to refine these candidates, taking special care to prevent generating specifications that overfit to sampled tests. The resulting specifications form a minimal set of requirements on the behavior of library implementations that ensures safety of a particular client program. Our solution thus provides a new multi-abduction procedure for precise specification inference of data structure libraries guided by client-side verification tasks. Experimental results on a wide range of realistic OCaml data structure programs demonstrate the effectiveness of the approach.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

This is a presentation video of my talk at OOPSLA 2021 on our paper accepted in the research track. In this paper, we present a novel data-driven abductive inference mechanism that infers specifications for library methods sufficient to enable verification of the library's clients. Our technique combines a data-driven learning-based framework to postulate candidate specifications, along with SMT-provided counterexamples to refine these candidates, taking special care to prevent generating specifications that overfit to sampled tests. The resulting specifications form a minimal set of requirements on the behavior of library implementations that ensures safety of a particular client program. Our solution thus provides a new multi-abduction procedure for precise specification inference of data structure libraries guided by client-side verification tasks. Experimental results on a wide range of realistic OCaml data structure programs demonstrate the effectiveness of the approach.

References

  1. Aws Albarghouthi, Isil Dillig, and Arie Gurfinkel. 2016. Maximal Specification Synthesis. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 789–801. isbn:9781450335492 https://doi.org/10.1145/2837614.2837628 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrew Appel. 2018. Software Foundations Volume 3: Verified Functional Algorithms.Google ScholarGoogle Scholar
  3. Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). Association for Computing Machinery, New York, NY, USA. 553–566. isbn:9781450333009 https://doi.org/10.1145/2676726.2676977 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00). Association for Computing Machinery, New York, NY, USA. 268–279. isbn:1581132026 https://doi.org/10.1145/351240.351266 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 337–340. isbn:978-3-540-78800-3 https://doi.org/10.1007/978-3-540-78800-3_24 Google ScholarGoogle ScholarCross RefCross Ref
  6. Isil Dillig, Thomas Dillig, Boyang Li, and Ken McMillan. 2013. Inductive Invariant Generation via Abductive Inference. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 443–456. isbn:9781450323741 https://doi.org/10.1145/2509136.2509511 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shachar Itzhaky, Anindya Banerjee, Neil Immerman, Ori Lahav, Aleksandar Nanevski, and Mooly Sagiv. 2014. Modular Reasoning about Heap Paths via Effectively Propositional Formulas. SIGPLAN Not., 49, 1 (2014), Jan., 385–396. issn:0362-1340 https://doi.org/10.1145/2578855.2535854 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shachar Itzhaky, Anindya Banerjee, Neil Immerman, Aleksandar Nanevski, and Mooly Sagiv. 2013. Effectively-Propositional Reasoning about Reachability in Linked Data Structures. In Proceedings of the 25th International Conference on Computer Aided Verification - Volume 8044 (CAV 2013). Springer-Verlag, Berlin, Heidelberg. 756–772. isbn:9783642397981 https://doi.org/10.5555/2958031.2958053Google ScholarGoogle ScholarCross RefCross Ref
  9. Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2014. The OCaml system release 4.02. Institut National de Recherche en Informatique et en Automatique, 54 (2014).Google ScholarGoogle Scholar
  10. Anders Miltner, Saswat Padhi, Todd Millstein, and David Walker. 2020. Data-Driven Inference of Representation Invariants. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 1–15. isbn:9781450376136 https://doi.org/10.1145/3385412.3385967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hoan Anh Nguyen, Robert Dyer, Tien N. Nguyen, and Hridesh Rajan. 2014. Mining Preconditions of APIs in Large-Scale Code Corpus. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA. 166–177. isbn:9781450330565 https://doi.org/10.1145/2635868.2635924 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chris Okasaki. 1999. Purely Functional Data Structures. Cambridge University Press, USA. isbn:0521663504Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Saswat Padhi, Rahul Sharma, and Todd Millstein. 2016. Data-Driven Precondition Inference with Learned Features. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 42–56. isbn:9781450342612 https://doi.org/10.1145/2908080.2908099 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Stephen Oney, and Amit Paradkar. 2012. Inferring method specifications from natural language API descriptions. In 2012 34th International Conference on Software Engineering (ICSE). 815–825. https://doi.org/10.1109/ICSE.2012.6227137 Google ScholarGoogle ScholarCross RefCross Ref
  16. Benjamin C Pierce, Chris Casinghino, Marco Gaboardi, Michael Greenberg, Cătălin Hriţcu, Vilhelm Sjöberg, and Brent Yorgey. 2010. Software foundations. Webpage: http://www. cis. upenn. edu/bcpierce/sf/current/index. html.Google ScholarGoogle Scholar
  17. Shengchao Qin, Chenguang Luo, Guanhua He, Florin Craciun, and Wei-Ngan Chin. 2010. Verifying Heap-Manipulating Programs with Unknown Procedure Calls. In Formal Methods and Software Engineering, Jin Song Dong and Huibiao Zhu (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 171–187. isbn:978-3-642-16901-4 https://doi.org/10.1007/978-3-642-16901-4_13 Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Ruggieri. 2002. Efficient C4.5 [classification algorithm]. IEEE Transactions on Knowledge and Data Engineering, 14, 2 (2002), 438–444. https://doi.org/10.1109/69.991727 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Jingyi Su, Mohd Arafat, and Robert Dyer. 2018. Using Consensus to Automatically Infer Post-Conditions. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 202–203. isbn:9781450356633 https://doi.org/10.1145/3183440.3195096 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Niki Vazou, Anish Tondwalkar, Vikraman Choudhury, Ryan G. Scott, Ryan R. Newton, Philip Wadler, and Ranjit Jhala. 2017. Refinement Reflection: Complete Verification with SMT. Proc. ACM Program. Lang., 2, POPL (2017), Article 53, Dec., 31 pages. https://doi.org/10.1145/3158141 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Zhe Zhou, Robert Dickerson, Benjamin Delaware, and Suresh Jagannathan. 2021. Data-Driven Abductive Inference of Library Specifications (Full Version). arxiv:2108.04783.Google ScholarGoogle Scholar
  22. Zhe Zhou, Robert Dickerson, Benjamin Delaware, and Suresh Jagannathan. 2021. OOPSLA2021 Artifact: Data-Driven Abductive Inference of Library Specifications. https://doi.org/10.5281/zenodo.5130646 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. He Zhu, Stephen Magill, and Suresh Jagannathan. 2018. A Data-Driven CHC Solver. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 707–721. isbn:9781450356985 https://doi.org/10.1145/3192366.3192416 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. He Zhu, Gustavo Petri, and Suresh Jagannathan. 2016. Automatically Learning Shape Specifications. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 491–507. isbn:9781450342612 https://doi.org/10.1145/2908080.2908125 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Data-driven abductive inference of library specifications

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!