Abstract
Programmers often leverage data structure libraries that provide useful and reusable abstractions. Modular verification of programs that make use of these libraries naturally rely on specifications that capture important properties about how the library expects these data structures to be accessed and manipulated. However, these specifications are often missing or incomplete, making it hard for clients to be confident they are using the library safely. When library source code is also unavailable, as is often the case, the challenge to infer meaningful specifications is further exacerbated. In this paper, we present a novel data-driven abductive inference mechanism that infers specifications for library methods sufficient to enable verification of the library's clients. Our technique combines a data-driven learning-based framework to postulate candidate specifications, along with SMT-provided counterexamples to refine these candidates, taking special care to prevent generating specifications that overfit to sampled tests. The resulting specifications form a minimal set of requirements on the behavior of library implementations that ensures safety of a particular client program. Our solution thus provides a new multi-abduction procedure for precise specification inference of data structure libraries guided by client-side verification tasks. Experimental results on a wide range of realistic OCaml data structure programs demonstrate the effectiveness of the approach.
Supplemental Material
- Aws Albarghouthi, Isil Dillig, and Arie Gurfinkel. 2016. Maximal Specification Synthesis. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’16). Association for Computing Machinery, New York, NY, USA. 789–801. isbn:9781450335492 https://doi.org/10.1145/2837614.2837628 Google Scholar
Digital Library
- Andrew Appel. 2018. Software Foundations Volume 3: Verified Functional Algorithms.Google Scholar
- Osbert Bastani, Saswat Anand, and Alex Aiken. 2015. Specification Inference Using Context-Free Language Reachability. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’15). Association for Computing Machinery, New York, NY, USA. 553–566. isbn:9781450333009 https://doi.org/10.1145/2676726.2676977 Google Scholar
Digital Library
- Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00). Association for Computing Machinery, New York, NY, USA. 268–279. isbn:1581132026 https://doi.org/10.1145/351240.351266 Google Scholar
Digital Library
- Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and Jakob Rehof (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 337–340. isbn:978-3-540-78800-3 https://doi.org/10.1007/978-3-540-78800-3_24 Google Scholar
Cross Ref
- Isil Dillig, Thomas Dillig, Boyang Li, and Ken McMillan. 2013. Inductive Invariant Generation via Abductive Inference. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 443–456. isbn:9781450323741 https://doi.org/10.1145/2509136.2509511 Google Scholar
Digital Library
- Shachar Itzhaky, Anindya Banerjee, Neil Immerman, Ori Lahav, Aleksandar Nanevski, and Mooly Sagiv. 2014. Modular Reasoning about Heap Paths via Effectively Propositional Formulas. SIGPLAN Not., 49, 1 (2014), Jan., 385–396. issn:0362-1340 https://doi.org/10.1145/2578855.2535854 Google Scholar
Digital Library
- Shachar Itzhaky, Anindya Banerjee, Neil Immerman, Aleksandar Nanevski, and Mooly Sagiv. 2013. Effectively-Propositional Reasoning about Reachability in Linked Data Structures. In Proceedings of the 25th International Conference on Computer Aided Verification - Volume 8044 (CAV 2013). Springer-Verlag, Berlin, Heidelberg. 756–772. isbn:9783642397981 https://doi.org/10.5555/2958031.2958053Google Scholar
Cross Ref
- Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2014. The OCaml system release 4.02. Institut National de Recherche en Informatique et en Automatique, 54 (2014).Google Scholar
- Anders Miltner, Saswat Padhi, Todd Millstein, and David Walker. 2020. Data-Driven Inference of Representation Invariants. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 1–15. isbn:9781450376136 https://doi.org/10.1145/3385412.3385967 Google Scholar
Digital Library
- Hoan Anh Nguyen, Robert Dyer, Tien N. Nguyen, and Hridesh Rajan. 2014. Mining Preconditions of APIs in Large-Scale Code Corpus. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA. 166–177. isbn:9781450330565 https://doi.org/10.1145/2635868.2635924 Google Scholar
Digital Library
- Chris Okasaki. 1999. Purely Functional Data Structures. Cambridge University Press, USA. isbn:0521663504Google Scholar
Digital Library
- Peter-Michael Osera and Steve Zdancewic. 2015. Type-and-Example-Directed Program Synthesis. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’15). Association for Computing Machinery, New York, NY, USA. 619–630. isbn:9781450334686 https://doi.org/10.1145/2737924.2738007 Google Scholar
Digital Library
- Saswat Padhi, Rahul Sharma, and Todd Millstein. 2016. Data-Driven Precondition Inference with Learned Features. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 42–56. isbn:9781450342612 https://doi.org/10.1145/2908080.2908099 Google Scholar
Digital Library
- Rahul Pandita, Xusheng Xiao, Hao Zhong, Tao Xie, Stephen Oney, and Amit Paradkar. 2012. Inferring method specifications from natural language API descriptions. In 2012 34th International Conference on Software Engineering (ICSE). 815–825. https://doi.org/10.1109/ICSE.2012.6227137 Google Scholar
Cross Ref
- Benjamin C Pierce, Chris Casinghino, Marco Gaboardi, Michael Greenberg, Cătălin Hriţcu, Vilhelm Sjöberg, and Brent Yorgey. 2010. Software foundations. Webpage: http://www. cis. upenn. edu/bcpierce/sf/current/index. html.Google Scholar
- Shengchao Qin, Chenguang Luo, Guanhua He, Florin Craciun, and Wei-Ngan Chin. 2010. Verifying Heap-Manipulating Programs with Unknown Procedure Calls. In Formal Methods and Software Engineering, Jin Song Dong and Huibiao Zhu (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 171–187. isbn:978-3-642-16901-4 https://doi.org/10.1007/978-3-642-16901-4_13 Google Scholar
Cross Ref
- S. Ruggieri. 2002. Efficient C4.5 [classification algorithm]. IEEE Transactions on Knowledge and Data Engineering, 14, 2 (2002), 438–444. https://doi.org/10.1109/69.991727 Google Scholar
Digital Library
- Jingyi Su, Mohd Arafat, and Robert Dyer. 2018. Using Consensus to Automatically Infer Post-Conditions. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 202–203. isbn:9781450356633 https://doi.org/10.1145/3183440.3195096 Google Scholar
Digital Library
- Niki Vazou, Anish Tondwalkar, Vikraman Choudhury, Ryan G. Scott, Ryan R. Newton, Philip Wadler, and Ranjit Jhala. 2017. Refinement Reflection: Complete Verification with SMT. Proc. ACM Program. Lang., 2, POPL (2017), Article 53, Dec., 31 pages. https://doi.org/10.1145/3158141 Google Scholar
Digital Library
- Zhe Zhou, Robert Dickerson, Benjamin Delaware, and Suresh Jagannathan. 2021. Data-Driven Abductive Inference of Library Specifications (Full Version). arxiv:2108.04783.Google Scholar
- Zhe Zhou, Robert Dickerson, Benjamin Delaware, and Suresh Jagannathan. 2021. OOPSLA2021 Artifact: Data-Driven Abductive Inference of Library Specifications. https://doi.org/10.5281/zenodo.5130646 Google Scholar
Digital Library
- He Zhu, Stephen Magill, and Suresh Jagannathan. 2018. A Data-Driven CHC Solver. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). Association for Computing Machinery, New York, NY, USA. 707–721. isbn:9781450356985 https://doi.org/10.1145/3192366.3192416 Google Scholar
Digital Library
- He Zhu, Gustavo Petri, and Suresh Jagannathan. 2016. Automatically Learning Shape Specifications. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 491–507. isbn:9781450342612 https://doi.org/10.1145/2908080.2908125 Google Scholar
Digital Library
Index Terms
Data-driven abductive inference of library specifications
Recommendations
Automatic Derivation of Formal Software Specifications from Informal Descriptions
SPECIFIER, an interactive system which derives formal specifications of data types and programs from their informal descriptions, is described. The process of deriving formal specifications is viewed as a problem-solving process. The system uses common ...
A Specification Translation from Behavioral Specifications to Rewrite Specifications
There are two ways to describe a state machine as an algebraic specification: a behavioral specification and a rewrite specification. In this study, we propose a translation system from behavioral specifications to rewrite specifications to obtain a ...






Comments