ABSTRACT
The reliability and correctness of complex software systems can be significantly enhanced through well-defined specifications that dictate the use of various units of abstraction (e.g., modules, or procedures). Often times, however, specifications are either missing, imprecise, or simply too complex to encode within a signature, necessitating specification inference. The process of inferring specifications from complex software systems forms the focus of this paper. We describe a static inference mechanism for identifying the preconditions that must hold whenever a procedure is called. These preconditions may reflect both data flow properties (e.g., whenever p is called, variable x must be non-null) as well as control-flow properties (e.g., every call to p must bepreceded by a call to q). We derive these preconditions using a ninter-procedural path-sensitive dataflow analysis that gathers predicates at each program point. We apply mining techniques to these predicates to make specification inference robust to errors. This technique also allows us to derive higher-level specifications that abstract structural similarities among predicates (e.g., procedure p is called immediately after a conditional test that checks whether some variable v is non-null.) We describe an implementation of these techniques, and validate the effectiveness of the approach on a number of large open-source benchmarks. Experimental results confirm that our mining algorithms are efficient, and that the specifications derived are both precise and useful-the implementation discovers several critical, yet previously, undocumented preconditions for well-tested libraries.
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487--499, 1994. Google Scholar
Digital Library
- R. Agrawal and R. Srikant. Mining sequential patterns. In Eleventh International Conference on Data Engineering, pages 3--14, 1995. Google Scholar
Digital Library
- R. Alur, P. Cerny, P. Madhusudan, and W. Nam. Synthesis of interface specidications for java classes. In POPL '05: Proceedings of the 32nd ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages, pages 98--109, 2005. Google Scholar
Digital Library
- G. Ammons, R. Bodik, and J. Larus. Mining specifications. In POPL'02: Proceedings of the 29th ACM SIGPLAN--SIGACT Symposium on Principles of Programming Languages, pages 4--16, 2002. Google Scholar
Digital Library
- G. Ammons, D. Mandelin, R. Bodik, and J. Larus. Debugging temporal specifications with concept analysis. In PLDI '03: Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 182--195, 2003. Google Scholar
Digital Library
- P. Anderson, T. Reps, and T. Teitelbaum. Design and implementation of a fine-grained software inspection tool. IEEE Trans. on Software Engineering, 29(8):721--733, August 2003. Google Scholar
Digital Library
- T. Ball and S.K. Rajamani. Automatically validating temporal safety properties of interfaces. In SPIN 2001, Workshop on Model Checking of Software, LNCS 2057, pages 103--122, May 2001. Google Scholar
Digital Library
- B. Blanchet. A computationally sound mechanized prover for security protocols. In IEEE Symposium on Security and Privacy, pages 140--154, 2006. Google Scholar
Digital Library
- D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A performance study of mining maximal frequent itemsets. In Workshop on Frequent Itemset Mining Implementations (FIMI'03), 2003.Google Scholar
- M. Castro, M. Costa, and T. Harris. Securing software by enforcing data-flow integrity. In OSDI '06: Proceedings of the 7th Usenix Symposium on Operating Systems Design and Implementation, pages 147--160, 2006. Google Scholar
Digital Library
- B. Chin, S. Markstrum, and T. Millstein. Semantic type qualifiers. In PLDI '05: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 85--95, 2005. Google Scholar
Digital Library
- C. Cortes, K. Fisher, D. Pregibon, and A. Rogers. Hancock: a language for extracting signatures from data streams. In KDD '00: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 9--17, 2000. Google Scholar
Digital Library
- D. Engler, D. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP'01: Proceedings of the 18th ACM Symposium on Operating Systems Principles, pages 57--72, 2001. Google Scholar
Digital Library
- M. Ernst, J. Cockrell, W. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE TSE, 27(2):1--25, February 2001. Google Scholar
Digital Library
- J. Fischer, R. Jhala, and R. Majumdar. Joining dataflow with predicates. In ESEC--FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 227--236, 2005. Google Scholar
Digital Library
- J. Foster, T. Terauchi, and A. Aiken. Flow-sensitive type qualifiers. In PLDI '02: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1--12, 2002. Google Scholar
Digital Library
- M. Furr and J. Foster. Checking type safety of foreign function calls. In PLDI '05: Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 62--72, 2005. Google Scholar
Digital Library
- P. Godefroid, N. Klarslund, and K. Sen. Dart: Directed automated random testing. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 213--223, Chicago, Il, 2005. Google Scholar
Digital Library
- D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. Sharma. Discovering all most specific sentences. ACM Transactions on Database Systems, 28:140--174, 2003. Google Scholar
Digital Library
- T. Henzinger, R. Jhala, and R. Majumdar. Permissive interfaces. SIGSOFT Softw. Eng. Notes, 30(5):31--40, 2005. Google Scholar
Digital Library
- G. J. Holzmann. The SPIN Model Checker: Primer and Reference Manual. Addison-Wesley, 2004. Google Scholar
Digital Library
- R. Jhala and R. Majumdar. Path slicing. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 38--47, 2005. Google Scholar
Digital Library
- T. Kremenek, P. Twohey, G. Back, A. Ng, and D. Engler. From uncertainty to belief: Inferring the specification within. In OSDI'06: Proceedings of the 7th Usenix Symposium on Operating Systems Design and Implementation, pages 161--176, 2006. Google Scholar
Digital Library
- P. Lam, V. Kuncak, and M. Rinard. Generalized typestate checking for data structure consistency. In VMCAI '05: Proceedings of 6th International Conference on Verification, Model Checking and Abstract Interpretation, pages 430--447, 2005. Google Scholar
Digital Library
- Z. Li, S. Lu, S. Myagmar, and Y. Zhou. Cp-miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, pages 289--302, 2004. Google Scholar
Digital Library
- Z. Li and Y. Zhou. Pr-miner: Automatically extracting implicit programming rules and detecting violations in large software code. In ESEC-FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 306--315, 2005. Google Scholar
Digital Library
- B. Liblit, M. Naik, A. Zheng, A. Aiken, and M. Jordan. Scalable statistical bug isolation. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 15--26, Chicago, Illinois, 2005. Google Scholar
Digital Library
- B. Livshits and T. Zimmermann. Dynamine: Finding common error patterns by mining software revision histories. In ESEC-FSE '05: 10th European Software Engineering Conference and 13th ACM SIGSOFT international symposium on Foundations of Software Engineering, pages 296--305, 2005. Google Scholar
Digital Library
- D. Mandelin, L. Xu, R. Bodik, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 48--61, 2005. Google Scholar
Digital Library
- J. Palsberg. Closure analysis in constraint form. ACM Trans. Program. Lang. Syst., 17(1):47--62, 1995. Google Scholar
Digital Library
- M. K. Ramanathan, A. Grama, and S. Jagannathan. Path-sensitive inference of function precedence protocols. In ICSE '07: Proceedings of the 29th International Conference on Software Engineering, 2007. Google Scholar
Digital Library
- M. Rinard, C. Cadar, D. Dumitran, D. M. Roy, T. Leu, and W.S. Beebee. Enhancing server availability and security through failureoblivious computing. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, 2004. Google Scholar
Digital Library
- O. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, May 1991. Google Scholar
Digital Library
- W. Weimer and G. Necula. Mining temporal specifications for error detection. In TACAS '05: Eleventh International Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 461--476, April 2005. Google Scholar
Digital Library
- J. Whaley, M. Martin, and M. Lam. Automatic extraction of objectoriented component interfaces. In Proceedings of the International Symposium of Software Testing and Analysis, ISSTA, 2002. Google Scholar
Digital Library
- Y. Xie and A. Aiken. Scalable error detection using boolean satisfiability. In POPL '05: Proceedings of the 32nd ACM Symposium on Principles of Programming Languages, pages 351--363, 2005. Google Scholar
Digital Library
- G. Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD '04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pages 344--353, 2004. Google Scholar
Digital Library
- J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal api rules from imperfect traces. In ICSE '06: Proceedings of the International Conference on Software Engineering, 2006. Google Scholar
Digital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In OSDI '04: Proceedings of the 6th Usenix Symposium on Operating Systems Design and Implementation, pages 273--288, San Francisco, CA, 2004. Google Scholar
Digital Library
Index Terms
Static specification inference using predicate mining
Recommendations
Scalable taint specification inference with big code
PLDI 2019: Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and ImplementationWe present a new scalable, semi-supervised method for inferring taint analysis specifications by learning from a large dataset of programs. Taint specifications capture the role of library APIs (source, sink, sanitizer) and are a critical ingredient of ...
Static specification inference using predicate mining
Proceedings of the 2007 PLDI conferenceThe reliability and correctness of complex software systems can be significantly enhanced through well-defined specifications that dictate the use of various units of abstraction (e.g., modules, or procedures). Often times, however, specifications are ...
Bi-Abduction with Pure Properties for Specification Inference
Proceedings of the 11th Asian Symposium on Programming Languages and Systems - Volume 8301Separation logic is a state-of-the-art logic for dealing with the program heap. Using its frame rule, initial works have strived towards automated modular verification for heap-manipulating programs against user-supplied specifications. Since manually ...







Comments