skip to main content
research-article
Open Access

Exploiting implicit beliefs to resolve sparse usage problem in usage-based specification mining

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Frameworks and libraries provide application programming interfaces (APIs) that serve as building blocks in modern software development. As APIs present the opportunity of increased productivity, it also calls for correct use to avoid buggy code. The usage-based specification mining technique has shown great promise in solving this problem through a data-driven approach. These techniques leverage the use of the API in large corpora to understand the recurring usages of the APIs and infer behavioral specifications (preconditions and postconditions) from such usages. A challenge for such technique is thus inference in the presence of insufficient usages, in terms of both frequency and richness. We refer to this as a "sparse usage problem." This paper presents the first technique to solve the sparse usage problem in usage-based precondition mining. Our key insight is to leverage implicit beliefs to overcome sparse usage. An implicit belief (IB) is the knowledge implicitly derived from the fact about the code. An IB about a program is known implicitly to a programmer via the language's constructs and semantics, and thus not explicitly written or specified in the code. The technical underpinnings of our new precondition mining approach include a technique to analyze the data and control flow in the program leading to API calls to infer preconditions that are implicitly present in the code corpus, a catalog of 35 code elements in total that can be used to derive implicit beliefs from a program, and empirical evaluation of all of these ideas. We have analyzed over 350 millions lines of code and 7 libraries that suffer from the sparse usage problem. Our approach realizes 6 implicit beliefs and we have observed that adding single-level context sensitivity can further improve the result of usage based precondition mining. The result shows that we achieve overall 60% in precision and 69% in recall and the accuracy is relatively improved by 32% in precision and 78% in recall compared to base usage-based mining approach for these libraries.

References

  1. Farhana Aleen and Nathan Clark. 2009. Commutativity Analysis for Software Parallelization: Letting Program Transformations See the Big Picture. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (XIV). ACM, New York, NY, USA, 241–252. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltos Allamanis and Charles Sutton. 2013. Mining Source Code Repositories at Massive Scale using Language Modeling. In Working Conference on Mining Software Repositories (MSR’13). 207–216. Google ScholarGoogle ScholarCross RefCross Ref
  3. Glenn Ammons, Rastislav Bodík, and James R. Larus. 2002. Mining Specifications. In Proceedings of the 29th ACM SIGPLAN SIGACT Symposium on Principles of Programming Languages (POPL ’02). ACM, 4–16. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Raymond P.L. Buse and Westley R. Weimer. 2008. Automatic Documentation Inference for Exceptions. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA ’08). ACM, New York, NY, USA, 273–282. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’77). ACM, New York, NY, USA, 238–252. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Patrick Cousot, Radhia Cousot, Manuel Fahndrich, and Francesco Logozzo. 2013. Automatic Inference of Necessary Preconditions. In in Proceedings of the 14th Conference on Verification, Model Checking and Abstract Interpretation (VMCAI’13). Springer Verlag. http://research.microsoft.com/apps/pubs/default.aspx?id=174239Google ScholarGoogle Scholar
  7. Patrick Cousot, Radhia Cousot, and Francesco Logozzo. 2011. Precondition Inference from Intermittent Assertions and Application to Contracts on Collections. In Proceedings of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation (VMCAI’11). Springer-Verlag, 150–168. http://dl.acm.org/citation.cfm?id=1946284.1946296 Google ScholarGoogle ScholarCross RefCross Ref
  8. Valentin Dallmeier, Nikolai Knopp, Christoph Mallon, Sebastian Hack, and Andreas Zeller. 2010. Generating Test Cases for Specification Mining. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA ’10). ACM, New York, NY, USA, 85–96. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Valentin Dallmeier, Christian Lindig, and Andreas Zeller. 2005. Lightweight Defect Localization for Java. In Proceedings of the 19th European Conference on Object-Oriented Programming (ECOOP’05). Springer-Verlag, 528–550. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. 2001. Bugs As Deviant Behavior: A General Approach to Inferring Errors in Systems Code. SIGOPS Oper. Syst. Rev. 35, 5 (Oct. 2001), 57–72. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Michael D. Ernst, Jake Cockrell, William G. Griswold, and David Notkin. 1999. Dynamically Discovering Likely Program Invariants to Support Program Evolution. In Proceedings of the 21st International Conference on Software Engineering (ICSE’99). ACM, 213–224. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gordon Fraser and Andreas Zeller. 2011. Generating Parameterized Unit Tests. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA ’11). ACM, New York, NY, USA, 364–374. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mark Gabel and Zhendong Su. 2008. Javert: Fully Automatic Mining of General Temporal Properties from Dynamic Traces. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT ’08/FSE-16). ACM, 339–349. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mark Gabel and Zhendong Su. 2012. Testing Mined Specifications. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). ACM, New York, NY, USA, Article 4, 11 pages. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Natalie Gruska, Andrzej Wasylkowski, and Andreas Zeller. 2010. Learning from 6,000 Projects: Lightweight Cross-project Anomaly Detection. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA ’10). ACM, 119–130. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Anthony Hall. 1990. Seven Myths of Formal Methods. IEEE Software 7, 5 (Sept. 1990), 11–19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gary T. Leavens and Curtis Clifton. 2008. Lessons from the JML Project. In Verified Software: Theories, Tools, Experiments, Zurich, Switzerland, Bertrand Meyer and Jim Woodcock (Eds.), Vol. 4171. 134–143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gary T. Leavens and William E. Weihl. 1995. Specification and verification of object-oriented programs using supertype abstraction. Acta Informatica 32 (August 1995), 705–778. Issue 8.Google ScholarGoogle Scholar
  19. Chang Liu, En Ye, and Debra J. Richardson. 2006. Software Library Usage Pattern Extraction Using a Software Model Checker. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE ’06). IEEE Computer Society, 301–304. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: Finding Common Error Patterns by Mining Software Revision Histories. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE-13). ACM, New York, NY, USA, 296–305. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David Lo and Shahar Maoz. 2009. Mining Hierarchical Scenario-Based Specifications. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE ’09). IEEE Computer Society, 359–370. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Amir Michail. 2000. Data Mining Library Reuse Patterns Using Generalized Association Rules. In Proceedings of the 22nd International Conference on Software Engineering (ICSE’00). ACM, 167–176. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Anh Cuong Nguyen and Siau-Cheng Khoo. 2011. Extracting Significant Specifications from Mining Through Mutation Testing. In Proceedings of the 13th International Conference on Formal Methods and Software Engineering (ICFEM’11). Springer-Verlag, Berlin, Heidelberg, 472–488. http://dl.acm.org/citation.cfm?id=2075089.2075130 Google ScholarGoogle ScholarCross RefCross Ref
  24. Hoan Anh Nguyen, Robert Dyer, Tien N. Nguyen, and Hridesh Rajan. 2014. Mining Preconditions of APIs in Large-scale Code Corpus. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM, New York, NY, USA, 166–177. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the Symposium on Foundations of Software Engineering (ESEC/FSE ’09). ACM, 383–392. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Corina S. Păsăreanu, Peter C. Mehlitz, David H. Bushnell, Karen Gundy-Burlet, Michael Lowry, Suzette Person, and Mark Pape. 2008. Combining Unit-Level Symbolic Execution and System-Level Concrete Execution for Testing Nasa Software. In Proceedings of the 2008 International Symposium on Software Testing and Analysis (ISSTA ’08). ACM, New York, NY, USA, 15–26. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Corina S. Păsăreanu and Neha Rungta. 2010. Symbolic PathFinder: Symbolic Execution of Java Bytecode. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE ’10). ACM, New York, NY, USA, 179–180. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Michael Pradel and Thomas R. Gross. 2009. Automatic Generation of Object Usage Specifications from Large Method Traces. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE ’09). IEEE Computer Society, 371–382. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hridesh Rajan, Tien N. Nguyen, Gary T. Leavens, and Robert Dyer. 2015. Inferring Behavioral Specifications from Largescale Repositories by Leveraging Collective Intelligence. In Proceedings of the 37th International Conference on Software Engineering - Volume 2 (ICSE ’15). IEEE Press, Piscataway, NJ, USA, 579–582. http://dl.acm.org/citation.cfm?id=2819009. 2819107Google ScholarGoogle ScholarCross RefCross Ref
  30. Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. 2007. Static Specification Inference Using Predicate Mining. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, 123–134. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Manos Renieris, Sébastien Chan-Tin, and Steven P. Reiss. 2004. Elided Conditionals. In Proceedings of the 5th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’04). ACM, New York, NY, USA, 52–57. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Martin C. Rinard and Pedro C. Diniz. 1997. Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers. ACM Trans. Program. Lang. Syst. 19, 6 (Nov. 1997), 942–991. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Joseph R. Ruthruff, Sebastian Elbaum, and Gregg Rothermel. 2006. Experimental Program Analysis: A New Program Analysis Paradigm. In Proceedings of the 2006 International Symposium on Software Testing and Analysis (ISSTA ’06). ACM, New York, NY, USA, 49–60. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Olin G. Shivers. 1991. Control-flow analysis of higher-order languages of taming lambda. (1991).Google ScholarGoogle Scholar
  35. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting Object Usage Anomalies. In Proceedings of the Symposium on Foundations of Software Engineering (ESEC-FSE ’07). ACM, 35–44. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yi Wei, Carlo A. Furia, Nikolay Kazmin, and Bertrand Meyer. 2011. Inferring Better Contracts. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11). ACM, 191–200. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Westley Weimer and George C. Necula. 2005. Mining Temporal Specifications for Error Detection. In Proceedings of the 11th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’05). Springer-Verlag, 461–476. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chadd C. Williams and Jeffrey K. Hollingsworth. 2005. Automatic Mining of Source Code Repositories to Improve Bug Finding Techniques. IEEE Trans. Softw. Eng. 31, 6 (2005), 466–480. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Tao Xie and David Notkin. 2004. Mutually Enhancing Test Generation and Specification Inference. In Formal Approaches to Software Testing: Third International Workshop on Formal Approaches to Testing of Software (FATES ’03). Springer Berlin Heidelberg, Berlin, Heidelberg, 60–69. DOI: Google ScholarGoogle ScholarCross RefCross Ref
  40. Jinlin Yang, David Evans, Deepali Bhardwaj, Thirumalesh Bhat, and Manuvir Das. 2006. Perracotta: Mining Temporal API Rules from Imperfect Traces. In Proceedings of the 28th International Conference on Software Engineering (ICSE ’06). ACM, 282–291. DOI: Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting implicit beliefs to resolve sparse usage problem in usage-based specification mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 1, Issue OOPSLA
      October 2017
      1786 pages
      EISSN:2475-1421
      DOI:10.1145/3152284
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2017
      Published in pacmpl Volume 1, Issue OOPSLA

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!