skip to main content
research-article

Scalable and systematic detection of buggy inconsistencies in source code

Authors Info & Claims
Published:17 October 2010Publication History
Skip Abstract Section

Abstract

Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs.

DejaVu operates in two phases. Given a target code base, a parallel /inconsistent clone analysis/ first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible /buggy change analysis/ framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy.

On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.

References

  1. }}A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of FOCS ’06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. }}B. S. Baker. On finding duplication and near-duplication in large software systems. In WCRE ’95: Proceedings of the Second Working Conference on Reverse Engineering, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. }}I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In ICSM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. }}A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In Selected papers from the sixth international conference on World Wide Web, pages 1157--1166, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. }}A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In SOSP ’01: Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. }}E. Duala-Ekoko and M. P. Robillard. Tracking code clones in evolving software. In ICSE ’07: Proceedings of the 29th international conference on Software Engineering, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. }}D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In OSDI, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}D. R. Engler, D. Y. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}D. Evans, J. Guttag, J. Horning, and Y. M. Tan. Lclint: a tool for using specifications to check code. In SIGSOFT FSE, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. }}J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3), 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}M. Gabel, L. Jiang, and Z. Su. Scalable detection of semantic clones. In ICSE ’08: Proceedings of the 30th international conference on Software engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. }}A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proc. VLDB, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. }}S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In Proc. PLDI ’02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. }}D. Hovemeyer and W. Pugh. Finding bugs is easy. In OOPSLA ’04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. }}L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of ICSE, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. }}L. Jiang, Z. Su, and E. Chiu. Context-based detection of clone-related bugs. In ESEC-FSE ’07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. }}E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In ICSE ’09: Proceedings of the 31st international conference on Software engineering, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. }}T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. TSE, 28(7), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. }}C. Kapser and M. W. Godfrey. "Cloning considered harmful" considered harmful. In Proc. WCRE ’06, pages 19--28, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. }}M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. In ESEC/FSE-13, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. }}Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. }}S. Livieri, Y. Higo, M. Matushita, and K. Inoue. Very-large scale code clone analysis and visualization of open source programs using distributed CCFinder: D-CCFinder. In ICSE ’07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. }}T. T. Nguyen, H. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen. Cleman: Comprehensive clone group evolution management. Automated Software Engineering (ASE), 2008., Sept. 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. }}J. W. Ratcliff and D. Metzener. Pattern matching: The gestalt approach. Dr. Dobb’s Journal, July 1988.Google ScholarGoogle Scholar
  25. }}C. K. Roy and J. R. Cordy. An empirical study of function clones in open source software. In WCRE ’08: Proceedings of the 2008 15th Working Conference on Reverse Engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. }}S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. }}M. Toomim, A. Begel, and S. Graham. Managing duplicated code with linked editing. In Proc. IEEE Symp. Visual Languages: Human Centric Computing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. }}R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable and systematic detection of buggy inconsistencies in source code

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 45, Issue 10
        OOPSLA '10
        October 2010
        957 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/1932682
        Issue’s Table of Contents
        • cover image ACM Conferences
          OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applications
          October 2010
          984 pages
          ISBN:9781450302036
          DOI:10.1145/1869459

        Copyright © 2010 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2010

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!