Abstract
Software developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly scalable system for detecting these general syntactic inconsistency bugs.
DejaVu operates in two phases. Given a target code base, a parallel /inconsistent clone analysis/ first enumerates all groups of source code fragments that are similar but not identical. Next, an extensible /buggy change analysis/ framework refines these results, separating each group of inconsistent fragments into a fine-grained set of inconsistent changes and classifying each as benign or buggy.
On a 75+ million line pre-production commercial code base, DejaVu executed in under five hours and produced a report of over 8,000 potential bugs. Our analysis of a sizable random sample suggests with high likelihood that at this report contains at least 2,000 true bugs and 1,000 code smells. These bugs draw from a diverse class of software defects and are often simple to correct: syntactic inconsistencies both indicate problems and suggest solutions.
- }}A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Proceedings of FOCS ’06, 2006. Google Scholar
Digital Library
- }}B. S. Baker. On finding duplication and near-duplication in large software systems. In WCRE ’95: Proceedings of the Second Working Conference on Reverse Engineering, 1995. Google Scholar
Digital Library
- }}I. D. Baxter, A. Yahin, L. Moura, M. Sant’Anna, and L. Bier. Clone detection using abstract syntax trees. In ICSM, 1998. Google Scholar
Digital Library
- }}A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In Selected papers from the sixth international conference on World Wide Web, pages 1157--1166, 1997. Google Scholar
Digital Library
- }}A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An empirical study of operating systems errors. In SOSP ’01: Proceedings of the eighteenth ACM symposium on Operating systems principles, 2001. Google Scholar
Digital Library
- }}E. Duala-Ekoko and M. P. Robillard. Tracking code clones in evolving software. In ICSE ’07: Proceedings of the 29th international conference on Software Engineering, 2007. Google Scholar
Digital Library
- }}D. Engler, B. Chelf, A. Chou, and S. Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In OSDI, 2000. Google Scholar
Digital Library
- }}D. R. Engler, D. Y. Chen, and A. Chou. Bugs as inconsistent behavior: A general approach to inferring errors in systems code. In SOSP, 2001. Google Scholar
Digital Library
- }}D. Evans, J. Guttag, J. Horning, and Y. M. Tan. Lclint: a tool for using specifications to check code. In SIGSOFT FSE, 1994. Google Scholar
Digital Library
- }}J. Ferrante, K. J. Ottenstein, and J. D. Warren. The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst., 9(3), 1987. Google Scholar
Digital Library
- }}M. Gabel, L. Jiang, and Z. Su. Scalable detection of semantic clones. In ICSE ’08: Proceedings of the 30th international conference on Software engineering, 2008. Google Scholar
Digital Library
- }}A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proc. VLDB, 1999. Google Scholar
Digital Library
- }}S. Hallem, B. Chelf, Y. Xie, and D. Engler. A system and language for building system-specific, static analyses. In Proc. PLDI ’02, 2002. Google Scholar
Digital Library
- }}D. Hovemeyer and W. Pugh. Finding bugs is easy. In OOPSLA ’04, 2004. Google Scholar
Digital Library
- }}L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of ICSE, 2007. Google Scholar
Digital Library
- }}L. Jiang, Z. Su, and E. Chiu. Context-based detection of clone-related bugs. In ESEC-FSE ’07, 2007. Google Scholar
Digital Library
- }}E. Juergens, F. Deissenboeck, B. Hummel, and S. Wagner. Do code clones matter? In ICSE ’09: Proceedings of the 31st international conference on Software engineering, 2009. Google Scholar
Digital Library
- }}T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: a multilinguistic token-based code clone detection system for large scale source code. TSE, 28(7), 2002. Google Scholar
Digital Library
- }}C. Kapser and M. W. Godfrey. "Cloning considered harmful" considered harmful. In Proc. WCRE ’06, pages 19--28, Washington, DC, USA, 2006. IEEE Computer Society. Google Scholar
Digital Library
- }}M. Kim, V. Sazawal, D. Notkin, and G. Murphy. An empirical study of code clone genealogies. In ESEC/FSE-13, 2005. Google Scholar
Digital Library
- }}Z. Li, S. Lu, S. Myagmar, and Y. Zhou. CP-Miner: A tool for finding copy-paste and related bugs in operating system code. In OSDI, 2004. Google Scholar
Digital Library
- }}S. Livieri, Y. Higo, M. Matushita, and K. Inoue. Very-large scale code clone analysis and visualization of open source programs using distributed CCFinder: D-CCFinder. In ICSE ’07, 2007. Google Scholar
Digital Library
- }}T. T. Nguyen, H. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen. Cleman: Comprehensive clone group evolution management. Automated Software Engineering (ASE), 2008., Sept. 2008.Google Scholar
Digital Library
- }}J. W. Ratcliff and D. Metzener. Pattern matching: The gestalt approach. Dr. Dobb’s Journal, July 1988.Google Scholar
- }}C. K. Roy and J. R. Cordy. An empirical study of function clones in open source software. In WCRE ’08: Proceedings of the 2008 15th Working Conference on Reverse Engineering, 2008. Google Scholar
Digital Library
- }}S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD, 2003. Google Scholar
Digital Library
- }}M. Toomim, A. Begel, and S. Graham. Managing duplicated code with linked editing. In Proc. IEEE Symp. Visual Languages: Human Centric Computing, 2004. Google Scholar
Digital Library
- }}R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In SIGMOD, 2005. Google Scholar
Digital Library
Index Terms
Scalable and systematic detection of buggy inconsistencies in source code
Recommendations
Scalable and systematic detection of buggy inconsistencies in source code
OOPSLA '10: Proceedings of the ACM international conference on Object oriented programming systems languages and applicationsSoftware developers often duplicate source code to replicate functionality. This practice can hinder the maintenance of a software project: bugs may arise when two identical code segments are edited inconsistently. This paper presents DejaVu, a highly ...
Detect Related Bugs from Source Code Using Bug Information
COMPSAC '10: Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications ConferenceOpen source projects often maintain open bug repositories during development and maintenance, and the reporters often point out straightly or implicitly the reasons why bugs occur when they submit them. The comments about a bug are very valuable for ...
Bug characteristics in open source software
To design effective tools for detecting and recovering from software failures requires a deep understanding of software bug characteristics. We study software bug characteristics by sampling 2,060 real world bugs in three large, representative open-...







Comments