skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Detecting argument selection defects

Published:12 October 2017Publication History
Skip Abstract Section

Abstract

Identifier names are often used by developers to convey additional information about the meaning of a program over and above the semantics of the programming language itself. We present an algorithm that uses this information to detect argument selection defects, in which the programmer has chosen the wrong argument to a method call in Java programs. We evaluate our algorithm at Google on 200 million lines of internal code and 10 million lines of predominantly open-source external code and find defects even in large, mature projects such as OpenJDK, ASM, and the MySQL JDBC. The precision and recall of the algorithm vary depending on a sensitivity threshold. Higher thresholds increase precision, giving a true positive rate of 85%, reporting 459 true positives and 78 false positives. Lower thresholds increase recall but lower the true positive rate, reporting 2,060 true positives and 1,207 false positives. We show that this is an order of magnitude improvement on previous approaches. By analyzing the defects found, we are able to quantify best practice advice for API design and show that the probability of an argument selection defect increases markedly when methods have more than five arguments.

Skip Supplemental Material Section

Supplemental Material

References

  1. Edward Aftandilian, Raluca Sauciuc, Siddharth Priya, and Sundaresan Krishnan. 2012. Building Useful Program Analysis Tools Using an Extensible Java Compiler. In International Working Conference on Source Code Analysis and Manipulation (SCAM). 14–23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014. 281–293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles A. Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, Bergamo, Italy, August 30 - September 4, 2015. 38–49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Miltiadis Allamanis and Charles A. Sutton. 2014. Mining idioms from source code. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014. 472–483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix, and William Pugh. 2008. Experiences Using Static Analysis to Find Bugs. IEEE Software 25 (2008), 22–29. Special issue on software development tools, September/October (25:5).Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Joshua Bloch. 2006. How to Design a Good API and Why It Matters. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications (OOPSLA ’06). ACM, New York, NY, USA, 506–507. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Joshua Bloch. 2008. Effective Java (2nd Edition) (The Java Series) (2 ed.). Prentice Hall PTR, Upper Saddle River, NJ, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2010. Exploring the Influence of Identifier Names on Code Quality: An Empirical Study. In European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 156–165.Google ScholarGoogle Scholar
  10. Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2011. Improving the Tokenisation of Identifier Names. In European Conference on Object-Oriented Programming (ECOOP). Springer, 130–154. Google ScholarGoogle ScholarCross RefCross Ref
  11. C. Caprile and P. Tonella. 1999. Nomen est omen: analyzing the language of function identifiers. In Sixth Working Conference on Reverse Engineering (Cat. No.PR00303). 112–122. Google ScholarGoogle ScholarCross RefCross Ref
  12. Ray-Yaung Chang, Andy Podgurski, and Jiong Yang. 2007. Finding what’s not there: a new approach to revealing neglected conditions in software. In International Symposium on Software Testing and Analysis (ISSTA). ACM, 163–173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. William W. Cohen, Pradeep D. Ravikumar, and Stephen E. Fienberg. 2003. A Comparison of String Distance Metrics for Name-Matching Tasks. In Workshop on Information Integration on the Web (IIWeb). 73–78.Google ScholarGoogle Scholar
  14. Pavlos S. Efraimidis and Paul G. Spirakis. 2006. Weighted random sampling with a reservoir. Inform. Process. Lett. 97, 5 (2006), 181 – 185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker. 2009. Mining source code to automatically split identifiers for software analysis. In 2009 6th IEEE International Working Conference on Mining Software Repositories. 71–80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar T. Devanbu. 2012. On the naturalness of software. In 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. 837–847. Google ScholarGoogle ScholarCross RefCross Ref
  17. Einar W. Høst and Bjarte M. Østvold. 2009. Debugging Method Names. In European Conference on Object-Oriented Programming (ECOOP). Springer, 294–317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dawn Lawrie, Christopher Morrell, Henry Feild, and David Binkley. 2007. Effective identifier names for comprehension and memory. Innovations in Systems and Software Engineering 3, 4 (2007), 303–318. Google ScholarGoogle ScholarCross RefCross Ref
  19. Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: Automatically Extracting Implicit Programming Rules and Detecting Violations in Large Software Code. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 306–315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Bin Liang, Pan Bian, Yan Zhang, Wenchang Shi, Wei You, and Yan Cai. 2016. AntMiner: Mining More Bugs by Reducing Noise Interference. In ICSE.Google ScholarGoogle Scholar
  21. Hui Liu, Qiurong Liu, Cristian-Alexandru Staicu, Michael Pradel, and Yue Luo. 2016. Nomen Est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names. In International Conference on Software Engineering (ICSE). 1063–1073. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Saul B. Needleman and Christian D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 3 (1970), 443 – 453. Google ScholarGoogle ScholarCross RefCross Ref
  23. Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N. Nguyen. 2013. A statistical semantic language model for source code. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013. 532–542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2009. Graph-based mining of multiple object usage patterns. In European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 383–392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rachel Potvin and Josh Levenberg. 2016. Why Google Stores Billions of Lines of Code in a Single Repository. Commun. ACM 59 (2016), 78–87. http://dl.acm.org/citation.cfm?id=2854146 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Pradel and Thomas R. Gross. 2009. Automatic Generation of Object Usage Specifications from Large Method Traces. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE ’09). IEEE Computer Society, Washington, DC, USA, 371–382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Michael Pradel and Thomas R. Gross. 2011. Detecting anomalies in the order of equally-typed method arguments. In International Symposium on Software Testing and Analysis (ISSTA). 232–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Michael Pradel and Thomas R. Gross. 2013. Name-based Analysis of Equally Typed Method Arguments. IEEE Transactions on Software Engineering 39, 8 (2013), 1127–1143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Pradel, C. Jaspan, J. Aldrich, and T. R. Gross. 2012. Statically checking API protocol conformance with mined multi-object specifications. In 2012 34th International Conference on Software Engineering (ICSE). 925–935. Google ScholarGoogle ScholarCross RefCross Ref
  30. Girish Maskeri Rama and Avinash Kak. 2015. Some Structural Measures of API Usability. Software Prac. Experience 45, 1 (Jan. 2015), 75–110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Veselin Raychev, Martin T. Vechev, and Eran Yahav. 2014. Code completion with statistical language models. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014. 44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Soederberg, and Collin Winter. 2015. Tricorder: Building a Program Analysis Ecosystem. In International Conference on Software Engineering (ICSE). Google ScholarGoogle ScholarCross RefCross Ref
  33. Forrest Shull, Vic Basili, Barry Boehm, A. Winsor Brown, Patricia Costa, Mikael Lindvall, Dan Port, Ioana Rus, Roseanne Tesoriero, and Marvin Zelkowitz. 2002. What We Have Learned About Fighting Defects. In Proceedings of the 8th International Symposium on Software Metrics (METRICS ’02). IEEE Computer Society, Washington, DC, USA, 249–. http://dl.acm.org/citation.cfm?id=823457.824031 Google ScholarGoogle ScholarCross RefCross Ref
  34. Suresh Thummalapenta and Tao Xie. 2009. Mining Exception-Handling Rules as Sequence Association Rules. In International Conference on Software Engineering (ICSE). IEEE, 496–506. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Song Wang, Devin Chollak, Dana Movshovitz-Attias, and Lin Tan. 2016. Bugram: bug detection with n-gram language models. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016. 708–719. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Andrzej Wasylkowski and Andreas Zeller. 2009. Mining Temporal Specifications from Object Usage. In International Conference on Automated Software Engineering (ASE). IEEE, 295–306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Andrzej Wasylkowski, Andreas Zeller, and Christian Lindig. 2007. Detecting object usage anomalies. In European Software Engineering Conference and Symposium on Foundations of Software Engineering (ESEC/FSE). ACM, 35–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Cheng Zhang, Juyuan Yang, Yi Zhang, Jing Fan, Xin Zhang, Jianjun Zhao, and Peizhao Ou. 2012. Automatic parameter recommendation for practical API usage. In International Conference on Software Engineering (ICSE). IEEE, 826–836. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Detecting argument selection defects

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!