10.1145/2635868.2635922acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

A large scale study of programming languages and code quality in github

Online:11 November 2014Publication History

ABSTRACT

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (729 projects, 80 Million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static v.s. dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that strong typing is modestly better than weak typing, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we hasten to caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, e.g., the preference of certain personality types for functional, static and strongly typed languages.

References

  1. Github archive, https://githubarchive.org/.Google ScholarGoogle Scholar
  2. Github documentation, https://help.github.com/articles/stars.Google ScholarGoogle Scholar
  3. Google big query, https://developers.google.com/bigquery/.Google ScholarGoogle Scholar
  4. J. Armstrong, R. Virding, C. Wikström, and M. Williams. Concurrent programming in erlang. 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bhattacharya and I. Neamtiu. Assessing programming language impact on development and maintenance: A study on c and c++. In Proceedings of the 33rd International Conference on Software Engineering, ICSE ’11, pages 171–180, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu. Don’t touch my code!: examining the effects of ownership on software quality. In Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering, pages 4–14. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. M. Blei. Probabilistic topic models. Communications of the ACM, 55(4):77–84, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cohen. Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum, 2003.Google ScholarGoogle Scholar
  9. H. CRAMÉR et al. Mathematical methods of statistics. Princeton University Press, 1946.Google ScholarGoogle Scholar
  10. S. Easterbrook, J. Singer, M.-A. Storey, and D. Damian. Selecting empirical methods for software engineering research. In Guide to advanced empirical software engineering, pages 285–311. Springer, 2008.Google ScholarGoogle Scholar
  11. K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. Software Engineering, IEEE Transactions on, 27(7):630–650, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. GitHub. Linguist: https://github.com/github/linguist.Google ScholarGoogle Scholar
  13. Google. http://golang.org/doc/effective_go. html#concurrency.Google ScholarGoogle Scholar
  14. S. Hanenberg. An experiment about static and dynamic type systems: Doubts about the positive impact of static type systems on development time. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’10, pages 22–35, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Harrison, L. Smaraweera, M. Dobie, and P. Lewis. Comparing programming paradigms: an evaluation of functional and object-oriented programs. Software Engineering Journal, 11(4):247–254, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  16. D. E. Harter, M. S. Krishnan, and S. A. Slaughter. Effects of process maturity on quality, cycle time, and effort in software product development. Management Science, 46(4):451–466, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Hindley. The principal type-scheme of an object in combinatory logic. Transactions of the american mathematical society, pages 29–60, 1969.Google ScholarGoogle Scholar
  18. M. Jump and K. S. McKinley. Cork: dynamic memory leak detection for garbage-collected languages. In ACM SIGPLAN Notices, volume 42, pages 31–38. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kleinschmager, S. Hanenberg, R. Robbes, É. Tanter, and A. Stefik. Do static type systems improve the maintainability of software systems? an empirical study. In Program Comprehension (ICPC), 2012 IEEE 20th International Conference on, pages 153–162. IEEE, 2012.Google ScholarGoogle Scholar
  20. Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have things changed now? An empirical study of bug characteristics in modern open source software. In ASID ’06: Proceedings of the 1st workshop on Architectural and system support for improving software dependability, October 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. P. Marques De Sá. Applied statistics using spss, statistica and matlab. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  22. C. Mayer, S. Hanenberg, R. Robbes, É. Tanter, and A. Stefik. An empirical study of the influence of static type systems on the usability of undocumented software. In ACM SIGPLAN Notices, volume 47, pages 683–702. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. A. Meyerovich and A. S. Rabkin. Empirical analysis of programming language adoption. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications, pages 1––18. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Milner. A theory of type polymorphism in programming. Journal of computer and system sciences, 17(3):348–375, 1978.Google ScholarGoogle Scholar
  25. A. Mockus and L. G. Votta. Identifying reasons for software changes using historic databases. In ICSM ’00: Proceedings of the International Conference on Software Maintenance, page 120. IEEE Computer Society, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Odersky, L. Spoon, and B. Venners. Programming in scala. Artima Inc, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Pankratius, F. Schmidt, and G. Garretón. Combining functional and imperative programming for multicore software: an empirical study evaluating scala and java. In Proceedings of the 2012 International Conference on Software Engineering, pages 123–133. IEEE Press, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Petricek and J. Skeet. Real World Functional Programming: With Examples in F# and C#. Manning Publications Co., 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. C. Pierce. Types and programming languages. MIT press, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. A. Porter and L. G. Votta. An experiment to assess different defect detection methods for software requirements inspections. In Proceedings of the 16th International Conference on Software Engineering, ICSE ’94, pages 103–112, Los Alamitos, CA, USA, 1994. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Posnett, C. Bird, and P. Dévanbu. An empirical study on the influence of pattern roles on change-proneness. Empirical Software Engineering, 16(3):396–423, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Rahman and P. Devanbu. How, and why, process metrics are better. In Proceedings of the 2013 International Conference on Software Engineering, pages 432–441. IEEE Press, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L. Tan, C. Liu, Z. Li, X. Wang, Y. Zhou, and C. Zhai. Bug characteristics in open source software. Empirical Software Engineering, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. Vapnik. The nature of statistical learning theory. springer, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Q. Vuong. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, pages 307–333, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  36. E. J. Weyuker, T. J. Ostrand, and R. M. Bell. Do too many cooks spoil the broth? using the number of developers to enhance defect prediction models. Empirical Software Engineering, 13(5):539–559, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. H. Zar. Significance Testing of the Spearman Rank Correlation Coefficient. Journal of the American Statistical Association, 67(339):578–580, 1972. Introduction Methodology Study Subjects Data Collection Categorizing Languages Identifying Project Domain Categorizing Bugs Statistical Methods Results Related Work Threats to Validity Conclusion Acknowledgements ReferencesGoogle ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A large scale study of programming languages and code quality in github

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!