10.1145/2568225.2568271acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Coverage is not strongly correlated with test suite effectiveness

Published:31 May 2014Publication History

ABSTRACT

The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites.

We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness.

We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.

References

  1. L. M. Adler. A modification of Kendall’s tau for the case of arbitrary ties in both rankings. Journal of the American Statistical Association, 52(277), 1957.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. of the Int’l Conf. on Soft. Eng., 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Soft. Eng., 32(8), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apache POI. http://poi.apache.org.Google ScholarGoogle Scholar
  5. L. Briand and D. Pfahl. Using simulation for assessing the real impact of test coverage on defect coverage. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. X. Cai and M. R. Lyu. The effect of code coverage on fault detection under different testing profiles. In Proc. of the Int’l Workshop on Advances in Model-Based Testing, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Closure Compiler. https://code.google.com/p/closure-compiler/.Google ScholarGoogle Scholar
  8. CodeCover. http://codecover.org/.Google ScholarGoogle Scholar
  9. Coverlipse. http://coverlipse.sourceforge.net/.Google ScholarGoogle Scholar
  10. M. Daran and P. Thévenod-Fosse. Software error analysis: a real case study involving real faults and mutations. In Proc. of the Int’l Symposium on Software Testing and Analysis, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Soft. Eng., 27(7), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. E. Fenton and N. Ohlsson. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Soft. Eng., 26(8), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Fowler. Test coverage. http: //martinfowler.com/bliki/TestCoverage.html, 2012.Google ScholarGoogle Scholar
  14. P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In Proc. of the Int’l Symposium on Foundations of Soft. Eng., 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria. In Proc. of the Symposium on Testing, Analysis, and Verification, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Transactions on Soft. Eng., 19(8), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. G. Frankl, S. N. Weiss, and C. Hu. All-uses vs mutation testing: an experimental comparison of effectiveness. Journal of Systems and Software, 38(3), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. D. Gibbons. Nonparametric Measures of Association. Sage Publications, 1993.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. of the Int’l Symp. on Soft. Testing and Analysis, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Gopinath, C. Jenson, and A. Groce. Code coverage for suite evaluation by developers. In Proc. of the Int’l Conf. on Soft. Eng., 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. P. Guilford. Fundamental Statistics in Psychology and Education. McGraw-Hill, 1942.Google ScholarGoogle Scholar
  22. K. Hayhurst, D. Veerhusen, J. Chilenski, and L. Rierson. A practical tutorial on modified condition/decision coverage. Technical report, NASA Langley Research Center, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. HSQLDB. http://hsqldb.org.Google ScholarGoogle Scholar
  24. M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Proc. of the Int’l Conf. on Soft. Eng., 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. JFreeChart. http://jfree.org/jfreechart.Google ScholarGoogle Scholar
  26. Joda Time. http://joda-time.sourceforge.net.Google ScholarGoogle Scholar
  27. R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? Technical Report UW-CSE-14-02-02, University of Washington, March 2014.Google ScholarGoogle Scholar
  28. K. Kapoor. Formal analysis of coupling hypothesis for logical faults. Innovations in Systems and Soft. Eng., 2(2), 2006.Google ScholarGoogle Scholar
  29. E. Kit. Software Testing in the Real World: Improving the Process. ACM Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. B. Marick. How to misuse code coverage. http://www. exampler.com/testing-com/writings/coverage.pdf, 1997.Google ScholarGoogle Scholar
  31. A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. of the Int’l Symposium on Software Testing and Analysis, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. J. Offutt. Investigations of the software testing coupling effect. ACM Transactions on Soft. Eng. and Methodology, 1(1), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. J. Offutt and J. Pan. Detecting equivalent mutants and the feasible path problem. In Proc. of the Conf. on Computer Assurance, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  34. W. Perry. Effective Methods for Software Testing. Wiley Publishing, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. PIT. http://pitest.org/.Google ScholarGoogle Scholar
  36. Randoop. https://code.google.com/p/randoop/.Google ScholarGoogle Scholar
  37. R. Sharma. Guidelines for coverage-based comparisons of non-adequate test suites. Master’s thesis, University of Illinois at Urbana-Champaign, 2013.Google ScholarGoogle Scholar
  38. SLOCCount. http://dwheeler.com/sloccount.Google ScholarGoogle Scholar
  39. W. E. Wong, J. R. Horgan, S. London, and A. P. Mathur. Effect of test set size and block coverage on the fault detection effectiveness. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1994.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Coverage is not strongly correlated with test suite effectiveness

      Reviews

      Andrew Brooks

      Should developers aim to write test suites with high coverage__?__ To answer this question, the relationships between coverage, test suite size, and test suite effectiveness were systematically explored making use of five reasonably large Java programs that had existing master test suites. The tool CodeCover was used to measure statement, decision, and modified condition coverage. The tool PIT was used to generate mutants and to report mutation kills as the effectiveness measure. Test suite size was varied by randomly sampling from the existing master test suites. A moderate to high correlation was found between effectiveness and the number of test methods in a test suite. A moderate to high correlation was found between effectiveness and coverage when test suite size was ignored. When test suite size was controlled for, the correlation between effectiveness and coverage was found to range from low to moderate. The authors suggest that coverage should not be used as a quality target. Evidence was also found suggesting that the use of complex coverage measures such as modified condition coverage is not justified. Since manually determining if thousands of mutants are equivalents is a very costly exercise, the authors simply assumed that all mutants not detected by the existing master test suites were equivalent. A sampling strategy should have been adopted to at least gauge the degree to which this assumption was correct. Overall, there is much in this study to commend over previous research, and this paper is very strongly recommended to the software engineering community. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICSE 2014: Proceedings of the 36th International Conference on Software Engineering
        May 2014
        1139 pages
        ISBN:9781450327565
        DOI:10.1145/2568225

        Copyright © 2014 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 May 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate 276 of 1,856 submissions, 15%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!