ABSTRACT
The coverage of a test suite is often used as a proxy for its ability to detect faults. However, previous studies that investigated the correlation between code coverage and test suite effectiveness have failed to reach a consensus about the nature and strength of the relationship between these test suite characteristics. Moreover, many of the studies were done with small or synthetic programs, making it unclear whether their results generalize to larger programs, and some of the studies did not account for the confounding influence of test suite size. In addition, most of the studies were done with adequate suites, which are are rare in practice, so the results may not generalize to typical test suites.
We have extended these studies by evaluating the relationship between test suite size, coverage, and effectiveness for large Java programs. Our study is the largest to date in the literature: we generated 31,000 test suites for five systems consisting of up to 724,000 lines of source code. We measured the statement coverage, decision coverage, and modified condition coverage of these suites and used mutation testing to evaluate their fault detection effectiveness.
We found that there is a low to moderate correlation between coverage and effectiveness when the number of test cases in the suite is controlled for. In addition, we found that stronger forms of coverage do not provide greater insight into the effectiveness of the suite. Our results suggest that coverage, while useful for identifying under-tested parts of a program, should not be used as a quality target because it is not a good indicator of test suite effectiveness.
- L. M. Adler. A modification of Kendall’s tau for the case of arbitrary ties in both rankings. Journal of the American Statistical Association, 52(277), 1957.Google Scholar
Cross Ref
- J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing experiments? In Proc. of the Int’l Conf. on Soft. Eng., 2005. Google Scholar
Digital Library
- J. H. Andrews, L. C. Briand, Y. Labiche, and A. S. Namin. Using mutation analysis for assessing and comparing testing coverage criteria. IEEE Transactions on Soft. Eng., 32(8), 2006. Google Scholar
Digital Library
- Apache POI. http://poi.apache.org.Google Scholar
- L. Briand and D. Pfahl. Using simulation for assessing the real impact of test coverage on defect coverage. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1999. Google Scholar
Digital Library
- X. Cai and M. R. Lyu. The effect of code coverage on fault detection under different testing profiles. In Proc. of the Int’l Workshop on Advances in Model-Based Testing, 2005. Google Scholar
Digital Library
- Closure Compiler. https://code.google.com/p/closure-compiler/.Google Scholar
- CodeCover. http://codecover.org/.Google Scholar
- Coverlipse. http://coverlipse.sourceforge.net/.Google Scholar
- M. Daran and P. Thévenod-Fosse. Software error analysis: a real case study involving real faults and mutations. In Proc. of the Int’l Symposium on Software Testing and Analysis, 1996. Google Scholar
Digital Library
- K. El Emam, S. Benlarbi, N. Goel, and S. N. Rai. The confounding effect of class size on the validity of object-oriented metrics. IEEE Transactions on Soft. Eng., 27(7), 2001. Google Scholar
Digital Library
- N. E. Fenton and N. Ohlsson. Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Soft. Eng., 26(8), 2000. Google Scholar
Digital Library
- M. Fowler. Test coverage. http: //martinfowler.com/bliki/TestCoverage.html, 2012.Google Scholar
- P. G. Frankl and O. Iakounenko. Further empirical studies of test effectiveness. In Proc. of the Int’l Symposium on Foundations of Soft. Eng., 1998. Google Scholar
Digital Library
- P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria. In Proc. of the Symposium on Testing, Analysis, and Verification, 1991. Google Scholar
Digital Library
- P. G. Frankl and S. N. Weiss. An experimental comparison of the effectiveness of branch testing and data flow testing. IEEE Transactions on Soft. Eng., 19(8), 1993. Google Scholar
Digital Library
- P. G. Frankl, S. N. Weiss, and C. Hu. All-uses vs mutation testing: an experimental comparison of effectiveness. Journal of Systems and Software, 38(3), 1997. Google Scholar
Digital Library
- J. D. Gibbons. Nonparametric Measures of Association. Sage Publications, 1993.Google Scholar
Cross Ref
- M. Gligoric, A. Groce, C. Zhang, R. Sharma, M. A. Alipour, and D. Marinov. Comparing non-adequate test suites using coverage criteria. In Proc. of the Int’l Symp. on Soft. Testing and Analysis, 2013. Google Scholar
Digital Library
- R. Gopinath, C. Jenson, and A. Groce. Code coverage for suite evaluation by developers. In Proc. of the Int’l Conf. on Soft. Eng., 2014. Google Scholar
Digital Library
- J. P. Guilford. Fundamental Statistics in Psychology and Education. McGraw-Hill, 1942.Google Scholar
- K. Hayhurst, D. Veerhusen, J. Chilenski, and L. Rierson. A practical tutorial on modified condition/decision coverage. Technical report, NASA Langley Research Center, 2001. Google Scholar
Digital Library
- HSQLDB. http://hsqldb.org.Google Scholar
- M. Hutchins, H. Foster, T. Goradia, and T. Ostrand. Experiments of the effectiveness of dataflow- and controlflow-based test adequacy criteria. In Proc. of the Int’l Conf. on Soft. Eng., 1994. Google Scholar
Digital Library
- JFreeChart. http://jfree.org/jfreechart.Google Scholar
- Joda Time. http://joda-time.sourceforge.net.Google Scholar
- R. Just, D. Jalali, L. Inozemtseva, M. D. Ernst, R. Holmes, and G. Fraser. Are mutants a valid substitute for real faults in software testing? Technical Report UW-CSE-14-02-02, University of Washington, March 2014.Google Scholar
- K. Kapoor. Formal analysis of coupling hypothesis for logical faults. Innovations in Systems and Soft. Eng., 2(2), 2006.Google Scholar
- E. Kit. Software Testing in the Real World: Improving the Process. ACM Press, 1995. Google Scholar
Digital Library
- B. Marick. How to misuse code coverage. http://www. exampler.com/testing-com/writings/coverage.pdf, 1997.Google Scholar
- A. S. Namin and J. H. Andrews. The influence of size and coverage on test suite effectiveness. In Proc. of the Int’l Symposium on Software Testing and Analysis, 2009. Google Scholar
Digital Library
- A. J. Offutt. Investigations of the software testing coupling effect. ACM Transactions on Soft. Eng. and Methodology, 1(1), 1992. Google Scholar
Digital Library
- A. J. Offutt and J. Pan. Detecting equivalent mutants and the feasible path problem. In Proc. of the Conf. on Computer Assurance, 1996.Google Scholar
Cross Ref
- W. Perry. Effective Methods for Software Testing. Wiley Publishing, 2006. Google Scholar
Digital Library
- PIT. http://pitest.org/.Google Scholar
- Randoop. https://code.google.com/p/randoop/.Google Scholar
- R. Sharma. Guidelines for coverage-based comparisons of non-adequate test suites. Master’s thesis, University of Illinois at Urbana-Champaign, 2013.Google Scholar
- SLOCCount. http://dwheeler.com/sloccount.Google Scholar
- W. E. Wong, J. R. Horgan, S. London, and A. P. Mathur. Effect of test set size and block coverage on the fault detection effectiveness. In Proc. of the Int’l Symposium on Software Reliability Engineering, 1994.Google Scholar
Cross Ref
Index Terms
Coverage is not strongly correlated with test suite effectiveness






Comments