skip to main content
10.1145/2483760.2483774acmconferencesArticle/Chapter ViewAbstractPublication PagesisstaConference Proceedingsconference-collections
research-article
Open access

Does automated white-box test generation really help software testers?

Published: 15 July 2013 Publication History

Abstract

Automated test generation techniques can efficiently produce test data that systematically cover structural aspects of a program. In the absence of a specification, a common assumption is that these tests relieve a developer of most of the work, as the act of testing is reduced to checking the results of the tests. Although this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the fact that the approach has only seen a limited uptake in industry suggests the contrary, and calls into question its practical usefulness. To investigate this issue, we performed a controlled experiment comparing a total of 49 subjects split between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. We found that, on one hand, tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase). However, on the other hand, there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.

References

[1]
S. Afshan, P. McMinn, and M. Stevenson. Evolving readable string test inputs using a natural language model to reduce human oracle cost. In Int. Conference on Software Testing, Verification and Validation (ICST), 2013. (To appear).
[2]
A. Arcuri and L. Briand. A hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Software Testing, Verification and Reliability. (To appear).
[3]
L. Baresi, P. L. Lanzi, and M. Miraz. Testful: an evolutionary test approach for Java. In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 185–194, 2010.
[4]
C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness tester for Java. Software: Practice and Experience, 34(11):1025–1050, 2004.
[5]
J. T. de Souza, C. L. Maia, F. G. de Freitas, and D. P. Coutinho. The human competitiveness of search based software engineering. In International Symposium on Search Based Software Engineering (SSBSE), pages 143–152, 2010.
[6]
G. Fraser and A. Arcuri. Evosuite: Automatic test suite generation for object-oriented software. In ACM Symposium on the Foundations of Software Engineering (FSE), pages 416–419, 2011.
[7]
G. Fraser and A. Arcuri. Sound empirical evidence in software testing. In ACM/IEEE International Conference on Software Engineering (ICSE), pages 178–188, 2012.
[8]
G. Fraser and A. Arcuri. Whole test suite generation. IEEE Transactions on Software Engineering, 39(2):276–291, 2013.
[9]
G. Fraser and A. Zeller. Exploiting common object usage in test case generation. In IEEE International Conference on Software Testing, Verification and Validation (ICST), pages 80–89. IEEE Computer Society, 2011.
[10]
G. Fraser and A. Zeller. Mutation-driven generation of unit tests and oracles. IEEE Transactions on Software Engineering, 28(2):278–292, 2012.
[11]
M. Harman, S. Mansouri, and Y. Zhang. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys (CSUR), 45(1):11, 2012.
[12]
M. Harman and P. McMinn. A theoretical and empirical study of search based testing: Local, global and hybrid search. IEEE Transactions on Software Engineering, 36(2):226–247, 2010.
[13]
M. Islam and C. Csallner. Dsc+mock: A test case + mock class generator in support of coding against interfaces. In International Workshop on Dynamic Analysis (WODA), pages 26–31, 2010.
[14]
R. Just, F. Schweiggert, and G. Kapfhammer. MAJOR: An efficient and extensible tool for mutation analysis in a Java compiler. In International Conference on Automated Software Engineering (ASE), pages 612–615, 2011.
[15]
B. Kitchenham, S. Pfleeger, L. Pickard, P. Jones, D. Hoaglin, K. El Emam, and J. Rosenberg. Preliminary guidelines for empirical research in software engineering. IEEE Transactions on Software Engineering, 28(8):721–734, 2002.
[16]
J. Koza. Human-competitive results produced by genetic programming. Genetic Programming and Evolvable Machines, 11(3):251–284, 2010.
[17]
K. Lakhotia, P. McMinn, and M. Harman. An empirical investigation into branch coverage for C programs using CUTE and AUSTIN. Journal of Systems and Software, 83(12):2379–2391, 2010.
[18]
P. McMinn. Search-based software test data generation: A survey. Software Testing, Verification and Reliability, 14(2):105–156, 2004.
[19]
E. F. Miller, Jr. and R. A. Melton. Automated generation of testcase datasets. In International Conference on Reliable Software, pages 51–58. ACM, 1975.
[20]
A. Namin and J.H.Andrews. The influence of size and coverage on test suite effectiveness. In ACM International Symposium on Software Testing and Analysis (ISSTA). ACM, 2009.
[21]
C. Pacheco and M. D. Ernst. Randoop: feedback-directed random testing for Java. In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), pages 815–816. ACM, 2007.
[22]
C. Parnin and A. Orso. Are automated debugging techniques actually helping programmers? In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 199–209, 2011.
[23]
C. Pasareanu and N. Rungta. Symbolic pathfinder: symbolic execution of java bytecode. In IEEE/ACM International Conference on Automated Software Engineering (ASE), volume 10, pages 179–180, 2010.
[24]
F. Pastore, L. Mariani, and G. Fraser. Crowdoracles: Can the crowd solve the oracle problem? In IEEE International Conference on Software Testing, Verification and Validation (ICST). IEEE, 2013. (To appear).
[25]
R. Ramler, D. Winkler, and M. Schmidt. Random test case generation and manual unit testing: Substitute or complement in retrofitting tests for legacy code? In EUROMICRO Conference on Software Engineering and Advanced Applications (SEAA), pages 286–293. IEEE, 2012.
[26]
G. Sautter, K. Böhm, F. Padberg, and W. Tichy. Empirical evaluation of semi-automated XML annotation of text documents with the GoldenGATE editor. Research and Advanced Techn. for Digital Libraries, pages 357–367, 2007.
[27]
C. Seaman. Qualitative methods in empirical studies of software engineering. IEEE Transactions on Software Engineering, 25(4):557–572, 1999.
[28]
D. Sjoberg, J. Hannay, O. Hansen, V. By Kampenes, A. Karahasanovic, N. Liborg, and A. C Rekdal. A survey of controlled experiments in software engineering. IEEE Transactions on Software Engineering, 31(9):733–753, 2005.
[29]
M. Staats, G. Gay, and M. Heimdahl. Automated oracle creation support, or: how I learned to stop worrying about fault propagation and love mutation testing. In ACM/IEEE International Conference on Software Engineering (ICSE), pages 870–880, 2012.
[30]
M. Staats, S. Hong, M. Kim, and G. Rothermel. Understanding user understanding: determining correctness of generated program invariants. In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 188–198. ACM, 2012.
[31]
N. Tillmann and N. J. de Halleux. Pex — white box test generation for .NET. In International Conference on Tests And Proofs (TAP), pages 134–253, 2008.
[32]
P. Tonella. Evolutionary testing of classes. In ACM International Symposium on Software Testing and Analysis (ISSTA), pages 119–128, 2004.
[33]
J. Wegener, A. Baresel, and H. Sthamer. Evolutionary test environment for automatic structural testing. Information and Software Technology, 43(14):841–854, 2001.

Cited By

View all
  • (2024)Investigating the readability of test codeEmpirical Software Engineering10.1007/s10664-023-10390-z29:2Online publication date: 26-Feb-2024
  • (2022)Overlap between Automated Unit and Acceptance Testing – a Systematic Literature ReviewProceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering10.1145/3530019.3530028(80-89)Online publication date: 13-Jun-2022
  • (2022)Human-based Test Design versus Automated Test Generation: A Literature Review and Meta-AnalysisProceedings of the 15th Innovations in Software Engineering Conference10.1145/3511430.3511433(1-11)Online publication date: 24-Feb-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISSTA 2013: Proceedings of the 2013 International Symposium on Software Testing and Analysis
July 2013
381 pages
ISBN:9781450321594
DOI:10.1145/2483760
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2013

Check for updates

Author Tags

  1. Unit testing
  2. automated test generation
  3. branch coverage
  4. empirical software engineering

Qualifiers

  • Research-article

Conference

ISSTA '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 58 of 213 submissions, 27%

Upcoming Conference

ISSTA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)24
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Investigating the readability of test codeEmpirical Software Engineering10.1007/s10664-023-10390-z29:2Online publication date: 26-Feb-2024
  • (2022)Overlap between Automated Unit and Acceptance Testing – a Systematic Literature ReviewProceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering10.1145/3530019.3530028(80-89)Online publication date: 13-Jun-2022
  • (2022)Human-based Test Design versus Automated Test Generation: A Literature Review and Meta-AnalysisProceedings of the 15th Innovations in Software Engineering Conference10.1145/3511430.3511433(1-11)Online publication date: 24-Feb-2022
  • (2022)Debugging Effectiveness of LBT: An Empirical Study2022 17th International Conference on Emerging Technologies (ICET)10.1109/ICET56601.2022.10004661(136-141)Online publication date: 29-Nov-2022
  • (2022)TestEvoViz: visualizing genetically-based test coverage evolutionEmpirical Software Engineering10.1007/s10664-022-10220-827:7Online publication date: 1-Dec-2022
  • (2022)Learning how to search: generating effective test cases through adaptive fitness function selectionEmpirical Software Engineering10.1007/s10664-021-10048-827:2Online publication date: 11-Jan-2022
  • (2022)Mining Precise Test Oracle Modelled by FSMTesting Software and Systems10.1007/978-3-031-04673-5_2(20-36)Online publication date: 10-May-2022
  • (2021)Path-Sensitive Oracle Data Selection via Static AnalysisElectronics10.3390/electronics1002011010:2(110)Online publication date: 7-Jan-2021
  • (2021)Automatic Unit Test Generation for Machine Learning LibrariesProceedings of the 43rd International Conference on Software Engineering10.1109/ICSE43902.2021.00138(1548-1560)Online publication date: 22-May-2021
  • (2020)Defect prediction guided search-based software testingProceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering10.1145/3324884.3416612(448-460)Online publication date: 21-Dec-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media