skip to main content
research-article

Structural Test Coverage Criteria for Deep Neural Networks

Published:08 October 2019Publication History
Skip Abstract Section

Abstract

Deep neural networks (DNNs) have a wide range of applications, and software employing them must be thoroughly tested, especially in safety-critical domains. However, traditional software test coverage metrics cannot be applied directly to DNNs. In this paper, inspired by the MC/DC coverage criterion, we propose a family of four novel test coverage criteria that are tailored to structural features of DNNs and their semantics. We validate the criteria by demonstrating that test inputs that are generated with guidance by our proposed coverage criteria are able to capture undesired behaviours in a DNN. Test cases are generated using a symbolic approach and a gradient-based heuristic search. By comparing them with existing methods, we show that our criteria achieve a balance between their ability to find bugs (proxied using adversarial examples and correlation with functional coverage) and the computational cost of test input generation. Our experiments are conducted on state-of-the-art DNNs obtained using popular open source datasets, including MNIST, CIFAR-10 and ImageNet.

References

  1. [n.d.]. Guide for Verification of Autonomous Systems. https://standards.ieee.org/project/2817.html.Google ScholarGoogle Scholar
  2. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI, Vol. 16. USENIX Association, 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rob Ashmore and Matthew Hill. 2018. “Boxing clever”: Practical techniques for gaining insights into training data and monitoring distribution shift. In Computer Safety, Reliability, and Security (LNCS), Vol. 11094. Springer, 393--405.Google ScholarGoogle Scholar
  4. Rob Ashmore and Elizabeth Lennon. 2017. Progress towards the assurance of non-traditional software. In Safety-critical Systems Symposium.Google ScholarGoogle Scholar
  5. Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 3 (June 2008), 346--359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nicholas Carlini and David Wagner. 2017. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy (S8P). 39--57.Google ScholarGoogle ScholarCross RefCross Ref
  7. Chih-Hong Cheng, Chung-Hao Huang, and Hirotoshi Yasuoka. 2018. Quantitative projection coverage for testing ML-enabled autonomous systems. In International Symposium on Automated Technology for Verification and Analysis, ATVA (LNCS), Vol. 11138. Springer, 126--142.Google ScholarGoogle ScholarCross RefCross Ref
  8. Tommaso Dreossi, Alexandre Donzé, and Sanjit A. Seshia. 2018. Compositional falsification of cyber-physical systems with machine learning components. Journal of Automated Reasoning (2018).Google ScholarGoogle Scholar
  9. Tommaso Dreossi, Shromona Ghosh, Alberto Sangiovanni-Vincentelli, and Sanjit A. Seshia. 2019. A formalization of robustness for deep neural networks. arXiv preprint arXiv:1903.10033 (2019).Google ScholarGoogle Scholar
  10. Souradeep Dutta, Xin Chen, and Sriram Sankaranarayanan. 2019. Reachability analysis for neural feedback systems using regressive polynomial rule inference. In Hybrid Systems: Computation and Control. ACM, 157--168.Google ScholarGoogle Scholar
  11. Souradeep Dutta, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. 2018. Output range analysis for deep feedforward neural networks. In NASA Formal Methods Symposium (LNCS), Vol. 10811. Springer, 121--138.Google ScholarGoogle ScholarCross RefCross Ref
  12. Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In 3rd International Conference on Learning Representations, ICLR.Google ScholarGoogle Scholar
  14. Kelly Hayhurst, Dan Veerhusen, John Chilenski, and Leanna Rierson. 2001. A Practical Tutorial on Modified Condition/Decision Coverage. Technical Report. NASA.Google ScholarGoogle Scholar
  15. Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In Computer Aided Verification, CAV (LNCS), Vol. 10426. Springer, 3--29. DOI:https://doi.org/10.1007/978-3-319-63387-9_1Google ScholarGoogle Scholar
  16. Radoslav Ivanov, James Weimer, Rajeev Alur, George J. Pappas, and Insup Lee. 2019. Verisig: Verifying safety properties of hybrid systems with neural network controllers. In Hybrid Systems: Computation and Control. ACM, 169--178.Google ScholarGoogle Scholar
  17. Cem Kaner. 2006. Exploratory testing. In Quality Assurance Institute Worldwide Annual Software Testing Conference.Google ScholarGoogle Scholar
  18. Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In Computer Aided Verification, CAV (LNCS). Springer, 97--117.Google ScholarGoogle Scholar
  19. Shuyue Lan, Chao Huang, Zhilu Wang, Hengyi Liang, Wenhao Su, and Qi Zhu. 2018. Design automation for intelligent automotive systems. In International Test Conference (ITC). IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  20. David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60, 2 (Nov. 2004), 91--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lei Ma, Felix Juefei-Xu, Jiyuan Sun, Chunyang Chen, Ting Su, Fuyuan Zhang, Minhui Xue, Bo Li, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity testing criteria for deep learning systems. In Automated Software Engineering (ASE). ACM, 120--131.Google ScholarGoogle Scholar
  22. Matthew Mirman, Timon Gehr, and Martin Vechev. 2018. Differentiable abstract interpretation for provably robust neural networks. In International Conference on Machine Learning, ICML. 3575--3583.Google ScholarGoogle Scholar
  23. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning, ICML. 807--814.Google ScholarGoogle Scholar
  24. Augustus Odena, Catherine Olsson, David Andersen, and Ian J. Goodfellow. 2019. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning, ICML. PMLR, 4901--4911.Google ScholarGoogle Scholar
  25. Chris Olah, Arvind Satyanarayan, Ian Johnson, Shan Carter, Ludwig Schubert, Katherine Ye, and Alexander Mordvintsev. 2018. The building blocks of interpretability. Distill (2018). DOI:https://doi.org/10.23915/distill.00010Google ScholarGoogle Scholar
  26. Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. 2016. The limitations of deep learning in adversarial settings. In European Symposium on Security and Privacy (EuroS8P). IEEE, 372--387.Google ScholarGoogle ScholarCross RefCross Ref
  27. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. RTCA. 2011. DO-178C, software considerations in airborne systems and equipment certification. (2011).Google ScholarGoogle Scholar
  29. SASWG. 2019. Safety assurance objectives for autonomous systems. (2019).Google ScholarGoogle Scholar
  30. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR.Google ScholarGoogle Scholar
  31. Xiaowu Sun, Haitham Khedr, and Yasser Shoukry. 2019. Formal verification of neural network controlled autonomous systems. In Hybrid Systems: Computation and Control. ACM, 147--156.Google ScholarGoogle Scholar
  32. Youcheng Sun, Xiaowei Huang, and Daniel Kroening. 2018. Testing deep neural networks. CoRR abs/1803.04792 (2018). arxiv:1803.04792 http://arxiv.org/abs/1803.04792Google ScholarGoogle Scholar
  33. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Automated Software Engineering (ASE). ACM, 109--119.Google ScholarGoogle Scholar
  34. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2014. Intriguing properties of neural networks. In ICLR.Google ScholarGoogle Scholar
  35. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2017. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. arXiv preprint arXiv:1708.08559 (2017).Google ScholarGoogle Scholar
  36. Cumhur Erkan Tuncali, Georgios Fainekos, Hisahiro Ito, and James Kapinski. 2018. Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In Intelligent Vehicles Symposium (IV). IEEE, 1555--1562.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Cumhur Erkan Tuncali, Hisahiro Ito, James Kapinski, and Jyotirmoy V. Deshmukh. 2018. Reasoning about safety of learning-enabled components in autonomous cyber-physical systems. In 55th Design Automation Conference (DAC). IEEE, 1--6.Google ScholarGoogle Scholar
  38. Zhou Wang, Eero P. Simoncelli, and Alan C. Bovik. 2003. Multiscale structural similarity for image quality assessment. In Signals, Systems and Computers, Conference Record of the Thirty-Seventh Asilomar Conference on.Google ScholarGoogle Scholar
  39. Matthew Wicker, Xiaowei Huang, and Marta Kwiatkowska. 2018. Feature-guided black-box safety testing of deep neural networks. In Tools and Algorithms for the Construction and Analysis of Systems, TACAS (LNCS), Vol. 10805. Springer, 408--426.Google ScholarGoogle Scholar
  40. Shakiba Yaghoubi and Georgios Fainekos. 2019. Gray-box adversarial testing for control systems with machine learning components. In Hybrid Systems: Computation and Control. ACM, 179--184.Google ScholarGoogle Scholar
  41. Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson. 2015. Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015).Google ScholarGoogle Scholar

Index Terms

  1. Structural Test Coverage Criteria for Deep Neural Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!