skip to main content
research-article

Metamorphic Testing of Deep Learning Compilers

Authors Info & Claims
Published:28 February 2022Publication History
Skip Abstract Section

Abstract

The prosperous trend of deploying deep neural network (DNN) models to diverse hardware platforms has boosted the development of deep learning (DL) compilers. DL compilers take the high-level DNN model specifications as input and generate optimized DNN executables for diverse hardware architectures like CPUs, GPUs, and various hardware accelerators. Compiling DNN models into high-efficiency executables is not easy: the compilation procedure often involves converting high-level model specifications into several different intermediate representations (IR), e.g., graph IR and operator IR, and performing rule-based or learning-based optimizations from both platform-independent and platform-dependent perspectives. Despite the prosperous adoption of DL compilers in real-world scenarios, principled and systematic understanding toward the correctness of DL compilers does not yet exist. To fill this critical gap, this paper introduces MT-DLComp, a metamorphic testing framework specifically designed for DL compilers to effectively uncover erroneous compilations. Our approach leverages deliberately-designed metamorphic relations (MRs) to launch semantics-preserving mutations toward DNN models to generate their variants. This way, DL compilers can be automatically examined for compilation correctness utilizing DNN models and their variants without requiring manual intervention. We also develop a set of practical techniques to realize an effective workflow and localize identified error-revealing inputs. Real-world DL compilers exhibit a high level of engineering quality. Nevertheless, we detected over 435 inputs that can result in erroneous compilations in four popular DL compilers, all of which are industry-strength products maintained by Amazon, Facebook, Microsoft, and Google. While the discovered error-triggering inputs do not cause the DL compilers to crash directly, they can lead to the generation of incorrect DNN executables. With substantial manual effort and help from the DL compiler developers, we uncovered four bugs in these DL compilers by debugging them using the error-triggering inputs. Our proposed testing frameworks and findings can be used to guide developers in their efforts to improve DL compilers.

References

  1. Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: A system for large-scale machine learning. In 12th $$USENIX$$ symposium on operating systems design and implementation ($$OSDI$$ 16) . 265--283.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et almbox. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) , Vol. 38, 4 (2019), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amazon. 2021. Amazon SageMaker Neo uses Apache TVM for performance improvement on hardware target . https://aws.amazon.com/sagemaker/neo/.Google ScholarGoogle Scholar
  4. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 193--205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, and Shing-Chi Cheung. 2020. SemMT: A Semantic-based Testing Approach for Machine Translation Systems. arXiv preprint arXiv:2012.01815 (2020).Google ScholarGoogle Scholar
  6. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et almbox. 2018a. $$TVM$$: An automated end-to-end optimizing compiler for deep learning. In 13th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 18) . 578--594.Google ScholarGoogle Scholar
  7. Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018b. Learning to optimize tensor programs. Advances in Neural Information Processing Systems , Vol. 31 (2018), 3389--3400.Google ScholarGoogle Scholar
  8. Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases . Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong ?.Google ScholarGoogle Scholar
  9. Tsong Yueh Chen, Hing Leung, and IK Mak. 2004. Adaptive random testing. In Annual Asian Computing Science Conference . Springer, 320--329.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. choi. 2020. TVM Performance Degradation . https://discuss.tvm.apache.org/t/performance-has-been-too-slow-since-the-tvm-update/5865/7 .Google ScholarGoogle Scholar
  11. Al Danial. [n.,d.]. CLOC . https://goo.gl/3KFACB .Google ScholarGoogle Scholar
  12. Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Jianjun Zhao, and Yang Liu. 2018. Deepcruiser: Automated guided testing for stateful deep learning systems. arXiv preprint arXiv:1812.05339 (2018).Google ScholarGoogle Scholar
  13. Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic. 2018. Testing Probabilistic Programming Systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Saikat Dutta, Wenxian Zhang, Zixin Huang, and Sasa Misailovic. 2019. Storm: program reduction for testing and debugging probabilistic programming systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . 729--739.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying Implementation Bugs in Machine Learning Based Image Classifiers Using Metamorphic Testing. In ISSTA .Google ScholarGoogle Scholar
  16. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Michael J Flynn. 1972. Some computer organizations and their effectiveness. IEEE transactions on computers , Vol. 100, 9 (1972), 948--960.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In ACM ESEC/FSE. ACM, 498--510.Google ScholarGoogle Scholar
  19. Jianmin Guo, Yue Zhao, Xueying Han, Yu Jiang, and Jiaguang Sun. 2019 b. Rnn-test: Adversarial testing framework for recurrent neural network systems. arXiv preprint arXiv:1911.06155 (2019).Google ScholarGoogle Scholar
  20. Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE transactions on computer-aided design of integrated circuits and systems , Vol. 37, 1 (2017), 35--47.Google ScholarGoogle Scholar
  21. Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019 a. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810--822.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated testing for deep learning frameworks. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 486--498.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Shashij Gupta, Pinjia He, Clara Meister, and Zhendong Su. 2020. Machine translation testing via pathological invariance. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 863--875.Google ScholarGoogle Scholar
  24. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  25. Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-invariant testing for machine translation. In ICSE .Google ScholarGoogle Scholar
  26. Pinjia He, Clara Meister, and Zhendong Su. 2021. Testing Machine Translation via Referential Transparency. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 410--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bernease Herman. 2017. The promise and peril of human evaluation for model interpretability. arXiv preprint arXiv:1711.07414 (2017), 8.Google ScholarGoogle Scholar
  28. SA Hex-Rays. 2014. IDA Pro: a cross-platform multi-processor disassembler and debugger .Google ScholarGoogle Scholar
  29. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google ScholarGoogle Scholar
  31. Wei Huang, Youcheng Sun, Xiaowei Huang, and James Sharp. 2019. testrnn: Coverage-guided testing on recurrent neural networks. arXiv preprint arXiv:1906.08557 (2019).Google ScholarGoogle Scholar
  32. Texas Instruments. 2021. The AM335x microprocessors support TVM . https://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components/Machine_Learning/tvm.html .Google ScholarGoogle Scholar
  33. Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510--520.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alex Krizhevsky, Geoffrey Hinton, et almbox. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  35. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , Vol. 25 (2012), 1097--1105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (Palo Alto, California) (CGO '04). IEEE Computer Society, Washington, DC, USA, 75--.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In PLDI .Google ScholarGoogle Scholar
  38. Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In OOPSLA .Google ScholarGoogle Scholar
  39. Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, and Depei Qian. 2020. The deep learning compiler: A comprehensive survey. IEEE Transactions on Parallel and Distributed Systems , Vol. 32, 3 (2020), 708--727.Google ScholarGoogle ScholarCross RefCross Ref
  40. Zhibo Liu and Shuai Wang. 2020. How far we have come: Testing decompilation correctness of C decompilers. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 475--487.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou. 2020 b. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In 14th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 20) . 881--897.Google ScholarGoogle Scholar
  42. Pingchuan Ma and Shuai Wang. 2022. MT-Teql: Evaluating and Augmenting Neural NLIDB on Real-world Linguistic and Schema Variations. In PVLDB .Google ScholarGoogle Scholar
  43. Pingchuan Ma, Shuai Wang, and Jin Liu. 2020 a. Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models. In IJCAI . 458--465.Google ScholarGoogle Scholar
  44. Microsoft. 2020. onnxruntime . https://github.com/microsoft/onnxruntime .Google ScholarGoogle Scholar
  45. Microsoft. 2021. Microsoft Linear Algebra Subprograms . https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/core/mlas .Google ScholarGoogle Scholar
  46. Timothy Prickett Morgan. 2020. INSIDE FACEBOOK'S FUTURE RACK AND MICROSERVER IRON . https://www.nextplatform.com/2020/05/14/inside-facebooks-future-rack-and-microserver-iron/.Google ScholarGoogle Scholar
  47. MT-DLComp. 2021. MT-DLComp . https://github.com/Wilbur-Django/Testing-DNN-Compilers .Google ScholarGoogle Scholar
  48. Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics (TOG) , Vol. 35, 4 (2016), 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Varsha Nair, Moitrayee Chatterjee, Neda Tavakoli, Akbar Siami Namin, and Craig Snoeyink. 2020. Fast Fourier Transformation for Optimizing Convolutional Neural Networks in Object Recognition. arxiv: 2010.04257 [cs.CV]Google ScholarGoogle Scholar
  50. Shin Nakajima and Tsong Yueh Chen. 2019. Generating biased dataset for metamorphic testing of machine learning programs. In IFIP-ICTSS .Google ScholarGoogle Scholar
  51. Nvidia. 2021. NVVM IR . https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html .Google ScholarGoogle Scholar
  52. NXP. 2020. NXP uses Glow to optimize models for low-power NXP MCUs . https://www.nxp.com/company/blog/glow-compiler-optimizes-neural-networks-for-low-power-nxp-mcus:BL-OPTIMIZES-NEURAL-NETWORKS .Google ScholarGoogle Scholar
  53. OctoML. 2021. OctoML leverages TVM to optimize and deploy models . https://octoml.ai/features/maximize-performance/.Google ScholarGoogle Scholar
  54. Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875 (2018).Google ScholarGoogle Scholar
  55. ONNX. 2021. ONNX Zoo: A collection of pre-trained, state-of-the-art models in the ONNX format . https://github.com/onnx/models .Google ScholarGoogle Scholar
  56. Qi Pang, Yuanyuan Yuan, and Shuai Wang. 2021. MDPFuzzer: Finding Crash-Triggering State Sequences in Models Solving the Markov Decision Process. arXiv preprint arXiv:2112.02807 (2021).Google ScholarGoogle Scholar
  57. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et almbox. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037.Google ScholarGoogle Scholar
  58. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). ACM, New York, NY, USA, 1--18. https://doi.org/10.1145/3132747.3132785Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries. In Proceedings of the 41st International Conference on Software Engineering (ICSE '19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Qualcomm. 2020. Qualcomm contributes Hexagon DSP improvements to the Apache TVM community . https://developer.qualcomm.com/blog/tvm-open-source-compiler-now-includes-initial-support-qualcomm-hexagon-dsp .Google ScholarGoogle Scholar
  61. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices , Vol. 48, 6 (2013), 519--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, Online, 4902--4912. https://doi.org/10.18653/v1/2020.acl-main.442Google ScholarGoogle ScholarCross RefCross Ref
  63. Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et almbox. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google ScholarGoogle Scholar
  64. Sergio Segura, Gordon Fraser, Ana B Sanchez, and Antonio Ruiz-Cortés. 2016. A survey on metamorphic testing. IEEE Transactions on software engineering , Vol. 42, 9 (2016), 805--824.Google ScholarGoogle ScholarCross RefCross Ref
  65. Jinyang Shao. 2021. Testing Object Detection for Autonomous Driving Systems via 3D Reconstruction. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 117--119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Arnab Sharma and Heike Wehrheim. 2019. Testing machine learning algorithms for balanced data usage. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 125--135.Google ScholarGoogle ScholarCross RefCross Ref
  67. Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 968--980.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  69. Dokyung Song, Julian Lettner, Prabhu Rajasekaran, Yeoul Na, Stijn Volckaert, Per Larsen, and Michael Franz. 2019. SoK: sanitizing for security. In 2019 IEEE Symposium on Security and Privacy (SP) . IEEE, 1275--1295.Google ScholarGoogle ScholarCross RefCross Ref
  70. Ezekiel Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2020. Astraea: Grammar-based Fairness Testing. arXiv preprint arXiv:2010.02542 (2020).Google ScholarGoogle Scholar
  71. Siwakorn Srisakaokul, Zhengkai Wu, Angello Astorga, Oreoluwa Alebiosu, and Tao Xie. 2018. Multiple-implementation testing of supervised learning software. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence .Google ScholarGoogle Scholar
  72. Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. In OOPSLA .Google ScholarGoogle Scholar
  73. Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic testing and improvement of machine translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering . 974--985.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  75. TensorFlow. 2019. XLA: Optimizing Compiler for TensorFlow . https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html .Google ScholarGoogle Scholar
  76. Tensorflow. 2020. Tensorflow backend for ONNX (Open Neural Network Exchange) . https://pypi.org/project/onnx-tf/.Google ScholarGoogle Scholar
  77. Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, and Xiangyu Zhang. 2021. To what extent do DNN-based image classification models make unreliable inferences? Empirical Software Engineering , Vol. 26, 5 (2021), 1--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-neural-network-driven Autonomous Cars (ICSE '18).Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Vincent Tjeng, Kai Xiao, and Russ Tedrake. 2017. Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356 (2017).Google ScholarGoogle Scholar
  80. TVM. 2020. AlterOpLayout . https://tvm.apache.org/docs/api/python/relay/transform.html .Google ScholarGoogle Scholar
  81. Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay. 2018. Automated Directed Fairness Testing (ASE).Google ScholarGoogle Scholar
  82. Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham Ruderman, Keith Anderson, Nicolas Heess, Pushmeet Kohli, et almbox. 2018. Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures. arXiv preprint arXiv:1812.01647 (2018).Google ScholarGoogle Scholar
  83. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google ScholarGoogle Scholar
  84. Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi?. Springer, 167--188.Google ScholarGoogle Scholar
  85. Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network Through Model Mutation Testing (ICSE).Google ScholarGoogle Scholar
  86. Shuai Wang and Zhendong Su. 2020. Metamorphic Object Insertion for Testing Object Detection Systems. In ASE .Google ScholarGoogle Scholar
  87. Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788--799.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Sally Ward-Foxton. 2021. Google and Nvidia Tie in MLPerf; Graphcore and Habana Debut . https://www.eetimes.com/google-and-nvidia-tie-in-mlperf-graphcore-and-habana-debut .Google ScholarGoogle Scholar
  89. Xilinx. 2020. Xilinx support TVM on DPU . https://www.xilinx.com/html_docs/xilinx2019_2/vitis_doc/deploying_running.html .Google ScholarGoogle Scholar
  90. Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2021 a. Enhancing Deep Neural Networks Testing by Traversing Data Manifold. arXiv preprint arXiv:2112.01956 (2021).Google ScholarGoogle Scholar
  91. Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2021 b. You Can't See the Forest for Its Trees: Assessing Deep Neural Network Testing via NeuraL Coverage. arXiv preprint arXiv:2112.01955 (2021).Google ScholarGoogle Scholar
  92. Yuanyuan Yuan, Shuai Wang, Mingyue Jiang, and Tsong Yueh Chen. 2021 c. Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16908--16917.Google ScholarGoogle ScholarCross RefCross Ref
  93. Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye. 2012. Alphaz: A system for design space exploration in the polyhedral model. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 17--31.Google ScholarGoogle Scholar
  94. Andreas Zeller. 1999. Yesterday, my program worked. Today, it does not. Why? ACM SIGSOFT Software engineering notes , Vol. 24, 6 (1999), 253--267.Google ScholarGoogle Scholar
  95. Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020 a. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).Google ScholarGoogle Scholar
  96. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018b. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In ASE .Google ScholarGoogle Scholar
  97. Xufan Zhang, Ning Sun, Chunrong Fang, Jiawei Liu, Jia Liu, Dong Chai, Jiang Wang, and Zhenyu Chen. 2021. Predoo: precision testing of deep learning operators. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis . 400--412.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018a. An Empirical Study on TensorFlow Program Bugs (ISSTA 2018).Google ScholarGoogle Scholar
  99. Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020 b. Detecting numerical bugs in neural network architectures. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 826--837.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, et almbox. 2020 a. Ansor: Generating high-performance tensor programs for deep learning. In 14th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 20) . 863--879.Google ScholarGoogle Scholar
  101. Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020 b. Flextensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859--873.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Husheng Zhou, Wei Li, Yuankun Zhu, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. Deepbillboard: Systematic physical-world testing of autonomous driving systems (ICSE).Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Indre Zliobaite. 2017. Fairness-aware machine learning: a perspective. arXiv preprint arXiv:1708.00754 (2017).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!