Abstract
The prosperous trend of deploying deep neural network (DNN) models to diverse hardware platforms has boosted the development of deep learning (DL) compilers. DL compilers take the high-level DNN model specifications as input and generate optimized DNN executables for diverse hardware architectures like CPUs, GPUs, and various hardware accelerators. Compiling DNN models into high-efficiency executables is not easy: the compilation procedure often involves converting high-level model specifications into several different intermediate representations (IR), e.g., graph IR and operator IR, and performing rule-based or learning-based optimizations from both platform-independent and platform-dependent perspectives. Despite the prosperous adoption of DL compilers in real-world scenarios, principled and systematic understanding toward the correctness of DL compilers does not yet exist. To fill this critical gap, this paper introduces MT-DLComp, a metamorphic testing framework specifically designed for DL compilers to effectively uncover erroneous compilations. Our approach leverages deliberately-designed metamorphic relations (MRs) to launch semantics-preserving mutations toward DNN models to generate their variants. This way, DL compilers can be automatically examined for compilation correctness utilizing DNN models and their variants without requiring manual intervention. We also develop a set of practical techniques to realize an effective workflow and localize identified error-revealing inputs. Real-world DL compilers exhibit a high level of engineering quality. Nevertheless, we detected over 435 inputs that can result in erroneous compilations in four popular DL compilers, all of which are industry-strength products maintained by Amazon, Facebook, Microsoft, and Google. While the discovered error-triggering inputs do not cause the DL compilers to crash directly, they can lead to the generation of incorrect DNN executables. With substantial manual effort and help from the DL compiler developers, we uncovered four bugs in these DL compilers by debugging them using the error-triggering inputs. Our proposed testing frameworks and findings can be used to guide developers in their efforts to improve DL compilers.
- Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et almbox. 2016. Tensorflow: A system for large-scale machine learning. In 12th $$USENIX$$ symposium on operating systems design and implementation ($$OSDI$$ 16) . 265--283.Google Scholar
Digital Library
- Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et almbox. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) , Vol. 38, 4 (2019), 1--12.Google Scholar
Digital Library
- Amazon. 2021. Amazon SageMaker Neo uses Apache TVM for performance improvement on hardware target . https://aws.amazon.com/sagemaker/neo/.Google Scholar
- Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 193--205.Google Scholar
Digital Library
- Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, and Shing-Chi Cheung. 2020. SemMT: A Semantic-based Testing Approach for Machine Translation Systems. arXiv preprint arXiv:2012.01815 (2020).Google Scholar
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et almbox. 2018a. $$TVM$$: An automated end-to-end optimizing compiler for deep learning. In 13th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 18) . 578--594.Google Scholar
- Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018b. Learning to optimize tensor programs. Advances in Neural Information Processing Systems , Vol. 31 (2018), 3389--3400.Google Scholar
- Tsong Y Chen, Shing C Cheung, and Shiu Ming Yiu. 1998. Metamorphic testing: a new approach for generating next test cases . Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, Hong Kong ?.Google Scholar
- Tsong Yueh Chen, Hing Leung, and IK Mak. 2004. Adaptive random testing. In Annual Asian Computing Science Conference . Springer, 320--329.Google Scholar
Digital Library
- choi. 2020. TVM Performance Degradation . https://discuss.tvm.apache.org/t/performance-has-been-too-slow-since-the-tvm-update/5865/7 .Google Scholar
- Al Danial. [n.,d.]. CLOC . https://goo.gl/3KFACB .Google Scholar
- Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Jianjun Zhao, and Yang Liu. 2018. Deepcruiser: Automated guided testing for stateful deep learning systems. arXiv preprint arXiv:1812.05339 (2018).Google Scholar
- Saikat Dutta, Owolabi Legunsen, Zixin Huang, and Sasa Misailovic. 2018. Testing Probabilistic Programming Systems. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018).Google Scholar
Digital Library
- Saikat Dutta, Wenxian Zhang, Zixin Huang, and Sasa Misailovic. 2019. Storm: program reduction for testing and debugging probabilistic programming systems. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . 729--739.Google Scholar
Digital Library
- Anurag Dwarakanath, Manish Ahuja, Samarth Sikand, Raghotham M. Rao, R. P. Jagadeesh Chandra Bose, Neville Dubash, and Sanjay Podder. 2018. Identifying Implementation Bugs in Machine Learning Based Image Classifiers Using Metamorphic Testing. In ISSTA .Google Scholar
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. 214--226.Google Scholar
Digital Library
- Michael J Flynn. 1972. Some computer organizations and their effectiveness. IEEE transactions on computers , Vol. 100, 9 (1972), 948--960.Google Scholar
Digital Library
- Sainyam Galhotra, Yuriy Brun, and Alexandra Meliou. 2017. Fairness testing: testing software for discrimination. In ACM ESEC/FSE. ACM, 498--510.Google Scholar
- Jianmin Guo, Yue Zhao, Xueying Han, Yu Jiang, and Jiaguang Sun. 2019 b. Rnn-test: Adversarial testing framework for recurrent neural network systems. arXiv preprint arXiv:1911.06155 (2019).Google Scholar
- Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-eye: A complete design flow for mapping cnn onto embedded fpga. IEEE transactions on computer-aided design of integrated circuits and systems , Vol. 37, 1 (2017), 35--47.Google Scholar
- Qianyu Guo, Sen Chen, Xiaofei Xie, Lei Ma, Qiang Hu, Hongtao Liu, Yang Liu, Jianjun Zhao, and Xiaohong Li. 2019 a. An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 810--822.Google Scholar
Digital Library
- Qianyu Guo, Xiaofei Xie, Yi Li, Xiaoyu Zhang, Yang Liu, Xiaohong Li, and Chao Shen. 2020. Audee: Automated testing for deep learning frameworks. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 486--498.Google Scholar
Digital Library
- Shashij Gupta, Pinjia He, Clara Meister, and Zhendong Su. 2020. Machine translation testing via pathological invariance. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 863--875.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google Scholar
Cross Ref
- Pinjia He, Clara Meister, and Zhendong Su. 2020. Structure-invariant testing for machine translation. In ICSE .Google Scholar
- Pinjia He, Clara Meister, and Zhendong Su. 2021. Testing Machine Translation via Referential Transparency. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 410--422.Google Scholar
Digital Library
- Bernease Herman. 2017. The promise and peril of human evaluation for model interpretability. arXiv preprint arXiv:1711.07414 (2017), 8.Google Scholar
- SA Hex-Rays. 2014. IDA Pro: a cross-platform multi-processor disassembler and debugger .Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation , Vol. 9, 8 (1997), 1735--1780.Google Scholar
Digital Library
- Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).Google Scholar
- Wei Huang, Youcheng Sun, Xiaowei Huang, and James Sharp. 2019. testrnn: Coverage-guided testing on recurrent neural networks. arXiv preprint arXiv:1906.08557 (2019).Google Scholar
- Texas Instruments. 2021. The AM335x microprocessors support TVM . https://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components/Machine_Learning/tvm.html .Google Scholar
- Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A comprehensive study on deep learning bug characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 510--520.Google Scholar
Digital Library
- Alex Krizhevsky, Geoffrey Hinton, et almbox. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems , Vol. 25 (2012), 1097--1105.Google Scholar
Digital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (Palo Alto, California) (CGO '04). IEEE Computer Society, Washington, DC, USA, 75--.Google Scholar
Digital Library
- Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In PLDI .Google Scholar
- Vu Le, Chengnian Sun, and Zhendong Su. 2015. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In OOPSLA .Google Scholar
- Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, and Depei Qian. 2020. The deep learning compiler: A comprehensive survey. IEEE Transactions on Parallel and Distributed Systems , Vol. 32, 3 (2020), 708--727.Google Scholar
Cross Ref
- Zhibo Liu and Shuai Wang. 2020. How far we have come: Testing decompilation correctness of C decompilers. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. 475--487.Google Scholar
Digital Library
- Lingxiao Ma, Zhiqiang Xie, Zhi Yang, Jilong Xue, Youshan Miao, Wei Cui, Wenxiang Hu, Fan Yang, Lintao Zhang, and Lidong Zhou. 2020 b. Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks. In 14th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 20) . 881--897.Google Scholar
- Pingchuan Ma and Shuai Wang. 2022. MT-Teql: Evaluating and Augmenting Neural NLIDB on Real-world Linguistic and Schema Variations. In PVLDB .Google Scholar
- Pingchuan Ma, Shuai Wang, and Jin Liu. 2020 a. Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models. In IJCAI . 458--465.Google Scholar
- Microsoft. 2020. onnxruntime . https://github.com/microsoft/onnxruntime .Google Scholar
- Microsoft. 2021. Microsoft Linear Algebra Subprograms . https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/core/mlas .Google Scholar
- Timothy Prickett Morgan. 2020. INSIDE FACEBOOK'S FUTURE RACK AND MICROSERVER IRON . https://www.nextplatform.com/2020/05/14/inside-facebooks-future-rack-and-microserver-iron/.Google Scholar
- MT-DLComp. 2021. MT-DLComp . https://github.com/Wilbur-Django/Testing-DNN-Compilers .Google Scholar
- Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically scheduling halide image processing pipelines. ACM Transactions on Graphics (TOG) , Vol. 35, 4 (2016), 1--11.Google Scholar
Digital Library
- Varsha Nair, Moitrayee Chatterjee, Neda Tavakoli, Akbar Siami Namin, and Craig Snoeyink. 2020. Fast Fourier Transformation for Optimizing Convolutional Neural Networks in Object Recognition. arxiv: 2010.04257 [cs.CV]Google Scholar
- Shin Nakajima and Tsong Yueh Chen. 2019. Generating biased dataset for metamorphic testing of machine learning programs. In IFIP-ICTSS .Google Scholar
- Nvidia. 2021. NVVM IR . https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html .Google Scholar
- NXP. 2020. NXP uses Glow to optimize models for low-power NXP MCUs . https://www.nxp.com/company/blog/glow-compiler-optimizes-neural-networks-for-low-power-nxp-mcus:BL-OPTIMIZES-NEURAL-NETWORKS .Google Scholar
- OctoML. 2021. OctoML leverages TVM to optimize and deploy models . https://octoml.ai/features/maximize-performance/.Google Scholar
- Augustus Odena and Ian Goodfellow. 2018. Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. arXiv preprint arXiv:1807.10875 (2018).Google Scholar
- ONNX. 2021. ONNX Zoo: A collection of pre-trained, state-of-the-art models in the ONNX format . https://github.com/onnx/models .Google Scholar
- Qi Pang, Yuanyuan Yuan, and Shuai Wang. 2021. MDPFuzzer: Finding Crash-Triggering State Sequences in Models Solving the Markov Decision Process. arXiv preprint arXiv:2112.02807 (2021).Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et almbox. 2019. Pytorch: An imperative style, high-performance deep learning library. In Advances in neural information processing systems. 8026--8037.Google Scholar
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated Whitebox Testing of Deep Learning Systems. In Proceedings of the 26th Symposium on Operating Systems Principles (Shanghai, China) (SOSP '17). ACM, New York, NY, USA, 1--18. https://doi.org/10.1145/3132747.3132785Google Scholar
Digital Library
- Hung Viet Pham, Thibaud Lutellier, Weizhen Qi, and Lin Tan. 2019. CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries. In Proceedings of the 41st International Conference on Software Engineering (ICSE '19).Google Scholar
Digital Library
- Qualcomm. 2020. Qualcomm contributes Hexagon DSP improvements to the Apache TVM community . https://developer.qualcomm.com/blog/tvm-open-source-compiler-now-includes-initial-support-qualcomm-hexagon-dsp .Google Scholar
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Acm Sigplan Notices , Vol. 48, 6 (2013), 519--530.Google Scholar
Digital Library
- Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond Accuracy: Behavioral Testing of NLP Models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, Online, 4902--4912. https://doi.org/10.18653/v1/2020.acl-main.442Google Scholar
Cross Ref
- Nadav Rotem, Jordan Fix, Saleem Abdulrasool, Garret Catron, Summer Deng, Roman Dzhabarov, Nick Gibson, James Hegeman, Meghan Lele, Roman Levenstein, et almbox. 2018. Glow: Graph lowering compiler techniques for neural networks. arXiv preprint arXiv:1805.00907 (2018).Google Scholar
- Sergio Segura, Gordon Fraser, Ana B Sanchez, and Antonio Ruiz-Cortés. 2016. A survey on metamorphic testing. IEEE Transactions on software engineering , Vol. 42, 9 (2016), 805--824.Google Scholar
Cross Ref
- Jinyang Shao. 2021. Testing Object Detection for Autonomous Driving Systems via 3D Reconstruction. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). IEEE, 117--119.Google Scholar
Digital Library
- Arnab Sharma and Heike Wehrheim. 2019. Testing machine learning algorithms for balanced data usage. In 2019 12th IEEE Conference on Software Testing, Validation and Verification (ICST). IEEE, 125--135.Google Scholar
Cross Ref
- Qingchao Shen, Haoyang Ma, Junjie Chen, Yongqiang Tian, Shing-Chi Cheung, and Xiang Chen. 2021. A comprehensive study of deep learning compiler bugs. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 968--980.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google Scholar
- Dokyung Song, Julian Lettner, Prabhu Rajasekaran, Yeoul Na, Stijn Volckaert, Per Larsen, and Michael Franz. 2019. SoK: sanitizing for security. In 2019 IEEE Symposium on Security and Privacy (SP) . IEEE, 1275--1295.Google Scholar
Cross Ref
- Ezekiel Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2020. Astraea: Grammar-based Fairness Testing. arXiv preprint arXiv:2010.02542 (2020).Google Scholar
- Siwakorn Srisakaokul, Zhengkai Wu, Angello Astorga, Oreoluwa Alebiosu, and Tao Xie. 2018. Multiple-implementation testing of supervised learning software. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence .Google Scholar
- Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding Compiler Bugs via Live Code Mutation. In OOPSLA .Google Scholar
- Zeyu Sun, Jie M Zhang, Mark Harman, Mike Papadakis, and Lu Zhang. 2020. Automatic testing and improvement of machine translation. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering . 974--985.Google Scholar
Digital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google Scholar
Cross Ref
- TensorFlow. 2019. XLA: Optimizing Compiler for TensorFlow . https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html .Google Scholar
- Tensorflow. 2020. Tensorflow backend for ONNX (Open Neural Network Exchange) . https://pypi.org/project/onnx-tf/.Google Scholar
- Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, and Xiangyu Zhang. 2021. To what extent do DNN-based image classification models make unreliable inferences? Empirical Software Engineering , Vol. 26, 5 (2021), 1--40.Google Scholar
Digital Library
- Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated Testing of Deep-neural-network-driven Autonomous Cars (ICSE '18).Google Scholar
Digital Library
- Vincent Tjeng, Kai Xiao, and Russ Tedrake. 2017. Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:1711.07356 (2017).Google Scholar
- TVM. 2020. AlterOpLayout . https://tvm.apache.org/docs/api/python/relay/transform.html .Google Scholar
- Sakshi Udeshi, Pryanshu Arora, and Sudipta Chattopadhyay. 2018. Automated Directed Fairness Testing (ASE).Google Scholar
- Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, Tom Erez, Avraham Ruderman, Keith Anderson, Nicolas Heess, Pushmeet Kohli, et almbox. 2018. Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures. arXiv preprint arXiv:1812.01647 (2018).Google Scholar
- Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).Google Scholar
- Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi?. Springer, 167--188.Google Scholar
- Jingyi Wang, Guoliang Dong, Jun Sun, Xinyu Wang, and Peixin Zhang. 2019. Adversarial Sample Detection for Deep Neural Network Through Model Mutation Testing (ICSE).Google Scholar
- Shuai Wang and Zhendong Su. 2020. Metamorphic Object Insertion for Testing Object Detection Systems. In ASE .Google Scholar
- Zan Wang, Ming Yan, Junjie Chen, Shuang Liu, and Dongdi Zhang. 2020. Deep learning library testing via effective model generation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 788--799.Google Scholar
Digital Library
- Sally Ward-Foxton. 2021. Google and Nvidia Tie in MLPerf; Graphcore and Habana Debut . https://www.eetimes.com/google-and-nvidia-tie-in-mlperf-graphcore-and-habana-debut .Google Scholar
- Xilinx. 2020. Xilinx support TVM on DPU . https://www.xilinx.com/html_docs/xilinx2019_2/vitis_doc/deploying_running.html .Google Scholar
- Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2021 a. Enhancing Deep Neural Networks Testing by Traversing Data Manifold. arXiv preprint arXiv:2112.01956 (2021).Google Scholar
- Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2021 b. You Can't See the Forest for Its Trees: Assessing Deep Neural Network Testing via NeuraL Coverage. arXiv preprint arXiv:2112.01955 (2021).Google Scholar
- Yuanyuan Yuan, Shuai Wang, Mingyue Jiang, and Tsong Yueh Chen. 2021 c. Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic Testing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16908--16917.Google Scholar
Cross Ref
- Tomofumi Yuki, Gautam Gupta, DaeGon Kim, Tanveer Pathan, and Sanjay Rajopadhye. 2012. Alphaz: A system for design space exploration in the polyhedral model. In International Workshop on Languages and Compilers for Parallel Computing. Springer, 17--31.Google Scholar
- Andreas Zeller. 1999. Yesterday, my program worked. Today, it does not. Why? ACM SIGSOFT Software engineering notes , Vol. 24, 6 (1999), 253--267.Google Scholar
- Jie M Zhang, Mark Harman, Lei Ma, and Yang Liu. 2020 a. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering (2020).Google Scholar
- Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018b. DeepRoad: GAN-based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems. In ASE .Google Scholar
- Xufan Zhang, Ning Sun, Chunrong Fang, Jiawei Liu, Jia Liu, Dong Chai, Jiang Wang, and Zhenyu Chen. 2021. Predoo: precision testing of deep learning operators. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis . 400--412.Google Scholar
Digital Library
- Yuhao Zhang, Yifan Chen, Shing-Chi Cheung, Yingfei Xiong, and Lu Zhang. 2018a. An Empirical Study on TensorFlow Program Bugs (ISSTA 2018).Google Scholar
- Yuhao Zhang, Luyao Ren, Liqian Chen, Yingfei Xiong, Shing-Chi Cheung, and Tao Xie. 2020 b. Detecting numerical bugs in neural network architectures. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 826--837.Google Scholar
Digital Library
- Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, et almbox. 2020 a. Ansor: Generating high-performance tensor programs for deep learning. In 14th $$USENIX$$ Symposium on Operating Systems Design and Implementation ($$OSDI$$ 20) . 863--879.Google Scholar
- Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020 b. Flextensor: An automatic schedule exploration and optimization framework for tensor computation on heterogeneous system. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859--873.Google Scholar
Digital Library
- Husheng Zhou, Wei Li, Yuankun Zhu, Yuqun Zhang, Bei Yu, Lingming Zhang, and Cong Liu. 2020. Deepbillboard: Systematic physical-world testing of autonomous driving systems (ICSE).Google Scholar
Digital Library
- Indre Zliobaite. 2017. Fairness-aware machine learning: a perspective. arXiv preprint arXiv:1708.00754 (2017).Google Scholar
Recommendations
Metamorphic Testing of Deep Learning Compilers
SIGMETRICS '22The prosperous trend of deploying deep neural network (DNN) models to diverse hardware platforms has boosted the development of deep learning (DL) compilers. DL compilers take high-level DNN model specifications as input and generate optimized DNN ...
Metamorphic Testing of Deep Learning Compilers
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer SystemsThe prosperous trend of deploying deep neural network (DNN) models to diverse hardware platforms has boosted the development of deep learning (DL) compilers. DL compilers take high-level DNN model specifications as input and generate optimized DNN ...
An Automatic Testing Approach for Compiler Based on Metamorphic Testing Technique
APSEC '10: Proceedings of the 2010 Asia Pacific Software Engineering ConferenceCompilers play an important role in software development, and it is quite necessary to perform abundant testing to ensure the correctness of compilers. A critical task in compiler testing is to validate the semantic-soundness property which requires ...






Comments