Abstract
Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence.
Supplemental Material
- Scott Bauer, Pascal Cuoq, and John Regehr. 2015. Deniable Backdoors using Compiler Bugs. PoC GTFO (2015), 7–9.Google Scholar
- Abdulazeez Boujarwah and Kassem Saleh. 1997. Compiler test case generation methods: a survey and assessment. Information and Software Technology (IST) 39 (1997), 617 – 625. Issue 9.Google Scholar
Cross Ref
- Colin Burgess and M. Saidi. 1996. The automatic generation of test cases for optimizing Fortran compilers. Information and Software Technology (IST) 38 (1996), 111 – 119. Issue 2.Google Scholar
- Cristian Cadar, Luís Pina, and John Regehr. 2015. Multi-Version Execution Defeats a Compiler-Bug-Based Backdoor. http://ccadar.blogspot.co.uk/2015/11/multi-version-execution-defeats.html .Google Scholar
- Cristian Cadar and Koushik Sen. 2013. Symbolic Execution for Software Testing: Three Decades Later. Communications of the Association for Computing Machinery (CACM) 56, 2 (2013), 82–90.Google Scholar
Digital Library
- T.Y. Chen, S.C. Cheung, and S.M. Yiu. 1998. Metamorphic testing: a new approach for generating next test cases. Technical Report HKUST-CS98-01. Hong Kong University of Science and Technology.Google Scholar
- Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. 2013. Taming Compiler Fuzzers. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’13).Google Scholar
Digital Library
- Yuting Chen, Ting Su, Chengnian Sun, Zhendong Su, and Jianjun Zhao. 2016. Coverage-directed differential testing of JVM implementations. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’16).Google Scholar
Digital Library
- Pascal Cuoq, Benjamin Monate, Anne Pacalet, Virgile Prevosto, John Regehr, Boris Yakobowski, and Xuejun Yang. 2012. Testing Static Analyzers with Randomly Generated Programs. In Proc. of the 4th International Conference on NASA Formal Methods.Google Scholar
Digital Library
- Brett Daniel, Danny Dig, Kely Garcia, and Darko Marinov. 2007. Automated Testing of Refactoring Engines. In Proc. of the joint meeting of the European Software Engineering Conference and the ACM Symposium on the Foundations of Software Engineering (ESEC/FSE’07).Google Scholar
Digital Library
- Alastair F. Donaldson, Hugues Evrard, Andrei Lascu, and Paul Thomson. 2017. Automated Testing of Graphics Shader Compilers. Proceedings of the ACM Programming Languages (PACMPL) 1, OOPSLA (2017), 93:1–93:29.Google Scholar
Digital Library
- Alastair F. Donaldson and Andrei Lascu. 2016. Metamorphic testing for (graphics) compilers. In Proc. of the International Workshop on Metamorphic Testing (MET’16).Google Scholar
- K.V. Hanford. 1970. Automatic generation of test cases. IBM Systems Journal 9 (1970), 242–257. Issue 4.Google Scholar
Digital Library
- Christian Holler, Kim Herzig, and Andreas Zeller. 2012. Fuzzing with Code Fragments. In Proc. of the 21st USENIX Security Symposium (USENIX Security’12).Google Scholar
- Petr Hosek and Cristian Cadar. 2015. Varan the Unbelievable: An efficient N-version execution framework. In Proc. of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15).Google Scholar
Digital Library
- Derek Jones. 2015. So you found a bug in my compiler: Whoopee do. http://shape-of-code.coding-guidelines.com/2015/12/ 07/so-you-found-a-bug-in-my-compiler-whoopee-do/ .Google Scholar
- Timotej Kapus and Cristian Cadar. 2017. Automatic Testing of Symbolic Execution Engines via Program Generation and Differential Testing. In Proc. of the 32nd IEEE International Conference on Automated Software Engineering (ASE’17).Google Scholar
Digital Library
- Alexander Kossatchev and Mikhail Posypkin. 2005. Survey of Compiler Testing Methods. Programming and Computing Software 31 (Jan. 2005), 10–19. Issue 1.Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proc. of the 2nd International Symposium on Code Generation and Optimization (CGO’04).Google Scholar
Digital Library
- Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler Validation via Equivalence Modulo Inputs. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’14).Google Scholar
Digital Library
- Vu Le, Chengnian Sun, and Zhendong Su. 2015a. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In Proc. of the 30th Annual Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA’15).Google Scholar
Digital Library
- Vu Le, Chengnian Sun, and Zhendong Su. 2015b. Randomized Stress-testing of Link-time Optimizers. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’15).Google Scholar
Digital Library
- Xavier Leroy. 2009. Formal verification of a realistic compiler. Communications of the Association for Computing Machinery (CACM) 52, 7 (2009), 107–115.Google Scholar
Digital Library
- Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F. Donaldson. 2015. Many-core compiler fuzzing. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’15).Google Scholar
- Nuno Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. 2015. Provably Correct Peephole Optimizations with Alive. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’15).Google Scholar
Digital Library
- Qingzhou Luo, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014. An Empirical Analysis of Flaky Tests. In Proc. of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE’14).Google Scholar
Digital Library
- Paul Dan Marinescu, Petr Hosek, and Cristian Cadar. 2014. Covrig: A Framework for the Analysis of Code, Test, and Coverage Evolution in Real Software. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’14).Google Scholar
Digital Library
- W. M. McKeeman. 1998. Differential testing for software. Digital Technical Journal 10 (1998), 100–107. Issue 1.Google Scholar
- Eriko Nagai, Atsushi Hashimoto, and Nagisa Ishiura. 2014. Reinforcing random testing of arithmetic optimization of C compilers by scaling up size and number of expressions. IPSJ Transactions on System LSI Design Methodology 7 (2014), 91–100.Google Scholar
- Kazuhiro Nakamura and Nagisa Ishiura. 2016. Random testing of C compilers based on test program generation by equivalence transformation. In 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). 676–679.Google Scholar
Cross Ref
- Paul Purdom. 1972. A sentence generator for testing parsers. BIT Numerical Mathematics 12 (1972), 366–375. Issue 3.Google Scholar
Cross Ref
- John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’12).Google Scholar
Digital Library
- Richard L. Sauder. 1962. A General Test Data Generator for COBOL. In Proc. of the 1962 Spring Joint Computer Conference (AIEE-IRE’62 Spring).Google Scholar
Digital Library
- Sergio Segura, Gordon Fraser, Ana Sanchez, and Antonio Ruiz-Cortés. 2016. A Survey on Metamorphic Testing. (2016).Google Scholar
- Chengnian Sun, Vu Le, and Zhendong Su. 2016a. Finding compiler bugs via live code mutation. In Proc. of the 31st Annual Conference on Object-Oriented Programming Systems, Languages and Applications (OOPSLA’16).Google Scholar
Digital Library
- Chengnian Sun, Vu Le, Qirun Zhang, and Zhendong Su. 2016b. Toward Understanding Compiler Bugs in GCC and LLVM. In Proc. of the International Symposium on Software Testing and Analysis (ISSTA’16).Google Scholar
Digital Library
- Qiuming Tao, Wei Wu, Chen Zhao, and Wuwei Shen. 2010. An Automatic Testing Approach for Compiler Based on Metamorphic Testing Technique. In Proc. of the 17th Asia-Pacific Software Engineering Conference (ASPEC’10).Google Scholar
Digital Library
- B.A. Wichmann. 1998. Some Remarks about Random Testing. http://www.npl.co.uk/upload/pdf/random_testing.pdf .Google Scholar
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’11).Google Scholar
Digital Library
- Yarpgen. 2018. https://github.com/intel/yarpgen .Google Scholar
- Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In Proc. of the Conference on Programing Language Design and Implementation (PLDI’17).Google Scholar
Digital Library
Index Terms
Compiler fuzzing: how much does it matter?
Recommendations
GrayC: Greybox Fuzzing of Compilers and Analysers for C
ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and AnalysisFuzzing of compilers and code analysers has led to a large number of bugs being found and fixed in widely-used frameworks such as LLVM, GCC and Frama-C. Most such fuzzing techniques have taken a blackbox approach, with compilers and code analysers ...
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively
AbstractCompiler fuzzing techniques require a means of generating programs that are free from undefined behaviour (UB) to reliably reveal miscompilation bugs. Existing program generators such as Csmith achieve UB-freedom by heavily restricting the form of ...
FuzzBoost: Reinforcement Compiler Fuzzing
Information and Communications SecurityAbstractEnforcing the correctness of compilers is important for the current computing systems. Fuzzing is an efficient way to find security vulnerabilities in software by repeatedly testing programs with enormous modified, or fuzzed input data. However, ...






Comments