The Devil Is in the Command Line: Associating the Compiler Flags With the Binary and Build Metadata

Engineers build large software systems for multiple architectures, operating systems, and configurations. A set of inconsistent or missing compiler flags generates code that catastrophically impacts the system's behavior. In the authors' industry experience, defects caused by an undesired combination of compiler flags are common in nontrivial software projects. We are unaware of any build and CI/CD systems that track how the compiler produces a specific binary in a structured manner. We postulate that a queryable database of how the compiler compiled and linked the software system will help to detect defects earlier and reduce the debugging time.


INTRODUCTION AND BACKGROUND
Compilers are software systems that translate programs "into a form in which it can be executed by a computer" [1].A C or C++ compiler such as Clang, GCC, or MSVC supports hundreds of command-line arguments (flags, options, switches).The compiler flags instruct compiler on different aspects of code generation, types of error detection, compliance to a specific version of the programming language standard, or target platform-specific nuances.An incorrect Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.Request permissions from permissions@acm.org.ICSE 2024, April 2024, Lisbon, Portugal © 2023 Association for Computing Machinery.ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. . .$15.00 https://doi.org/10.1145/nnnnnnn.nnnnnnncombination of compiler flags can have disastrous consequences for the resulting software system.For example, accidentally turning off the compiler flag to enable checks for buffer security to catch stack overflows (e.g., omitting the /GS option in MSVC) can expose a zero-day vulnerability [11,12].
Engineers can compile the same version of a software system to target different platforms and intents, such as debugging, profiling, or an official release.Typical implicit variables that influence the final set of compiler flags are the host operating system where the compiler executes, the target operating system where the code will run, the compiler version, dependencies available on the host system, and the desired build type [15,21].
Both commercial and open-source software utilizes a variety of build systems.The build system determines the conditions under which a compiler runs and what combination of compiler flags it passes to the compiler.Some of the most popular build systems are: Ant [22], Bazel [7], Buck2 [17], CMake [13], GNU Make [6], Ninja [16], and NMAKE [3].Each build system has different means of specifying the dependency graph, defining the rules for build actions, and how it initializes the default set of compiler flags.
Listing 1 displays a simple conditional statement that modifies the set of dependent libraries based on the host operating system. 1   Listing 1: Excerpt from the Redis Makefile.

# Linux ARM32 needs -latomic at linking time ifneq ( , $ ( findstring armv , $ ( uname_M ) ) ) FINAL_LIBS += -latomic endif
In Listing 2, we see a more complex conditional logic. 2 The build system disables the usage of a critical dependency (the jemalloc memory allocator) based on what the host and target platforms are.For an engineer to understand the details about how exactly the compiler generates code, they need to either intercept the compiler execution or inspect the resulting log files with the final command line that the compiler interprets.

INDUSTRY CHALLENGES
Our primary motivation for this paper comes from observing, debugging, and fixing the repeating patterns of defects.The root cause for these defects are the incorrect assumptions about how a compiler generated binaries for a particular software system.A possible negative interaction between compiler flags is a known problem [19].In our industry experience, the defects caused by incorrect (extraneous, missing, unsuitable) compiler flags have high consequences, are hard to detect and stealthy, and are time-consuming to investigate and replicate.
The primary categories of problems that we have encountered during the last two decades in the industry are as follows: (1) The differences between engineers' development environment and the official build servers.(5) Third-party software components rarely provide the build configuration used to generate the binaries.For example, a dependency can turn off the support for exception handling (e.g., specifying the -fno-exceptions flag in GCC).If the consumer assumes that it can catch exceptions, it invalidates the application's ability to handle errors.

Non-deterministic builds
Popular C++ compilers can generate code in a non-deterministic manner.Recompiling a translation unit using the same build configuration can result in a different binary [9,10].Similarly, modern build systems process the build graph efficiently by enforcing directed acyclic graph-like build dependencies [7,17].As a result, the order of object files listed during the linking stage will depend on which translation unit the compiler built first.That, in turn, can cause non-deterministic behavior [5].
Tracking the usage of compiler flags will decrease a subset of problems related to the reproducibility of the build environment [14].It will help with debugging and early detection of complex defects that influence the behavior of an entire software system.

INDUSTRY NEEDS
While the "[b]uild systems are awesome, terrifying-and unloved," they are something that each engineer uses daily [18].Modern build tools such as Bazel [7] and Buck2 [17], have made significant progress towards the build hermeticity [8] and making builds reproducible.However, developing a fully self-contained and deterministic build system has been a complex problem, even for a company like Microsoft [15,20,21].Popular utilities such as GNU Autotools are not hermetic by design and rely on the dependencies from the current execution environment [2].
Engineers need to have the means to understand the evolution of the final set of compiler flags for each binary throughout the project's history for each configuration.Currently, the "state-of-the-art solution" involves engineers inspecting the build logs as a text file and using tools such as diff to compare the results between two builds.If the build logs are not systematically archived, engineers must rebuild the entire system to understand the final set of compiler flags.Building multiple product versions to isolate a problem can take days or longer for complex software systems, such as an operating system.

POTENTIAL RESEARCH DIRECTIONS
An obvious solution is to parse the build logs, extract the necessary information, store it in the metadata associated with each resulting build, and provide a query interface for engineers to solve problems similar to the ones we enumerate in Section 2. Most industrial CI/CD systems enable associating an individual build with the test case results, the state of the source code repository when the compiler generated the binaries, and other related metadata.Adding the information about compiler flags is just another dimension of the metadata.
Another option is storing the compiler flags the build system uses inside each binary.For example, an ELF file format supports the .commentand .notesections in the final binary [23].However, that approach will require making changes to each used compiler.