A C Subset for Ergonomic Source-to-Source Analyses and Transformations

Modern compiled software, written in languages such as C, relies on complex compiler infrastructure. However, developing new transformations and improving existing ones can be challenging for researchers and engineers. Often, transformations must be implemented bymodifying the compiler itself, which may not be feasible, for technical or legal reasons. Source-to-source compilers make it possible to directly analyse and transform the original source, making transformations portable across different compilers, and allowing rapid research and prototyping of code transformations. However, this approach has the drawback of exposing the researcher to the full breadth of the source language, which is often more extensive and complex than the IRs used in traditional compilers. In this work, we propose a solution to tame the complexity of the source language and make source-to-source compilers an ergonomic platform for program analysis and transformation. We define a simpler subset of the C language that can implement the same programs with fewer constructs and implement a set of source-to-source transformations that automatically normalise the input source code into equivalent programs expressed in the proposed subset. Finally, we implement a function inlining transformation that targets the subset as a case study. We show that for this case study, the assumptions afforded by using a simpler language subset greatly improves the number of cases the transformation can be applied, increasing the average success rate from 37%, before normalisation, to 97%, after normalisation. We also evaluate the performance of several benchmarks after applying a naive inlining algorithm, and obtained a 12% performance improvement in certain applications, after compiling with the flag O2, both in Clang and GCC, suggesting there is room for exploring source-level transformations as a complement to traditional compilers.


INTRODUCTION
Modern software written in languages such as C allow the authors to abstract themselves away from the implementation details of the machine or lower level abstractions, and focus on encoding the functional requirements of their application.To do so, they rely on modern compilation toolchains, such as LLVM and GCC, to optimise the programs and ensure they satisfy non-functional requirements, such as performance.
Moreover, developers usually expect that these transformations will be applied automatically and choose a level of optimisation that balances performance with compilation time and the number of performed optimisations, or might manually enable or disable certain optimisations or insert directives in the source code to guide the compiler.
These transformations are usually applied to an intermediate representation (IR), that lowers the code away from language-specific constructs without introducing platform-dependent concepts [7], and we consider that they can usually be classified in two broad classes, optimisations, which directly contribute to improving the programs' performance, and enabling transformations, which, irrespective of their direct performance impact, might enable or facilitate further optimisations.An example of an optimisation is constant folding [1], which directly reduces the amount of work done at runtime, while an example of an enabling transformation would be function inlining [27], that may or may not improve performance, but has the effect of enabling further optimisations across function boundaries [11,15].

Motivation
To research, prototype, and implement novel compilation techniques, the traditional approach has been for researchers to extend existing compilers and implement transformation passes on their low-level IR.This process presents a number of challenges: (1) Because IRs have a lower level of abstraction, some semantic details of the high-level language that might be useful in performing the transformation are frequently lost when translating to that IR, or are only recovered after performing complex analysis of the low-level code [31].(2) The low level of abstraction of most IRs (e.g.LLVM-IR [19]) ties the effort to a specific computing model, e.g. the von Neumann machine, even when the semantics of the transformation are applicable to a larger range of computing models, if encoded in terms of the high-level language [20].(3) Because each compiler tends to have its own Intermediate Representation, supporting transformations that have been encoded in one IR to another compiler is non-trivial and results in duplication of effort.Essentially compiler passes become tied to a specific compiler.(4) Often it is not feasible to modify the compiler.This may be a practical issue (coordination with upstream developers) or a regulatory issue: some projects may need to use certified compilers that need to be kept simple, or the cost of certifying them becomes prohibitive.(5) The low-level IR, together with the development environment of traditional compilers, can be a significant entry barrier with a steep learning curve for many people interested in exploring code transformations.
A complementary approach that has been proposed that addresses these issues is the use of source-to-source compilers as the first step in the compilation toolchain [4,17,26].Source-to-source compilers (also commonly called transpilers) are compilers whose output, instead of being code at a lower level than the input code (e.g.assembly), is code at the same or a different, but still high-level, human-readable language.By encoding the semantics of the transformation in terms of the source language, challenge 1 is addressed, as the semantic model of the the original source code is preserved.Considering that languages such as C have gained popularity as programming languages for different kinds of hardware (e.g.GPUs, FPGAs), this shows that the approach is not restricted to a specific target architecture, addressing challenge 2. By generating as the output a program in the source language, challenges 3 and 4 are addressed, as the ability to reuse the underlying compilation infrastructure is preserved, and existing optimisation work can be re-purposed.By working at the source-level, the programmer can use an already familiar model, the high-level language, and several source-to-source approaches allowing the implementation of transformations without changes to the compiler itself, as interpreted scripts in popular languages (e.g.Python [30], JavaScript [6]), addressing challenge 5.However, this approach is not without caveats [8,12].One of the most important is that a high-level language, by design, presents an extensive set of syntactical and semantic constructs leading to increased complexity in the analyses [22].
To control this complexity, compiler developers have begun implementing multiple tiers of IRs, aiming to balance different levels of expressibility (higher-level abstractions) and parsimony (lower-level primitives).However, this approach of restricting the semantics of the language to a subset by construction eschews the ease-of-use benefits of source-to-source compilation, by requiring compiler developers to work with even more models and languages.
Our work presents an alternative approach that can be taken, specifically within the context of a source-to-source compiler.We present a technique to simplify the high-level language by a process of subtraction.This process exploits the wealth of primitives present in the high-level language, by rewriting specialised language constructs in terms of their more general counterparts.That way, we are left with a simpler program, still in the source language, that can be confidently targeted by simpler analyses and transformations.
We propose a simpler subset of the C language that can implement the same programs with fewer constructs, and a set of source-to-source transformations that automatically normalise the input source code into equivalent programs expressed in the proposed subset.These transformations have been implemented as scripts for the source-to-source compiler Clava [6].Additionally, we have implemented a function inlining transformation that targets the subset as a case study, and show that the assumptions afforded by using a simpler language subset greatly improves the number of cases the transformation can be applied.

RELATED WORK
To meet the challenges presented in Section 1.1, several distinct approaches have been proposed and developed.

Developments in IR-based optimisation approaches
As high-level languages have expanded their breadth to capture more semantic details, several compiler infrastructures have introduced their own IR between the high-level language Abstract Syntax Tree (AST) and the underlying optimiser and its low-level IR.These representations generally work on an intermediate level of abstraction, in which the set of syntactic constructs is restricted, and the semantic details of the language are encoded, as opposed to being eliminated in the process of lowering to a low-level IR.Some examples of this approach include the Rust compiler, which lowers the programs written in the language to their internal IR, MIR [18], eliminating several syntax constructs to facilitate analyses such as liveness, death and reachability checking, borrow checking (memory safety); the Swift compiler, which lowers programs to the Swift Intermediate Language (SIL) [29] and provides dataflow analyses that enforce Swift language requirements and high-level optimisation passes; and the Julia compiler, which lowers the programs to a Static Single-Assignment (SSA) form representation [5] after performing macro inlining and other syntax simplification tasks, in order to perform middle-end optimisation tasks.
While the intermediate level of abstraction might be appropriate for some optimisations, it is limited by several factors.In particular, it still ties the optimisation work to a single compiler implementation, and necessitate a lowering of the level of abstraction, which might impede optimisation efforts that depend on high-level details of the programming language.This effect is exacerbated when considering the use cases of finding optimisations for codes that make use of embedded DSLs or libraries [31].
Another recent contribution in this space is MLIR [20], part of the LLVM project.This novel approach to intermediate representation aims to solve several deficiencies of LLVM-IR's approach to compilation, namely shortcomings in dealing with heterogeneous targets and non-scalar data.
To that end, it formalises a system of intermediate representation with the following characteristics: • Ability to capture multiple levels of abstraction through the use of nested blocks of operations, and the use of attributes to capture semantic details of tagged code regions.• Separation of concerns between different translation passes through the definition of dialects, static traits and dynamic interfaces.• Ability to define schemas for non-scalar data.
• Declarative approach to defining code transformations, including the ability to progressively and partially lower the abstraction level of code regions.
A practical example of this approach is the recently proposed Clang Intermediate Representation [21] (CIR).This representation leverages MLIR's ability to define dialects and transformations within and between them to provide an intermediate language that does not necessarily rely on the syntactic constructs of languages such as C, but allows important semantic concepts from these languages to be abstractly represented and reasoned about in the analyses, before being morphed into other, lower-level IR dialects.
While this approach might prove valuable in its stated goal of generalising and driving a wide goal of compiler projects, the generic nature of the representation may worsen the ergonomics of developing code transformations.In particular, the transformation implementer must learn the semantics of the IR representation, which is constructed to support the semantics of the language but not derived from it, and how to perform transformations on it.

C Intermediate Language: Source-to-Source normalisation of C code
Another work of particular relevance is the C Intermediate Language [23].This project is similar to, and can be in some ways considered to be a precursor of, our work, as it similarly recognises that the C language has a wealth of complex constructs hampering a straightforward analysis process of source programs.To tackle this issue, it constructs a high-level representation that preserves most semantic aspects of the C source code, but simplifies several aspects of the language, including removing redundant constructs and syntactic sugar, making implicit casts explicit, separating value evaluation, side-effect creation and control-flow changes.It also incorporates a Control Flow Graph into the representation, to simplify some analyses.After converting to this representation, it applies any transformations that the user has specified, using an embedded DSL in OCaml, and outputs the transformed program in C.
As close as this approach may be to our work, there are still some different trade-offs between this approach and ours.For instance, this approach still works by constructing a new representation that, while mapping closely to the source language, still is separate from it.While this allows for a significant transfer of domain knowledge compared to other IR-based approaches, it still is not as close to the source language as we desired.For instance, valid C constructions cannot be directly inserted, if not supported by the representation.Additionally, when implementing further transformations, CIL requires them to be written using a limited interface based on OCaml and using only a visitor pattern, which we consider that increases the learning curve and hinders adoption.

C Source-to-Source Compilers
Several C source-to-source compilers have been proposed and used for multiple purposes, such as performance tuning [14], instrumentation for checkpointing [10], or automatic parallelisation [24].The ROSE [26] compiler supports C/C++ and FORTRAN, and supports generic AST-based transformations by directly extending the compilation framework, which is written in C++.Examples of already provided transformations are auto-parallelisation of loops, as well as optimisations such as loop fissioning and fusion.Cetus [3] is written in Java and also supports AST-based generic transformations.However, it only supports ANSI C, and needs to be extended in order to implement custom transformations.The Artisan approach [30] is focused on source-to-source compilation for heterogeneous platforms.It accepts analysis and transformations implemented as Python scripts, and its main use case is hardware/software partitioning targeting CPU + FPGA systems.Clava [6] uses Clang's frontend tooling to support a set of C-family programming languages (e.g.C, C++, OpenCL, CUDA), but uses its own AST, which is similar to Clang's.Analyses and transformations are implemented as JavaScript scripts, which are passed to the tool as arguments.
After evaluating several source-to-source compilers, we elected to use Clava.It is being actively developed, has good support for C, and being able to specify the transformations in JavaScript without modifying the tool itself is an important factor for the ergonomics we are striving for.Regarding the proposed approach, we consider that using source-to-source compilation allows us to provide a simpler development experience to users, by relying on the structure and semantics of a widely supported high-level language, and assure the users that if they are able to implement the transformation on high-level code, they will be able to automate it.By relying on source-code normalisation and simplification passes, the provided representation will be derived from the high-level language by subtraction, which can be immediately be understood as a "simpler C".

A NORMALIZED SUBSET OF THE C LANGUAGE
An important factor for an intermediate representation for code analysis and transformation is a balance between parsimony -making sure that only the language elements that are needed to represent the domain are exposed -and expressiveness -exposing a set of language elements that is rich enough to represent the domain in a natural and concise way.Considering the use case of an ergonomic approach for research and implementation of automatic yet inspectable program transformations and optimizations, we established a set of principles to navigate this trade-off, and arrived at a normalized or canonical subset of the C and C++ languages to use as our intermediate representation.

Simplified, but structured, control flow
Control flow in C is performed by using different kinds of statements: selection statements, iteration statements and unstructured control flow statements.For each of these categories, we considered whether the corresponding statements could be replaced with a set of patterns using fewer primitives, while avoiding replacing structured control flow statements with equivalent unstructured control flow statements, whenever possible.

Selection statements.
C uses the if statement as the main pattern for structured selection control flows.This statement takes an expression as a condition, and, depending on the evaluation being a non-zero (truthy) or zero (falsy) value, it will execute the statement contained in a respective statement (termed as the then or else branch of the statement).We consider this statement must be part of the subset, and normalization of other selection statements should result in if-statements.The ternary operator can be considered a selection statement.It takes an expression as a condition, and evaluates to different expressions depending on whether the condition evaluates to a truthy or falsy value.During normalisation, the ternary operator is rewritten in terms of the if-statement.This transformation furthermore requires that non-linear control flow in evaluated expressions is performed explicitly, as detailed in Section 3.2.The switch (case) statement evaluates a condition and jumps to an execution path based not on the truth value of a condition, but by comparing it to a set of values (the cases of the statement).Unfortunately, this statement in C does not exhibit structured control flow by default, since in the absence of a break statement, the flow of execution falls through to the code path of subsequent cases in the code.Nonetheless, by analysing the existence or nonexistence of break statements and using code duplication, it is possible to guarantee structured control flow in most common cases.However, in the general case there is no guarantee of structured control flow (e.g., Duff's device [13]), and it was infeasible for the scope of this work to devise a general transformation for the switch statement.No transformation regarding switch statements was implemented, either for the general or the restricted case.
3.1.2Iteration statements.Iteration statements are statements where a condition is repeatedly evaluated, and a set of statements are executed as long as the condition holds.The C language defines three types of statements to perform this control flow: the while statement, the do-while statement, and the traditional for statement.While these statements could be rewritten in terms of selection statements and unconditional jumps, the pattern of iteration is important enough to warrant explicit expression, and at least one of the iteration statements should be kept.Thus we chose to keep as a primitive the while statement, which is the conceptually simpler of the three.Do-while statements are then transformed to a while statement by peeling the first iteration in a separate scope before the transformed loop, and for statements are transformed into a while statement by moving the initialisation statement to before a while loop, and the step statement to the end of the body of the loop, all inside a new scope.

Unstructured control flow.
The last category of statements that we considered were unstructured control flow statements, which include break statements, continue statements, early return statements, long jumps and go-to statements.These statements, in the general case, cannot be converted to equivalent structured statements without introducing meaningfully complex code changes,so for this work we decided against implementing transformations to remove them, except in two cases.In the case where for statements were converted to while statements, it was necessary to transform continue statements into equivalent go-to statements to a label before the step expression at the end of the loop.Secondly, having multiple return statements in a function can greatly increase the complexity of certain analysis and transformations,so we developed a transformation to remove early returns from a function by introducing unconditional jumps to the end of a function and by using an auxiliary variable.

Simpler Expressions
C allows complex expressions which may contain elements that perform side-effects or transfer control flow in some way.Some examples of this complexity may include divergent control flow in evaluation (such as when using ternary operators), side-effects during expression evaluation (such as when evaluating increment and decrement operators), and interprocedural jumps (such as when evaluating a call).

R-value simplification.
In order to separate the expression's evaluation from its use, we decided that any use of an expression (such as in a call parameter or in a header of a selection or iteration statement) could only be made with an immediate or an existing variable, and that the evaluation of any complex expression ought to be simplified by using one or more temporary variables.
Following this idea, the rvalue of a variable assignment can be: • An immediate or an existing variable.
• One or more unary operators without side-effects applied in succession to an immediate or an existing variable (e.g.unary decrement and increment operators are not allowed).• A binary operator, whose arguments are an immediate, an existing variable, or another binary operator, to which this restriction on allowed arguments applies recursively.Assignment operators are not allowed.• An interprocedural call, whose arguments can only be an immediate or an existing variable.
In general, these restrictions should be understood to adhere to the principle that expressions should pertain only to combinations of pre-existing and primitive values.Other concerns should be dealt with externally to expression evaluation.Of course, these restrictions imply that a range of valid expressions must be rewritten using our restricted set of constructs.Other expressions need only to be decomposed.In those cases, we insert temporary variables to contain the intermediate results of those expressions' evaluation.After this process we are left with a sequence of statements to be placed before the expression evaluation, a new value for the expression, and a sequence of statements to be placed after the evaluation.

Explicit side-effect evaluation.
There are several expressions in C that can also effect changes on the program state.These expressions, while useful when writing expressive, high-level code, can complicate automated reasoning about the program.Therefore we rewrite these expressions to separate value computation and side-effect application.
The first kind of expressions to be rewritten are the ones that use operators that perform side-effects on the variables they are operating on, i.e., compound assignment, increment and decrement operators.Since these operators simultaneously read and change a variable,we insert a separate statement performing the side-effect, and to reference the results of the change explicitly, according to the semantics of each operator: • Compound assignment: the assignment will be performed in a previous statement, and the lvalue of the assignment will be subsequently used in the computed expression.• Pre-increment/decrement unary operator: the increment or decrement will be inserted before the expression evaluation, and the target will be used in the computed expression.• Post-increment/decrement unary operator: the target will be used in the computed expression, and the operation will be inserted as a subsequent statement.
Afterwards, the isolated unary increment and decrement operators can safely be rewritten to a simple assignment with a sum.

Overall simplification of the language and removal of superfluous elements
The remaining simplifications consist of rote substitutions of convenience operators for their more verbose primitive versions, and other simple transformations: • Breaking up variable declaration statements into several statements, to ensure that only one variable is declared in each statement.This simplifies the processing of declarations.• Separating variable initialization from variable declaration, by placing the initial value in a separate assignment statement (excluding array variables).This guarantees that attributions are only in assignments, avoiding the need to check declarations.• Removing compound assignment operators, by replacing with assignments whose rvalue is the primitive operator corresponding to the complex assignment.This reduces the number of operators that need to be taken into account.• Removing variable shadowing.When variables are declared in inner scopes with the same name as an existing variable in an outer scope, they are renamed.This guarantees that variable names are unique, and can be used as keys.

CASE STUDY: FUNCTION INLINING
Function inlining is a code transformation that replaces a call to another procedure for the body of that very same procedure [27].
It is a very important and versatile transformation used by most compilers, with many potential benefits.First, functions calls have some associated overhead [9].Also, the procedure might be located in a very disparate part of memory, hitting performance due to decreased spatial locality.More importantly, having full access to the code of the called procedure can enable further optimisations [11,15].However, function inlining can also degrade performance.Larger functions usually contain a higher number of variables, which increases register pressure.Also, if the code size increases above a certain threshold, this can cause additional cache misses, or even page faults.Modern compilers use sophisticated heuristics to decide whether to inline or not a function call [16,25,28] , and having function inlining available as a source-to-source transformation can help to more easily perform experiments regarding this transformation across several compilers.

Inlining in a source-to-source context
Conceptually speaking, inlining is a relatively simple operation.The compiler must take the text of the called function, replace the call with said text, adjust references to the arguments of the function to point to the parameters passed into the call, and store the resulting expression, if it gets used in the called context.However, in the general case, high-level languages such as C do not exhibit a strict separation between assignment expressions, side-effects and control flow, and syntax for statements and expressions is not perfectly composable, so there are many factors that will impede this straight-forward implementation of inlining: • References to caller lvalues and function parameters: C function calls provide their parameters by value.This means that a function can modify the value of its parameters, without modifying said values on the caller's end.This means than an inlining transformation must copy all call arguments and reference those copies in the inlined function code • Calls contained in compound expressions and statement headers: In C, function calls can be nested in contexts where the text of the function, which can be comprised of several statements, cannot be inlined in a way that is syntactically valid, such as in if or for statement headers, or in the middle of complex expressions.• Early return statements: Early return statements allow a function to avoid executing the entirety of its code.When inlining functions that make use of that ability, the assumption that a return statement can simply be replaced with an assignment to a variable containing the result is no longer valid, an thus impede a straight-forward implementation of inlining.
Fortunately, the following difficulties can be addressed with the normalisation process described in Section 3, in order to enable the application of a simplified form of inlining in an increased number of situations.

Implementation
Assuming we apply transformations to normalise the program to the proposed language subset, we are able to implement a transformation to inline function calls, if the call in question is either an expression statement within a scope, or is the right side of an assignment expression statement.To perform the transformation at the source level, we use the following steps: (1) Replace the call or assignment in the caller with an empty scope.
(2) Insert variable declarations corresponding to the function's arguments, and assign the value of the call's arguments to those variables.(3) Copy the AST nodes representing the function's body.(4) If the function returns a value, and the caller assigned the value of the call to a variable, replace the return statement with an assignment, whose left hand side is the caller's result variable and the right hand side is the function's return expression.(5) Rename any variable whose name has a conflict with the caller's variables.

EXPERIMENTAL EVALUATION
We devised two experiments to validate our approach: A Comparing a pre-existing inlining transformation's effectiveness with and without normalization, against the new inlining transformation.B Determining if the inlining optimization shows any performance effects.We prepared a replication package1 that contains all the scripts used for obtaining the results in this work, and instructions on how to reproduce them.

Experimental Setup
For the experimental evaluation, we considered two sets of benchmarks, CHStone and NASA Advanced Supercomputing Parallel Benchmarks (NAS), each from a different application area, embedded systems and high-performance computing, respectively.Additionally, we created a set of three applications to stress test the inliner.All experiments where done on a machine with 2× Intel Xeon CPU E5-2630 v3 @ 2.40GHz and 128GB, running Ubuntu 20.04.05 LTS.We used two C compilers, Clang v10.0.0 and GCC v9.4.0.The values of LoC where obtained with the program cloc v1.942 , and the source code transformation scripts where applied using the source-to-source compiler Clava3 built from commit 11d7354.

Benchmark Characterization
A characterization of the benchmarks can be seen in Table 1, which contains lines-of-code count (LoC) and number of ASTs nodes (Nodes) for each unmodified application (Original), and how many times the base value increased (LoC Inc. and Nodes Inc.) in two situations: after applying subset normalization (Normalized) and after applying inlining recursively to all calls found inside the main function (Inlined).
The original application sizes vary greatly, from a few dozens lines of code (e.g.vec_multi) to a couple thousand (e.g.NAS-BT-W), and from a couple hundred AST nodes to tens of thousands.While lines of code is a metric that is more easily understood, we consider that the number of nodes provides additional information about the complexity of the code.For instance, we can see that normalizing and inlining an application increases both LoC and Nodes, but the increase in LoC is considerably higher than in AST nodes.Usually this happens because of the way the transformations affect those metrics.A transformation that separates a variable declaration from its initialization doubles the lines of code count, but slightly increases the node count, since most nodes of the original declaration are reused and only a small number of new nodes are introduced.
Taking this into account, we can see that applying the subset normalization transformations to this set of benchmarks increases the number of nodes by at most 2.5×, with an harmonic mean of 1.3×.After applying inlining, as expected the increase is significantly higher, going as high as a 23× node increase, with an harmonic mean of 4×.For very big applications this can be an issue, although in this experiment we are applying inlining very aggressively, in a way it would not be usually applied, and can be seen as an upper bound on the size after inlining.

Experiment A: Function inlining implementations with and without normalization
Besides the inlining transformation presented in this work (see Section 4), the Clava compiler already has a built-in inline transformation [2].To compare both implementations of inlining, for each application and each call, we tried to inline just that call and register if the inlining was successful or not.We only considered inlinable calls, that is, we ignored calls to functions whose implementation was not available, or that where calls to functions in system headers (e.g.printf).We tested all calls instead of only the first call of each function because where and how the call is used can affect the success of inlining.
Table 2 shows the result of this experiment.It contains, for each application, the number of inlinable calls present in the code (#Calls), as well as the inlining success rate of the built-in inliner (Prev.)and the proposed inliner (New), before and after the application code was normalized.In some cases, normalization increases the number of calls (e.g.CHS-motion, k_means).This is an effect of some normalization transformations, such as the one that decomposes expressions on the headers of iteration and selection statements.We can see that, before normalization, the new inliner for most cases is already able to inline the same amount or more calls than the previous inliner (the exceptions being NAS-BT-W and k_means), albeit with a success rate that varies between 22% and 59% (harmonic mean of 37%).After applying normalization, the success rate of the new inliner dramatically increases, varying between 81% and 100% (harmonic mean of 97%).The reason for this improvement is We consider this shows the enabling effect of the normalization process, in particular if the transformation is designed with the normalized subset in mind.Except for a single case (i.e.vec_multi), the previous inliner could not take advantage of the normalized code.

Experiment B: Performance Effects of Inlining
To evaluate the performance effects of inlining, we measured the runtime performance of the applications with and without inlining (Table 3).We excluded the CHStone applications since their execution time was under 1ms on the tested machine.We tested two compilers, GCC and Clang, with two optimization levels, O0 (no optimizations enabled) and O2 (standard optimizations).For the inlined version we applied the inlining strategy of recursively inlining all the calls present in the main function, after normalization.The execution times are an average of 8 executions.We have also measured the execution times before and after applying normalization.Applying normalization before compiling with standard optimizations (i.e.O2) caused little to no impact in execution performance (difference within 2 percentage points, for benchmarks that run for more than 100ms), while compiling without optimizations (i.e.O0) caused a small but noticeable performance degradation (up to 14 percentage points).This experiment also guarantees that any performance change observed after inlining when applying standard optimizations is due to the inlining itself, and not the normalization.
After applying the described inlining strategy, under the O0 optimization level we expected that inlining small functions that are called often would result in performance gains, and that inlining bigger functions with less frequent calls would have smaller gains or even negative performance effects.We observed this effect in the benchmark NAS-BT-W, in the form of performance degradation of 0.74% and 0.67% of the original execution time when compiling with Clang and GCC, respectively, and in the benchmark NAS-FT-W, in the form of a performance improvement of around 3.8×, for both compilers.
Under the O2 optimization level, we expected that there would be little performance effect, as the compilers under test already should perform several code optimization transformations, including inlining, and so the source code level transformation would have less of an effect.This is the observed effect, with two surprising exceptions: the inlined version NAS-BT-W showed a performance improvement of 12% when compiled with Clang, and NAS-FT-W showed a performance improvement of 13% when compiled with GCC.Keep in mind that for this experiment the objective was not to improve performance, since we are applying a very naive inlining algorithm, but see if applying inlining at the source code level would have any effect in performance after compilation.We consider these are promising results, showing that there is room for exploring source-level transformations as a complement to traditional compilers, especially when considering that the developed transformations are still prototypical in nature.

CONCLUSIONS
Our main contribution in this work was establishing a suitable theoretical framework for subtractively building a subset of C, especially considering the breadth and depth of constructs of the language.We consider that the principles that we identified can be foundational and useful when applying this exercise to other languages and scenarios and, even if the subset that we derived was by no means exhaustive or perfectly adequate, it will surely serve as a good first draft to keep iterating on a better C subset that could be targeted by source-to-source compiler users to perform analyses and transformations.Generally speaking, we think we were successful in meeting our objectives along this facet.
We also implemented a suite of automated transformations that can transform C programs into a normalised subset of the language that is more amenable for source code transformations.Our case study on function inlining produced interesting results, the normalisation pass was able to greatly enhance the action of our newly implemented inlining transformation and we observed some promising performance results when benchmarking the effect of inlining on the execution time of a set of moderate sized applications.

Future Work
There are some limitations in our work that should be addressed in the future: • Further work is needed to validate the compatibility of our normalisation and inlining with other programs source code, compiler toolchains, computing platforms, etc. • Additional work is required regarding automatic dependency resolution between transformation passes and checking desirable properties of transformation passes such as idempotency.
• Future work will evaluate the possibility of using the for loop as an additional primitive for loop representation.For loops are generally the focus of many tools (e.g.OpenMP parallelisation) as they are easier to analyse and to infer their behaviour.However, the general conversion of while loops (the chosen primitive to represent all loops) to for loops requires complicated analysis and is a non-trivial transformation.Conversely, it is very simple to do the opposite.
We again express our hope that this work can be a foundation for another avenue of exploration within the compiler research world, and that other developers will be excited to work on successors to, and competitors of, this project.Compiler research deserves to be a topic that is approachable by all kinds of engineers, students and scientists.

Table 1 :
Characterization of the benchmarks used for the experimental evaluation.

Table 2 :
Effects of different inlining implementations with and without normalization.
g. foo();), or simple assignments from the value of a call to a variable (e.g. a = foo();).These are feasible cases for source-tosource inlining, and the only two cases supported by the new inliner.

Table 3 :
Runtime performance (ms) for each original program and corresponding speedup after inlining, segmented by compiler.