When Function Inlining Meets WebAssembly: Counterintuitive Impacts on Runtime Performance

The WebAssembly standard defines a bytecode format serving as a compilation target for languages such as C, C++, and Rust. WebAssembly compilers are built on top of existing compiler infrastructures such as LLVM and newly developed compiler toolchains such as Binaryen, handling various new features of the WebAssembly language. However, we observe that both these new and existing infrastructures implicitly assume that the execution environments of native and WebAssembly applications are the same, ignoring the presence of browser compilers in the WebAssembly pipeline. This incorrect assumption often misguides function inlining optimizations, resulting in a slower WebAssembly module when function inlining is applied. This paper is the first to investigate the counterintuitive impacts of function inlining on WebAssembly runtime performance. We inspect the inlining optimization passes of the LLVM and Binaryen infrastructures used in the Emscripten C/C++-to-WebAssembly compiler. Our investigation on 127 C/C++ samples from the LLVM test suite shows that 66 samples exhibit counterintuitive behavior due to function inlining, particularly from inlining hot functions into long-running functions. We hope our findings motivate further work on revising existing optimizations with the unique characteristics of WebAssembly environments in mind.


INTRODUCTION
WebAssembly (abbreviated Wasm) [34] is a low-level, statically typed language aiming to serve as a universal compilation target for the Web.It is designed to be fast to compile and run; to be portable, i.e., language-, hardware-, and platform-independent; and to have formal type and memory safety guarantees.WebAssembly is supported on all four major browsers (i.e., Chrome, Firefox, Safari, and Edge) [51] and compiles from several programming languages, including C, C++, C#, Rust, and Go [26].Recent studies have shown that one out of every 600 websites use WebAssembly [35] for purposes such as games [40,69], cryptography [60,70], machine learning [66], and medical research [33,38].
WebAssembly compilers leverage the same compiler infrastructures as compilers of traditional languages.For example, the Emscripten C/C++-to-WebAssembly compiler [10], the Rustc compiler [17], and Intel's oneAPI compiler [6] all use the LLVM [7] compiler infrastructure.Unfortunately, we observe that WebAssembly compilers leverage existing infrastructures without considering the differences between WebAssembly and native applications.

Compilation Threads
Figure 1: Chromium tier-up process.In this example, function $main uses the Liftoff-generated code when first called as it is the only code available.$main calls $f1 which only has Liftoff code ready.$f2 uses the TurboFan-generated code as it is available at the first call.On the second call to $f1, its TurboFan-generated code is available and used for the call.
One of the substantial differences is that WebAssembly has the additional compilation layer at runtime running within browsers, generating the final machine code for WebAssembly instructions.Browsers, such as Chromium [2] and Firefox [13], typically include at least two WebAssembly compilers: a fast compiler emitting unoptimized code and a slow compiler emitting highly optimized code.Browsers use both compilers to ensure the machine code for WebAssembly functions is available early and can perform faster once the optimized code is available.When the optimized code is ready, the code is tiered-up on the following function call invocation by replacing the unoptimized code with the optimized code.The tiering-up process only occurs on a function call because the function (e.g., a heuristic on the function's code size) to determine if it is beneficial to inline.The Middle-end component passes the optimized IR to the CodeGen component to create a WebAssembly module.Next, the module is passed to Binaryen's wasm-opt tool [18], which applies Binaryen's set of optimization passes to the module.In Binaryen, function inlining is performed by the inlining-optimizing pass.Similar to the inline pass in LLVM, the inlining-optimizing pass moves function instructions into the location of the original call site if the calculated inlining cost is less than a threshold value.Differences between these passes include the IR structures that are inlined as LLVM can also inline its block structures.Besides, Binaryen can support partial inlining of early-return conditional statements [18].Figure 3 illustrates Binaryen's function inlining.Finally, the compilation pipeline outputs the optimized WebAssembly binary and JavaScript support code.

WebAssembly Execution Pipeline
The generated WebAssembly module and JavaScript files are run by a browser such as Chromium [2] or Firefox [54], which each have different internal compilers to generate machine code for the WebAssembly module.For example, Chromium is powered by the V8 JavaScript and WebAssembly engine [16], which includes two compilation engines to generate machine code for WebAssembly.The first compiler, Liftoff [25], is a single-pass compiler that emits machine instructions immediately after reading in a WebAssembly instruction at the expense of the number of optimizations that it applies.As a result, the Liftoff code can perform sub-optimally when executed.The second compiler, TurboFan [14], is a multi-pass compiler that applies several optimization passes to the machine code.While TurboFan generates faster code, this compiler takes much longer to generate code than Liftoff.To balance start-up speed with execution performance, Chromium first generates code for WebAssembly functions with Liftoff and immediately starts the TurboFan compilation.When the TurboFan code for a function is ready, the function code tiers-up by replacing the Liftoff code with the TurboFan code.Firefox uses the SpiderMonkey JavaScript and WebAssembly engine [13] to handle WebAssembly execution.Similar to V8, SpiderMonkey contains two compilation engines for WebAssembly.The first compiler, Wasm-Baseline, performs a fast translation of WebAssembly instructions to machine code for quick startup.The second engine, Wasm-Ion, applies optimizations on the emitted machine code.SpiderMonkey follows the tiering-up scheme by using Wasm-Baseline to emit machine code quickly while Wasm-Ion generates better-performing machine code.

COUNTERINTUITIVE INLINING EXAMPLE
We demonstrate how function inlining can counterintuitively impact runtime behavior using a sample benchmarking program, random.cpp,as an example.We present its source code and compiled WebAssembly code in Figure 4. We highlight the impact on two of the sample's functions when the function inlining is enabled and disabled.Figure 4(a) shows the C++ source code implementation of the functions gen_random and main.gen_random uses the constants IM, IA, and IC to generate a pseudo-random number.The main function calls gen_random in a long-running while loop performing 400 million iterations, making gen_random a hot function.Figure 4(b) shows the WebAssembly code of wasm-function [13] and wasm-function[14] when function inlining is disabled.The export section on line 180 shows that wasm-function [13] implements main.Inspecting the loop code within wasm-function [13] shows that it calls wasm-function [14] with the value 100.0 passed in as an argument, meaning that wasm-function[14] implements gen_random.Figure 4(c) shows the WebAssembly code for the main function, wasm-function [13], produced when inlining is enabled.Inspecting Figure 4(b) and Figure 4(c) reveals that wasm-function [14] from Figure 4(b) has been inlined into wasm-function [13].
When the Chromium browser runs this WebAssembly module, machine code for each function is first generated using the Liftoff compiler.Once this compiler finishes generating code for a function, the function can begin executing.In the background, the optimizing TurboFan compiler begins generating better-performing machine code for that function.When TurboFan finishes generating the machine code, the browser switches out the Liftoff-generated code for the TurboFan-generated code on the following function call.However, since main in a C program is only invoked once, the browser does not switch to the TurboFan-generated code.Because the hot function gen_random has been inlined into main, gen_random also uses the slower Liftoff code, and the program runtime performance is negatively impacted.This example shows how function inlining can cause counterintuitive runtime behavior in WebAssembly.

METHODOLOGY
We aim to understand the counterintuitive effects of function inlining on WebAssembly program runtime.We define a counterintuitive effect as producing a binary with a slower runtime performance than if the optimization was disabled.Specifically, we focus on the following research questions: • RQ1 -Significance: How often does function inlining counterintuitively impact WebAssembly modules, and are the effects unique to WebAssembly?• RQ2 -Function Characteristics: Which characteristics of the inlined functions cause the counterintuitive behavior?• RQ3 -Quantification: How does excluding certain functions from inlining impact the counterintuitive effects?
To answer these questions, we use samples from the LLVM test suite to perform five sets of experiments.Next, we discuss the C/C++ source programs and the experiments in detail.

C/C++ Source Programs
To measure the runtime performance impacts of different optimization configurations, we select 143 C/C++ samples totaling over 34,000 lines of code (LOC) from the LLVM test suite [8].The test suite contains benchmarking samples measuring LLVM compilation performance.We focus on the samples within the Single-Source/Benchmarks directory, listed in Table 2, as these samples are designed to trigger optimizations and can be compiled by Emscripten without code changes.We select this test suite for its inclusion of samples used in prior works and it ease of compilation.This test suite includes samples from the Polybench benchmark suite [59], which was used by Jangda et al. to compare WebAssembly

DISCUSSION 7.1 Limitations
Our investigation of WebAssembly performance suffers from two main limitations.First, the precision of our custom-built JavaScript measurement tool limits the depth of our investigation.Most browsers limit JavaScript timers to millisecond resolution [52], which is too coarse to measure a typical WebAssembly call.As a result, we focus on samples that have long running functions with runtimes in the magnitude of seconds.We also focus on samples with a percent decrease greater than 5% to account for the lack of precision.
Our second limitation is that we only inspect two browsers, Chromium and Firefox.Inspecting each browser adds additional manual work, and we are limited by our budget of manual effort available.We accept this limitation as Chromium-based browsers and Firefox account for 74% of the browser market share [1].

Threats to Validity
7.2.1 Internal Validity.Our study results are subject to possible errors in the manual inspection processes.We manually inspect the emitted code to ensure that function inlining is present or omitted as per the tested configuration.We use the average of 10 runs to ensure changes are not caused by small runtime variations.Multiple factors, such as hardware, operating system, and system load, make it difficult to reproduce the exact runtime values we record.However, we describe the steps used to establish our Baseline experiment.The counterintuitive behavior, relative to the baseline, should remain consistent across different experimental setups.

External Validity.
We use benchmarking samples from the LLVM test suite.As Emscripten is an LLVM-based compiler, we find that this collection of benchmarks curated by the LLVM development team is well-suited to assess the compilation effects caused by the inlining passes.The compiler benchmark samples also perform intensive computations, an intended use case of WebAssembly.

Construct Validity.
We identify the runtime impacts of function inlining optimizations by measuring the program runtime through browser execution timing, native execution timing, and event profiling tools.These measurement methods should highlight changes caused by different optimizations used in the samples.

Future Work
Our current work only investigates the counterintuitive behavior of two inlining optimization passes.Our measurements show cases where these two passes alone cannot explain the counterintuitive behavior, indicating that other optimization passes also cause the behavior.We plan to study the other LLVM and Binaryen passes for similar counterintuitive behavior.
Our current analysis only focuses on a single metric for counterintuitive behavior: runtime performance.We plan to investigate possible counterintuitive changes in other metrics, such as code size, memory usage, and energy consumption.

RELATED WORK
Compiler Optimizations.Existing work studies the impacts of different optimizations on specific processor architectures [28] and high-level synthesis [30].Some work proposes optimization frameworks improving SIMD performance [36].Other works leverage machine learning techniques on optimization selection [41,48].Theodoridis et al. describe LLVM inlining heuristics improvements in native applications [68].To our knowledge, our work is the first to study inlining performance in WebAssembly compilers.
Compiler Studies.Previous compiler studies investigate the prevalence of compiler bugs [62,67] and survey different compiler testing approaches [29].Other studies develop compiler testing techniques, such as equivalence modulo inputs [42,43] and skeletal program enumeration [74].
WebAssembly Performance Measurements.Yan et al. [72] find evidence of optimizations causing counterintuitive effects.Jangda et al. [37] compare the performance of C programs compiled to WebAssembly and native code.In contrast, our work focuses on effects of function inlining on WebAssembly applications.

CONCLUSION
Function inlining optimizations in WebAssembly compilers fail to consider the presence of multiple browser compilers, leading to runtime performance issues.We provide the first in-depth investigation on the counterintuitive impact that function inlining can have on WebAssembly modules.Inlining can prevent hot functionality in the modules from leveraging optimized machine code if the functions are inlined into long-running or seldomly invoked functions, leading to noticeable performance degradation of the whole application.We find that this behavior effects 66 out of 127 samples in the LLVM test suite and is caused by the inlining passes in both the LLVM and Binaryen components of Emscripten.We hope our work highlights the need to revisit existing optimization techniques for optimal WebAssembly usage.

DATA AVAILABILITY
We make our experiment results and data collection scripts available on Zenodo at https://zenodo.org/record/7041455[23].This artifact contains the measured runtime results for all of our experiments and the scripts used to run the experiments.