Indexed Types for a Statically Safe WebAssembly

We present Wasm-prechk, a superset of WebAssembly (Wasm) that uses indexed types to express and check simple constraints over program values. This additional static reasoning enables safely removing dynamic safety checks from Wasm, such as memory bounds checks. We implement Wasm-prechk as an extension of the Wasmtime compiler and runtime, evaluate the run-time and compile-time performance of Wasm-prechk vs WebAssembly configurations with explicit dynamic checks, and find an average run-time performance gain of 1.71x faster in the widely used PolyBenchC benchmark suite, for a small overhead in binary size (7.18% larger) and type-checking time (1.4% slower). We also prove type and memory safety of Wasm-prechk, prove Wasm safely embeds into Wasm-prechk ensuring backwards compatibility, prove Wasm-prechk type-erases to Wasm, and discuss design and implementation trade-offs.


INTRODUCTION
WebAssembly (Wasm) is a low-level language designed to work well in the browser environment [Haas et al. 2017].It has a small binary footprint and supports streaming execution (i.e., execution can safely begin before the entire program has been downloaded).Wasm is also designed to be fast, outperforming JavaScript, and can be used to speed-up intensive computations within webpages.It is also safe-Wasm enforces a separation of code and data, uses a simple static types system, and is proven type and memory safe.
Although Wasm is type and memory safe, it relies on potentially costly dynamic checks for these safety guarantees in some important instructions.Errors raised by these checks are always fatal for Wasm, ensuring safety, although possibly complicating software development.They can be costly in terms of run-time performance, too.Jangda et al. [2019] measured that Wasm runs between 1.45-1.55xslower than corresponding native code.Their root cause analysis attributes part of this to the dynamic checks required by Wasm runtimes, particularly dynamic checks on indirect function calls.Our analysis, discussed in Section 6, finds explicit dynamic memory bounds checks cause an average of 1.76x slowdown.This demonstrates the significance of finding a strategy to safely avoid performing these explicit dynamic checks.
To mitigate the costs of dynamic checks on memory operations, the runtime for a Wasm module can reserve sufficient virtual memory to represent an entire 32-bit address space (4GiB1 ), and mark addresses outside the memory bounds as inaccessible.In many environments, such as in the browser, this works well, since browsers often use a lot of memory and run on end-user machines with a 64-bit address space and therefore plenty of virtual memory.In practice, this makes memory bounds checks essentially free 2 .
While this approach may work in the browser, it relies on the following assumptions: • Wasm modules can address only a 32-bit address space.
• Wasm modules are running on a 64-bit architecture and operating system.
• 4GiB of virtual memory is available.
• The system provides efficient virtual memory with permissions.
However, these assumptions do not hold in some contexts, and may not continue to hold in general.Wasm has become popular as an intermediate language and efficient virtual machine/sandbox for many purposes, and has continued to grow beyond its original design.The Memory64 proposal 3 extends Wasm to include 64-bit addressable memory.Some embedded systems only provide 32-bit (or smaller) address spaces 4 .Wasm is being experimented with as a replacement for high-overhead containers in serverless applications 5 ; in this context, limiting virtual memory is useful to provide a hard resource limit to a Wasm module.Wasm is used as an intermediate language for optimizing GPGPU computations [Ginzburg et al. 2023], and GPUs do not provide the necessary virtual memory abstractions for efficient bounds checks.
We claim that Wasm can be redesigned with a stronger type system to mitigate the performance overhead of dynamic safety checks without the above assumptions of the runtime environment.To test our hypothesis, we design, implement, and evaluate Wasm-precheck, an extension of Wasm with an indexed type system that can statically check the safety preconditions for each Wasm instruction that requires dynamic checks.Indexed types equip a type system with the ability to statically enforce constraints on run-time values, refining a type to a subset of values of that type [Zenger 1997].This ability is key to static reasoning about the low-level patterns in Wasm.
Using Wasm-precheck's stronger type system, we can safely remove dynamic checks both in theory and in practice.We prove type safety of Wasm-precheck Section 4.3, which implies welltyped programs without (some or all) dynamic checks never get "stuck" (or access uninitialized memory).We implement Wasm-precheck in an extension of the Wasmtime compiler [Bytecode Alliance 2019] and evaluate run-time and compile-time performance on the PolyBenchC benchmark suite [Pouchet and Yuki 2016].The type system enables safely removing most dynamic checks, in practice by moving checks out of loops.This yields an average performance speed-up of 1.71x over Wasm_dyn, a configuration of Wasm with explict dynamic checks.This speed-up comes with a small overhead in binary size (7.18%larger) and time taken to type-check the program (1.4% slower) Section 6.
We pay attention to design decisions that would affect adoption and implementation of Wasmprecheck.To ensure backwards compatibility, we show that Wasm programs can be automatically embedded into Wasm-precheck, and that all Wasm-precheck programs erase to well-typed Wasm programs (possibly with more dynamic checks).Wasm-precheck does not fix a particular constraint solving algorithm, although we provide a prototype implementation; developers may choose their own trade-offs between the compile-time cost of a more expressive type system vs. the benefits of additional static reasoning.We elaborate on this in Section 2 and Section 3.3.2.We also discuss how Wasm-precheck could be interpreted as a specification of a sound static analysis of Wasm, in case the addition of annotations to the surface language is undesirable or infeasible (Section 8).
In short, our contributions are: • The formal model of Wasm-precheck, an extension of Wasm that, provided sufficient type annotations, requires no dynamic checks for type and memory safety (Section 3).• An implementation of Wasm-precheck in an extension of the Wasmtime compiler, and a reference implementation of the Wasm-precheck formal model in Redex (Section 5).• A performance analysis comparing Wasm-precheck to various configurations of Wasm (Section 6).• A proof of type safety for Wasm-precheck (Section 4.3).
• A proof of backwards compatibility-all well-typed Wasm programs automatically embed into Wasm-precheck, possibly with more dynamic checks than necessary (Section 4.1).• A proof that Wasm-precheck introduces no new dynamic behaviours-all well-typed Wasmprecheck programs erase to well-typed Wasm, potentially with extra dynamic checks (Section 4.2).

MAIN IDEAS
Wasm is unusual for low-level performance-oriented languages in that it provides a strong type safety guarantee.Wasm programs are guaranteed to be type and memory safe-if a well-typed program terminates, it either runs to a value of the expected type, or raises a well-defined dynamic error [Haas et al. 2017].Importantly, this rules out undefined behaviour such as accessing out-ofbounds memory, or casting integers to labels.
Unfortunately, not all undefined behaviour can be caught statically by Wasm's type system.Consider the reduction rule for a binary operation.
( .const 1 ) ( .const 2 )  .↩→  .const where  =  ( 1 ,  2 ) This small-step relation  * ↩→  * reduces instructions in a stack machine.A sequence  * (0 or more instructions ) represents the stack of values and instructions to be executed.The instruction  .const 2 indicates a value  2 of type  on the stack.The  .instruction expects 2 operands of type  on the top of the stack, and reduces them to the value  produced by the binary operation.Since all programs are well typed, this operation must succeed and produce a constant of type , so our semantics need not perform any dynamic checks on the values  1 and  2 .
Except when  is division, in which case there is a well-typed value for which  ( 1 ,  2 ) is undefined, namely, when  2 = 0. 0 is a valid value of type i32, so the type system allows an undefined operation.We require a second reduction rule for  to make division well defined.
( .const 1 ) ( .const 2 )  .↩→ trap where  = div and  2 = 0 This is unfortunate; now every division operation must perform a dynamic check, possibly raising an error in production we could have caught in development, and costing run-time performance (although, this cost is irrelevant for division).
Idea: The safe dynamic semantics are a specification for a static reasoning system.
While Wasm has a strong static type system, it only provides coarse reasoning about types such as i32, but cannot express the fine-grained precondition for the well-definedness of division.The type judgement has the shape  ⊢  * :  expresses a precondition that values of type  * 1 are on the stack prior to executing the instruction (sequence), and a postcondition that values of type  * 2 are on the stack after execution.For example, this is the Wasm typing rule for binary operations: A binary operation, such as division, expects two values of type  (either 32-bit integer or float) on the stack; after executing the operation, there should be a single value of type  on the stack.This is insufficient to reason about whether  2 = 0.
We use division as a running example, but Wasm features several instructions with the same problem.These include memory accesses, which require dynamic memory bounds checks, and indirect function calls, which require dynamic type checks.
Wasm has a strong type system, but it is simple (in the sense of simply-typed -calculus).It is capable only of expressing invariants such as "a binary operation takes two integers", but not all safety conditions required by the run-time system.In particular, it is insufficient to express the true type of division: a binary operation on two integers such that the divisor is non-zero.However, there are some kinds of type systems that are capable of expressing such invariants.
Idea: An indexed type system suffices to express the conditions under which dynamic checks can be removed from each Wasm instruction.An indexed type system essentially changes the language of types from simple types to a (typically decidable) predicate logic with constraints between (representations of) values.Types are indexed by a name representing the run-time value for the term of that type.The type system collects and solves constraints between these values.For example, the following is (a simplification of) the typing rule for statically safe division.
A type ( ) represents a value  of type  on the stack.The name  is initially unconstrained, representing an unknown value.We modify the type system to collect a system of constraints  between these names.In this typing rule, the safe division operation is well typed if the constraint set  guarantees that the second operand, named  2 in the type system, cannot be 0.This is easily decided by a solver for our logic.In the postcondition, we add a new constraint that  3 is equal to  1 divided by  2 .This does not perform the division at type checking time, since  1 and  2 may not have known concrete values, but adds this constraint to the constraint set.
This typing rule prechecks all the safety criteria for the division instruction statically; once the type system is satisfied, this instruction can never be the source of a trap, and requires no dynamic checks.Formally, we see this by needing only the one reduction rule, instead of the two for div.
We define similar rules for memory accesses without dynamic bounds checks and indirect function calls without dynamic type checks.We prove type safety for Wasm-precheck, guaranteeing that the new typing rules are sufficient to imply the well-definedness of these reduction rules.
Statically proving that a dynamic value is non-zero may be difficult in general, so we keep the original div instruction with its dynamic check and under-specified typing rule.While we could replace the original instruction with div✓ in all cases, and require that the check is inserted explicitly when necessary, it is useful for Wasm-precheck to remain a strict superset of Wasm.
Idea: Wasm-precheck need not be implemented as a new language with a separate syntax, but could be read as a specification for a sound static analysis over Wasm.
While we formalize Wasm-precheck as a language, one could also view it as a specification for a proven-correct static analysis.Wasm-precheck is fully backwards compatible with Wasm, meaning that any well-typed Wasm program is also well typed in Wasm-precheck (although old code may not automatically inherit improved static reasoning).Thus, programmers are not required to fight with the new type system.We prove this formally: all Wasm programs embed trivially into the new type system and run to the same result (Section 4.1).However, one could implement a static analysis over Wasm, which provides Wasm-precheck annotations without developer intervention and without modifying the surface syntax of Wasm.Using Wasm-precheck in this way would be a conservative approximation of the type system.If a tool can infer these annotations, or the programmer is willing to add annotations or rewrite code to help the type system, then Wasm-precheck can remove some dynamic checks while type and memory safety are still guaranteed for all programs.We discuss this further in Section 8.
Idea: Wasm-precheck can improve performance by safely removing dynamic checks in practice, as well as in theory.
We implement Wasm-precheck and show it can improve performance by reducing the number of dynamic checks required while maintaining safety.Our evaluation shows that, using Wasmprecheck, we can remove 97% of the performance overhead of explicit dynamic checks on average, resulting in an average performance speed-up of 1.71x.The performance evaluation uses PolyBenchC, memory-intensive benchmarks, which are manually annotated with sufficient type information to check, as well as a few explicit dynamic checks when insufficient information is available statically.In effect, the type system enables moving dynamic checks out of a loop, replacing them with a single dynamic check before the loop.The programs used in the evaluation were the output of a the Emscripten compiler from C to Wasm, showing that Wasm-precheck can support patterns in compiled output, which is important since Wasm is generally used as a compiler target.

Syntax
Wasm-precheck is a superset of Wasm with a different representation of types and four statically safe versions of Wasm instructions added.Figure 1 shows the syntax of Wasm-precheck, with changes compared to Wasm highlighted.Four administrative instructions, which can only appear during evaluation, are omitted here, as we do not discuss them in detail.Like Wasm, Wasm-prechk is a stack-based language.Dynamic operands to instructions are passed on the stack and are not part of the instruction syntax.Since Wasm-precheck uses an indexed type system, some type annotations are enriched compared to Wasm type annotations; we discuss these differences later in Section 3.3.1.
The key changes to the syntax are four new instructions, referred to as prechecked instructions and denoted with a ✓ at the end of the operator.Prechecked instructions are equivalent to their Wasm counterparts, but they don't require dynamic checks.We discuss how their safety is statically checked later in Section 3.3.But first, we present the dynamic semantics of Wasm-precheck.

Dynamic Semantics
Wasm-precheck's reduction relation has the same structure as Wasm.We briefly explain the dynamic semantics of all Wasm-precheck instructions; however, since most instructions are unchanged from Wasm, we only present the formal rules of new instructions and some helpful for understanding the indexed type system.For full definitions of Wasm evaluation rules, see Figure 2 of Haas et al. [2017].
The reduction relation, ;  * ;  * ↩→   ′ ;  ′ * ;  ′ * is defined on configurations consisting of a runtime store , which holds module instance information (the  decorating the reduction arrow indicates which module instance is being reduced); a sequence of values  * representing local variables; and the instruction stack  * .We ignore the module instance information, which is not critical for our work.A value  is represented by the constant instruction ( .const).As in Wasm, the stack is represented as a sequence of values at the head of the instruction sequence  * .Following Wasm, the store , local variables  * , and the instance subscript  are elided when they are unchanged and unused (hence,  and  * do not appear in Figure 3).
Prechecked instructions require no dynamic checks, since their safety preconditions are enforced statically by the Wasm-precheck type system.This can be seen in the reduction rules for the prechecked instructions in Figure 2: unlike their non-prechecked counterparts, prechecked instructions do not have rules to trap (a trap is the Wasm run-time error).
Figure 3 shows instructive excerpts unchanged from Wasm.The simplest are nop, which removes itself from the stack, and unreachable, which unconditionally evaluates to trap.When trap appears as an operand or operator, all evaluation rules produce trap; trap is a fatal error.
Most instructions manipulate values on the stack.The constant instruction  .const intuitively pushes a value onto the stack, but formally it is a value on the stack.Numeric operators, , , , and , consume either one or two values from the stack, and push one value as the result.We present the  instructions at the top of Figure 2. The division operator div traps and the second argument is 0, whereas in the div✓ instruction, the second operand  2 is statically guaranteed to be non-zero.drop consumes a value from the top of the stack and does nothing with 2 )  * end end where  func = (func   1 ;  1 →   2 ;  2 local  *  * ) Fig. 3. Wasm-precheck reduction rules (excerpts) it.Finally, select is a ternary operator that consumes three values and pushes either the first or second value based on the truthiness of the third value-0 is false, and other values are truthy.
There are three control flow blocks that introduce a label-block, loop, and if.Their reduction rules, given in Figure 3, are unchanged from Wasm, but we explain them to clarify how labels are introduced, as indexed typing for labels is tricky.Each block instruction introduces a new evaluation context, which binds a label as a de Bruijn index to a sequence of instructions.Instructions  * in the body of the block are reduced in the this evaluation context.Intuitively, labels point to where evaluation should continue when jumped to.The loop block binds the label to the loop itself, while block (and therefore if) bind the label to an empty instruction sequence.Jumping to a loop's label repeats the loop; control exits the loop by default.Jumping to a block's label exits the block early.
Branching (br ) takes some values   , jumps to the th (zero-indexed) label in the evaluation context of the instruction, and continues executing with   on the stack but the labels discarded.Execution continues with the code bound to the label of the th outer block, inside the remaining evaluation context, as seen in the second-to-last rule of Figure 3.We elide formal rules for other branching instructions, but explain them briefly.Function calls, both direct and indirect, must first determine which closure represents the function being called, and then the body of the closure will be evaluated in an environment specified by the closure.A direct function call, call , uses a statically provided function index to get the closure from the list of functions in the current module.An indirect function call, call indirect , first dynamically looks up the function index; this process is explained more below.
Closures in Wasm are a combination of a function and a pointer to the module environment that the function should operate within.The module environment contains all of the global variables, table, memory, and functions that can be referred to within the closure.Evaluating a closure introduce a return label in the form of a local administrative block instruction, which also holes the local variables for the function, and the module environment pointer .The local variables in the local block represent arguments consumed by the function call (  ), and a number of additional local variables specified as part of the function, which are initialized to zero (( .const0)  ).
The instructions for local and global variables are similar to each other, except for scope: local variables are local to functions, whereas global variables are global to all functions in a module instance.Both have instructions to push the value of the th variable onto the stack (get local     2).Memory in Wasm-precheck is a linear sequence of bytes.There are the standard instructions for loading and storing values, load and store, respectively.The prechecked memory operations load✓ and store✓ rely on static bounds checks (see Figure 2).Memory operations also include static operands: the representation of the value being loaded or stored, _ or , respectively; the offset, ; and alignment,  (the alignment does not affect the semantics in any way, so we omit this from the formal model).The current memory returns the current memory size, while grow memory can increase size, returning either the new size of memory, or -1 if memory cannot be increased.

Type System
3.3.1 Index Language.In Wasm, instruction types  * →  * express the number and types of values expected on the stack before and after an instruction.In Wasm-precheck, the instruction type has the form  * ; ; Γ;  →  * ; ; Γ; , whose non-terminals are defined in Figure 4.The stack type is a sequence of indexed types  * , representing the number and types of values on the stack.The local variable environment  tracks the indexed types of local variables, so constraints can refer to local variables.The locals environment has the same representation as a stack type: a sequence of indexed types.A constraint set  is, well, a set of constraints between index terms.The index environment Γ describes which index variables are in scope before or after the instruction executes; it is represented as a set of indexed types.This is a formal detail used to reason about the scope of index variables; we omit it from the presentation of typing rules in this section.The full typing details are available as part of the supplementary material [Geller et al. 2023].
Constraints in an instruction type are written in the index language, given in Figure 4: •  is a constraint about index terms: either an equality constraint, or a proposition combining constraints using a simple first-order logic; •  is an index term: either an index type variable, a constant with an explicit value type, or a model of a Wasm operation on values; •  is an index variable, representing a specific run-time value; •  is a value type, which coarsely classifies a run-time value; Finally, the whole module instance is typed under a module environment , with type information about the module and the execution context. is a partial record containing: •  func , the types of functions in the module; Proc.ACM Program.Lang., Vol. 8, No. POPL, Article 80. Publication date: January 2024.
•  global , the types of global variables in the module; •  table , the number and types of functions in the table if the module has one, and undefined otherwise; •  memory , the (initial) size of memory if the module has one, and undefined otherwise; •  local , the value types of the local variables, which is defined when typing a function body (this is redundant with the local variables environment in the instruction type, but retained for backwards compatibility); •  label , the stack of label types, which is used for typing branching instructions.Label types are either the precondition for loops, and postcondition for other blocks.•  return , the return type used to type the return instruction.Return types are just the postcondition stack type and constraint set, since local variables leave scope after a return.

Implication.
Unlike in a simple type system, we cannot simply syntactically compare a postcondition to precondition to type check two instructions.For example, a function expecting a value greater than zero might be given a value that is greater than ten.That should be fine, as semantically a value greater than ten is greater than zero, but these types differ syntactically.
We use a notion of logical implication for the logic corresponding to our index language for checking agreement between constraint sets.We define logical implication in Wasm-precheck as follows: Γ ⊢  1 ⇒  2 if every valid variable assignment for  1 is also valid for  2 .Formally: This can be read as saying that a constraint set  1 implies another constraint set  2 under Γ if, given the type declarations for index variables in Γ, the set of possible assignments to those variables under  1 is a subset of the set of possible assignments under  2 .
The type system is parameterized by the implementation of ⇒, denoted using ⇝.We do not require ⇝ to be complete (always returning true if one constraint set does in fact imply another), but we do require it to be sound (never returning true when one constraint set does not imply another), allowing any implementation ⇝ to be an under-approximation of ⇒.Formally: ∀Γ,  1 ,  2 .Γ ⊢  1 ⇝  2 implies Γ ⊢  1 ⇒  2 This allows flexibility, as an implementation can use a faster constraint solver that may not be as precise as the theoretical notion of implication.While this may reduce the static reasoning ability, safety is maintained.

Typing Judgement.
The typing judgment  ⊢  * :  * 1 ;  1 ;  1 →  * 2 ;  2 ;  2 states that under the module environment , the instruction sequence  * produces the configuration described by  * 2 ;  2 ;  2 if it is executed in a configuration described by  * 1 ;  1 ;  1 .The stack must have type  * 2 after execution if it had type  * 1 before execution; the local variable types must be  2 if they were  1 ; the constraint set  2 must hold if  1 held.Viewing the the stack type as a function,  1 would be a refinement of the function inputs, and  2 a refinement of the outputs.
We gradually present the (simplified to elide Γ) typing rules inline; the complete definitions are available as part of the supplementary material [Geller et al. 2023].
The typing rules are presented in a declarative form, so they describe what types different instructions can have, but are not always sufficient for constructing a type for an instruction.This causes a minor difference between our model and the implementation.In the model, we merely require the existence of a constraint set relating the label type to the pre or postcondition of the block.In practice, we require this be a user-provided annotation, discussed in Section 5.
When implementing these rules, we add syntactic annotations on block instructions (see Section 5.1 and Section 6.4), so type checking is syntax directed.However, adding syntactic annotations is straightforward, so we omit them for simplicity.Further, by omitting them, we give our model more flexibility for different implementations.For example, instead of using annotations, it may also be possible to construct the types using an inference algorithm, discussed in Section 8.
We first discuss some simple rules that do not use indexed type information.Rule Unreachable accepts any precondition and guarantees any postcondition since it causes a trap.The instruction nop makes no changes from the pre to the postcondition because the instruction does nothing.Rule Drop consumes the top value from the stack, represented by , and does not change anything.Typing rules for binary, test, and relational operations are all similar except for the operators and number of operations; we explain binary operations in detail.Rule Binop adds constraints between new and old program values.In the post condition, the fresh index variable  3 is constrained to be equal to the result of applying the operator to the two index variables ( 1 and  2 ) on the stack in the precondition: (=  3 (∥ ∥  1  2 ).We use ∥ ∥ to indicate that we are moving  (or  or ) from a Wasm-precheck to the index language, where it is modeled as a function rather than a stack machine instruction.Again, the locals environment  is unchanged.
Rule Div-Prechk, for the prechecked division operator, requires that the second operand is nonzero.The premise  ⇝ ¬(=  2 0) requires that the index constraints satisfy the proposition  2 ≠ 0. Since divide-by-zero is proven absent statically, it is safe to use div✓ without dynamic checks.
Binop  3 fresh  ⊢  .: (  1 ) (  2 ); ;  → (  3 ); ; , Recall that select is a ternary operator that consumes three values from the stack ( 1 ,  2 , and  3 ) and returns the first value,  1 , if the third value,  3 is truthy (non-0), and otherwise returns the second value  2 .The third value must be an i32.Rule Select uses the type-level "if" to constrain the result variable  to depend on the truthiness of  3 : (if (=  3 (i32 0)) (=   2 ) (=   1 )).Note that this "if" only introduces syntax that is only evaluated when checking constraint satisfaction between constraint sets.Control flow blocks.The three block instructions check their bodies with additional information added to the environment  to handle branching instructions within the bodies.The body of a block instruction is not type checked with the same module type context  of the block, but rather with a modified context with a label type pushed onto the stack of label types  label .Any branching instruction within the block is typed against the new  label (this is described more below when discussing Rule Br).
The end of an block can be reached either through a branching instruction or by the body  * being evaluated to a sequence of values.Thus, the label type and postcondition of the body  * must agree, so that the postcondition of the block is guaranteed to hold no matter how the end of the block is reached.To ensure this, the label type and body's postcondition must have the same stack and index local store (( 2  2 )  and (   2 ) * ), and the constraint set  3 from the postcondition of the body  * must imply the label type's constraint set  2 , which is then the same constraint set in the postcondition of the block.Thus,  2 represents the point of agreement between executing the body,  * , and any branching instruction from within the body.The premise says that executing the body,  * , must be well typed at the current precondition, ( 1  1 ) * ; (   1 )  ;  1 , and then results in the aforementioned postcondition ( 2  2 )  ; (   2 ) * ;  3 .Rule If similarly checks both possible branches with an updated context, but with extra information based on whether the condition variable  is true or not, depending on the branch.In the "true" branch  * 1 , the index variable  consumed by the if is known to be truthy, i.e., non-zero, whereas in the "false" branch  * 2 ,  is zero.Thus, the two branches do start from the same precondition  1 , but with the added constraint ¬(=  (i32 0)) in the true branch, and (=  (i32 0)) in the false branch.For if, the two branches must agree as well, so they are required to have the same postcondition, except for the resulting constraint sets, which both must imply an agreed upon constraint set  2 from the label type.
Branching from within a loop re-executes the loop from the beginning, so the precondition is essentially the loop invariant.Thus, the label type and precondition must agree, in contrast to others blocks where the label type and postcondition must agree.If the body  * is reduced to a sequence of values, these values are returned and the loop exits, as described by the postcondition.When type checking the loop body, the label type is the precondition, and the postcondition of the loop as a whole, ( 2  2 ) * ; (   2 )  ;  2 is the same as the postcondition of the body  * up to implication ( 4 ⇝  2 ).Instead of checking the body  * against the precondition of the loop, ( 1  1 ) * ; (   1 )  ;  1 , it is instead checked with the precondition ( 1  1 ) * ; (   1 )  ;  3 , which is reachable either from branching or the first time the loop is executed.
Branching (br ) consumes values   and jumps to the th label in the evaluation context, continuing executing with   on the stack.The instructions following a branch are not executed, so the postcondition,  * 2 ;  2 ;  2 , is arbitrary.Similarly, in addition to the consumed values (represented by  * 3 ), the stack may contain arbitrary other values  * 1 , which are discarded when branching.The precondition of br checks that the current program state satisfies the -th (counting backwards from the top) label type on the stack  label , ensuring that the condition for branching is met.
Rule Return is similar to Rule Br, except that return is checked against the current return type  return instead of against a label type.Return types do not include a locals environment, since local variables are only scoped within functions; after a return, they all go out of scope.Like br, code after a return is dead, so the postcondition of return is arbitrary:  * 2 ;  2 ;  2 .The conditional branch instruction, br if, consumes a value  from the stack and branches if it is truthy.In contrast to Rule Br, execution can continue after br if, specifically when  is zero and branching doesn't occur.If the consumed value is constrained to be non-zero in the type system, then this causes a contradiction in the constraint set , indicating dead code.This conditional information is captured by Rule Br-If: the check against the label type can assume that  is truthy, and the instructions following the br if can assume that .
Finally, br table is essentially a br  where  is determined by indexing into a statically provided list of branching indices  + using the operand from the stack.We must ensure that every possible label type that might be the target is implied by the precondition of the br table instruction.Like br, br table must branch, so the postcondition is arbitrary. Return Direct function calls call  look up the type annotation of the function,  * 1 ;  3 →  * 2 ;  2 , in the environment .The current state,  1 must satisfy the precondition constraints of the function,  3 , which ensures that the assumptions made by the function hold.We omit the locals environment from the type annotation, since local variables are not preserved or accessible across function calls.The postcondition of call extends the constraint set from the precondition,  1 , with the postcondition from the function we are calling  2 , producing a union of the two constraint sets.Note, this union is different from most other rules where a single constraint is added to a constraint set using the syntactic constructor form , .We extend the constraint set because the type annotation on the function can only contain constraints about the arguments, so simply copying the postcondition from the annotation would result in the loss of information about all other index variables.
Typing an indirect function call call indirect, is similar to typing a direct call, except that the expected type is based on the statically provided type annotation ( * 1 ;  2 →  * 2 ;  3 ).This is because call indirect does not know, statically, the type of the function being called, so must be provided with the expected type to be able to check the type and compute the resulting state.Rule Call-Indirect also ensures that there is a table defined for the module using the side condition  table = (, tfi  ), which ensures that a table is present for the module which contains  functions and provides the associated function types tfi  .
Proving the safety of a prechecked indirect function call is more complex.This involves statically checking that the actual precondition satisfies the precondition on every possible function that could be called.Rule Call-Indirect-Prechk checks that the type of every function at every possible index value has the expected type: ∀0 ≤  < .( ⇝ ¬(= (i32 ) )) ∨ tfi  () = tfi 2 .tfi  () is a shorthand for looking up the th function type in the sequence tfi  .Note that the ∀ and ∨ are at the meta level and not within the index language, and that the size of the table  is statically known.The rule also checks that the operand is within the table bounds:  ⇝ (lt  ).
Typing instructions for local variables, Rule Get-Local, Rule Set-Local, and Rule Tee-Local, all dereference the type from the locals environment  at the statically de Bruijn index , denoting the th local variable.Rule Get-Local puts a fresh index variable,  2 , on the stack with the value type  of the th local, and constrains it to be equal to the th local variable.Rule Set-Local works in the reverse direction, replacing the index variable associated with the local variables being assigned.For set local, we replace the indexed type of the local variable with the indexed type from stack; since the index variables in these types identify the value from the stack, this reflects the value being moved from the stack to the store.Finally, Rule Tee-Local is a combination of the above two rules, as the instruction is a combination of get local and set local.Global variables are shared between modules and can be mutable, so we do not track constraints on globals; we discuss this limitation more in Section 8. Rule Set-Global checks that the global being set is mutable and has the same type as the operand.Rule Get-Global introduces a fresh index variable  with the type  of the th global variable from the context.We do not reason about the contents of memory, so the non-prechecked memory instructions do not add constraints.All the memory instruction typing rules ensure the module has a declared memory using the side condition  memory = , which looks up the initial size of memory in the module type context .Rule Mem-Load and Rule Mem-Store ensure that the alignment fits the type being loaded or stored 2  ≤ (| | <) ?| |.Rule Mem-Store simply checks that the second operand has the expected type .Rule Grow-Memory simply consumes a 32-bit integer (the additional amount of memory the user would like to allocate), and returns a 32-bit integer value, representing the updated amount of memory if the allocation was successful, and −1 otherwise.
Prechecked memory instructions are statically checked to take place within the static memory bounds.We currently do not reason about dynamically increasing memory size, which we discuss further in Section 8.The initial memory size is some number of 64 Ki pages (65, 536 bytes), so we check that the constraint set in the precondition implies that the memory index  plus the static offset  is less than 65, 536 − ℎ (memory indices are unsigned, so they cannot be less than 0).ℎ is a shorthand for the number of bytes being stored or loaded.It is equal to the length in bytes of the type value , if  is not provided ( ?= ), and otherwise equal the length in bytes of , a packed type used to load or store a slice of 8 bits, 16 bits, or 32 bits.

METATHEORY
First, we show how to automatically translate Wasm programs to Wasm-precheck programs and vice versa.Then, we prove the type safety of Wasm-precheck.We provide key insights and details here; complete definitions and proofs are provided in the supplementary material [Geller et al. 2023].

Embedding Wasm into Wasm-precheck
The embedding function takes a Wasm module and replaces all type annotations with indexed types that have no constraints on the index variables.Intuitively, this works because the type annotations are the only part of the surface syntax of Wasm that differs in Wasm-precheck, and the constraints are only necessary to type check prechecked instructions.While this embedding requires no additional developer effort, it provides no information to the indexed type system beyond what can be trivially inferred, so it may not automatically improve static reasoning, and does not automatically provide prechecked instructions.More sophisticated embeddings could attempt to insert prechecked instructions; we discuss this Section 8. First, we define embedding over modules: the top-level object of both the Wasm and Wasmprecheck surface syntax.Embedding a module  means embedding all functions  * and globals  * in the module.The definition of embedding is not interesting; we recur over the syntax looking for type annotations, and enriching them to indexed types with fresh index variables and empty constraint sets.The definition can be found in the anonymous supplementary material.We Proof.(Sketch) The proof follows by induction on the structure of the Wasm typing derivation.The full proof is available as part of the supplementary material [Geller et al. 2023] □

Erasing Wasm-precheck Annotations
We provide an erasure function from Wasm-precheck programs to Wasm programs by discarding the extra type information and replacing prechecked instructions with their non-prechecked counterparts.Erasure is defined not just for the surface syntax, but also for typing constructs (such as the module environment), administrative instructions, and run-time data structures (such as the store).This extended definition of erasure allows us to reason about the behavior of Wasm-precheck run-time programs in Wasm, which is useful in the type safety proof.Erasure is best illustrated with an example.Full erasure definitions and proofs can be found in the anonymous supplementary material, but their formal details are not insightful.The function annotation constrains the input  1 to be greater than 0. This lets a div✓ be used with the input as a divisor.The annotation also includes the primitive Wasm types, which is the only information needed for type checking under Wasm, so we get rid of all other information to produce a Wasm type annotation, as well as replacing div✓ with the Wasm instruction div.

div end
The key theorem is that erasing a well-typed Wasm-precheck (run time) machine configuration produces a well-typed Wasm (run time) machine configuration, so all Wasm-precheck programs erase to running, type safe Wasm programs.This is useful in showing type safety, since intuitively, no reduction rule is type directed, so if erasing the types results in a type safe Wasm program, then reduction in Wasm-precheck is also type safe.Note that   ( * ) =  * trivially.The proofs are available in Section 4.2.

Type Safety
Type safety is the property that a well-typed machine state either reduces to another well-typed state (perhaps infinitely), a sequence of values, or evaluates to the well-defined error trap.Type safety of Wasm-precheck guarantees a number of important properties, including memory safety.
In addition, since prechecked instructions cannot trap, as it is not part of their semantics, the type safety of Wasm-precheck ensures that they always successfully reduce to a value.
To reason about the run-time store , a run-time store type  is introduced.The store context  contains the type information for everything in : module instances, tables, and memories.Every module instance in  has an associated module type context in , for example, the th module instance would have the type  inst ().The module type context  inst () is familiar to us as .
Additional administrative typing judgments necessary for the proof are available as part of the supplementary material [Geller et al. 2023].
For the type safety proof, we define an evaluation function  (;  * ;  * ) using ↩→ *  , the transitive, reflexive closure of ;  * ;  * ↩→   ′ ;  ′ * ;  ′ * .The evaluation function has three possible outcomes: the program may terminate, returning a sequence of values v * ; the program may trap, returning the trap instruction which represents a fatal run-time error; or the program may not terminate.Proof.Follows from Lemma 4 (Progress) and Lemma 2 (Subject Reduction).□ Subject reduction, also known as type preservation, ensures that if a machine state ;  * ;  * has a given type, then the machine state  ′ ;  ′ * ;  ′ * after a reduction step (;  * ;  * ↩→   ′ ;  ′ * ;  ′ * ) will have an equivalent type.The main theorem for subject reduction allows the machine state after reduction,  ′ ;  ′ * ;  ′ * , to have the same type up to implication (the reduced expression may have a stronger postcondition).
Proof.(Sketch) We use our inversion lemmas to gain information about the type of the store  and the local variables  * , then hand that information to Lemma 3, which does most of the work.
The full proof is available as part of the supplementary material [Geller et al. 2023].□ Lemma 3 is the main lemma for subject reduction, and is the body of the "loop" that is type safety.We show that if a machine state ;  * ;  * , reduces to  ′ ;  ′ * ;  ′ * , then the type of the new machine state matches.Formally, this means either  ′ * has the same type, or it has a different locals environment  1 in the precondition that matches the types of the locals  ′ * after reduction.
The full proof is available as part of the supplementary material [Geller et al. 2023].□ The main difficulty is reasoning about stack values consumed as a program reduces.Intuitively, after reduction, the constraints will be more specific, and thus stronger, than before reduction.We can weaken the types to recover the original types.
Lemma 4 (Progress) ensures that if a machine state is well typed then it either: entirely consists of values, is a trap, or it takes a step to another machine state.Lemma 4 is the key property that shows the static guarantees allow ✓-tagged instructions to reduce without dynamic checks.By proving that well-typed ✓-tagged instructions reduce, we are sure there is no undefined behaviour by leaving out a reduction rule for division-by-zero, for example.
As with subject reduction, the main lemma is showing progress for individual instructions.In addition to the main typing premise, the lemma relies on some premises guaranteeing the wellformedness of program states.These express that there is some well-typed value prefix on the stack, that branches are statically well-bound, that the module instance's run-time memory, Proof.(Sketch) By induction on ;  inst () ⊢  * :  * 2 ;  2 ; Γ 2 ;  2 →  * 3 ;  3 ; Γ 3 ;  3 (where  inst () is a module type context, usually denoted by ).Since most Wasm-precheck instructions have the same dynamic semantics as in Wasm, and every Wasm-precheck type includes all the information of a Wasm type, we conclude that the Wasm-precheck term takes a step by translating to Wasm and using Wasm's type safety proof.The intuition is that most Wasm-precheck instructions have the same reduction rules in Wasm, so we can erase to Wasm, where Wasm's type safety guarantees the instructions satisfies progress.This does not work for prechecked instructions or inductive cases.The full proof is available as part of the supplementary material [Geller et al. 2023].□

IMPLEMENTATION
We implement Wasm-precheck as an extension of Wasmtime [Bytecode Alliance 2019], a fast, secure, compliant, runtime system for Wasm with JIT and AOT compilation.The implementation is straightforward, following the formal models.However, there are two details of interest: how we resolve join-points, and how we implemented constraint solving.Our implementation is available for use as part of the supplementary material [Geller et al. 2023].

Type Annotations for Join-Points
Recall from Section 3.3 that block, if, and loop all introduce code points reachable from multiple paths due to branching.In these cases, we must find a set of constraints-the postcondition constraint set of blocks and ifs, and precondition constraint set for loops-that is implied by every path.
In our declarative formal model, we require only that such a set exists-the set does not come from the program syntax or from a subderivation.In the implementation, we require user provided type annotations specifying pre or postconditions on blocks to resolve these join-points.
The syntax of the annotations largely follows from the theory, except for how variables are referred to within the constraints.Within the annotations, the user can refer to stack variables by name, and locals by name or de Bruijn index.
For example, the annotation below denotes a type that takes a parameter  on the stack, and asserts that, in the precondition, the parameter  is less than the local variable .If the local variable  was known to be the 0th local variable, then it could alternatively be referenced using (local 0).Because the index language can only express constraints through equality, and the less than operator i32.lt_u returns a 32-bit integer, the output of the operator is explicitly checked for equality against the number 1, effectively checking for truthiness.
( type ( func ( param a i32 ) ( pre ( eq ( i32 1) ( i32 .lt_u a ( local b )))))) Similar to the model, annotations are not checked against the current state syntactically, but up to implication.The actual (as calculated by the type system) precondition at the start of the block must imply the expected precondition given by the type annotation.Similarly, the actual postcondition guaranteed by the body of the block must imply the expected postcondition from the type annotation.The overall type of the block is the actual precondition (stronger than the annotation) and the expected postcondition (weaker than the actual postcondition).
Type annotations are required to be well formed: a type annotation can only constrain the values consumed and produced by a function/block (also the local variables for blocks).Formally, in the pre and postcondition, all variables in the constraint set must appear in the stack type environment or local index store of that type annotation.Further, the precondition constraint set can only refer to parameters: variables that are part of the precondition stack or locals.However, the postcondition constraint set can refer to variables that are either part of the precondition or postcondition, to express relationships between parameters and results.

Constraint Solving
Like Wasm-precheck is parameterized by implication ⇝, so is our implementation in Wasmtime.Any constraint solver can be used that implements the interface between the constraint solver and the index language.This interface is a Rust trait in our Wasmtime extension.
We choose Z3 for constraint solving for ease of use [De Moura and Bjørner 2008].Our implementation uses Z3's bitvectors, resulting in a straightforward 1-to-1 relationship between Wasm-precheck operators and Z3 operators.In our Wasmtime implementation, we currently only support prechecked memory instructions (so call indirect✓ and div✓ are disabled) for reasons discussed in Section 6 and Section 8.However, we have separately implemented typing rules for these instructions via an encoding to Z3 in our Redex model.

Redex Model
We provide a reference implementation of the formal model of Wasm-precheck in Redex [Felleisen et al. 2009], which also uses Z3 for constraint solving.Our implementation includes a model of the type system that checks whether a given typing derivation is valid in our model, and a syntaxdirected algorithm for generating typing derivations from Wasm-precheck programs.The former can be used to validate type-inference algorithms for Wasm-precheck.The implementation also includes each of these for plain Wasm, which are reused in the implementation of Wasm-precheck.
The key challenge in the reference implementation was encoding constraints for the function table and indirect function calls.Recall that for call indirect✓ tfi, we have to encode constraints about which functions in a table can be called.To encode this, we construct a Z3 array that is the same size as the table.We chose Z3 arrays because they have a similar abstraction to tables.We fill the array with boolean values which are true if the function at the table index is a suitable function type, i.e., is a subtype of the expected type tfi, and false otherwise.Finally, we assert all of the translated constraints from the constraint set about the table index, and constrain that the value in the array at the table index is true.

EVALUATION
Our evaluation seeks to answer the following questions: (1) What is the best case performance speed-up of removing dynamic checks from Wasm? (2) What speed-up can we realistically get using Wasm-precheck?(3) What is the added cost of the Wasm-precheck type system compared to Wasm's?
In addition, we provide a description of the type annotation process for the benchmarks, and what these annotations look like.This gives an idea of the amount of time and work required to use Wasm-precheck in practice for improved performance, but also suggests the potential of using a static analysis to generate/infer the annotations.

General Setup
We use the PolyBenchC benchmark suite [Pouchet and Yuki 2016] and the Wasmtime runtime and compiler [Bytecode Alliance 2019] to perform our evaluation.
We compare four versions of Wasmtime: a "Wasm_dyn" version with virtual memory guard pages disabled and dynamic bounds checks enabled (the baseline we want to improve); a "no-checks" version with all safety checks disabled (used for run-time performance comparison, providing a frame of reference for best case improvements); a "Wasm-precheck" version, our extension of Wasmtime implementing Wasm-precheck; and a "Wasm_vm" version, the default configuration of Wasmtime with virtual memory guard pages.We emphasize that this evaluation is just as much an evaluation of the Wasmtime implementations as it is of the Wasm and Wasm-precheck languages, and other compilers may perform differently.

Name Wasmtime Version Wasm_dyn
Wasmtime with guard pages removed, dynamic bounds checks enabled Wasm-no-checks Wasmtime with dynamic memory checks disabled (unsafe) Wasm-precheck Wasmtime extended to implement Wasm-precheck Wasm_vm Unmodified Wasmtime, uses 8GBs of VM for checks (safe) We use two versions of the PolyBenchC suite: one unmodified version compiled from C to Wasm with Emscripten with emcc -Os [Emscripten Contributors 2015], and one version manually ported from Wasm to Wasm-precheck with type annotations added and some instructions modified.The annotation process is described in Section 6.4.The unmodified version is used with the "Wasm_dyn", "Wasm_vm", and "no-checks" versions of Wasmtime.
The ported version is used in the Wasm-precheck version of Wasmtime.In the manually ported version, we leave the Emscripten runtime unchanged, but do modify the generated functions for each benchmark.For each benchmark, we annotate two of the functions: one which initialized the data and one which performed the benchmark computation.In addition to adding type annotations, we add an explicit dynamic check to the top of each benchmark function, which is necessary to type check the dynamically allocated data.The type system tracks constraints from this explicit dynamic check, and is able to use this one check to eliminate many checks.
For all benchmarks, we used Wasmtime in ahead-of-time (AOT) mode: first pre-compiling benchmarks to .cwasmfiles using wasmtime compile, then measuring the run time of executing the pre-compiled file using wasmtime -allow-precompiled.
Benchmarks.PolyBenchC focuses on the performance of arithmetic and memory instructions, and was used in the original Wasm work by Haas et al. [2017].PolyBenchC benchmarks initialize vectors and matrices (represented using arrays), and then compute over these structures.These benchmarks perform many memory and arithmetic operations in tight, often nested, loops.They may benefit more from Wasm-precheck than the average program.However, they are not unrealistic, as we expect some computationally intensive Wasm programs to follow this pattern.For example, the demo image classifier microservice for Dapr/WasmEdge has a similar structure. 6nly dynamic memory bounds checks.Our run-time performance evaluation studies only dynamic memory bounds checks.When surveying Wasm code, we found that memory accesses were abundant, while indirect calls and integer division-by-zero checks seldom occurred.Checked integer division was rare, in part, because the predominant datatype was floating point numbers.
We also found that memory access were the most expensive.We prototyped with pathological microbenchmarks, which repeatedly execute instructions with dynamic checks in loop.We found that memory bounds checks had much larger slowdown than dynamic type checks on indirect calls, and measured no overhead on the integer division-by-zero check.
We believe dynamic memory bounds checks are the most expensive because they require the most effort to check in comparison to the cost of the instruction they guard.They require a comparison per operation.The check involves loading the current size of memory and performing an integer check, which we found usually amounts to using an extra register; this agrees with Jangda et al. [2019], who cite increased register pressure due to dynamic checks as a cause of performance issues.While the run-time type check on call indirect also requires an extra comparison per instruction, it is likely to be insignificant compared to all the computation involved in a function call.
Notably, Jangda et al. [2019] identify dynamic checks on indirect function calls as a significant cost, which disagrees with our findings.There are several possible reasons for this disagreement.First and most importantly is the difference in studying memory bounds checks.Jangda et al. [2019] did not disable virtual memory guard pages to study dynamic memory bounds checks, and therefore would not have seen overhead on such checks.Second, the implementation of indirect function calls in the compiler we study could be more optimized then the version studied by Jangda et al. [2019].Third, it could be a difference in workflow, where our "pathological" microbenchmark is not in fact the worst case scenario.Therefore, we focus on eliminating dynamic memory bounds checks in our evaluation, and ignore the other dynamic checks.We emphasize that Wasm-precheck is theoretically capable of eliminating the cost of other dynamic checks, if those costs exist in practice.
Hyperfine.In general, for the benchmarks requiring timing data, we use Hyperfine (hyperfine) [Peter 2023], a benchmarking tool that helps to control for noise.Using hyperfine, we fix the number of warmups to 3, to ensure the benchmark program was warm in disk cache, and the number of runs to 10, to account for variations in background processes, process randomization, etc.

Run-time Performance Analysis
Wasm-no-checks vs Wasm_dyn.Compared to Wasm_dyn, the unsafe removal of dynamic checks in Wasm-no-checks achieved an average speed-up of 1.76x, up to a maximum of 3.21x (Figure 5).This is a significant increase and demonstrates the value of safely removing dynamic bounds checks.However, how close can we get, safely, with Wasm-precheck?
Wasm_vm vs Wasm_dyn.Compared to Wasm_dyn, using virtual memory in Wasmtime to optimize memory bounds checks led to an average speed-up of 1.73x, up to a maximum of 3.26x (Figure 5).The run times generally correspond closely to Wasm-no-checks, with some outliers where Wasm-no-checks (and Wasm-precheck) outperform Wasm_vm.We conjecture that in the short-lived benchmarks, there may be measurable cost associated with reserving the necessary virtual memory.
Wasm-precheck vs the rest.Compared to Wasm_dyn, the safe removal of dynamic checks in Wasm-precheck led to an average speed-up of 1.71x, up to a maximum of 3.18x (Figure 5).That is about 97% of the speed-up achieved by Wasm-no-checks on average, and about 99% of the speed-up achieved by Wasm_vm on average.
Discussion.Wasm-precheck can remove a large percentage of the overhead of dynamic memory bounds checks.This is achieved attempting to remove every dynamic check.Instead, we focused on loops in computationally-intensive functions.In practice, Wasm-precheck enables the type system to propagate information from dynamic checks outside a loop, so the loops can be free of dynamic checks.The speed-up is achieved in compiled code that includes a memory manager, modified only to include type annotations and an explicit dynamic check at the top of each function.
The PolyBenchC suite are memory intensive programs that access memory in nested loops.Thus, the results of our benchmarks are not generally applicable: it is not fair to expect an arbitrary program to benefit as much as our benchmarks.That said, memory intensive programs are a prominent and useful class of programs, and we can see that Wasm-precheck can safely reap significant performance benefits for such programs.For memory intensive programs, Wasmprecheck can accomplish nearly the same mitigation of the costs of dynamic checks as the default Wasm configuration, without the need for VM guard pages.

Type Checking Cost Analysis
For Wasm, fast compilation and a small binary footprint are key design points.Wasm-precheck can achieve significant speed-ups over Wasm, but at the cost of a more complex type checker, additional type annotations, and the possible addition of explicit dynamic checks in the code.To quantify these costs, we measure and compare the compilation time and binary size of our PolyBenchC suites.For this analysis, we only compare the baseline Wasm and the Wasmtime implementations.

6.
3.1 Binary Footprint.We compare the size in bytes of the annotated Wasm-precheck program to the size of the unmodified Wasm version.These sizes are of Wasm binary format, not the binary output of the Wasmtime AOT compiler.
Results.On average, the total binary size is about 7.18% larger for Wasm-precheck then Wasm (column 4 of Table 1).We also compare the code and type sections separately (available in the anonymous supplementary material Section D.1).The code sections are barely larger, on average about 0.84%.The type sections were significantly larger, with an average of 642%, as these required explicit pre and postconditions.The smallest Wasm-precheck type section was only about 250% larger than its Wasm counterpart (from 176 to 420 bytes), whereas the largest was a little over 18x as large (from 206 to 3761 bytes).
Discussion.The added footprint of type annotations is relatively small compared to the full file size.The overall increase in size of the final binary is small and is probably worth the improved performance.The overall file size included runtime code added by emcc.Recall we only add annotations for two functions.For programs where intensive computation is focused in a few functions, the addition of type annotations for those functions is minor compared to the overall size of the program.However, for a module with less runtime support code compared to code for intensive computation, the increase in binary size may be more significant.
In addition, we found that some programs benefited from reuse between annotations, significantly reducing annotation overhead.For example, in floyd-warshall we reuse the annotations between the two functions, as they had extremely similar structures.This reuse in type annotations led to a much smaller than average increase in overall code size of 1.83%.
Recall that the precondition on a loop is essentially a loop invariant; the postcondition is only taken into account for after the loop exits.Local 0 represents an array with a (statically known) size of 32,000,000 bytes.The precondition checks that both the starts and ends of the array are within the static size of memory, 67,108,864 bytes (1024 pages of 65536 bytes each), and the postcondition ensures that the size is unchanged when the loop exits.Locals 2 and 6 are loop iteration variables with respective statically known loop bounds of [1, 1998] and [2, 1999] (inclusive).The loop bounds are reflected in the precondition.Because we are looking at an inner loop, the iteration variable from the outer loop, local 6, should be unchanged when this inner loop exits.
Finally, locals 12 and 3 store the result of computations performed prior to this loop; they are constrained using their respective computations.Essentially, they are the result of moving a computation that is invariant in the loop to be calculated once before the loop instead of on every iteration.
In some cases, these stored computations were constrained based on an upper bound rather than how they were assigned.Using a maximum bound results in more succinct but less precise annotations.Conversely, spelling out the constraints based on the computation, as on line 7 of the above example, is more verbose but also more precise.In the above example, the extra precision makes sense, as local 6 has a lower bound as well as an upper bound.However, there are other cases where this extra precision is necessary, ussually when a stored computation is used to access multiple arrays of different sizes.If we did not have the lower bound on local 6, we could substitute the upper bounds for locals 0 and 6 in line 7 of the above example, resulting in the following more compact constraint on local 3: ( eq ( i32 1) ( i32 .lt_u ( local 3) ( i32 63968000)))

RELATED WORK
Using types to improve static reasoning of low-level and compiler intermediate languages is not a new idea.Tarditi et al. [1996] used strongly typed intermediate languages (TIL) to enable optimization of SML code.Compiling SML involves many translations among intermediate languages, and by preserving type information across those translations Tarditi et al. [1996] were able to safely perform additional compiler optimizations.Using TIL led to up to 50% faster programs.Morrisett et al. [1999] demonstrated how to preserve types through five representative compilation passes to get from System F (a model of a high-level functional language) to a typed assembly language (TAL).The focus of TAL was on safety.Morrisett et al. [1999] demonstrated that untrusted code could be safely executed, so long as it was well typed and type checked first.Although Morrisett et al. [1999] argued that the type-preserving compilation passes would permit similar optimizations to TIL, they didn't include further optimizations based on TAL.
The most closely related work is Xi and Harper [2001], which developed an indexed type system for an assembly language, DTAL, that enabled static guarantees and optimizations such as safely removing array bounds checks.The goal of DTAL, similar to TAL, was to support type-preserving compilation from a high-level language for both optimizations and safety.DTAL was intended to be a target for supporting type-preserving compilation from Dependent ML (an indexed typed SML) and SML.DTAL is a register machine language, and the type system focuses on the flow of constraints between registers and memory.By contrast, Wasm is a stack-based language, so Wasm-precheck focuses on stack-based reasoning.Wasm also includes more structured control flow operations, which pose some unique challenges.One of the major static reasoning hurdles in Wasm, the call indirect instruction, is not present in DTAL, and the ability to reason about a call indirect-like instruction statically is novel to Wasm-precheck.
An alternative to typed intermediate languages is proof-carrying code (PCC), which uses a logical framework over low-level code to statically prove safety properties [Necula 1997].While typed intermediate languages require types as part of the language, PCC uses a separate logical framework, allowing more flexibility to use the approach with an existing language.A PCC approach to removing dynamic safety checks from Wasm would still require Wasm to be extended with instructions that lack dynamic checks, and would otherwise be quite similar.
Like the Wasm-precheck type system, Liquid Types are able to ensure safety properties and eliminate dynamic checks, but unlike Wasm-precheck use sophisticated inference to reduce the annotation burden.Liquid Types were introduced as an indexed type system in OCaml, focused on combining strong static reasoning about program variables with low developer effort [Kawaguchi et al. 2009].Like Wasm, OCaml already had a strong static type system, but Liquid Types allowed the efficient verification of a large set of libraries with low developer annotations.Since their introduction, Liquid Types have been applied to many languages, including Haskell, Ruby, C, JavaScript [Chugh et al. 2012;Kazerounian et al. 2018;Rondon et al. 2012Rondon et al. , 2010;;Vazou et al. 2014a,b;Vekris et al. 2016].It is possible that the Wasm-precheck type system could be converted to a Liquid Type system, removing the need for annotations in the implementation of Wasm-precheck.
An alternate approach to the safety/performance tradeoff by Popescu et al. [2021] starts with unsafe Rust code, and attempts to make it safer while maintaining the performance.They introduce a tool which identifies code that has explicitly omitted dynamic safety checks, and reintroduce such checks until a specified performance overhead threshold is met.In this way, more expensive dynamic checks are less likely to be added.This approach allows library users to have more finegrained control over the safety/performance tradeoff that would normally be decided by the library developer.However, safety is not maintained using this approach.

DISCUSSION AND FUTURE WORK
We briefly discuss limitations in the current model and implementation of Wasm-precheck and how they may be addressed in future work.
Table Mutation.Wasm-precheck assumes that function tables are immutable by a Wasm module, as they were in the original specification [Haas et al. 2017].However, the most recent Wasm specification now supports table mutation [Rossberg 2022] 7 .Furthermore, the table could always be mutated by the host environment (e.g.,, using the JavaScript API in a browser).Table mutation violates static guarantees for prechecked indirect function calls (call indirect✓), and functions with incompatible types may be called.This is a limitation in safety of the Wasm-precheck model, although not of our implementation, since call indirect✓ is disabled.
The most straightforward solution is to introduce a separate immutable table, and only allow call indirect✓ with the immutable table.Attempting to use mutating instructions with an immutable table could result in a static type error.Alternatively, instead of a static type error, every call indirect✓ on the mutable table could be transparently downgraded to a call indirect instead, invalidating the optimizations from Wasm-precheck, but allowing the safe execution of the code.This approach allows fine-grained control over the table, providing additional static guarantees.The host environment would be required to respect immutable tables.
Another approach is to modify the implementation of Wasm-precheck rather than specification.JIT implementations could re-type-check programs when the table is mutated, either by instructions or by the host environment.If the check fails, affected call indirect✓s on that table could be downgraded to a call indirect.This idea is simpler for developers and requires no language redesign, but could be intractable if table mutation occurs often.If table mutation is infrequent (e.g., only at the beginning of execution for dynamic linking), then this strategy could produce good results.
Global Variables.We do not support constraints on global variables because we cannot compositionally track constraints across module boundaries.This is a limitation in expressivity, but not in safety.Before linking, a module has no information about globals from another module, which would be necessary for reasoning about the types of functions imported from the other module.Concretely, imagine that the th module calls a function   that was imported from the th module.The call instruction is reduced to call {inst , func   } where  is the index for the module instance where   is defined.  cannot modify the global variables in the th module directly.However,   may call a function imported from th module that modifies the globals in the th module.We have to assume the worst and can make no assumptions about the global variables after   returns.
We might address this limitation with an effect system to track how functions modify global variables.However, this could be undesirable or difficult to accomplish if global variables should not be exposed as part of an interface.
Dynamic Resizing of Memory.Wasm-precheck only supports type checking ✓-tagged loads and stores based on the static size of memory, but memory can grow monotonically via grow memory.This is a limitation in expressivity, but not in safety.It should be possible to statically reason about the dynamic size of memory by tracking a dependency on the result of the grow memory instruction.If the result is −1, we know that the memory remains the same size.Otherwise, the result is equal to the new memory size.For example, we could introduce an index variable   to track the size of memory, and after a grow memory, constrain the size of memory to be   =  if the result is −1, and   =  otherwise.Then, in load✓ and store✓ , the bounds check would be performed against   .This would likely require passing the size of memory in the instruction type rather than the module environment.
Support for Streaming Compilation.Although not an explicit goal, Wasm-precheck, like Wasm, should support streaming execution.All type annotations are declared before the code section, and this information is propagated forward during type checking (computing the strongest postcondition, rather than reasoning backwards to compute the weakest precondition).Each instruction is checked, and its constraints solved, before checking the next instruction.
This relies on sufficient type annotations before execution, so it may be difficult to combine with type inference.However, if steaming compilation is needed with type inference, one option is to use the Wasm type system as a fast first compiler, and then run the Wasm-precheck type checker in the background to eliminate dynamic checks where possible.This would mirror Firefox's current Wasm implementation, which provides a straightforward fast compiler to begin execution as soon as possible, and a slower, optimizing compiler whose result is used once it finishes compilation.
Type Annotations.There is significant room for improvement in reducing the size of the annotations in the implementation.Currently, each part of the index language is simply encoded into binary, without any optimization, compression, sharing, or eliding parts that could be trivially and locally inferred.Some space savings could be obtained by extending the binary format to explicitly encode common forms, saving some bytes.Additionally, there is significant constraint reuse, as nested blocks usually reuse the preconditions of the outer blocks with some additions.Most postconditions simply specify that locals remain unchanged, even though no instructions inside the annotated block can mutate them.This situation should be easily detectable by the typechecker, which could allow omitting such constraints for local variables which are not modified.
Type Inference.Given the relatively straightforward constraints in Wasm-precheck, type inference may be sufficiently effective to improve the performance Wasm programs without developer effort.For Wasm-precheck's type system, this amounts to performing static analysis over Wasm that approximates Wasm-precheck's type system and outputs relevant type annotations.Wasmprecheck's index language encodes a logic that corresponds to a straightforward data flow analysis, so implementing an optimizing embedding should not be difficult using standard analysis techniques Flanagan and Leino [2001].Such techniques have been widely adapted to indexed and refinement type systems, e.g., see Rondon et al. [2010] or Jhala and Vazou [2020] (Section 5).Wasm-precheck provides correctness guarantees about any such analysis via type safety.
Alternative Constraint Solvers.Wasm-precheck is parametric over the definition of implication, allowing us to use different constraint solvers with different tradeoffs between effectiveness and efficiency.In our prototype implementation, we used Z3, which works well in practice but not in theory.We conjecture that the octagonal abstract domain is sufficient for constraint satisfaction for most our benchmarks [Mine 2001].The octagonal abstract domain has a polynomial worst case complexity, compared to the current exponential worst case complexity of arbitrary Z3 queries.

CONCLUSION
We introduce Wasm-precheck, a low-level language that uses an indexed type system to improve static guarantees and therefore performance of Wasm code.To ensure the safety of Wasm-precheck, we have proven the type safety of Wasm-precheck as well as showing backwards compatibility with Wasm through a sound type erasure to Wasm and automatic embedding from Wasm to Wasm-precheck.We implement Wasm-precheck in an extension of Wasmtime, and achieved an average performance gain of 1.71x by safely removing explicit dynamic checks in the widely used PolyBenchC benchmark suite.This demonstrates our hypothesis that Wasm can be equipped with a type system that, by improving static guarantees to remove unnecessary dynamic checks, can be used to improve performance while maintaining safety.

Fig. 5 .
Fig. 5. Comparison of the average run time of PolyBenchC programs; Wasm vs Wasm-no-checks and Wasm vs Wasm-prechk.The error bars show the Standard Error of the Mean.

the stack and only branches if the value is truthy. Finally, br table is essentially a br 𝑗 where 𝑗 is determined by indexing into a statically provided table (no relation to the function table) of branching indices 𝑖 + based on the instruction's dynamic operand (br table resembles a switch statement). Returning (return) is similar to branching, but jumps to a separate class of label introduced by a function call.
The conditional branch, br if, consumes a value from Wasm-precheck Typing and Index Language Syntax and get global ).Although there are mutation instructions for both kinds of variables, (set local  and set global ), not all global variables are mutable, whereas all local variables are.The tee local is a combined set local and get local that consumes and returns a value while also setting the th local variable to that value, like the Unix tool tee; this instruction only exists for local variables.An indirect function call, call indirect, consumes a value from the stack, and attempts to call the function at that index in the table-a list of functions, defined statically as part of the module.Since the target of call indirect is not necessarily statically known, indirect calls use a run-time check against the statically provided expected type ( * 1 ;  1 →  * 2 ;  2 ).call indirect✓ relies on the fact that the function from the table has the expected type ( * 1 ;  1 →  * 2 ;  2 ) (see Figure The constant instruction is a simple example of indexed types.Intuitively,  .const pushes the constant value  of type  onto the stack.The typing rule Rule Const reflects this: in the postcondition, the first value on the stack has indexed type ( ) for fresh index variable .The postcondition includes a constraint that  is equal to the constant , resulting in constraint set , (=  ( )).The locals environment  is unchanged.
The last rules handle composing sequences of instructions.Rule Empty types the empty instruction sequence , which simply has the same pre and postcondition ; ; .Rule Stack-Poly allows a prefix of the stack to be ignored (or added, depending on your perspective); this adds polymorphism in "the rest" of the stack to all the other typing rules.Rule Composition composes a sequence of instructions  * 1 with another instruction  2 , checking that pre and postconditions match up.