Bit-Stealing Made Legal: Compilation for Custom Memory Representations of Algebraic Data Types

Initially present only in functional languages such as OCaml and Haskell, Algebraic Data Types (ADTs) have now become pervasive in mainstream languages, providing nice data abstractions and an elegant way to express functions through pattern matching. Unfortunately, ADTs remain seldom used in low-level programming. One reason is that their increased convenience comes at the cost of abstracting away the exact memory layout of values. Even Rust, which tries to optimize data layout, severely limits control over memory representation. In this article, we present a new approach to specify the data layout of rich data types based on a dual view: a source type, providing a high-level description available in the rest of the code, along with a memory type, providing full control over the memory layout. This dual view allows for better reasoning about memory layout, both for correctness, with dedicated validity criteria linking the two views, and for optimizations that manipulate the memory view. We then provide algorithms to compile constructors and destructors, including pattern matching, to their low-level memory representation. We prove our compilation algorithms correct, implement them in a tool called ribbit that compiles to LLVM IR, and show some early experimental results.


INTRODUCTION
Algebraic Data Types (ADTs) are an essential tool to model data.They allow grouping together information in a consistent way through the use of records, also called product types, and organizing options through the use of variants, also called sum types.Proper ADT support enables: • Modeling data in a way that is close to the programmer's intuition, abstracting away the details of the memory representation of said data.• Safely handling data by ensuring via pattern-matching that its manipulation is well-typed, exhaustive and non-redundant.• Optimizing data manipulation thanks to rich constructs understood by the compiler.
Despite these promises, ADTs were initially only present in functional programming languages such as NPL [Rod M Burstall 1977], HOPE [Rod M. Burstall et al. 1980], and later OCaml and Haskell.Recently, they have gained a foothold in more mainstream languages such as TypeScript, Scala, Rust and soon Java.They are, however, still lacking in high-performance lower-level languages.Indeed, low-level programmers find it essential to have fine-grained control over data layout: how structs are packed, where to insert indirections, or whether to shatter an array of records into several arrays [AoS and SoA 2023].All these choices and optimizations are essential for performance.This is particularly unfortunate as the descriptive nature of ADTs enables safe yet efficient representations.Classically, the Option type, whose values are either Some(value) or None, can be simply represented by boxing the value in the Some constructor below a pointer.However, if that value is an integer ranging from 0 to 10, we can represent it unboxed and use 11 for None.This optimization is regularly done manually by C or C++ programmers, at the cost of error-prone manipulations.Complex optimizations on nested and rich types are even more error-prone.
But how does one actually pick a representation for ADTs?The Rust Reference [2023] famously says of the representation of values: "The only data layout guarantees made by [the Rust] representation are those required for soundness".OCaml, Java and Scala make similar non-guarantees about the art and craft of representing structured types.Even the C standard gives very little information about concrete choices made on a given platform.Most languages similarly keep a prudish veil on memory representation to allow for optimization and language evolution, thus denying programmers the ability to fine-tune data layout.Alternatively, GHC Haskell provides a rich algebra of types with various unboxing annotations.This, however, exposes internal representation choices to the full program, polluting every value manipulation such as construction and pattern matching.All these alternatives come up with specific pattern matching compilation algorithms [Augustsson 1985;Maranget 2008;Sestoft 1996;Wadler 1987] that, despite their efficiency, do not yield themselves to the tortuous data representations found in low-level programs.
In this article, we explore a different approach: a "source" specification relying on run-of-the-mill ADTs and used by the rest of the code, accompanied by a "memory" specification describing the nitty-gritty of data layout in full detail.This dual description enables reasoning about data layout optimizations and provides generic, yet efficient pattern matching compilation.We then leverage the link between source and memory specifications to emit efficient low-level code for value construction and pattern matching, based on high-level source code.We develop our approach in the context of immutable, monomorphic data types, for sequential code emission.
Our contributions are as follows: • A DSL, dubbed ribbit and presented in Section 2, to describe the memory layout of monomorphic, immutable ADTs, with support for well-known memory tricks: struct packing, bit-stealing, . . .• The formalization of a simplified language (Section 3) with validity criteria for types and their representation (Section 4), novel compilation algorithms for ADT value constructors and pattern matching (Sections 5 and 6), along with their soundness proofs.• A sketch of extensions of our formalism, including irregular data layouts and arrays (Section 7).
• A full implementation of our DSL ribbit, with code generation targeting LLVM IR (Section 8).
• Some initial performance evaluation and comparison (Section 9).

MEMORY REPRESENTATION OF ADTS
In the following section, we peel back the veil hiding the details of data layouts of rich data types through case studies of increasing complexity.In these examples, we use ribbit1 to specify both the source type, some client code that uses it, and some non-trivial memory representation2 showcasing the expressivity of our approach, along with some compilation results.

Gentle Representations for Record and Sum Types
A Rust-like version of red-black trees (RBTs) is depicted in Fig. 1.We first define Color, which is either the constant Red or Black.RBTs are defined through the mutually recursive types Node for a node of the tree, and RBT for the tree itself.A Node is a product type containing a color, a value, and its left and right children.A tree of type RBT is a sum type with two variants: Empty and Node.For instance, Node{c:Red, v:1515, l:Node{c:Black, v:0, l:Empty, r:Empty}, r:Empty} is a tree containing two integers.The cardinal function takes a tree and returns its cardinal using pattern matching.If its argument is of the "shape" of the left-hand side of a rule then the value of its right-hand side (body) is evaluated.Pattern matching is introduced by the keyword match and alternatives are enumerated in a list of branches the form  ⇒  where  is a pattern and  the body.Moreover, patterns can be nested, and the body can use named subterms.In our example, Empty yields a cardinal of 0 and Node{c:_, v:_, l, r} yields a cardinal of 1+cardinal(l)+cardinal(r).
Like most self-balancing trees, RBTs are a performance-intensive data structure.Indirections in the memory representation limit locality, result in slow memory loads, cache misses, and slowdowns of several orders of magnitude.To achieve best possible performance, it is critical to pay attention to the memory representation of our types.We would nevertheless prefer to tweak the representation without mangling its source representation, which provides nice data constructors and access functions that are close to intended type semantics.We propose to use detailed annotations to precisely describe the memory layout of data types.Unlike previous efforts [Chen et al. 2023], our annotations are fully explicit, handle sum types gracefully, and scale well to a wide variety of popular representation techniques such as bit-stealing, unboxing, packing, etc.
As our first foray into representation tweaking, we chose an almost naive representation of RBTs where trees are either an aligned pointer to a Node, or Empty represented as a 64-bit word encoding the value 1.This representation is described using our DSL ribbit in Fig. 1b.In lines 4-8, we define the representation of a single Node as a block, denoted by {{...}}.Each subterm of the original type is represented, along with the layout it should use.For instance, we decide to encode the value (field v of the source record) as an i64 placed in the first field of the memory block (as we assume it is the most accessed field) with (n.v as i64).We also decide to explicitly pad the structure after the color to keep the left and right children aligned (hence i63).The RBT type is an alternative: either Empty or For this purpose, we use the notion of splits.split p {...} indicates a choice depending on the integer value stored at position p.In this case, the empty position .inspects the bitword (word or pointer) at the root of the tree being described.The split then contains a list of branches each containing an integer value in its left-hand side, and a layout specification in its right-hand side.Concretely, line 12 indicates that if the RBT value t is 0, we are considering the Empty case and the tree is represented as an i64.Otherwise, on line 13, we are in the Node case, which is represented as a 64-bit pointer (denoted &<64>(...)) to a Node block.
This memory specification allows us to control the representation precisely: for instance, here, we do not add any indirection to left and right children, so that empty children are represented directly.For pedagogical purposes, we picked a common representation for trees in C-like languages: a pointer that is null for empty trees.Unlike C, however, the language forbids dereferencing a null pointer and users can use nice constructs such as pattern matching and field accesses.As a first example, we provide in Fig. 1d the output of our compilation procedure for the cardinal function of Fig. 1c.In the proposed low-level representation,  (the input representation of a tree in memory) is only dereferenced when the tree is not empty.

Pointer Tagging and Unboxing
Potentially-null pointers are a common trick that is easily attainable for a dedicated optimizing compiler.In fact, Rust automatically finds a similar representation.Let us explore a more delicate example taken from Zarith [Leroy and Miné 2010], an OCaml library for arbitrary-precision integers.To speed up computations, integers in zarith are either "small", represented as 63-bit integers and using usual instructions, or "big", using GMP's BigInt.The choice of representation made in Zarith is optimized and not naturally expressible via OCaml datatypes.Instead, it uses unsafe operations and the C foreign function interface.We can readily implement this type with a bespoke representation in ribbit, shown in Fig. 2, which leverages two new notions: bitword specifications and pointer alignment.A Zarith integer is represented in the Small variant as a 64-bit word with its lowest bit set to 1 and its 63 higher bits encoding the actual integer.This is specified via the with construct, which takes a position inside of the current bitword (here, a range of bits), and a memory specification describing its contents.The Large variant is represented as a 64-bit word-aligned pointer (denoted &<64, 8>(...), where 8 is the number of unused bits).Using split, we distinguish between these two variants using the lowest bit (.[0, 1], 0 is in the range, 1 is excluded), which is 1 in the Small variant (by specification) and 0 in the Large variant (thanks to pointer alignment).As before, we take care of mentioning all subterms in the type, here i.Small and i.Large, along with their representations.
In this example, we exploit pointer alignment to embed the tag identifying the variant in the lowest bit.This optimization is generally known as bit-stealing and is commonly done manually in C programs.This involves implementing all case analysis by hand using masks and manual dereferencing for every manipulation of the corresponding data.This is not only error-prone, but also obscures program semantics compared to pattern matching.Conversely, high-level languages have difficulty expressing such types cleanly (this example is taken from an OCaml library) and must drop down to lower-level manipulation, thus forgoing their nice high-level guarantees.Thanks to our specification, the programmer retains the full control over memory layout available in C, while enjoying the descriptive aspect of high-level constructs associated with ADTs.We already showcased the safety result of non-null pointer dereferences "by construction" in the previous section.Our approach also provides additional constraints for the memory layouts to be correct.Here, our "bit-stolen" type is valid and agrees with the user source type because: • All source subtypes (Small and Large) are properly accounted for in the memory type: i.Small as ... and i.Large as .... • Split branches in Fig. 2 are distinct: their left-hand sides all contain different values (0 and 1) that correspond to all possible variants (Small and Large).• The bit-stealing process is "legal": the first bit is always unused in the Small case and fits within address alignment bits in the Large case.The formalization of such validity criteria is one of our contributions, described in Section 4.

Real World: Red-Black Trees In Linux
As a final case study, let us look again at RBTs, but this time with a highly optimized representation originally found in the Linux kernel to model device trees [Torvalds 2023].The representation is hand-tweaked to take as little space as possible by embedding both the tag and the color in pointer alignment bits.The original version keeps a pointer to the parent node, which we eschew for pedagogical purposes.The ribbit code is shown in Fig. 3a.The source type is the same as in our first representation in Fig. 1a.The main novelty is that we combine with and subterm types in line 8 to store the Color subterm inside a second alignment bit.Since the source type does not change, the rest of the program, such as the cardinal function we showcased, will work unmodified.A similar change of representation in another context would often require rewriting a significant amount of code.
RBTs famously rely on a fairly complex balancing step, which redistributes colors depending on the internal invariant of the data structure.Thanks to nested patterns and "or" patterns, this step can be expressed very compactly using pattern matching, as shown in Fig. 3b.This pattern (Type expressions) matching simultaneously inspects four arguments: the current color c, the current value v and the subtrees t1 and t2.Writing the corresponding C code for an optimized memory representation is rather painful, requiring bitmasks before dereferencing and careful bit manipulations.Since this pattern matching is at the core of a performance-sensitive data structure, we naturally want it to be as efficient as possible.Optimized decision trees result in highly non-trivial code, as can be seen in Fig. 3c.Many pattern matching implementations come with clever techniques to output optimized decision trees [Kosarev et al. 2020;Maranget 2008;Sestoft 1996].Unfortunately, these approaches are designed for values that directly reflect the structure of the underlying ADT.As we have seen, this is not necessarily the case for optimized memory representations: some part of the value could be used to discover the shape of some other part, such as whether a dereferencing is legal or not.One of our contributions is a novel algorithm that takes such fine-grained dependencies into account in order to compile source-level pattern matching to low-level code.This algorithm is parametrized by a memory specification, which is part of our input language which we define now.The next section focuses on core features of our input language.In Section 7 and Appendix B, we describe extensions that allow our formalism to capture more intricate layouts, such as WebKit-like NaN-boxing, which we present in Appendix A.

SPECIFICATIONS FOR ALGEBRAIC DATA TYPES
We now formalize our input language.As before, we present a two-tiered view: source types used for programming and following a common presentation of ADTs and memory specifications detailing how to represent them in memory.We also detail the shape of programs we consider for our compilation algorithms.For simplicity, we initially consider only "regular" memory specifications, which limit inlining of sum types.We show how to relax this restriction later on.

Source Types
Our source language is composed of simple (monomorphic, immutable) algebraic data types whose grammar is presented in Fig. 4. We denote types using  and type variables with .We denote all tuples with angle brackets, for instance ⟨Int 64 , Int 64 ⟩ for the type of 64-bit integer pairs.Constructors of sums are marked with a capital letter, for instance Some() + None is an option type.In examples, we use  as shortcut for  (⟨⟩).All references are marked with an ampersand, for instance &Int 8 is the type of references to 8-bit integers.In addition, we use Γ to denote type environments, i.e., maps from type variables  to types , which we use to model recursive types.
Paths and Focusing.Paths, denoted  in Fig. 4, indicate a position into a value, a type, or a pattern.For instance, in the pattern (&), the variable  is at the position  = .. * , where * denotes dereferencing.We also generally denote focus (, . . . ) the term at this position.For instance, focus (., (&)) = &.The full definition of focus can be found in Appendix C.

Memory Specifications
The second ingredient of our language is a memory description for every source type .This memory type describes how the given type will be represented in memory.We consider an abstraction of memory similar to LLVM IR's types, shown in Fig. 5.At this representation level, we are bit-precise, yet abstract away some architecture-dependent details such as endianness and machine pointer size and alignment.Handling of these details is delegated to the underlying compiler backend (in  our case, LLVM: see Section 8).For the purpose of this formalization, each pointer solely consists of an address and unused alignment bits at the lower end. 3et us first ignore the "Split" alternation, and focus on the rest of the grammar.As a convention, all memory elements are given a hat, for instance  for memory types.
Simple memory types construction.In Fig. 5, type variables, which we use to handle recursive types, are denoted by .( as ) indicates that the subterm at the position  in the source type will be represented by the memory type .A singleton type, denoted (= ), is a memory type whose only possible value is the immediate .Structure types are denoted by {{  0 , . . .,  −1 }}.Memory words can be of two natures: Word ℓ is the type of words of size ℓ and Ptr ℓ, ( ) is the type of pointers of size ℓ, with  unused bits due to address alignment, and pointing to a value of type .In both cases, they can be accompanied by bitword contents, denoted ⋉  :  which indicates that the value at position  follows the memory type  (observe the priority of ':' over '⋉').
Memory path operations include pointer dereferencing and field accesses in structs, as well as (constant) bitwise and arithmetic operations that manipulate memory words.The set of available bitwise and arithmetic operations shown in the grammar is arbitrary; it depends on the targeted instruction set.In the rest of this paper, we will mainly use the extraction operation [, ], whose semantics are "extract bits from  inclusive to  exclusive, with 0 denoting the least significant bit".
Example 3.1 (Memory type description).Let us consider the source types  int = Int 64 and  tup = ⟨Int 32 , &, Int 8 ⟩.We have a straightforward memory encoding for  int :  int = ( as Word 64 ) which encodes a 64-bit integer as "itself".For  tup , we choose to represent the tuple with a "struct", and decide to order its components as follows: first, we store the Int 32 (position .0 in the source type), then the Int 8 (position .2 of the source type), then 24 bits of padding (with zeroes, hence the bitword content ⋉ : (= 0)), then the  pointer on 64 bits.We group the Int 8 , Int 32 and 24 padding bits together, thus leading to  tup = {{(.0 as Word 32 ), (.2 as Word 8 ), Word 24 ⋉  : (= 0), (.1 as Ptr 64 ( ))}}.

Split memory types.
Using a memory-level specification to describe type layouts and pattern matching is convenient, but it raises several new difficulties that are not obvious in more direct approaches.One notable such difficulty is that, notably in the case of sum types, memory accesses can have dependencies that do not follow source-level nested subterms.Let us show this on an example.
Example 3.2 (Non-nested dependencies).We consider the Zarith example from Section 2.2, using a large integer type to model GMP::BigInt.Let   = Small(Int 63 ) + Large(Int 128 ).Once again, we choose a compact representation exploiting bit-stealing: values of the form Small() map to 64-bit words whose lowest bit is always 1; values of the form Large() map to pointers aligned on at Fig. 6.Grammar for the source language ( ∈ Var) least one byte.Now, consider the pattern Large(1) | Large(2).In the source version, it is clear we must establish whether the head constructor is Large before matching its contents.In the memory version, things are less clear: all values are "simply" words, and both patterns are supposed to dereference a pointer to access integers.For safety however, we must ensure that the word is indeed a pointer before dereferencing it to observe its contents!
To enforce such constraints, we introduce splits whose purpose is to model constraints of the form "if the numeric value at a given position in memory is , then we follow the memory type associated with ".This lets us model choices in representation, and clarify dependencies in non-nested cases.We denote Split ( )  a split type with a discriminant position  and a set  of branches of the form   : K  ⇒   .It expresses that if the value at position  in memory is   , then it corresponds to a source constructor in the set K  , dubbed the branch provenance, and its memory type is   .
Example 3.3 (Splits in Zarith).Our choice of representation in Example 3.2 is expressed by: Since there are two possible representations depending on the head constructor, we use a "split" type with two branches.In the first branch, the provenance {Small} indicates that it represents values of the Small(Int 63 ) case in the source type.Its associated memory type is Word 64 ⋉ ( [0, 1] : (= 1)) ⋉ ( [1, 64] : (.Small as Word 63 ): a 64-bit word whose lowest bit is set to 1 and whose 63 highest bits encode the subterm at position .Small (that is, the 63-bit integer value from the source type).In the second branch, the provenance is {Large} and the memory type representing Large(Word 128 ) is Ptr 64,1 ((.Large as Word 128 )) ⋉ [0, 1] : (= 0): a byte-aligned pointer to a 128-bit word encoding the integer value at position .Large in the source type, with its lowest bit ("alignment bit") set to 0. Finally, the split discriminant [0, 1] indicates how to tell these two cases apart: by looking at the lowest bit, which is 1 for the Small variant and 0 for the Large variant.

Source Programs and Compilation Setup
As setup for our compilation algorithms, we formalize input programs as simplified expressions (shown in Fig. 6) where every expression is let-bound, akin to A-normal form (Sabry and Felleisen [1993]).Our main interest lies in two categories of expressions: constructors, i.e., value allocation; and destructors, i.e., pattern matching.Constructors of ADTs, denoted , are simply values with a syntax reminiscent of types, along with constants  ∈ C and variables .They syntactically cannot contain expressions (which should be let-bound).To showcase proper compilation of complex values, sub-values can be nested directly.For instance, Node(⟨Red, 42, , ⟩) is a single constructor.For simplicity, we do not support recursive values (types might be recursive).
Destructors correspond to pattern matching, shown in Section 2: they allow to filter out value shapes at the left-hand side of rules, bind the variables they contain, and return the expression on the right-hand side.Patterns, denoted , consist of constructor patterns, along with variables, wildcards (_), and or-patterns ("|").For instance, given the source type  = (Int 1 ) + (&Int 32 ), the pattern (&_) denotes any value of the given "shape" with an ignored sub-value at the position of _.
For both values and patterns, we consider a focusing operation, similar to the one described for types.For instance, focus (..* , (&)) = .The full definition of focus can be found in Appendix C.
Expressions generally contain numerous other operations, notably function definition and application.We consider the compilation of functions to be orthogonal to the contents of this article, and will not detail it.In the rest of this article, we will focus on compilation of constructors in Section 5 and of destructors in Section 6.However, for these two compilation procedures to be correct, we first need to formalize constraints on user-provided memory layouts.

VALIDITY AND AGREEMENT OF TYPES
To ensure the correctness of a given memory specification, we consider two complementary notions: intrinsic validity of a memory type, and agreement between source and memory types.We first define utility operations on memory types.

Specialization
In many cases, we need to identify "the part of a memory type corresponding to the variant ".For instance, when compiling the pattern  =  ( ′ ), we want to know which parts of the memory type are relevant to the pattern  ′ .For this purpose, / filters a type  by .It proceeds as a map over  and returns a specialized, split-free memory type.For splits, it returns the branch whose provenance set contains .The full definition is in Appendix D; we present it here on an example.
Example 4.1 (Specialization of the Zarith memory type).Let us specialize   from Example 3.3 for the constructor Large.By looking at provenances, we establish that only the second branch of the split definition is relevant.We then explore that type recursively and obtain   /Large = Ptr 64,1 ((.Large as Word 128 )) ⋉ [0, 1] : (= 0).

Focusing
The memory version of focusing, which extracts nested memory terms from a parent term at a position specified by a memory path, is denoted focus and defined in Appendix D. Unlike its source version, focus is not complete: indeed, it is unclear what the focus of a split should be when the discriminant itself is not under focus.We thus define focus to be invalid on splits.For all other constructs, it follows intuitions from the source.For instance, focus .

Validity of Memory Specifications
We first define a validity judgment denoted Γ ⊨  : .The kind  is either Spec, representing unsized numerical data and only used to specify bitword contents, Block representing contiguous memory values and never used in bitwords, or Word representing bitwords and usable anywhere.The full judgment is defined in Appendix D; we show the rules concerning words and splits.
In the WordValidity rule, we enforce that bitword contents should be specs or words.We also ensure that paths must behave properly: they should adequately fit in the word size, and not overlap each other.This constraint is essential for focus to be defined on words.In the SplitValidity rule, we ensure that provenances in distinct branches are disjoint, and that focusing on the discriminant indeed yields the word on the left-hand side.These constraints ensure that specializing a type with a tag always returns a single result.
Example 4.2 (Invalid memory type: C unions).This validity judgment already rejects non-trivial user mistakes.Let us emulate a traditional C union layout for   , which takes up as much space as the largest variant (Large):  bad = Word 128 ⋉ [0, 63] : (.Small as Word 63 ) ⋉  : (.Large as Word 128 ). bad is not valid: the Word kinding rule does not apply because [0, 63] and  overlap.Since it lacks distinguishing data to use as a discriminant, this layout is not expressable as a split.

Agreement Between Source and Memory Types
We now state the relationship between a given source type and its memory specification.
Definition 4.3 (Agreement).Let  a source type and  a memory type.We say that  represents , or agrees with , and we write  ⊩ , if the following conditions hold: All subterms bind source subtypes to their valid representation.
(Subterm Coherence) Either  and  are leaves (i.e.,  = Int ℓ and  = Word ℓ ) or for all  such that focus ( , ) = ( as   ), we have   ⊩ focus (, ).Provenances and split branches are coherent with the source type.(Provenance Coherence) For all  s.t.focus ( , ) = Split (  ′ )   : K  ⇒   , for each  and each  ∈ K  ,  is a head constructor of  and / ⊩  ( ′ ) where  ′ = focus (., ).All source subtypes are represented within the memory type.
(Coverage) For every  that leads to a leaf in  (i.e., focus (, ) = Int ℓ ),  covers : the memory type  ′ = / contains a subterm type for a source position  0 prefix of .More precisely, there exist source and memory paths  0 ,  0 such that focus (  0 ,  ′ ) = ( 0 as  ′′ ) and  0 ⪯ .Memory types provide a way to tell branches of sum types apart.
As a first example, we show that our memory specification for Zarith agrees with its source type.
Example 4.4 (   agrees with   ).The memory type meets all four criteria of Definition 4.3 for   = Small(Int 63 ) + Large(Int 128 ).We define Subterm coherence: immediate since there are no subterms outside of the split (i.e., no subterms are reachable through focusing).
The rest of this section is dedicated to counter-examples.
Example 4.6 (Non-distinguishability).Let  = {{(.Large as Word 128 ), (.Small as Word 64 )}}.It includes both variants' subterms in distinct struct fields but provides no way to distinguish the variants.It thus does not meet the distinguishability criterion for   : we have /Large = /Small = {  }; it lacks a discriminant that differs between Large and Small.
Note that these criteria allow for simple inlined structs and nested fields, but do not allow for complex mangling of nested values.Indeed, distinguishability mandates that every head constructor is identified in a split that occurs "before" its nested constructors' distinguishing splits.This introduces an unnecessary dependency between distinction of nested constructors and head constructors.Furthermore, provenances consist solely of head constructors, preventing specialization of deeply nested source variants.In other words, it is impossible to rearrange deeply nested values or flatten inductive data structures.These simplifications are done for pedagogical purposes, and we sketch how to lift them in Section 7.1 with richer, nested provenances and more elaborate criteria.The full formalism used in ribbit and shown in appendices uses these richer notions.

CONSTRUCTORS: BUILDING VALUES IN MEMORY
As an appetizer for things to come, we start with the compilation of values.Our goal is to compile a source value into the corresponding memory value according to a given memory type.As a reminder, source values are composed of tuples, variants, constants, and variables.

Memory Values
Memory values, defined below and denoted , can be seen as pieces of code that build values in memory.Following memory types, bitword values are denoted Word ℓ () and Ptr ℓ, ( ) (), where  are concrete integers.For pointers,  fits in alignment bits.Struct values are denoted {{  0 , . . .,   }}.For example, Ptr 64,1 (Word 128 (42)) ( 0) is a 64-bit wide pointer to a 128-bit wide word encoding the integer value 42, with the pointer's lowest bit (declared as an alignment bit) set to 0.

From Source To Memory Values
Memory types provide a rich description.Splits not only have a discriminant and various subtypes, but also a provenance indicating which source variants could lead to a branch.Similarly, subterm types ( as ) indicate the position in the source type to which they correspond.This information is unnecessary to specify the memory representation, since it is about the source type.However, it is exactly what is needed to guide the translation from source values to memory values.The key idea is to do a cross-examination of both source value and memory type: the value contains head constructors which let us select relevant parts of the memory type (using specialization, as defined in Fig. 18) and synthesize proper memory values, and so on until a leaf is found.For this purpose, we define two mutually recursive functions: ty2val, which synthesizes a memory value from a type, and val2mem, which explores a source value.Both take a source value , a filtered memory type , and return a memory value.
Synthesized memory values from types.ty2val takes a filtered type, i.e., without any splits.It proceeds by induction on the memory type, rewriting it as a memory value: singleton types are turned into constant values, type variables are looked up in the environment.For subterm types ( as ), it first finds the subvalue at position  within  denoted   = focus (, ). indicates the memory type guiding the representation at this point, so we can recursively call val2mem(  , ).In struct, pointer and word cases, we recursively call ty2val on each nested memory type with the same source value to populate the struct, pointer or word.For words and pointers, we want to synthesize a concrete bitword.For this, we should merge all bitword contents of the form   :   into a single concrete numeric value  such that for every , the evaluation of   on  yields   .This is done by fillbitword, which returns the sum of every   interpreted as an integer, truncated and shifted to fit into the bit range of   .This filling method is correct owing to memory type validity, which asserts that the memory paths   never overlap (i.e., their bit ranges are distinct).
where   = ty2val(,   ) } Val2Mem.val2mem explores the value to direct the usage of ty2val.Variables and constant types (such as Int 64 ) are immediately translated.For values headed by a constructor , we consider  ′ = /, that is, the specialization of  by  as defined in Section 4.1.Specialization returns a memory type without any splits which covers all potential memory types that apply to .For values without a head constructor, we keep the memory type as-is, as it cannot contain any splits thanks to (Provenance Coherence) from Definition 4.3.In both cases, ty2val(,  ′ ) returns the memory value which fits  ′ and contains subvalues of  at the position of each subterm type.Remark on regularity.During translation to memory values, the contents of used variables can be used directly, either by reference or by copy.Indeed, the "regularity" limitation from Section 4 asserts direct subterms are all present in the memory type, forbidding mangling of original subvalues.This still allows inlined structs, but disallows some complex schemes.Lifting this limitation requires a more complex procedure to synthesize values, which we sketch in Section 7.1.

DESTRUCTORS: FROM PATTERNS TO DECISION TREES
Compilation of patterns is done in three phases: first, we translate source patterns into memory patterns.We then use a bespoke intermediate representation dubbed "memory trees" to compile memory patterns to decision trees.Finally, we enrich the resulting decision trees to bind variables to appropriate positions in memory.We assume patterns are exhaustive, non-redundant and contain no variable shadowing, which can be achieved with well-known techniques [Maranget 2007].

Memory Patterns
Memory patterns, defined below and generally denoted , are patterns that work directly on memory values.They contain a mix of elements from regular patterns and of constructions found in memory types.Similar to source patterns, they can contain variables , wildcards _, and alternatives  |  ′ .Following memory types, patterns can also be bitwords such as Word ℓ and Ptr ℓ,  , with some associated specification patterns written ⋉  :  indicating that the pattern  will be applied at position .Finally, struct patterns are denoted  0 , . . .,   .
6.1.1From Source To Memory Patterns.The translation from source to memory patterns is similar to the translation from source to memory values defined in Section 5.2.It uses two mutually recursive functions ty2pat and pat2mem, which both take a source pattern and a memory type, and return a memory pattern.We only give a quick rundown of the differences and demonstrate them on an example.The full definition is in Appendix D.3.
ty2pat.Just like its value version, ty2pat proceeds by induction on the memory type to produce a memory pattern and calls pat2mem on subterm types.Unlike values, we do not need to synthesize immediates using fillbitword: since memory patterns closely follow the shape of memory types, with bitword specifications for words and pointers, we simply proceed recursively.For instance, ty2pat ⟨, ⟩, Word 8 ⋉ .[0,4] : pat2mem.Since the details of memory-specific constructs are handled by ty2pat, the main purpose of pat2mem itself is to handle pattern-specific constructs: variables, wildcards and disjunctions, which are immediately translated to their memory pattern equivalents.Other constructs are dealt with analogously to values, as shown below.

Memory Trees
As we have seen previously, splits are an essential part of our formalism, allowing us to handle case disjunction gracefully, even in cases where the discriminant between branches is found in unexpected places.This also encodes a notion of dependency: a discriminant must be examined before the appropriate branch is taken.The most seasoned approach to pattern matching compilation [Augustsson 1985;Fessant and Maranget 2001;Maranget 1992Maranget , 2008] ] uses pattern matrices where each column encodes a disjunction, and each line encodes a conjunction.This approach is effective in general, but encodes dependencies poorly.Instead, we propose a tree-like structure, dubbed memory trees, to compile our patterns.
Memory trees, defined below, are our main intermediate representation during pattern matching compilation.The key idea is to preserve dependencies, yet leave the compiler free to arrange independent operations in any order.Similar to decision trees, a memory tree can be a leaf node (I) where I is a set of output branch identifiers that may reach this point, or a "decision node" Switch( ) {. . .} which inspects the position  in memory and picks a branch accordingly.A tree can also be a "bud" (I on ): a leaf containing output branches, but that could be developed further if needed using the memory type .Finally, trees can be assembled "in parallel": T ∥ T ′ is a tree where one of T and T ′ is executed first, and the other second, the order being not yet decided.As a shortcut, T [ ] substitutes over leaves and buds in a memory tree.For instance, T [(I) → (I ∪ {})] adds a new output branch identifier to each leaf.

From Patterns To Memory Trees
We now consider a matching problem and describe how to compile it to a sequential decision tree via the use of memory trees.A matching problem is composed of a memory type  under scrutiny and of a list of branches {   →   | 0 ≤  < }, each branch consisting of a memory pattern   (obtained via pat2mem) and a branch identifier   .We proceed in four steps: (1) Scaffold (Section 6.3.1) a memory tree template from the memory type .
(3) Weave (Section 6.3.3) each pattern from the exploded matching problem into the current tree.
(4) Finalize (Section 6.3.4) the memory tree by sequentializing and optimizing it.
Example 6.2 (Example 6.1, cont').The detailed compilation process will be illustrated on our running example.The successive memory trees are depicted graphically in Figs. 8 and 9.For instance, the memory tree of Fig. 8b has 5 parallel nodes (each having 2 children) and 2 switch nodes.For Switch( ) {  → T  , . ..}, the memory path is depicted at the top of the node and   is the concrete value labeling the branch to subtree T  .Buds are depicted in yellow, leaves in green.
6.3.1 Scaffolding: Memory Type-Specific Tree Templates.Scaffolding builds a memory tree "template" based on a memory type.This memory tree does not yet contain any actual output branch, since no pattern has been taken into account yet.More precisely, scaffold  (I, ) creates a memory tree based on the position  (initialized as ) in the type , with I placed at the leaves.The definition is given below.Singleton types are directly turned into leaves, as they do not involve any choice.Subterms ( as ) are turned into buds (I on ), keeping the type  available for later expansion by nested patterns.Struct types, along with word and pointer types' specifications, are all treated as parallel nodes: indeed, the order in which to explore subtypes is irrelevant, and will be determined later on.Finally, splits are turned into switch nodes.Note that, unlike splits, the discriminant of a switch is absolute, hence the use of  . ′ .Additionally, provenances are not useful at this stage anymore, and are thus not recorded in the memory tree.
3.2 Exploding Pattern Disjunctions.Variables might be deeply nested inside a succession of disjunctions and aggregates.Keeping track of which bindings are accessible via which paths in the memory tree would be cumbersome.Instead, we simply break up all disjunctions in the initial memory matching problem to produce a larger problem whose patterns are disjunction-free.While exponential in theory, the size of patterns in actual programs make this a non-issue in practice.explode  yields a list of disjunction-free patterns {  0 , . . ., Its full definition is given in Appendix D.3.We then define the exploded pattern problem as explode {   →   | 0 ≤  < } =  , ↦ →   0 ≤  <  and  , ∈ explode   .For our running example, no pattern is exploded.

Weaving Patterns Into A Memory Tree.
We can now weave patterns from the matching problem into the previously generated memory tree.For each branch (  id →  id ) in the exploded matching problem, we define T id+1 = weave ,id  id , , T id until exhaustion of all patterns.The initial tree T 0 is the output of the scaffolding phase.Each weaving adds relevant choices and outputs from the pattern  id to the tree.The general form weave ,id , , T takes a memory pattern , a memory type , and a tree T , along with the current path  and an output identifier id, and returns a new memory tree.It inspects both pattern and tree, and integrates the latter into the former.We now present a selection of rules.The full definition is in Appendix E.1.Some key steps are illustrated on our running example at the end of this section.
Branch identifiers and leaves.The general goal of weaving is to add the branch identifier id to each leaf that is relevant to this pattern.By design of scaffolding, leaves always correspond to a singleton type, therefore only simple patterns (variables, wildcards, or constants) can occur there.The WeaveLeaf case simply adds the output identifier id to the leaf.
weave ,id , (= ), (I) = (I ∪ {id}) (WeaveLeaf) Subterms, buds and tree expansion.Subterms and buds enable the memory tree to be expanded as needed to handle nested patterns.This expansion happens conditionally using the following three rules.In WeaveBudWildcard, if a wildcard reaches a bud, there is no need to expand: we simply add the output id to the bud.If there is a proper pattern, then we need to expand using scaffold in WeaveBud.Subterms are handled transparently via WeaveSubTerm.
weave ,id (_, , (I on weave ,id , , (I on  ′ ) = weave ,id , , scaffold  (I,  ′ ) (WeaveBud) weave ,id , ( as ), T = weave ,id , , T Aggregates and parallel nodes.The purpose of parallel nodes is to model "aggregate" constructions, which include structs, words and pointers.In all these cases, the order in which sub-patterns should be explored is not set in stone, and will be decided during sequentialization (Section 6.3.4) based on heuristics.We showcase the rules for structs below.Each rule recursively explores every subtree, using appropriate subtypes and subpatterns.The first rule WeaveStructAny handles the case where the memory pattern  is a wildcard _ by propagating this wildcard to each subtree.The second rule WeaveStruct handles the case where  is a struct pattern  0 , . . .,  −1 by weaving each subpattern   into its associated subtree.All aggregate constructions are handled with two similar rules.
weave ,id _, {{  0 , . . . , −1 }} , ∥ 0≤< T  = ∥ 0≤< weave  .,id(_,   , T  ) (WeaveStructAny) weave ,id  0 , . . . , −1 , {{  0 , . . . , −1 }} , ∥ 0≤< T  = ∥ 0≤< weave  .,id  ,   , T  (WeaveStruct) Splits and switches.Switches are decision nodes used to model sum-like constructs (i.e., splits).They are handled by the rule WeaveSplit shown below.The main idea is that  should only be woven into branches of the decision tree with which it is compatible.For instance, if the pattern  only accepts the value {{0, 42}}, it should not be propagated to branches that assume the value at position .0 is 1.For this purpose, we identify the relative path used in the switch discriminant (here,  ′ ), and gather all words that could be accepted by the pattern at this position with focus  ′ ,  .For each branch, if the discriminant value is accepted by the pattern at this position, we weave the pattern into the corresponding subtree.

∀𝑖, T ′
Example 6.3 (Running examplescaffold and weave).We begin with scaffold  (∅,  2 ).The resulting memory tree is depicted in Fig. 8a.The "struct" rule generates a parallel node; its two  The split in   is mirrored by a new switch node, on which we can finally weave the pattern   .Following WeaveSplit, weaving only explores the branch 1 of the switch, whose discriminant value matches that at the discriminant position in the memory pattern (in our case, 1).The branch identifier 0 is then transferred to the remaining buds and leaves, yielding Switch(.0.[0, 1]) {0 → (∅) ∥ (∅ on Word 128 ) ; 1 → ({0}) ∥ ({0} on Word 63 )}.
Finally, this first weaving results in the memory tree depicted in Fig. 8b.After weaving each remaining memory pattern on resultant memory trees, we finally obtain Fig. 8c.
6.3.4Post-Treatment.At this stage, we have woven all patterns into the memory tree.Our memory tree is thus "complete", in that it contains all information from the matching problem.During weaving, the shape of the tree must not be changed: indeed, it must remain synchronized with the memory type for memory patterns to be woven in the right places.Now that weaving is done, however, we can reshape the tree in arbitrary ways as long as semantics are preserved.The goal of this phase is to simplify and optimize the decision tree to prepare it for sequential code generation.As a first step, we "trim" the tree by removing its remaining typing information.We then sequentialize it.At any point, we can apply various classic optimizations, some of which are sketched below.
Trimming.Since we have explored all patterns, the remaining buds will never be expanded, and can thus be turned into normal leaves.This is done by the following operation: trim (T ) ≜ T [(I on ) ↦ → (I)].From now on, we assume that no buds remain in the tree.Sequentialization.seq (T ) is the sequentialized version of T , i.e., a semantically equivalent tree that does not contain any parallel nodes.Its definition is given below and is based on the following description.When we encounter a parallel node, we first pick a branch  0 (based on heuristics, for instance following Scott and Ramsey [2000]).We then graft the remaining branches onto each leaf of T  0 .graft T parent , T child , defined below, places T child at the leaves of T parent and specializes the child tree's leaves by intersecting them with the initial parent leaf.This might result in empty leaves (indicating unreachable code), which can be removed later.Finally, we sequentialize the resulting tree.Note that we sequentialize after grafting.Sequentializing remaining branches before grafting would produce a faster compilation algorithm, but give less freedom for heuristics to pick an appropriate branch at each grafting point.
Tree optimizations.Optimizations on decision trees from literature can be used at any point after trimming.Sequentialization, in particular, might introduce redundant tests or create unreachable branches.Two optimizations are particularly relevant.Constant folding propagates information from switches, such as "position  contains value ", and uses it to remove redundant switches.Dead branch elimination removes branches that lead to leaves with no outputs (i.e., (∅)).
Example 6.4 (Running exampletrim and seq).Fig. 9a is immediately obtained from Fig. 8c by trimming type information from buds.After this step, some redundancy remains, thus we apply an easy simplification to obtain Fig. 9b.We are now ready to remove parallel nodes.Let us pick the left branch (as T 0 ; T 1 is then the other switch node) and apply graft (T 0 , T 1 ).T 0 is thus promoted as parent of T 1 .All leaves of T 0 are replaced by (a copy of) T 1 , in which leaves' sets are intersected.For instance, the right child {1, 3} of T 0 is replaced by a Switch(.1&1)whose leaves are {1, 3} ∩ {0, 2, 3} = {3} and {1, 3} ∩ {1, 2} = {1}, hence Fig. 9c.

Bindings and Extraction
So far, we have built a decision tree that only recognizes patterns and orients towards the right match case.However, it does not bind variables present in the matched pattern.We now show how to build a binding environment adapted to the memory representation.For simplicity, we give a declarative specification here.
We first gather all bindings of an exploded memory matching problem  as a set  of quadruplets (, id, , ) containing a variable, a branch identifier, the next program and a memory path.Concretely, bindings() ≜ (, id,  id , ) focus ,  =  and (  id →  id ) ∈  .
We then replace each leaf with its associated binding environment and output program.We only keep the smallest branch identifier, since the first match case has priority.

Soundness
We now state the soundness of pattern matching compilation, along with a sketch of the proof.Details are in Appendix F.
Typing and evaluation judgments.To state correctness, we require source typing and evaluation judgements.Given a source matching problem  =  0 →   | . . .|  −1 →  −1 , we also define  ▷  → ,  for "m matches value v in the  ℎ branch and binds the variables in ", where  is a map from variables  to source paths .We also write Γ ⊢  :  → Σ for "pattern  has type  and binds variables whose types are defined in environment Σ".Both are fully defined in Appendix C.
Analogously, we write  ▶  → ,  for "  matches the memory value  in the  ℎ branch and binds the variables in ", where  is a map from variables  to memory paths .Note that there is no notion of memory typing for memory patterns and values.We also extend agreement to type environments: given Σ and Σ, we write Σ ⊩ Σ if dom( Σ) = dom(Σ) and for all  ∈ dom(Σ), we have Σ() ⊩ Σ().Finally, let ( , ) = T ▶▶  denote the evaluation of memory tree T on input . and Γ ⊢ T :  denote "tree T inspects memory values of type ".They are fully defined in Appendix D. Theorem 6.6.Let , Γ (resp., Γ) a source (resp.memory) type and its environment such that  ⊩ .Let  be a source value and  =   →  0 ≤  <  a source matching problem such that Γ ⊢  :  and ∀, Γ ⊢   :  → Γ  .For each pattern type environment Σ  , let Σ  such that Σ  ⊩ Σ  .
Proof sketch.We use the memory-level pattern matching judgment ▶ as a bridge between the "constructive" (from source to memory) and "destructive" (from memory objects to trees) parts of our compilation scheme.We first handle the source-to-memory part by proving the equivalence below.We use the agreement criteria from Definition 4.3 to ensure that each component of  is reflected in a component of , then proceed by induction on val2mem and pat2mem, whose definition closely follows the memory type.For the memory-to-tree part, we restore typing information discarded by our compilation scheme, using a separate tree typing judgment Γ ⊢ T :  which ensures that the structure of T reflects that of the considered memory type .Let us first focus on pattern recognition equivalence, putting aside variable bindings.
We first show that scaffolding yields a well-typed tree whose evaluation on a memory value yields the initial output set.In our case, we get Γ ⊢ T −1 :  and T −1 ▶▶  → ∅.
We prove weaving correctness by a "progress and preservation" approach, showing that weaving preserves the "shape" of the tree, encapsulated by tree typing.We use the following invariants: let T a tree such that Γ ⊢ T :  and T ▶▶  → I for some set of identifiers I. Let (  → id) ∈  an exploded memory pattern.weave ,id , , T is defined and yields a memory tree T ′ such that Γ ⊢ T ′ :  and T ▶▶  → I ∪ ({id} if  id ▶  and ∅ otherwise).
For all subsequent operations, we only need to show that the result of tree evaluation is preserved at each step.For each step, this can be done by a direct induction.We conclude the following for our fully woven tree: At this stage, we have shown that we find the same code identifier in the source and compiled version.We now need to deal with bindings.Since we have already proven that bindings are coherent between source-and memory-level pattern matching, we only have to show that memory bindings are preserved by Bindings and BindVars, which is immediate from their definitions.□

EXTENSIONS
So far and for simplicity, we have described our formalism for a language with "plain" Algebraic Data Types (only sums, products, and references), and comparatively simple split types, with only head constructors in provenances (which, in effect, limits inlining) and only one split discriminant.We have extended our formalism in several ways, including constants in patterns and multiple split discriminants, which are described in Appendix B. We now focus on two extensions we consider essential for the expressivity of our approach: irregular specifications, and arrays.Irregular specifications, along with several other extensions, are included in the formalism detailed in appendices and implemented in ribbit.

Irregular Specifications
So far, to ease our presentation, we have restricted specifications to memory types in which split branches' provenances solely consist of head constructors.This limits the extent to which one may provide different specifications for different sum type variants, and prevents inlining of deeply nested fields.
Example 7.1 (Arithmetic expressions).Consider the type of arithmetic expressions on 32-bit integers with variables.Let  exp =  (string) +  (Int 32 ) +  (op,  exp ,  exp ) and op = Plus + Minus + . . . .For our application, we need a representation that supports high sharing of values.Given what we have seen so far, we can immediately choose a bit-stealing representation to distinguish between the  ,  and  cases.The  case is represented immediately with the 32-bit integer nested, the  is a tagged pointer to a string (which can easily be shared), and the  case is a pointer to a structure containing the operator  op (represented by an 8-bit integer) and the two sub-expressions.To make manipulating pointers to expressions cheap, we avoid tagging the  case.
(..0 as  op ); (..1 as  exp ); (..2 as  exp ) While this representation is reasonable, we notice during benchmarking that many expressions have small branches, such as (3 × ) + 5. We now want to leverage the Word 56 padding in  node to store an Int 32 when one of the two subexpressions is directly an integer.This is difficult: we need to make a special representation for values whose shape is  (⟨_,  (_), _⟩ or  (⟨_, _,  (_)⟩.So far, provenances in split branches can only be head constructors of a type.To this end, we extend our setup so that provenances in splits can describe deeper constructors in types.We then use the split construct to provide a specialized representation for values of the aforementioned shapes.We obtain the following modified  node : This makes constructors and destructors more complex.For instance, let us take the expression  (Plus, , ).We need to determine whether  and  carry a  tag, and act accordingly.A "highlevel" version of such code would look as follows: match (, ) { (_), _ → . . .| _,  (_) → . . .| _, _ → . ..}. Naturally, we want to emit the corresponding low-level code.
We now lift the regularity restriction to allow for aggressive inlining, and review the changes to our memory specification formalism needed to accommodate for such richer representations.Fullfledged provenances, denoted prov and defined below, are analogous to simplified source patterns that give the full tree of constructors, with variants, records, references, and wildcard ⊤.The split construct now features a set of full provenances: Split ( )   : provs  ⇒   | 0 ≤  <  .Finally, prov ⋒ prov ′ is the intersection of provenances (which might return ⊥).
Specialization.Because provenances now have an arbitrary depth, the specialization operation defined in Section 4.1 may return multiple possible types.Indeed, consider the types above: the tag  matches several branches, therefore the source pattern  (_) must accommodate memory values that follow either of these branches.To this end, specialization now takes a full provenance and returns a set of possible memory types.For provenances without ⊤s, it always returns a single memory type.
Validity and agreement.We adjust memory type validity to ensure that improved provenances from different branches never overlap.In particular, given a split type Split ( )   : provs  ⇒   , we enforce that all pairs prov ∈ provs  , prov ′ ∈ provs  for  ≠  are non-overlapping.In addition to memory type validity, we must also refine the agreement criteria presented in Definition 4.3.Subterm coherence, coverage and distinguishability criteria only receive minimal changes, following the richer specialization operation.Provenance coherence, on the other hand, is more complex.Indeed, it must now allow inlining: a memory type might include a subterm's nested constructors and distinguish between them early on, enabling it to rearrange arbitrarily nested elements as desired, without requiring the whole subterm to be represented as a standalone memory value.To this end, we define source type specialization, which filters nested constructors of a source type according to a provenance.For instance,  exp / (⊤,  , ⊤) = { (op,  (string),  exp )}.The full definition is available in Appendix C. Since source type specialization may yield multiple types, we must now define agreement criteria between a memory type and a set of source types.These updated agreement criteria are as follows: All subterms bind source subtypes to their valid representation.
(Coverage) For every  that leads to a leaf in at least one  (i.e., ∃ ∈  , focus (, ) = Int ℓ ),  covers : every memory type  ′ ∈ / contains a subterm type for a source position  0 prefix of .More precisely, there exist a source and memory path  0 ,  0 such that focus (  0 ,  ′ ) = ( 0 as  0 ) and  0 ⪯ .Memory types provide a way to tell branches of sum types apart.
Using richer provenances opens the way to unrolling recursive types, for instance, representing a list two-by-two.Unfortunately, this might force constructors/destructors to walk the structure recursively.It is unclear how to behave reasonably in this case.For now, we forbid such unrolling altogether.
val2mem and pat2mem.Both val2mem and pat2mem rely on every variable corresponding to a subterm type, allowing for the representation of a subvalue to be simply copied, or passed by reference.This is not the case anymore.The solution here is, when presented with a variable at a position whose representation has been mangled, to deconstruct and reconstruct locally the value, until we reach the next available subterm type, allowing us to keep "standalone" and "nested" representations synchronized.Naturally, this requires more runtime computation (a traditional tradeoff of very compact representation).While reasonable on paper, this is quite delicate in practice, and even more so in implementation.ribbit only implements simple cases.

Arrays
Our presentation focused so far on usual algebraic data types.However, a central part of representation choices is interaction between data types and arrays: whether to unbox elements inside an array, the size and alignment of these elements, or the choice between "struct of arrays" or "arrays of struct" representations.All these choices are essential to improve locality and performance or unlock the use of specific instructions such as SIMD.We now give a quick sketch of how ribbit could be extended to support arrays.The key idea is to embed iteration variables in memory types and paths to "synchronize" accesses.We illustrate this on an example.

CODE GENERATION
In the previous sections, we have described compilation steps to lower constructors (i.e, value allocation) and destructors (i.e, pattern matching) to a low-level description operating on memory values.This section deals with the last step of our compilation scheme: producing executable LLVM IR code to build memory values, and to deconstruct such memory values using decision trees.The global code generation procedure is rather straightforward: constructors are turned into memory allocation and initialization code and decision trees are turned into LLVM control flow graphs.Two difficulties remain: deal with LLVM's explicitely typed IR and generate code for memory paths.

Lowering Memory Types
Our compilation procedure discards type information, yielding a decision tree whose input is a single untyped root memory value.Backend targets such as LLVM IR require each memory value (including the root value, bound subterm values and intermediate switch discriminants) to be explicitely typed.This section describes the process of building an adequate LLVM IR type for each memory value, using information retrieved from the initial memory type.
Since every memory value in the decision tree is expressed as a position within the root memory value, we can deduce their types from the root memory type.However, our memory types contain splits, which are not expressable as LLVM types [LLVM Language Reference Manual 2023].Indeed, LLVM types are unambigous, in that they only describe fully concrete types (in terms of bit width, legal operations, etc.) and do not capture multiple branches.For our purposes, they consist of (1) fixed-width integer types, such as i32 (2) an opaque pointer type ptr (3) structures that aggregate other LLVM types, such as {i64, ptr}.For simplicity, we assume LLVM pointer types are by default 64-bit wide with 8 unused bits due to address alignment.Different pointer types requires using LLVM address spaces with different data layout specifications (in the LLVM sense).
We now describe, given a type , how to build the corresponding LLVM type.This is straightforward for non-split memory types: we map any ℓ-bit-wide word or pointer to the integer type iℓ, discarding any bitword content specification.We do not use the specific ptr type yet so as to freely manipulate pointer alignment bits.We map structs to LLVM structs and type variables and subterm types to their bound memory type's LLVM type.
Lowering split types to LLVM IR types is less immediate.We need an LLVM type that is able to store values of any branch, and in which the split discriminant is accessible.By definition, the exact shape of a memory value instanciated from a split type (and an unknown source value) is unknown until we inspect its discriminant.Type validity ensures this is possible: indeed, a split type has one kind and each branch type must be of this kind.Furthermore, the discriminant location is always accessible in each branch type.Conservatively, we use the "largest" branch type to determine the common type shape.If the kind is Word, every branch maps to an LLVM integer type and we take the largest.If the kind is Block, every branch maps to an LLVM struct and we recursively find a common type for each field, and keep extra fields.After inspecting the discriminant, we refine the memory type and cast the value to a more precise LLVM type in order to perform operations specific to the identified variant.By validity of memory types, this cast should always be valid.

Code Generation For Memory Paths
Memory paths  are used to specify the discriminant of each switch node and to specify values in binding environments.Code generation transforms memory paths into sequences of instructions extracting the part of the root memory value specified by the path.Memory path operations consist of pointer dereferencing, field access and arbitrary operations on bitwords.Given a target providing instructions for dereferencing, field access and all operations used in bitword content specifications, as well as casts from pointers to words and back, mapping operations on structs and on words to target instructions is immediate.Dereferencing operations require additional care to reset all alignment bits and cast the value to a pointer type before dereferencing it.

EXPERIMENTAL EVALUATION
We implemented our DSL, full compilation procedure and final LLVM IR generation in a tool dubbed ribbit, which is publicly available.The tool supports ADT declarations with recursive types and user-defined memory layouts, including our extension to irregular layouts (Section 7.1), constant patterns and multi-discriminant splits (Appendix B).Our code generation outputs fully shared decision trees (following Maranget [2008]).Finally, on top of the numerous examples described in the rest of this article, we implemented "full representations", i.e., functions from source types to memory types mimicking existing languages, such as OCaml.In this section we demonstrate: • that ribbit is expressive enough to reproduce the behavior of native OCaml and Rust compilers, on some representative middle-sized examples, with similar performance; • that acting on low-level memory layouts impacts static characteristics of generated decision trees, as well as execution-time performance.
The latest version of our artifact, which includes ribbit along with instructions to reproduce these results, is available at https://doi.org/10.5281/zenodo.7994178.For this purpose, we consider two examples: the red-black trees motivating example from Section 2, and a stack machine interpreter example used by Maranget [2008] to showcase various heuristics for OCaml pattern-matching compilation.The full programs are available in Appendix G.For these two examples, we have implemented native OCaml and Rust versions, two ribbit versions mimicking the internal memory representations of OCaml and Rust, along with the Linuxlike red-black tree encoding from Section 2.3.Details of internal memory layout were obtained both from non-official sources [Minsky and Madhavapeddy 2021; The Rustonomicon 2023] and by manual inspection of emitted intermediate representations.However, we do not attempt to match compilation heuristics.Experiments have been performed on a desktop machine running Gentoo GNU/Linux x86-64 on an Intel Core i5 CPU @ 1.60GHz ×8 (although ribbit is single-threaded), with 32 Gio of RAM.As for compiler versions, we used Rust 1.67.1,OCaml 4.14.0 and LLVM 14. Fig. 11.Static metrics of decision trees generated by ribbit : code size is the total number of switches; we also compute the number of switches per path in the decision tree; and the number of dereferences along these paths.
In Fig. 10 we first compare the memory size and number of pointers for some concrete values (a small stack program and a red-black tree with 30 nodes), to cross-validate our memory layout specifications (the size of objects in OCaml/Rust versus the size obtained in ribbit with our encoding).As shown, the data match for all implementations.We then compare execution-time performance of the generated code.This comparison should be made while keeping in mind that ribbit emits code that does far less than native OCaml or Rust: our prototype implementation does not implement sophisticated memory management, or even proper function calls.Additionally, the measured times are very tiny, making measurement difficult.Our goal here is only to show that our technique can emit code of similar efficiency to seasoned industry-ready compilers using their memory layouts, which is indeed the case.The final lesson from these dynamic measurements is that improving the representation of values is very worthwhile: each reduction from the OCaml representation to the Rust and then to the Linux layout significantly reduces memory footprint and execution time.The Linux red-black trees implementation, in particular, is extremely efficient while having a tiny footprint, demonstrating the gain of such sophisticated bit-stealing.
Figure 11 depicts static metrics obtained via ribbit for different memory encodings.For both benchmarks, the better performance obtained with the Rust layout and to a further extent with the Linux layout seems to correlate with fewer derefencing operations.Perhaps unsurprisingly, it seems that a good static measurement to predict performance is the amount of indirection.These results suggest that we should complement existing heuristics for pattern matching compilation with new ones taking data layout-related metrics into account, such as number of dereferencing operations and cache friendliness.We plan to explore this in the future.

Memory Representation and Algebraic Data Types
Memory representation in functional polymorphic garbage-collected languages was quickly identified as an important area for performance improvement (Jones and Launchbury [1991], Leroy [1992], andPeterson [1989]).Our work encourages new development in this area, as it readily supports such layouts and allows to easily experiment with new representations.Unboxing and arrays have been the subject of numerous works and libraries (see [Keller et al. 2010] for a recent Haskell example).We believe many of the data layouts proposed in these works would enhance our approach, notably regarding mutability and concurrency, which we do not explore.Colin et al. [2018] refine the criterion for recursive yet unboxed types in the OCaml case.We believe a similar refined criterion could be used in the irregular case.Iannetta et al. [2021] and Koparkar et al. [2021] propose completely flattened representations for recursive types, providing excellent cache behavior and parallelism but requiring whole-program transformations.In contrast, our technique provides great manual control over memory representation and follows a more traditional compilation pipeline.Supporting such fully flattened layouts would be highly desirable.
Several approaches attempt to combine polymorphism and optimized data layouts.Leroy [1990] shows how to make polymorphic and monomorphic representations work conjointly and Hall et al. [1994] show how to marry specialization and unboxing.Classically, C++ and Rust rely aggressively on specialization.All these approaches would be compatible with our work.
Rust uses the notion of "niche" to exploit unused bits in words and pointers.They have been steadily adding new layout optimizations that exploit manual annotations, up to exploiting pointer alignment like we do [RFC: Alignment niches for references types 2021].We believe our memory types offer a more formal definition, which allows to reason about these data layout optimizations, and hope to collaborate on the topic with Rust developers.We also believe the use of explicit memory types and agreement criteria offer a better user interface for such manual optimizations.

Pattern Languages
Pattern languages have been adopted in mainstream languages with rich static typing such as Haskell, OCaml, F#, Scala or Rust, but also more recently in more general languages such as Python and soon Java.This diversity of host languages, with their very varied compilation techniques and memory representations, make our work all the more relevant.We focused on the core of pattern matching language, with some minor extensions such as disjunctive patterns.Extensions such as ranges, guards, or polymorphic variants [Garrigue 1998] are orthogonal to our work.
Pattern matching is also used pervasively in dependently typed languages [Cockx et al. 2016;Tuerk et al. 2015] and for GADTs [Garrigue and Normand 2015;Karachalias et al. 2015].In such setting, matching on a term can reveal the type of another term.Memory trees precisely handle such dependencies very well, and we hope to adapt them to such typing disciplines in the future.
Active patterns [Syme et al. 2007] and pattern synonyms [Pickering et al. 2016] allow users to abstract over patterns by exposing "constructors" which do not directly reflect the underlying definition of the algebraic data type.This allows for both a "programmer" view and a "representation" view, similar to our approach.Combined with rich type algebra with unboxing annotations, it enables some representation tricks.However, we are so far not aware of any typing algebra that leverages niches and bit-stealing, making such a combination far more limited than our approach.

Pattern Matching Optimization
Pattern matching optimization in strict languages has been a rich topic of study.Closest to us, Fessant and Maranget [2001] and later Maranget [2008] champion the "row and column" approach using a pattern matrix, which is currently used in OCaml.As we mentioned before, this approach deals with dependencies between various parts of a given pattern poorly.Sestoft [1996] emits a rough tree of if nodes, and relies on a global supercompilation pass to optimize the resulting tree.This could be an interesting alternative to our parallel nodes and sequentialization.Kosarev et al. [2020] explore a different optimization technique by encoding the choice of optimal decision tree into a relational synthesis problem, and solving through miniKanren.Their idea is very promising, but fails to scale to big matches.[Solodkyy et al. 2013] propose patterns-as-library for C++ based on objects and template meta-programming.In all these cases, DAGs with maximal sharing can be created from trees using hash-consing [Filliâtre and Conchon 2006].
Most approaches rely on heuristics for choosing a column to split.Maranget [2008] introduces "necessity"-based heuristics.A study of heuristics is done in [Scott and Ramsey 2000].We believe our approach, which exposes much lower-level details than commonly considered in pattern matching algorithms, could benefit from new heuristics taking data layout into account.

CONCLUSION
We have presented ribbit4 , a language approach to describe bit-aware memory representations of ADTs based on a dual-view approach.From source and memory specifications, ribbit compiles constructors and destructors to low-level code.We demonstrate the expressivity and versatility of our approach by encoding classic memory layouts for ADTs (OCaml, Rust) along with numerous memory optimizations such as bit-stealing, unboxing, and inlining.As we provide end-users with an expressive language, we propose a notion of validity and agreement between source and memory types.Together with correctness proofs, these notions strongly increase the trust of the whole process (hence the main title of this article).
Our technique paves the way towards the formalization and description of optimization techniques for memory representation.Indeed, for the moment, end-users write their own memory layout specification.We plan to investigate generating new optimized memory layouts, for instance applying super-optimization or automatically searching for representations optimized for space and cache behavior.ribbit can serve as a substantial building block for these future works.
Naturally, to achieve this objective, we also want to extend our approach.Future versions of ribbit featuring arbitrary pointer layouts and even lower-level, architecture-dependent details (e.g., endianness) would enable memory layouts specifically tuned for various platforms.Mutability and concurrency in algebraic types have received significant attention from the Rust community, which adapt well to our work.We also hope to capture far more tweaked representations for recursive types, including unrolling and flattening, notably following the work from Koparkar et al. [2021].

A EXAMPLE: WEBKIT-LIKE NAN-BOXING
In this section, we describe a representation that models the memory layout used in the JavaScript-Core engine (built into WebKit) to encode JavaScript values on 64-bit platforms, which uses an optimization dubbed NaN-boxing.Our description is based off the implementation available in [We-bKit NaN-boxing 2023].
We use the following extensions to our memory description framework: • irregular specifications with nested provenances, described in Section 7.1; • "default" branches in splits that capture values which do not match any other branch, which we denote with a wildcard discriminant value _.For instance, Split ( )  : provs 1 ⇒  1 _ : provs 2 ⇒  2 describes a memory type that maps values whose provenance is in provs 1 to  1 , which always contains  at the position , and values whose provenance is in provs 2 to  2 , which never contains  at the position ; • pattern matching on integer and floating-point numeric constants, described in Appendix B.1; • simple arithmetic operations (in our case, addition) on values of constant types: ( +  as Word ℓ ) denotes a memory type that maps a source value  to an ℓ-bit word encoding the numeric value focus (, ) + .
on the 32 highest bits).We define the corresponding memory type We finally combine the three memory types previously defined for double, integer, and nonnumeric values into a single memory type jsval which is valid and agrees with jsval, using the top 16 bits as tag bits (i.e., split discriminant): We now want to allow pattern matching on constants.Let us consider C the set of constant types.We assume that constant types only have a single subterm whose path is .As an example, we have Int ℓ ∈ C. Pattern matching on constant requires a few additions: • Unlike for sum types, a switch on constants can not cover "the full signature" which contains all the cases.Indeed, some constants types have infinitely many values, such as strings.Even finite ones such as 64bits integers are far too large to enumerate.• The "interesting" branches are not known during scaffolding, and will need to be added during weaving of each patterns.
For this reason, we first extend memory trees with switches on words with a default case, in Fig. 12b.On first inspection of the memory type, when we encounter a subterm ( as ) with  a constant type, scaffold simply emits such switch with an initial default case containing a leaf, as shown in Fig. 12c.Initially, there is only a default case as no other constant patterns has been presented.weave then modifies the appropriate branches, as shown in Fig. 12d.There are three cases depending on the woven pattern .If  is a constant pattern  for a branch that is already present in the switch, we simply extend that branch.If  is a wildcard, we propagate in all the branches.Finally, if  is a constant pattern  that is not present in the existing branches, we add a new branch initialized as the default case, and propagate in that new branch.In all cases, each branch can only contain a simple leaf, which we (potentially) extend with the new .
In this presentation, we assume that the constant type is represented by a word.Some constants are more complex, for instance strings.In that case, we could introduce a dedicated switch construct that would be handled specifically.This would be essential for strings, for instance, as switches on strings already benefits from numerous optimisations in compilers.

B.2 Multiple split discriminants
Our split types may be made more flexible by the addition of multiple discriminant locations.So far, our memory types only allow for a single memory path to indicate the piece of data in memory that distinguishes between different branches of a split.However, one may wish to use a combination of multiple values at distinct positions in memory instead.
For instance, consider the following source type: provenance Cons: the subterm Cons.1 would be missing.We require the improved provenance Cons.1.Nil.The last case (a list with at least two elements) is represented by a pointer to a cell containing both elements, and the appropriate subterms.The requirements that we cover all branches and can distinguish between branches wouldn't hold with "simple" provenances.This change of representation has numerous consequences, both semantically (the behavior of sharing, notably), and codegen-wise (how to extract values during pattern matching?)and should be evaluated carefully.Crucially, this change of representation leave the source code unchanged, allowing for easier experiments.

C SOURCE LANGUAGE
Dynamic Semantics.Fig. 13 defines the dynamic operational semantics of patterns. ▷  →  stands for "p matches value v and binds the variables in ".Most of the rules are straightforward, with the following peculiarities: • The bound value environment returned by the matching is populated by variables, through the Var rule.• We enforce that there is no shadowing: variables must be bound only once, as asserted by the side-condition in the Tuple rule.• Alternatives are left-leaning: we first try to match with the left branch (AltL rule) before trying the right branch (AltR rule).We also define the matching judgement in rule Matching: which has the same behavior as the normal ▷ judgement, but additionally returns the index of the branch which was matched.In a full language, it would then trigger the evaluation of the body of the branch in question.Typing.Fig. 14 defines the typing judgement Γ ⊢  :  → Γ ′ : "pattern  has type  and binds variables whose types are defined in environment Γ ′ ".Typing closely follows dynamic semantics: • As before, the bound typing environment Γ ′ is populated by variables through the Var rule.

Any
• We enforce that there is no shadowing in the Tuple rule.
• Bound environments must be identical in all branches, as enforced by the Alt rule.
Focus utility fonction to access subterms.The subterm of a given type, pattern or value at a position indicated by a path can be accessed using the focus function defined in Fig. 15.Source type specialization.Specialization by a provenance recursively filters constructors and yields a set of source types.defined in Fig. 18, yields the set of possible split-free memory types, that is, those in which only split branches that match the provenance were kept.
We also define specialization for the special wildcard tag ⊤ by keeping all split branches in the last rule:

D.3 Memory patterns
A high level presentation of pat2mem and ty2pat can be found in Section 6.1.1.These functions are similar to val2mem and ty2val from Section 5.2.
Memory types to memory patterns.ty2pat takes a source pattern, a filtered (i.e., split-free) memory type, and returns a memory pattern.Its definition is mutually recursive with pat2mem, but we present it first alone for clarity.It proceeds by rewriting the memory type as a memory pattern by changing its leaves: singleton types are turned into constant patterns, type variables are looked up in the environment.Subterm types ( as ) are more delicate: first it finds the subpattern   at position  in p.If there are no nested patterns,   is a variable or a wildcard. indicates the memory type guiding the representation at this point, so we can recursively call pat2mem(  , ).Note that   might not be defined: indeed, a memory type could mention some deeper subterm that is not present in the pattern.In this case, ty2pat fails.The following typing judgment characterizes trees produced by scaffolding, weaving or sequentializing a well-typed tree for a given memory type.In addition to the usual memory type environment Γ, it is parametrized by the current memory path  relative to the root memory type.Definition D.1 (Tree typing).Informally, Γ,  ⊢ T :  means "the tree T is suitable for weaving memory patterns and evaluating memory values which, when focused on , follow the shape of the memory type ".This section details the proof of soundness of our compilation scheme sketched in Section 6.5.We do not consider arrays, and do not guarantee that bindings are coherent for non-regular specifications.
We do include full-fledged provenances (and prove pattern recognition equivalence in the nonregular case), multiple split discriminants and pattern matching on constants.The first part of the proof, which establishes the equivalence between source-and memory-level pattern matching, relies on the agreement criteria from Definition 4.3.We use a nested induction technique to span the whole construction of memory patterns and values.Recall the definition of val2mem and pat2mem.These functions handle special cases of source values and patterns such as wildcards, variables, disjunctions and immediate values.The task of transforming remaining value/pattern pieces into memory values/patterns is left to ty2val and ty2pat, respectively, which do so by closely following the structure of the memory type.Upon encountering a subterm type, we call back val2mem/pat2mem to delve deeper into the source value/pattern.Similarly, we will use an outer induction hypothesis that deconstructs the considered source value/pattern, then inspect the memory type in an inner induction hypothesis.Proof.By nested induction on  within .We perform an outer induction step upon encountering a subterm type within .We then use the appropriate agreement criterion to deconstruct  while ensuring that the new memory type  ′ is adequate for the subterm (for instance, if  =  ( ′ ), we use the coverage criterion to ensure that  ′ agrees with the source type of  ′ ).Other criterion are used in innermost induction steps: for instance, upon encountering a split type, we use the distinguishability criterion to decide which branch should be explored based on .□ We can now focus entirely on memory-level constructs in order to prove that compiling a memory pattern matching yields a tree with the same evaluation semantics.We first define a tree evaluation judgment that corresponds to "execution" on some input memory value.Definition F.2 (Tree evaluation).T ▶▶  → I means that executing T on the input  yields the set of output identifiers I. T may be an actual decision tree, in which case I is replaced with a proper output ( , ).Informally, we show that the following invariant is true during the entire compilation procedure, up to and excluding the final step that replaces leaf contents with single outputs: "Let T the current tree and I such that T ▶▶  → I.If  has been woven onto T with the identifier id, then id ∈ I ⇐⇒  ▶ ."The following typing judgment characterizes every "current tree", that is, any tree produced by scaffolding, weaving or sequentializing an adequate input.In addition to the usual memory type environment Γ, it is parametrized by the current memory path  relative to the root memory type.
Definition F.3 (Tree typing).Informally, Γ,  ⊢ T :  means "the tree T is suitable for weaving memory patterns and evaluating memory values which, when focused on , follow the shape of Fig. 1.Red-Black Trees with a Rust-like layout

Fig. 3 .
Fig. 3. Red-Black Trees in the Linux kernel

Fig. 4 .
Fig. 4. Grammar for source types and paths

Fig. 5 .
Fig. 5. Grammar of memory types  and memory paths .

WordValidityΓ
⊨   :   ∈ {Spec; Word}   valid in a word of size ℓ ∀ ≠ ,   and   have distinct bit ranges

Fig. 16 .
Fig. 16.Definition of specialization on source types Fig. 17.Path focusing on memory types