FP²: Fully in-Place Functional Programming

As functional programmers we always face a dilemma: should we write purely functional code, or sacrifice purity for efficiency and resort to in-place updates? This paper identifies precisely when we can have the best of both worlds: a wide class of purely functional programs can be executed safely using in-place updates without requiring allocation, provided their arguments are not shared elsewhere. We describe a linear _fully in-place_ (FIP) calculus where we prove that we can always execute such functions in a way that requires no (de)allocation and uses constant stack space. Of course, such a calculus is only relevant if we can express interesting algorithms; we provide numerous examples of in-place functions on datastructures such as splay trees or finger trees, together with in-place versions of merge sort and quick sort. We also show how we can generically derive a map function over _any_ polynomial data type that is fully in-place. Finally, we have implemented the rules of the FIP calculus in the Koka language. Using the Perceus reference counting garbage collection, this implementation dynamically executes FIP functions in-place whenever possible.


INTRODUCTION AND OVERVIEW
The functional program for reversing a list in linear time using an accumulating parameter has been known for decades, dating back at least as far as Hughes's work on difference lists [1986]: As this definition is pure, we can calculate with it using equational reasoning in the style of Bird and Meertens [Backhouse 1988;Gibbons 1994].Using simple induction, we can, for instance, prove that this linear time list reversal produces the same results as its naive quadratic counterpart.Not all in the garden is rosy: what about the function's memory usage?The purely functional definition of reverse allocates fresh Cons nodes in each iteration; an automatic garbage collector needs to discard unused memory.This generally induces a performance penalty relative to an imperative in-place implementation that destructively updates the pointers of a linked list.Reasoning about such imperative in-place algorithms, however, is much more difficult.
As programmers we seem to face a dilemma: should we write purely functional code, or sacrifice purity for efficiency and resort to in-place updates?This paper identifies precisely when we can have the best of both worlds: a wide class of purely functional programs can be executed safely using in-place updates without requiring allocation, including the reverse function above.
In particular, what if the compiler can make the assumption that the function parameters are owned and unique at runtime, i.e. that there are no other references to the input list xs of reverse at runtime.In that case, the compiler can safely reuse any matched Cons node and update it in-place with the result -effectively updating the list in-place.In this paper we describe a novel fully in-place (FIP) calculus that guarantees that such a function can be compiled in a way to never (de)allocate memory or use unbounded stack space-it can be executed fully in-place.
To illustrate the purely functional fully in-place paradigm, we consider splay trees as described by Sleator and Tarjan [1985].These are self-balancing trees where every access to an element in the tree, including lookup, restructures the tree such that the element is "splayed" to the top of the tree.As a result, the lookup function not only returns a boolean representing whether or not the element was found in the tree, but also a newly splayed tree.Splay trees are generally not considered well-suited for functional languages, because every such restructuring of the tree copies the spine of the tree leading to decreased performance relative to an imperative implementation that can rebalance the tree in-place.Surprisingly, it turns out to be possible to write the splay algorithms in a purely functional style using fully in-place functions.

Zippers and Unboxed Tuples
Let us first define the type of splay trees containing integers 1 : type stree Node(left : stree, value : int, right : stree) Leaf For the lookup function, once we find a given element in the tree, we need to somehow navigate back through the tree to splay the node to the top.The usual imperative approach uses parent pointers for this, but in a purely functional style we can use Huet's zipper [1997] instead.The central idea is a simple one: to navigate through a tree step by step, we store the current subtree in focus together with its context.Naively, one might try to represent the context as a path from the root of the tree to the current subtree.The zipper, however, reverses this path so each step up or down the tree requires only constant time.For splay trees, we can define the corresponding zipper as: where the destructive match on xs allows the matched Cons cell to be reused (⋄ 2 ) for the Cons(x,acc) allocation in the branch.

Splay Tree Lookup and Atoms
Using the zipper definition for splay trees, we can now define the lookup function as follows: fip fun lookup( t : stree, x : int ) : (bool, stree) zlookup(t,x,Root) fip fun zlookup( t : stree, x : int, ctx : szipper ) : (bool, stree) match!t Leaf -> (False, splay-leaf(ctx)) // not found, but splay anyway Node(l,y,r) -> if x < y then zlookup(l,x,NodeL(ctx,y,r)) // go down the left (NodeL reuses Node) elif x > y then zlookup(r,x,NodeR(l,y,ctx)) // go down the right (NodeR reuses Node) else (True, splay(Top(l,y,r),ctx)) // found it, now splay it to the top The lookup function calls zlookup with an initial empty context Root.It seems we need to allocate the Root constructor, but constructors without any fields do not require allocation.These are typically implemented using pointer-tagging where only the tag is used to represent them.We call such constructors atoms.Examples include the Nil of lists, but also booleans (True and False), and, depending on the implementation, primitive types like integers (int) or floats.
The zlookup function traverses down the tree while extending the current zipper context in-place with the path that is followed.Figure 1 shows a concrete example of how zlookup constructs the zipper in the transition from (A) to (B).Once the element is found, the corresponding node is splayed back up to the top of the tree as splay(Top(l,x,r),ctx).Here we use the Top constructor: type top Top( left : stree, value : int, right : stree ) But why is this Top constructor needed?Can we not just call splay directly with explicit arguments as splay(l,x,r,ctx)?This is not possible though in a fully in-place way.In particular, it would mean that the Node(l,y,r) on which we matched would need to be deallocated as it cannot be reused immediately (and deallocation is not allowed in fip functions).Moreover, we would now need to allocate the final top node later on.If we define splay without using Top, we would get something like: fun splay( l : stree, x : int, r : stree, ctx : szipper ) : stree // not fip!match!ctx Root -> Node(l,x,r) ...
That is, once the zipper ctx is at the Root we need to return a fresh Node with x on top.As there is no constructor that can be reused (since Root is just an atom), this would require allocation.By using the intermediate Top constructor we avoid this: at the call site we can now reuse the Node that we just matched for Top, and later on when we reach the Root, we can reuse Top again to create the final Node that becomes the top of the returned splay tree.This results in the final fully in-place definition of the splay function: fip fun splay( top : top , ctx : szipper ) : stree match!top Top(l,x,r) -> match!ctx Root -> Node(l,x,r) NodeL (Root,y,ry) -> Node(l,x,Node(r,y,ry)) // zig NodeL(NodeR(lz,z,up),y,ry) -> splay( Top(Node(lz,z,l),x,Node(r,y,ry)), up) // zig-zag NodeL(NodeL(up,z,rz),y,ry) -> splay ( Top(l,x,Node(r,y,Node(ry,z,rz))), up) // zig-zig NodeR(ly,y,Root) -> Node(Node(ly,y,l),x,r) NodeR(ly,y,NodeL(up,z,rz)) -> splay( Top(Node(ly,y,l),x,Node(r,z,rz)), up) // (B)->(C) NodeR(ly,y,NodeR(lz,z,up)) -> splay ( Top(Node(Node(lz,z,ly),y,l),x,r), up) The compiler statically checks that this function is fully in-place.As a result, we know that its execution uses no stack space and performs all its rebalancing operations without any (de)allocation -each Top and Node can reuse a destructively matched Top, NodeL, or NodeR in every branch.Each of the matched cases correspond to a "zig", "zig-zig", and "zig-zag" rebalance operation as described by Sleator and Tarjan [1985].Figure 1 shows a concrete example of the "zig-zag" in the transition from (B) to (C).For completeness, the auxiliary splay-leaf function that is called in case the item is not a member of the tree is included below: fip fun splay-leaf( ctx : szipper ) : stree match!ctx Root -> Leaf NodeL(up,x,r) -> splay(Top(Leaf,x,r),up) NodeR(l,x,up) -> splay(Top(l,x,Leaf),up) The definition of splay may look somewhat involved but compared to the imperative definition it is fairly concise.Moreover, the usual imperative algorithm uses extra space for parent pointers in each node.We do not need this: if we study the generated code for the fip functions, we see that the reuse of the zipper nodes corresponds to the usual "pointer-reversal" techniques [Schorr and Waite 1967] (which is illustrated nicely in Figure 1 (B)).Such pointer-reversal is not often used explicitly in practice though since it is difficult to get right by hand.In the code above, however, the fully in-place fip functions using zippers provide a statically typed, memory safe, purely functional definition with the same runtime behaviour, without requiring explicit pointer manipulation.

Borrowing and Second-Order Functions
While our examples have so far been entirely first-order, we also allow functions to be passed as arguments.For example, we can map over a splay tree as follows: In this function we seemingly violate our linearity constraint since f is used three times in the first branch.However, the function parameter is marked as borrowed using the hat notation (^f) [Ullrich and de Moura 2019].This allows the parameter to be freely shared but at the same time it cannot be used in a destructive match, passed as an owned parameter, or returned as a result.Such borrowing is often useful for functions that inspect a data structure.Consider the following example is-node function: This function would not be fully in-place if we matched destructively.A subtle point about higherorder functions is that we consider an application f(e) as a borrowed use of f, and, as a consequence, f cannot modify any captured free variables in-place.We enforce this in our calculus by only allowing top-level functions (rather than arbitrary closures) as arguments in our fully in-place calculus, effectively making it second-order.Finally, note that the smap function is marked not as fip but as fbip.Unlike the earlier splay function, smap has recursive calls in non-tail positions.That makes it hard to claim that this function is "fully in-place"-after all, its execution uses stack space linear in the depth of the splay tree.We use the fbip keyword to signify FIP functions that still reuse in-place but are allowed to use arbitrary stack space and deallocate memory.Nevertheless, for smap this is not really required: in Section 3 we show that any map function of polynomial datatypes (including smap) can be tranformed into a tail-recursive, zipper-based traversal that is fully in-place.
The fbip keyword is derived from the "functional but in-place" technique [Reinking, Xie et al. 2021] which is a more liberal notion of our strict fully in-place functions.Our implementation also supports fip(n) and fbip(n) for a constant n, which allows the function to allocate at most a constant n constructors.This is sometimes useful for functions like splay tree insertion where a single Node may need to be allocated for the newly inserted element, making it fip(1).
1.4 Fully in-Place in a Functional World One might argue that fully in-place programming is just imperative programming in functional clothing.Where are the closures, the non-linear values, the persistence?And who allocates the list to be reversed in the first place?We need to be able to embed our fully in-place functions in a larger host language to be useful.The challenge is to do this safely while still guaranteeing in-place updates when possible.To illustrate this point, consider the following function: Even though reverse is a fip function, it would not be safe for it to destructively update its input list since the argument xs is used twice (as an owned argument) in the body of palindrome.The FIP calculus presented here statically checks a function's definition-yet deciding which calls to fip functions can be safely executed using destructive updates requires further information about how arguments are shared at call sites.
1.4.1 Uniqueness Typing.One way to check this information statically is using a uniqueness type system.For example, Clean [Brus et al. 1987] is a functional language where the type system tracks when arguments are unique or shared [Barendsen and Smetsers 1995;De Vries et al. 2008].A fip function may safely use in-place mutation, provided all owned parameters have a unique type.
In that way, the type system guarantees that any argument passed at runtime will be a unique reference to that object, ruling out any possible sharing.As a result, it is always safe to reuse the argument in-place.One possible drawback of linear type systems and uniqueness typing is that it leads to code duplication, where a single function can have multiple different implementations: one version taking a unique argument; and one taking a shared argument.For example, we may need to define two reverse functions that have an equivalent implementation but only differ in the fip annotation and the uniqueness type of the input list -one uses copying and can be used persistently with a shared list, while the FIP variant updates the list in-place but requires the input list to be unique.
1.4.2Precise Reference Counting.Checking call sites of fip functions need not happen statically.Instead, we could also use a dynamic approach where we check at runtime if a FIP function can be executed in-place [Lorenzen and Leijen 2022;Schulte and Grieskamp 1992;Ullrich and de Moura 2019].This is the approach taken in our implementation in the Koka language, which uses Perceus precise reference counting [Reinking, Xie et al. 2021;Ullrich and de Moura 2019].When an object has a reference count of one, it is safe to update it in-place.To illustrate how the compiled code looks in practice, consider the compilation of the fully in-place reverse-acc function.The destructive match on the input list now dynamically checks whether or not the list can be mutated in place and the generated code will look something like: The reuse credit ⋄ 2 is compiled into an explicitly named reuse token ru, and holds to the memory location of the resulting Cons cell.If the input list is unique, we reuse the address of the input, &xs, and otherwise, we adjust the reference counts of the children accordingly and return freshly allocated memory of the right size.In the recursive call, we initialize the Cons cell in-place at the ru memory Proc.ACM Program.Lang., Vol. 7, No. ICFP, Article 198.Publication date: August 2023.location as Cons@ru(x,acc).Compared to the static analysis, we have lost the static guarantee that the owned parameters are unique at runtime, but also we gained expressiveness: in particular, we can define now a single purely functional but fully in-place reverse function that serves all usage scenarios: it efficiently updates the elements in place if the argument list is unique at runtime, but it also adapts dynamically when the list, or any sublist happens to be shared -falling back gracefully to allocating fresh Cons nodes for the resulting list.

Contributions
To support the motivating examples outlined so far, this paper makes the following contributions: • Following the pioneering work on the LFPL calculus [Hofmann 2000b[Hofmann 2000a]], we present a novel fully in-place (FIP) calculus (Section 2), precisely capturing those functions that can be executed fully in-place.We provide a standard functional operational semantics for our language, but also define an equivalent semantics for FIP functions in terms of a fixed store, where no (de)allocation can take place.As a result, we know that FIP functions never allocate memory and use bounded stack space.As shown for splay trees, atoms and unboxed tuples are needed to avoid allocations for many common scenarios and the FIP calculus includes these features.Furthermore, the rules of the FIP calculus provide a static guarantee of linearity in a syntactic way where parameters can be owned uniquely or borrowed.• The FIP calculus is only useful if it can actually be used to describe interesting algorithms.To show the wide applicability of our approach we present a variety of familiar functional programs and operations on datastructures that are all fully in-place.We have already seen how Huet's zipper [1997] datastructure, colloquially described as the functional equivalent of backpointers, can be used fully in-place, and how we can use this to implement fully in-place splay trees [Sleator and Tarjan 1985].In Section 3, we further show that we can use a defunctionalized CPS transformation [Danvy 2008] to derive a generic map function for any polynomial inductive datatype.
The derived map uses fully in-place Schorr-Waite traversal [Schorr and Waite 1967]   We have a full implementation in the Koka compiler [Leijen 2021,v2.4.2;Lorenzen et al. 2023b], and detailed proofs can be found in the technical report [Lorenzen et al. 2023a].

A LANGUAGE FOR FULLY IN-PLACE UPDATE
Figure 2 presents the syntax of the fully in-place FIP calculus.The syntax has been carefully chosen to be expressive enough to cover many interesting functions as shown in this paper, but at the same time restricted enough to be straightforward to analyze.Particular properties of our syntax are the inclusion of unboxed tuples, borrowed parameters, and the lack of general lambda expressions.The syntax distinguishes between expressions e, and values v that cannot be evaluated further.Values are either variables or fully applied constructors C k , taking k values as arguments.We often leave out the superscript k when not needed.
Unboxed tuples (v 1 , . .., v k ) are considered expressions, rather than values.In this way, we syntactically rule out that unboxed tuples may be passed as an argument to a constructor, causing them to become "boxed" (and allocated).Instead of enforcing this with a type system [Peyton Jones and Launchbury 1991], we use this syntactic restriction to enforce this property.By doing so, the check is simpler and allows us to specify the FIP calculus independent of its static semantics.
We often write just v or e for a singleton unboxed tuple (v), and write an overline to denote an unboxed tuple (v 1 , . .., v k ) as v, or an unboxed tuple of variables (x 1 , . .., x k ) as x.Since expressions always eventually evaluate to an unboxed tuple, the let x = e 1 in e 2 expression binds all components of the result unboxed tuple e 1 in x.
There are no general lambda expressions.In general, closures need to be heap allocated if they contain free variables.To keep the FIP rules as simple as possible, we do not allow arbitrary lambda expressions.Instead, the global Σ environment holds all top-level functions f , which can be mutually recursive and passed as arguments.This makes our calculus essentially second-order.A top-level function is declared as f (y; x) = e, where y are the borrowed parameters, and x are the owned (unique) parameters.Just as in a let binding, the y and x bind the components of the unboxed tuples that are passed.A function is called by writing f (e 1 ; e 2 ) with e 1 for the borrowed arguments, and e 2 for the owned arguments.
If there are no borrowed arguments, we sometimes write just f (e 2 ).The syntax e 1 e 2 is used for general application when the function to be called is not statically known.This happens when a function f is passed as an second-order argument itself, and in such case, e 2 is always passed as the owned parameter(s).We have two match expressions, the regular match, and the destructive Evaluation order: step Evaluation steps: match!.There is no difference in the functional semantics between the two, but in a heap semantics the destructive match can be used for reuse, and as we see in the FIP rules in Figure 4, it can only be used on owned parameters.

Functional Operational Semantics
Figure 3 gives the functional operational semantics for our calculus.This semantics does not yet use a heap.An evaluation context E is a term with a single occurrence of a hole □ in place of a subterm.For example, if Together with the step rule, E determines the evaluation order, where the hole denotes a unique subterm that can be reduced.A small step reduction e 1 −→ e 2 evaluates e 1 to e 2 .The reduction steps are standard except for always using unboxed tuples to substitute.We write e[x:=v] for the capture-avoiding substitution of the distinct variables x = (x 1 , . .., x n ) with the values v = (v 1 , . .., v m ), where n must be equal to m and x ̸ ∈ fv(v).When we substitute in a function body, we write e[x:=v 1 , y:=v 2 ] to substitute all (distinct) variable x and y at once, where we again require a capture avoiding substitution with x, y ̸ ∈ (fv(v 1 ) ∪ fv(v 2 )).
When an expression e cannot be reduced further using step, then either e reduced to an unboxed tuple v and we are done, or we call the evaluation stuck.We have purposefully described the language and its dynamic semantics without a specific type system, but we can easily define standard Hindley-Milner typing rules [Hindley 1969;Milner 1978] to guarantee that well-typed programs never get stuck.

FIP: Fully In-Place
As defined, our functional semantics is very liberal and allows expressions that generally cause allocation, like C x y. Figure 4 specifies the FIP calculus rules that guarantee that the resulting programs can be evaluated without needing any (de)allocation.The statement Δ | Γ ⊢ e means that under a borrowed environment Δ and owned environment Γ, the expression e is a well-formed FIP expression.The borrowed environment Δ is a set of borrowed variables which generally come from the borrowed parameters of a function f , or by borrowing in the let rule.We write Δ, Δ ′ for the union of the sets Δ and Δ ′ .The owned environment Γ is a multiset of owned variables, and also reuse credits.Following Hofmann [2000a] we denote these as a diamond ⋄ k , signifying a credit of size k.We can append two owned environments Γ and Γ ′ as Γ, Γ ′ .Sometimes, we also write Δ, Γ to join a borrowed set with a multi-set Γ which does not contain reuse credits, where the result is the borrowed set Δ joined with the elements in Γ.Note that in the current rules, all variables in the Γ environment occur only once as we have no way to duplicate them.In Section 5 we generalize this to the full Perceus calculus.The FIP rules ensure that variables in the owned environment Γ are used linearly (with some borrowing allowed in let).However, this is a syntactic property and we do not use a linear type system.This is much simpler to specify and implement, and also makes FIP independent of any particular type system used by a host language.
The linearity of the FIP calculus is apparent in the var rule, Δ | x ⊢ x where we can only consume x if it is the only element of the owned environment Γ.Similarly, the tuple rule splits the owned enviroment in n distinct parts, Γ i , and ensures well-formedness of each constituent value of the tuple, v i , under the corresponding environment Γ i .With the atom rule, Δ | ∅ ⊢ C we can return constructors without arguments which we consider allocation free.The owned environment must be empty here since our calculus is not affine: we cannot discard owned variables as that implies freeing a potentially heap allocated value (but in the next section we consider an extension of FIP that allows deallocation as well).
The only way to create a fresh constructor with k ⩾ 1 arguments, is through the reuse rule where we need to check well-formedness of each argument, but we also need a reuse credit ⋄ k to guarantee that the needed space is available at evaluation time.The dmatch! rule creates such reuse credits: we can destructively match on an owned variable x to get a reuse credit ⋄ k in each branch.In each branch the x is no longer owned though (but became a reuse credit instead).For simplicity we only allow matching on a variable in the rules, but we can always rewrite a user expression match! e { . . .} into let x = e in match!x { .. } where x is a fresh owned variable.Again, we do not allow freeing at this point, so reuse credits can only be consumed by the reuse rule or the empty rule for zero-sized reuse credits (when an atom is matched).
In contrast, the borrowed match bmatch can only match on borrowed variables and such match can only be used to inspect values without creating fresh reuse credits.Even though variables in the owned environment Γ cannot be discarded (i.e.freed) or duplicated (i.e.shared), we can temporarily borrow them.In the let rule the owned environment is split in three parts Γ 1 , Γ 2 , Γ 3 .The Γ 1 and Γ 3 environments are passed to e 1 and e 2 respectively, but the Γ 2 environment is passed to e 2 as an owned enviroment, but also to e 1 as a borrowed environment!Since we consume Γ 2 in the derivation of e 2 , we can consider them borrowed in the derivation of e 1 .Note that we still need Γ 3 since Γ 2 is joined with the borrowed Δ environment and as such cannot contain any reuse credits (which can thus be included in Γ 3 instead).
The call rule is used for a function call f (y; e) where we can pass borrowed variables y, and the owned argument e.To allow for passing functions, we can also pass a top-level function as part of y.In the bapp rule we can call such functions passed as a variable.Since we can only pass them as borrowed, we also only allow borrowed calls of the form y e.To prepare for an extension with full lambda expressions, we only allow owned arguments in a call, as already apparent in the operational semantics.Finally, we can check all top-level functions for well-formedness using the ⊩ Σ rule.Any function f ∈ Σ where ⊩ Σ is considered fully in-place.
Implementing the check algorithmically is straightforward where the owned environment becomes synthesized.For let bindings we first check e 2 and use the synthesized Γ 2 , Γ 3 to check e 1 (where Γ 3 only contains reuse credits).When merging synthesized environments we check if linearity is preserved.Since Δ ∩ Γ is always empty, we can also infer whether to use a borrowed or destructive match and no such distinction is needed in the user facing syntax -we keep it in our calculus explicit though since we need the distinction in the store semantics.

Store Semantics
With the FIP calculus defined, we can now define another operational semantics.Figure 5 defines the store semantics where we evaluate using a fixed-size store S.The rules in the store semantics all adhere to an important invariant: each step of the evaluation should not allocate or deallocate memory.In this section, we establish a key result relating this store semantics with the functional operational semantics defined previously: under certain conditions, satisfied by all well-formed FIP programs, the store semantics and operational semantics coincide.
The store semantics uses an evaluation context E but this time a full evaluation goes to an unboxed tuple of variables x (instead of values v).Any constructor is bound in the store S where every element is either a binding x ↦ → 1 C k x 1 . . .x k of size k, or a reuse credit ⋄ k of size k.
Using the eval rule, we can reduce using small steps in the store semantics.The reduction rules have the form S | e −→ s S ′ | e ′ where an e in a store S reduces to e ′ with a new store S ′ .The rules are similar to the earlier operational semantics but now we always substitute with variables instead of values.There are now two more rules for evaluating constructor values which are bound in the store.The (reuse s ) transition uses a reuse credit ⋄ k in the store to apply the constructor, while (atom s ) allows atoms to be created freely.The (bmatch s ) and (dmatch s ) reductions differ, where the latter replaces the original constructor binding with a reuse credit of the same size.
Since our store semantics is destructive (in the reuse and dmatch rules), it can fail for expressions where the standard evaluation semantics would succeed.Even for expressions that are well-formed, the store semantics can fail if the initial store has internal sharing.If a shared variable is mutated in place, this would break referential transparency.Thus, we have to require that any variable is referred to just once in the store -we call this a linear store.
Definition 1. (Store Soundness and Linearity) For a store S we write dom(S) to denote the set of variables x bound in S and write rng(S) to denote the set of values C x bound in S. A store is sound if all free variables in rng(S) are bound: fv(rng(S)) ⊆ dom(S).A store is linear if it is sound, and any variable x in dom(S) occurs at most once in the free variables of rng(S).By roots(S) we denote all reuse credits of S and the set of variables in dom(S) that do not occur in the free variables of rng(S).
On linear stores mutation is safe; in a reference counted setting such a store corresponds to a heap where all values have a reference count of one.We can now state the main soundness theorem.We write [S]x to denote a substitution that recursively replaces variables by their bound value in S.
We assume that we are given stores corresponding to the owned and borrowed values, but only require that the store of the owned values is linear.We can then show that the store evaluation leaves the borrowed values unchanged and only modifies the owned values: Theorem 1. (The store semantics is sound for well-formed FIP programs) If Δ | Γ ⊢ e and given disjoint stores S 1 , S 2 with Δ ⊆ dom(S 1 ), S 1 sound, Γ = roots(S 2 ) and S 2 linear, then ) and S 3 is linear.This is a strong theorem and the proof is quite involved (see App. B of the tech.report), both due to destructive update and the ability to match on variables temporarily borrowed in the let rule.As a corollary, we can now see that any FIP expression can run on the store semantics if we use a store containing the necessary reuse credits, i.e. we give it enough space to allocate upfront: We can define the size of a store by adding the sizes of all bindings within it.Since atoms and empty reuse credits have size zero, they do not contribute to the size of the store.

Definition 2.
The size |S| of a store S is: With this definition, we can immediately see that the size of the store does not change in any reduction of the store semantics.As such, FIP programs can reduce in-place without any (de)allocation:

FBIP: Allowing Deallocation
Our basic FIP calculus is quite strict and allows neither allocation nor deallocation.We can easily extend it though to allow deallocation.Figure 6  the FIP calculus with deallocation, where the syntax is extended with drop x; e to drop an owned variable x, and free k; e to free a reuse credit of size k.
The drop rule consumes a variable x from the owned environment.Since the multiplicity of all elements in Γ is still one, this asserts that x is no longer an element of Γ.Similarly, the free rule allows discarding a reuse credit.
The operational semantics is also extended with two new reductions for dropping a bound contructor and freeing a reuse credit.With these new rules and reductions for deallocation, the soundness theorem 1 continues to hold (see App. B of the tech.report).Again, we can immediately see that the store semantics now only allows deallocation:

Stack Safe FIP
So far, our FIP calculus has allowed us to bound the heap space of the program-but what about the stack space?If we only seek to bound allocations, we could choose to leave it unbounded.In practice, however, the stack space matters: when compiling FIP programs to C we have to assume a relatively small stack and even in a garbage-collected setting growing the stack is not free.To ensure that the stack is bounded, we require two modifications to the calculus.We assume that any function f is defined as part a of mutually recursive group f (which might consist of just f or more functions).In the call rule we then require two additional conditions.Firstly, Σ contains only functions defined before or as part of the current mutually recursive group.Secondly, we constrain the mutually recursive groups f by requiring that any recursive calls within this group are tail-recursive.Formally, all function definitions f ∈ f need to be of the form f (y; x) = T [f ]: wher f ̸ ∩ fv(e 0 ).In the call rule, one can pass functions from Σ to the called function.By the first constraint, these functions can only be defined before or mutually recursive with the current definition.In the tail-context we further require that any functions passed as arguments are also not in the mutually recursive group.Thus, we can only pass functions that were defined strictly before the current definition.We call the FIP calculus extended with these requirements FIP S , in which the stack usage is always bounded.The fip keyword in our implementation checks if a function is a well-formed FIP S function.
To show this formally, we use the size of the evaluation context as a proxy for the stack size.We write |e| for the depth of an expression e and |E| for the depth of an evaluation context.We fix a signature Σ and denote by | max | the maximum depth of an expression bound in Σ.Then we have: Theorem 4. (A FIP program uses constant stack space) Let Σ be fully-in-place such that for all functions f in Σ that are mutually recursive with f , we have f (y; This then yields our stack size bound of |Σ| 2 .The additional factor of | max | describes the maximum size of the evaluation context within each function.In practice, we would not allocate a stack frame for these parts of the evaluation context.In a first-order context we would expect a stack bound of |Σ| (where any function can call functions defined before it).However, in a second-order calculus, any function can call anonymous functions defined after it which adds another factor of |Σ| (and see App.C of the tech.report for a detailed proof).

FULLY IN-PLACE TRAVERSALS OVER POLYNOMIAL DATATYPES
A classic example of a fully in-place algorithm is the in-place traversal of a binary tree [Reinking, Xie et al. 2021].Consider a binary tree with all the values at the tips: Similar to our earlier splay tree in Section 1, we can again define a zipper to help traverse the tree in-order: type tzipper<a,b> Top BinL (up : tzipper<a,b>, right: tree<a>) BinR (left : tree<b>, up : tzipper<a,b>) A tzipper<a,b> stores fragments of the input tree in-order: those subtrees we have not yet visited are stored using the BinL constructor; the subtrees we have already visited are stored in the BinR constructor.We can now map over the tree in-order without using heap-or stack space by reusing the tzipper nodes.To define the tree map function, we begin by repeatedly stepping down through the input tree to the leftmost tip.Each subtree we have not yet visited, is accumulated in a BinL constructor.Once we hit the leftmost leaf, we apply the argument function f, and work our way back up, recursively processing any unvisited subtrees: fip fun down( t : tree<a>, ^f : a -> b, ctx : tzipper<a,b> ) : tree<b> match!t Bin(l,r) -> down( l, f, BinL(ctx,r) ) // go down the left spine, remember to visit r later Tip(x) -> app( Tip(f(x)), f, ctx) // start upwards along the zipper fip fun app( t : tree<b>, ^f : a -> b, ctx : tzipper<a,b> ) : tree<b> match!ctx Top -> t BinR(l,up) -> app( Bin(l,t), f, up) // keep going up rebuilding the tree BinL(up,r) -> down( r, f, BinR(t,up) ) // go down a right side fip fun tmap( t : tree<a>, ^f : a -> b ) : tree<b> down(t,f,Top) The mutually tail-recursive app and down functions are fully in-place since each matched Bin can be paired with a BinL, each BinL with a BinR, and finally each BinR with a Bin again.The definition of tmap may seem somewhat involved, yet consider writing this function in an imperative language, without using extra stack-or heap space, mutating pointers throughout the tree.
Seeing how we can write a map over a binary tree as a FIP function, we may ask if this is possible perhaps for any simple algebraic datatype that can be expressed as a sum of products.It turns out this is indeed the case, and we show this in two steps: first we show in the next subsection a general method for rewriting programs that are tail-recursive modulo reusable contexts (TRMReC) such that they are fully in-place.Then, we show how we can generically derive a map function for any polynomial inductive datatype to which our TRMReC translation can be applied.

Tail Recursion Modulo Reusable Defunctionalized CPS Contexts
While our FIP tmap function may seem very different from a standard map over trees, it turns out that it actually corresponds to the defunctionalized CPS [Danvy 2008;Reynolds 1972]  Let us focus on the first branch, where a CPS-translation yields the following closures: Comparing with our tzipper type, we can identify Top with the identity function, BinR with the inner closure (fn(r') k(Bin(l', r'))), and BinL with the outer closure -the zipper is just the defunctionalization of the closures: The arguments r' and l' to the closures correspond to the tree t, the down function to the transformed tmap function, and app applies the defunctionalized continuation k to the new tree.As shown by Danvy [2022], this defunctionalized CPS-transformation applies widely and can transform many programs from direct-style to tail-recursive form.But are these techniques also applicable when writing FIP programs?Sobel and Friedman [1998] show that it is always possible to reuse the zipper for the result in all anamorphisms.In fact, in the above translation, it is even possible to reuse the initial tree to construct the zipper.
Using this insight, we can give a general translation to tail-recursive programs that is guaranteed fully in-place.It is inspired by the defunctionalized CPS-translation, but we have to make several small adjustments to make it work.For example, notice how the borrowed function f is not included in the zipper, but instead passed directly to app.This is crucial, since we can not store a borrowed value inside a data structure.Figure 7 shows the formal transformation, based on the general framework of tail recursion modulo context as shown recently by Leijen and Lorenzen [2023].
Starting with a function f (y; x) = e, we first define the zipper by creating a constructor H for the identity and one constructor Z i each for each evaluation context E i in e that contains a recursive call to f .Each constructor carries the free variables z i of its evaluation context and the link to the parent zipper z ′ .We transform f into f ′ (y; x, z) and provide an app(y; z, x ′ ) function, where we ensure that both receive the same borrowed variables y.The calls to f ′ receive the current zipper as an extra argument z and is defined by the translation e z below.The app function matches on the current zipper (as before) and resumes execution in the relevant (transformed) evaluation context.
The transformation follows defunctionalized CPS contexts of the TRMC framework [Leijen and Lorenzen 2023,Sec. 4.3] with the (tctx), (base), (tail), and (ectx) rules.If we encounter a tail context T, we continue the transformation in every hole using the (tctx) rule.When we encounter a term e 0 which has no recursive calls, the (base) rule inserts a call to app to apply the result of e 0 to the continuation stored in z.If we encounter a tail-call we simply leave it as is.Finally, if we find a call in an evaluation context, we turn it into a tail-call by storing the free variables of E i and the current zipper.Notice that we cannot have a tail-context nested in an evaluation context as the translation assumes that programs are in A-normal form [Flanagan et al. 1993].
Translating f to a tail-recursive f ′ function: Applying the zipper: ] z ′ Tail recursive contexts with f ̸ ∈ fv(e 0 ): Tail recursion translation for a function f with zipper z: So far this transformation describes just how to enable tail recursion on general defunctionalized CPS contexts but the result is not yet guaranteed to be FIP.To guarantee that reuse applies we need to preserve the side-condition on (ectx), which ensures that E i does not depend on borrowed variables other than y (which could not be stored in an accumulator) and that there is a space credit of the appropriate size for z and the free variables z i of E i .
If this condition is met, this transformation yields a tail-recursive, fully in-place program: Theorem 5. (The TRMReC transformation is sound.)Let f be a function with y | x ⊢ f (y; x) and f (v 1 ; v 2 ) −→ * w.If it can be transformed into f ′ , then y | x, z ⊢ f ′ (y; x, z) and y | z, x ⊢ app(y; z, x) and f ′ (v 1 ; v 2 , H) −→ * w.
See App.D of the tech.report for the proof.We also generalize this theorem to handle recursive calls with varying borrowed arguments and the case where k ≥ |z i | + 1 (as common in folds) or more than one reuse credit is available.Clearly, this translation can apply to the tmap introduced at the start of this section.We just have to check the side condition: Thus the condition is fulfilled and the translation succeeds.

Schorr-Waite Tree Traversals
Using the translation in the previous section we can now generalize the tmap function to any polynomial inductive datatype.Following the approach by van Laarhoven [2007] to generically derive functors in Haskell, we use a generic macro $map to define a (non tail-recursive) map function for any type T in a straightforward way, where we match on each constructor in T and call the $map macro on each of the fields: The $map macro dispatches on the type of its argument.if x has type T it generates a recursive call map(f ; x) if x has type , it generates a call f (x); otherwise it leaves the argument unchanged.In this definition, all recursive calls to map happen in $map, each application of which is in an evaluation context E i where the side-condition holds.Thus, we can apply the TRMReC transformation of the previous section and automatically obtain a tail-recursive fully in-place version of map.
Why does reuse work so naturally here?Part of the solution seems to be that the link to the parent is stored together with the other free variables.In contrast, McBride [2008] defines a generic fold function which is not fully in-place since it stores the defunctionalized continuations on a stack.Nevertheless, as McBride [2001] shows, the defunctionalized continuations correspond to the (generalised) derivative of a regular type T .For every constructor C with k recursive subtrees, the derivative datatype has k constructors, one for each possible continuation.Reuse then arises naturally, as the constructors of the derivative and original datatype can line up perfectly.
In the literature on imperative algorithms, these traversals that use no extra stack space except for direction hints (as encoded in the constructors of the zipper datatype), are known as Schorr-Waite traversals [Schorr and Waite 1967].Effectively, we can thus derive a Schorr-Waite traversal for any polynomial algebraic datatype and use the reuse analysis of the FIP calculus to compile it to the corresponding imperative code.This is remarkable as imperative Schorr-Waite traversals are notoriously difficult to get right or prove correct.In his famous work on separation logic, Reynolds [2002] writes: The most ambitious application of separation logic has been Yang's proof of the Schorr-Waite algorithm for marking structures that contain sharing and cycles.
Of course, our construction cannot handle cycles, but rather shows the traversal of trees (or any polynomial inductive datatype in general).The tree traversal by itself, however, is already quite complicated and has become a benchmark for verification frameworks [Loginov et al. 2006;Walker and Morrisett 2000].Furthermore, our translation also shows that the Schorr-Waite tree traversal is equivalent to a stack-based depth-first traversal (like the standard tmap).This was already shown by Yang [2007] in the context of separation logic, but that required more advanced methods than straightforward induction.

FURTHER EXAMPLES OF FULLY IN-PLACE ALGORITHMS
Many common functions used in functional programming are already FIP in their standard definition, like map or reverse.In this section we want to present some advanced examples that test the limits of what is possible.Along the way, we see several techniques that may be of general use for designing algorithms in a language with in-place reuse.These include passing reuse credits to functions, padding constructors and the partition datastructure.Full listings of the examples in this section can be found in App.A of the tech.report.

Imperative Red-Black Tree Insertion, Functionally
Can insertion into a red-black tree be FIP?The traditional implementation of red-black trees, due to Okasaki [1999], can indeed be written to use all its arguments in-place.However, it occasionally has to rebalance the result of the recursive call and thus uses stack space linear in the depth of the tree.However, we can avoid this by using a zipper (as in Section 3) and balancing while reconstructing the tree from the zipper.Surprisingly, this yields a functional implementation which is almost identical to the imperative red-black tree insertion algorithm described in the popular "Introduction to Algorithms" textbook [Cormen et al. 2022].We first define the tree and its zipper:  (c : color, l : accum<k,v>, k : k, v : v, r : tree<k,v>) NodeR(c : color, l : tree<k,v>, k : k, v : v, r : accum<k,v>) We can insert a value into the tree by recursing into the left-or right-subtree until we either find the key with an existing value or a leaf.In the latter case, we will have to allocate a new node for key and value.During the recursion, we turn the nodes of the tree into the accum zipper.At the end, we thus have a subtree with our new value and a zipper.We create a rebalanced red-black tree by calling the fixup function below on them.
But how do we write fixup?Thankfully, we can translate it almost verbatim from the rb-insert-fixup procedure in section 13.3 of Cormen et al. [2022]!We present the code here to illustrate that this function is FIP and closely follows its imperative counterpart.A detailed explanation can be found in [Cormen et al. 2022].The function distinguishes three cases (marked on the left of the function definition) that correspond to the three cases in the textbook implementation.Case one translates directly (even if we have it twice).The second case is the most complicated, rotating an inner part left, before rotating the outer part right.In case three, it is possible to stop fixup (by calling rebuild defined below) as the parent node was colored black.
fip fun is-red(^t : tree) : bool match t { Node(Red) -> True; _ -> False } fip fun left-rotate(t : tree<k,v>) : tree<k,v> match!t { Node (c,l,k,v,Node(c1,l1,k1,v1,r1)) -> Node(c1,Node(c,l,k,v,l1),k1,v1,r1); t' -> t' } The last remaining function is rebuild, which is called when balancing is finished.In the imperative implementation, this simply marks the root black and returns the root.But in our version, the root is now hidden in the zipper and we have to rebuild the tree from the zipper (without balancing) to access the root: In practice, the imperative version benefits from not having to rebuild the tree (and fixup considers some cases specifically to be able to enter rebuild earlier).Thus we can not quite achieve the same efficiency in a functional version.However, our version can also be used persistently (see Section 5) and might be easier to understand.

Sorting Lists In-Place
Is it possible to run merge sort in-place?The traditional functional implementation first turns each element into a singleton list.A singleton list is obviously sorted, so we can now pairwise merge such sorted lists.Finally, we end up with just one sorted list, which we extract: 2,3,4]] // merge pairs of sorted lists -> [1,2,3,4] // extract sorted list Here, the output takes up just as much space as the input -so a fully in-place implementation might be possible.But in the first step, many singleton lists are created for which no reuse tokens are available, so the traditional implementation is not fully in-place.However, we can make it FIP by using a tailor made datastructure, that can store the sorted sublists while taking up exactly the same amount of space as the original list.Our partition is a list partitioned into sublists, which are either Ones or Subs of at least two elements.We exploit that we have at least two elements by storing two elements in the last cell of a list2.

type partition<a>
Sub(list : list2<a>, tail : partition<a>) One(elem : a, tail : partition<a>) End type list2<a> Cons2(x : a, tail : list2<a>) Nil2(x : a, y : a) Considering the memory usage, we see that a list2 can store n elements in n − 1 cells.As a result, a partition of n elements uses just as much space as a list of n elements, while also keeping information about the partitioning.With that in hand, we can implement an in-place list mergesort: Cons(4,Cons(3,Cons(2,Cons(1,Nil)))) // start with unsorted list -> One(4,One(3,One(2,One(1,End)))) // created sorted singleton lists -> Sub(Nil2(3,4),Sub(Nil2(1,2),End)) // merge pairs of sorted lists -> Sub(Cons2(1,Cons2(2,Nil2(3,4))),End) // ... until only one list2 is left -> Cons(1,Cons(2,Cons(3,Cons(4,Nil)))) // convert back to list This datastructure is also helpful to implement a quicksort that can not run out of stack.The typical functional but in-place implementation is [Baker 1994a;Hofmann 2000b;Hudak 1986]: // sort the sublists append(lo', Cons(pivot, hi')) This code is not FIP because the stack usage is not bounded and might grow linear with the length of the input list.Of course, we could apply the defunctionalized CPS-transformation (see Section 3), but our side-condition fails: // reuse credit of size 2 available val (lo, hi) = split(pivot, xx) quicksort'(lo, Z1(pivot, hi, zipper)) // reuse credit of size 3 needed The problem here is that the zipper needs to store pivot,hi and the parent zipper, which requires more space than we have available.This is because, in a stack-safe quicksort, the zipper needs to keep track of all the pivots and hi lists that still need to be sorted.However, we can use a partition structure as the zipper where we store the pivots as singletons and the hi either not at all (if hi is empty), as a singleton (if hi is a one-element list) or else as a list2.We can pass the parent zipper into the split function, which now returns a list lo and a partition hi which includes the zipper.Then we obtain a fully in-place solution: Cons(pivot, xx) -> // reuse credit of size 2 available val (lo, hi) = split(pivot, xx, zipper) quicksort'(lo, One(pivot, hi)) // reuse credit of size 2 needed

Finger Trees
Finally, as an advanced example, we want to consider finger trees [Claessen 2020;Hinze and Paterson 2006], an efficient functional implementation of sequences.Yet, at first glance the cons function on finger trees does not appear to be FIP: only the More constructors can be reused as the other datatypes do not match up for reuse.We can fix this, however, by padding all constructors with a dummy atom Pad so that they all have three slots.
fun cons(x : a, s : seq<a>) : seq<a> match!s Empty -> Unit(x, Pad, Pad) Unit(y, _, _) -> More(One(x, Pad, Pad), Empty, One(y, Pad, Pad)) More(One(y, _, _), q, u) -> More(Two(x, y, Pad), q, u) More(Two(y, z, _), q, u) -> More(Three(x, y, z), q, u) More(Three(y, z, w), q, u) -> More(Two(x, y, Pad), cons(Pair(z, w, Pad), q), u) We have gotten rid of all deallocations since all constructors on the left of -> can be paired with one to the right.But we still have allocations in the Empty, Unit and More(Three) cases.Even worse, the cons can recurse up to O(log n)-times in the More(Three) case and require a new memory cell each time, so this function is not fip(n) or fbip(n).However, this case is very unlikely as the amortized complexity analysis of finger trees shows that cons only recurses O(1)-times on average and thus only uses a constant amount of memory.Therefore, we pair a finger tree seq<a> with a buffer which contains exactly the memory needed.Our buffer is just a padded list of size 3, which makes it available for reuse with the rest of the finger tree.type buffer { BEmpty; BCons(next : buffer, b : pad, c : pad) } We then pass the necessary reuse credits of size 3 to cons, which we either use to create a new cell in the finger tree or fill up the buffer.If we recurse into cons, we draw the necessary memory back from the buffer.Then we only need to ensure that we pass enough credits so that the buffer is never empty.Inserting two elements x,y into an empty finger tree yields More(One(x, Pad, Pad), Empty, One(y, Pad, Pad)), so it would seem that we need to pass at least two credits each.But that would mean that we need 6n space to represent n elements in a finger tree!We can do better by specializing More(One) as More0 to represent the two-element list as More0(x, Empty, One(y, Pad, Pad)).With this modification, it suffices to pass in a single reuse credit per element for a space overhead of 3n space, which is close to the 2n factor of singly-linked lists.Our cons function than takes a reuse credit unit3 and becomes: lam Fig. 8.The fip calculus extends the Perceus linear resource calculus with borrowing, reuse, and unboxed tuples.The calculus extends the syntax, rules, and functional semantics of the FBIP calculus as shown in Figure 4 and 6.The multiplicity of each variable in Γ is unconstrained.
More(Pair(y, z, _), q, u) -> (More(Triple(x, y, z), q, u), BCons(b, Pad, Pad)) More(Triple(y, z, w), q, u) -> match!b BCons(b', _, _) -> val (q', b") = cons(Pair(z, w, Pad), u3, q, b') (More(Pair(x, y, Pad), q', u), b") This function is now fully in-place.In the More(Pair) case we store an unneeded credit in the buffer.In the More(Triple) case we recurse4 and take a reuse credit from the buffer.The buffer has the invariant that, given n 1 Triple, n 2 Three and n 3 Two constructors in the finger tree, its size is n 1 + 2 * n 2 + n 3 .Since this invariant is maintained in all methods, the buffer is never empty.

REFERENCE COUNTING WITH BORROWING AND UNBOXING
In this section we formalize the connection between the FIP calculus and Perceus precise reference counting [Lorenzen and Leijen 2022;Reinking, Xie et al. 2021].Our implementation of FIP in Koka uses this approach where detection of the uniqueness of owned arguments happens dynamically at run-time.Figure 8 formalizes the fip calculus as an extension of the syntax and operational semantics of the FBIP calculus (in Figure 4 and 6).It has full lambda expressions z x. e now since arbitrary allocation is allowed.Here we write the free variables of the lambda expression explicitly as (the multiset) z.This is not needed for the functional operational semantics but as we see later, we require it for the heap based operational semantics.Moreover, we have a dup x; e and dropru x; e expressions that let us duplicate owned variables and explicitly reuse dropped variables.Finally, the alloc k; e allows for abitrary allocation of constructors by creating reuse credits ⋄ k at runtime.The (beta) evaluation rule for lambda expressions is standard, and we can see that the (dup), (dropru), and (alloc) rules have no effect in the functional operational semantics (and are only used in the heap semantics).
We rephrase the original Perceus linear resource calculus, called 1 [Reinking, Xie et al. 2021], in Figure 8 as a type system (instead of a typed translation).We call any expressions in Δ | Γ ⊢ e well-formed, and such expression always uses correct reference counting when evaluated, i.e. it never drops a value from the heap that is still needed later, or leaves garbage in the heap at the end of an evaluation.Moreover, the new type rules also extend the original rules with borrowing and unboxed tuples, and give a characterization of reuse based on reuse credits.
Just like the FIP calculus, the rules are still based on linear logic with a linear owned Γ environment, but unlike a pure linear logic it now has an escape hatch: through rules like dup we can freely duplicate "linear" variables by maintaining reference counts dynamically at runtime.As we can see, there is a suprisingly close connection between fip and the FBIP and FIP calculi where each one is a strict subset of the other: FIP ⊂ FBIP ⊂ fip .As such, the FIP calculus is exactly the subset of fip that excludes the rules that require dynamic reference counting!
The dup rule is the rule that either allows us to use a borrowed variable (x ∈ Δ) as owned, or duplicates an owned variable (x ∈ Γ).The alloc rule now allows arbitrary allocation of a constructor by adding a reuse credit ⋄ k to the owned environment.With the full lambda expressions, we also have an app rule to apply an argument to a lambda expression.Here, we split the owned environment in two parts for each subexpression.We could have been more elaborate and allow borrowing of Γ 2 in the e 1 derivation just like our earlier let rule in Figure 4. We refrain from doing that here for simplicity as we can always use let if borrowing is required.
The lam rule requires that all free variables of the lambda expression are owned (which are needed to create the initial closure).In the body, we check with the free variables (from the closure) and the passed in parameters as all owned.Borrow information is not part of a type, so only top-level functions can take borrowed arguments (using the call rule).
The match rule can match on any borrowed or owned variable.However, each branch must start by dupping the matched constructor fields (as dup(x i )).Indeed, since the match is non-destructive each field is now reachable directly but also via the original x (and thus we need to increment the reference count at runtime).For simplicity, the match rule can only match variables but we can always rewrite an expression match e { . . .} into let x = e in match x { . . .} for a fresh x when required.Since match no longer creates reuse credits, we can now create them explicitly instead using the "drop reuse" dropru rule.This drops a variable x, and immediately allows for a reuse credit ⋄ k where k is the allocated size of x.
With the new match and dropru rules we no longer require the destructive match and corresponding rule of the FIP calculus and we can always replace any destructive match: In particular, if the FIP match!expression is well-formed, we have: and thus we can also derive that the translated match is well-formed in the fip calculus: dup x i ; dropru x; e i } Furthermore, unlike the FIP or FBIP calculus, we can always elaborate a plain expression with dup, drop, free, alloc, and dropru to make it a well-formed fip expression.The heap semantics can thus always be used to evaluate an expression.In particular, we can easily adapt the Perceus algorithm [Reinking, Xie et al. 2021] to elaborate plain expressions with correct reference count instructions.
A heap H extends a store S with reference counts n ⩾ 1, and can hold closures as well: where : := C x | z x. e The ↦ −→ h relation extends the ↦ −→ s (using H for S) relation:

Heap Semantics
Figure 9 gives a heap based operational semantics for our Perceus calculus.Here we generalize the store S from the FIP calculus (Figure 5) to contain a reference count n ⩾ 1 for each binding.
The heap now contains reuse credits ⋄ k , a constructor binding x ↦ → n C x, or a closure x ↦ → n z x. e.We extend the original FBIP store semantics in Figure 5 and Figure 6 with new rules where the evaluation context and eval rule stays the same (just replacing a store S with a heap H).For the (bmatch) rule we can allow any reference count on the matched binding, while the (dmatch!) rule requires that the matched binding has a unique reference count: The extra transition rules are for general allocation and reference counting.The (alloc h ) rule allows allocating a constructor without a reuse credit, and similarly, the (lam h ) rule allocates a closure.The application rule (app h ) applies a closure.We see that it starts by dupping its environment z, and then dropping the closure itself.This way the app rule can consider the free variables to part of the owned environment.This is important in practice as it allows a function to discard variables in the environment as soon as possible and be garbage-free.Here we can also see why we need to maintain z explicitly: even though the free variables of a lambda expression are initially distinct, during evaluation some may be substituted by the same variable and we need to dup such variable multiple times when applying to maintain proper reference counts.
The other rules all deal with reference counting.The (dup h ) transition increments a reference count, while (drop h ) decrements a reference count n > 1.The (dlam h ) rule drops a closure when the reference count is 1; this never creates a reuse credit though as the size of a closure cannot be accounted for statically.Note that also the environment is dropped, just like the (dconru h ) rule and the (dcon s ) rule in Figure 6.The (dconru h ) rule creates a reuse credit if the reference count is unique, while the (dropru h ) rule applies for a non-unique reference count with n > 1; this rule decrements the reference count but also allocates a fresh reuse credit (as required by rule dropru)this is where the runtime falls back to copying if the cell was not unique.
Of course, with our match!translation, we no longer require the (dmatch h ) rule and can derive the rule from the translated expression when we assume the matched binding is unique: { (drop h ), (A) } However, the translation is more general, and can also proceed if the matched binding is not unique but shared -in that case the final steps use (dropru h ) and become: where the binding for x stays alive but we still allocate a fresh reuse credit.This is exactly where we can generate the code shown in Section 1.4 where we essentially inline and specialize the definition of dropru and check upfront if the matched binding is unique or not.

Soundness of the Heap Semantics
First we generalize the properties of the store semantics to reference counted heaps: Definition 3. (Heap Soundness and Linearity) For a heap H we write dom(H) to denote the set of variables x bound in H and write rng(H) to denote the set of values bound in H. Two heaps H 1 , H 2 are compatible if they map equal names x ↦ → n v ∈ H 1 , x ↦ → m w ∈ H 2 , to equal values v = w.A heap is sound if all free variables in rng(H) are bound: fv(rng(H)) ⊆ dom(H).A heap is linear if it is sound, and any variable x ↦ → n v in dom(H) occurs at most n times in the free variables of rng(H).By roots(H) we denote the multi-set of reuse credits of H and variables x ↦ → n v of dom(H), which contains any variable n − m times, if it occurs m times in the free variables of rng(H).
The definition of linearity ensures that mutation is safe if the reference count is one.Exactly as in the store semantics, we write [H]x to denote a substitution that recursively replaces variables by their bound value in H.We assume that we are given heaps corresponding to the owned and borrowed values, but only require that the heap of the owned values is linear and do not assume that the heaps have a disjoint domain.Instead we use the join operator ⊗ to define a joined heap of the borrowed and owned part, even if they have common elements with the same name and value.We join common elements by summing their reference counts.Since this eliminates one reference to their children, we decrease their reference count accordingly: The rootset of H 1 ⊗ H 2 is exactly the disjoint union of the roots of H 1 and H 2 .When applied to linear heaps, ⊗ can be viewed as a partial commutative monoid [Jensen and Birkedal 2012] where every result is valid.However, we prefer a categorical view on linear heaps where a morphism H 1 → H 2 exists if dom(H 1 ) ⊆ dom(H 2 ) and roots(H 1 ) ⊆ roots(H 2 ).This forms a monoidal category with the join operator as tensor product.We also define a heap subtraction operation [H 1 , H 2 ] if H 1 → H 2 similar to an internal hom (and which corresponds to the magic wand of separation logic).We can This is again a strong theorem as it shows that the dynamic reference count is always correct and no variables will be discarded too early, while also having no garbage at the end of an evaluation (x = roots(H 3 )).Our proof (in App.E of the tech.report) is novel and may be well suited to possible mechanized formalization.As a corollary, any closed fip expression can evaluate starting from an empty heap: While it is outside the scope of this paper, we could also modify the let rule of our calculus with a (★)-condition to characterize garbage-free and frame-limited derivations [Lorenzen and Leijen 2022].However, borrowing makes it harder to achieve these proporties and further study is needed.In particular, a garbage-free derivation can only exist if all borrowed arguments are still used later on, and similarly, a frame-limited derivation can only exist if all borrowed arguments are either used later on or have constant size.

BENCHMARKS
Figure 10 shows benchmark results of examples from this paper, relative to the fip variant.The results are the average over 5 runs on an AMD7950X on Ubuntu 22.04.2 with Koka v2.4.2.Each benchmark uses 100 iterations over N (=100000) element structures.We test each benchmark in following variants: • fip: the algorithm implemented as FIP in Koka.
• std: the standard functional implementation in Koka without reuse optimization.For general GC'd languages without precise reference counts, the relative performance between std and fip can be more indicative of potential performance gains.• std-reuse: just as std but with reuse optimization enabled.This is standard Koka which always applies dynamic reuse.• c/c++: an standard in-place updating implementation in C or C++.Since our benchmarks are allocation heavy, we also include a variant when linked with the mimalloc [Leijen et al. 2019] memory allocator since that is usually faster than the standard C/C++ one.The benchmarks consist of: • rbtree: performs N balanced red-black tree insertions and folds the tree to compute the sum of the elements.The fip variant is adapted from Lorenzen and Leijen [2022].while std uses Okasaki style insertion [Okasaki 1999].The C++ versions use the standard in-place updating STL std::map which is implemented using red-black trees internally.• ftree: builds a finger tree of size N and performs 3*N uncons/snoc operations.The fip variant is shown in Section 4.3 where the std variant uses a implementation described by Claessen [2020].• msort, qsort: sorts an N element random list.The fip variants use the implementations shown in Section 4.2 while std uses the standard recursive functional implementations (derived from the Haskell library implementations).• tmap: maps an increment function over a shared (non-unique) N element tree returning a fresh tree which is then folded to compute the sum of the elements.The fip variant uses the implementation of Section 3 while std and c/c++ use the standard (recursive) way to map over a tree.It is hard to draw firm conclusions as the results are dependent on our particular implementation, but we make some general observations: • The performance of fip versus std is generally much better showing that in-place updating is indeed generally faster than allocation.• Even without a fip annotation, the std-reuse variant shows that the reuse optimization in Koka can be very effective -but of course, unlike fip, reuse here is not guaranteed.• In an absolute sense, the performance seems very good where in the rbtree benchmark the fip variant rivals the performance of the in-place updating std::map implementation in C++.• The tmap benchmark is interesting as fip is generally slower here.The fip variant uses a zipper to visit the tree and uses constant stack space (unlike the others which use stack space linear in the depth of the tree).Reversing the pointers Schorr-Waite style can be slower though than recursing with the stack.Also, the tree that is mapped is shared and thus even the fip function cannot reuse the original tree.Nevertheless, the fip variant will still reuse the zipper it uses to traverse the tree.This is also shows why std and std-reuse are performing similarly, since there is no reuse possible for the standard algorithms.That std-reuse is only about 1% slower shows that the dynamic reuse check has negligible impact on performance.

RELATED WORK
The FIP calculus is most closely related to Hofmann's type system for in-place update [Hofmann 2000b[Hofmann 2000a]].Just like Hofmann, we add reuse credits to a linear enviroment, model a destructive match, and collect top-level functions in the signature.However, Hofmann's unboxed tuples can escape into allocations, which makes it necessary to monomorphise the program (and track types to be able to do so).In contrast, our calculus does not need monomorphisation or know about types at all.Hofmann also uses a uniform size for all constructors of a datatype (including atoms such as Nil), but unboxes the first layer of each datatype.Many FIP programs can also be checked by that scheme, but it seems to increase memory usage substantially: in our calculus, a constructor with n fields filled with atoms takes n space, while it would take n * n space in Hofmann's calculus.While we only model unique and borrowed values in our FIP calculus, shared values are another interesting variant.Unlike borrowed values, shared values can be stored in datatypes.But unlike unique values, they can be used multiple times (and it is not possible to use a destructive match! on them).Shared values correspond to the usage aspect 2 introduced by Aspinall and Hofmann [2002] and Aspinall et al. [2008].We believe that it may be worthwhile to extend the FIP calculus with shared values to allow it to check a wider range of programs.However, shared values can only be supported in a garbage-collected setting, while our FIP programs can also easily be compiled to C.
Even without in-place reuse, FIP programs still use constant space, which allows us to reason about their space usage.Space credits [Hofmann 2003;Hofmann and Jost 2003] generalize reuse credits with the axiom ⋄ n 1 , ⋄ n 2 = ⋄ n 1 + n 2 .This axiom does not hold for reuse credits (which can not be combined unless they are in adjacent slots in the heap), but it does hold if we view ⋄ n just as the promise that n words of space is available.Based on space credits, an automated analysis [Hoffmann et al. 2011;Hofmann and Jost 2006] or manual proofs in separation logic [Madiot and Pottier 2022;Moine et al. 2023] can be used to reason about heap space.However, these systems usually do not model atoms or unboxing, which we identified as crucial for real-world FIP programs.
Reuse analysis can be implemented either statically using uniqueness types [Barendsen and Smetsers 1995] or flow analysis, or dynamically using reference counts.Compile-time Garbage Collection [Bruynooghe 1986] is the most developed flow based analysis, which tracks the flows of unique values through the program to identify reuse opportunities statically.Reuse with reference counts has long been applied to arrays, where the update function can be designed to mutate the array in-place if the reference count is one [Hudak and Bloss 1985;Scholz 1994].Similarly, Stoye et al. [1984] used reuse with one-bit reference counts for combinator reduction, where reuse is encoded in the hand-written combinators.However, in this work, we rely on a reuse analysis that can statically discover reuse opportunities between otherwise unconnected memory cells, which was pioneered by OPAL [Didrich et al. 1994;Schulte 1994;Schulte and Grieskamp 1992].Their analysis was refined by Ullrich and de Moura [2019], who showed that such an analysis can be implemented efficiently without duplicating code.Reinking, Xie et al. [2021] present the linear resource calculus as a formalization of precise reference counting and give a garbage free algorithm.Lorenzen and Leijen [2022] refine this calculus further with a declarative star condition that can guarantee either garbage-free or frame-limited space usage which ensures extra space usage due to reuse is bounded.
Borrowing is a long-standing technique to reduce the overhead of reference counting [Baker 1994b; Lemaitre et al. 1986].If a lifetime analysis can prove that the lifetime of one reference dominates that of another, we can avoid counting the second reference -in our calculus, this is expressed by borrowing Γ 2 in let (first introduced by [Reinking, Xie et al. 2021]).Ullrich and de Moura [2019] introduced borrowed parameters on top-level functions which is especially important for recursive functional code.They also showed an inference for borrowing annotations, but, as pointed out by Lorenzen and Leijen [2022], this can increase memory usage by an unbounded amount.

Fig. 1 .
Fig. 1.Looking up the number 3 in a splay tree (A), creating the zipper context to the node containing 3 (B), and splaying the node up to the top (C).

Fig. 4 .
Fig. 4. Well-formed FIP expressions, where the multiplicity of each variable in Γ is 1.

Theorem 3 .
(A FBIP program can only deallocate.)For any S | e ↦ −→ * s S ′ | e ′ with the deallocation rules, we have |S| ⩾ |S ′ |.In our implementation the fbip keyword checks if the function is well-formed in the FBIP calculus.
In Section 6 we show a short performance evaluation of our particular implementation.It shows that fip algorithms are competitive to standard functional algorithms in Koka.This is somewhat expected since the standard algorithms can already avoid many allocations through the existing dynamic reuse as part of Perceus reference counting.If such dynamic reuse is disabled for the standard algorithms, fip functions tend to outperform with a larger margin.