This Is Driving Me Loopy: Efficient Loops in Arrowized Functional Reactive Programs

Arrowized Functional Reactive Programming (AFRP) is one approach to writing reactive programs declaratively, based on the arrows abstraction in Haskell. While AFRP elegantly expresses the relationships between inputs and outputs of a reactive system, na'ive implementations suffer from poor performance. In particular, the loop combinator depends on lazy semantics: this inflicts the overheads of lazy evaluation and simultaneously prevents existing optimisation techniques from being applied to it. We present a novel program transformation which utilises the Arrow and ArrowLoop laws to transform typical uses of loop into restricted forms that have an execution order that is known at compile-time and therefore can be executed strictly. We evaluate the performance gained from our transformations and prove that the transformations are correct.


Introduction
Arrowised Functional Reactive Programming (AFRP) [8] is a paradigm for writing reactive programs [24], which was popularised by the Haskell library Yampa.In AFRP, reactive programs are built using signal functions: functions which arr  pre 0 produce streams of outputs from streams of inputs.Execution of a program is broken up into time steps, in each of which signal functions get an input and produce a corresponding output.This means that the program effectively reacts to its inputs over time by producing outputs at the same rate.Signal functions can be combined with the arrow combinators [9] to form larger programs.
As an example of AFRP in action, consider a reactive summing program which, at every time step, retrieves an input and adds it to a running total, which is also the output.This is implemented in Fig. 1 as a Yampa program and visualised as a box-and-wire diagram.
The overall program, which is itself a signal function, is built up from smaller signal functions.We have arr sum, which sums two inputs and returns the sum as both outputs; and pre 0, which returns 0 at the first time step and then the previous input at future time steps.Their inputs and outputs are routed as shown in the diagram using the * * * and ≫ combinators.We finally enclose all this in loop, which connects the second output of its internal signal function to its second input.
loop seems to introduce a dependency cycle here, in which the second input of arr sum needs the second output of arr sum to be computed.Fortunately, pre can generate its output at a given time step only using its previous input, meaning that we get the output of pre before needing to compute its input.Therefore the above program works by retrieving the previous output of the program "stored" in pre, applying arr sum to that and the current input of the program to compute the new cumulative total, returning that as the output of the program and "storing" it in the pre for use at the next time step.
The above example shows how stateful programs are written in AFRP: loop is used in tandem with pre in order to use outputs from previous time steps as inputs at a subsequent time step.However, within this pattern lies a performance issue that has not yet been addressed by existing literature: in order to implement pre as shown, loop's definition depends on lazy semantics.In pseudo-Haskell, we can define a simplified semantics of loop as follows, where evalSF takes a term of the form loop  ,  is some signal function, and  is an input at the current time step.The definition is recursive and we can see that the second input  depends on the result of the recursive call: When evaluating loop  , the order in which signal functions in  are evaluated to determine  is not always the same or obvious as the signal functions are not always run from left to right: in the example in Fig. 1 we have to first run pre 0 to get its output, which corresponds to the  that is the second input to arr sum.The execution order is determined at runtime by lazily evaluating  , which means that evaluating loop  suffers from the overheads needed for lazy evaluation.This lazy evaluation does not happen just once at the start of the program, however.When Yampa evaluates a signal function for a single input, it also returns a possibly different signal function to run at the next time step.This is necessary for signal functions like pre v which need access to previous state: when we run pre v with input , we get the output  and a new signal function pre a, which embeds the new state within the next signal function [17].This approach to reactive programs that are essentially rewriting themselves at runtime might require us to re-evaluate the order of operations in  that needs to be performed at each time step.
Another issue with dependency resolution through lazy evaluation is that some well-typed loop  contain dependency cycles so cannot be run.Consider the program in Fig. 1 but with the pre term omitted: this contains a dependency cycle as arr sum needs its inputs to compute its outputs, but the second output of arr sum depends on its own second input.The presence of a dependency cycle is only noticed when trying to evaluate the loop, causing a runtime error.
We address the problems caused by evaluating loop lazily by introducing a program transformation which transforms loop  with no dependency cycles into alternative forms with known execution order, which can be executed strictly.More concretely, our contributions are as follows: • We provide a program transformation for a subset of Yampa (Section 3) which transforms loop  without dependency cycles to use variants of loop with known execution orders.We accomplish this by applying the arrow laws [9] as well as novel rules to rewrite loop  so that uses of pre happen before everything else.• We prove that our transformation works on programs without dependency cycles in them, and that this transformation does not affect program meaning (Section 4).• We present a Haskell implementation, Severn, of this transformation (Section 5), which runs the resulting program strictly.We compare the performance of programs written in Severn to their equivalent Yampa programs (Section 6).

Yampa and Arrow Laws
For the benefit of readers unfamiliar with arrows and Yampa, we briefly introduce the arrow constructors and laws that we use throughout this paper.
Our work builds on Yampa, an AFRP implementation built around signal functions (SFs).We consider a minimal set of arrow operators 1 which are enough to define many useful Yampa programs: this minimal set is presented in Fig. 2. We also make use of first f and second f throughout as synonyms for  * * * id and id * * *  respectively.
We briefly describe each operator in turn: arr f allows pure functions to be turned into SFs where  transforms an input into an output.SFs can then be composed sequentially with ≫ and in parallel with * * * .These operators give us the ability to run SFs consisting of pure functions and compose them into larger programs.
pre v2 introduces the effect of state by mirroring its input stream as output, delayed by one timestep.For example, the inputs 1, 2, 3 passed to pre v give us the outputs , 1, 2.
Finally, loop  provides a way to introduce feedback into our arrow programs by directly connecting the second output of  to its own second input.This is where Yampa requires lazy evaluation, as we cannot run  strictly without its second output.In this work, we focus on its interactions with the pre operator: since pre can generate an output at a given timestep without its corresponding input, it can be used to generate the second output of  without needing the second input.We saw this in Fig. 1, where the second input of the loop is the second output of loop from the previous time step due to pre.
These operators are enough to define common Yampa programs.We discuss additional operators, such as switch, in Section 7.3.
Since SFs are instances of Haskell's Arrow and ArrowLoop type classes, SFs must obey their laws.These laws define required equivalences between programs, which we use throughout our work to prove that each step of our program transformation preserves program meaning.Throughout this paper we introduce necessary laws as they are needed, beginning with two keys ones below.Interested readers can consult Hughes [9,Section 7] for the full set.

Commutative Causal Arrows (CCAs)
Liu et.al define CCAs, which extend arrows with two additional laws that hold for Yampa [14,15]: With these additional laws, whole AFRP programs can be transformed into one of two forms: a single arr, or a loop of the form LoopD f i = loop ( ≫ second (pre i)) [30].This transformation is performed using the ArrowLoop laws, which merge composed and nested loops together into a single loop using routing functions.
At first glance, this seems to solve the problem we are addressing in this paper: LoopD f i can be executed strictly by first executing pre i and then executing  .However, the routing functions used by the transformation still rely on lazy evaluation, and thus the LoopD created by the CCA transformation cannot be executed strictly.

Transforming loop Into Strict Variants
We claim that the lazy semantics required to evaluate loop  is a cause of performance issues due to the involved overheads.Our goal is therefore to determine the execution order of  within loop  at compile-time.
We achieve this by finding decoupled [27] parts of  : those which can produce outputs at time step  without any of their inputs at , like pre.We define restricted forms of loop where those decoupled parts are separated out from the rest of  , with the aim of running those parts first.An example of this, and the main restricted form of loop we consider, is Yallop and Liu's LoopD  ′  construct 3 [30].We define this as follows alongside an interpreter runLoopD which maps LoopD  ′  to a corresponding signal function.A loop  can be expressed as LoopD only if  contains a pre just before its second output.Since pre can produce an output at a given time step without the input at that time step, we 3 This is called loopPre in Yampa.know the execution order of LoopD f ′ i: evaluate the final pre i to produce the second output, use it as second input, and run the rest of  ′ with both inputs.No lazy evaluation is required.
The question is then how to transform arbitrary loop  into equivalents that can be expressed as LoopD  ′ .Informally, given a loop  , our aim is to move a single pre within  to appear just before  's second output while preserving the semantics of that loop.
In the rest of this section we present the necessary transformation for LoopD and other restricted forms of loop with known execution orders.We do this in four parts, as follows: 1.In Section 3.1 we apply ArrowLoop's sliding law to transform some loop f into LoopD  ′  by moving pre  within  to be just before the second output.We also discuss a variety of transformations that may need to be applied in order to allow sliding, and introduce CCA composition form to make sliding easier to apply.2. Sometimes there are multiple looped values in a loop: a transformed loop f will be of the form LoopD  ′ (, ).For this we slide a single pre (, ) to be before the second output of  .However, pre (, ) can be expressed in a few different ways, such as pre  * * * pre .To make sure that we are able to work with these equivalent statements of pre (, ), we use CCA's product rule and a new split rule which finds nearby pre to combine them into a single pre (, ).(Section 3.2) 3.There are some loops where the pre is "trapped" between two non-pre arrows that we cannot slide and which therefore cannot be transformed by the above two steps.Fortunately, such loops also have a trivial execution order, for which we define another restricted form of loop called LoopM in Section 3.3.4. We then look at the case where multiple loops are present in a program, e.g.loop (loop  ), in Section 3.4.The inner loop is transformed using the above transformations, and then LoopD and LoopM are modified to allow nesting. 5. Finally, we combine these transformations into an algorithm in Section 3.5.We justify that these steps cover all possible cases of a loop with no dependency cycles in Section 4.

Sliding
We start by looking at how to move a single pre to the rightmost position of a loop body.Examples of loops which can be transformed into LoopD in this way can be seen on the left sides of Fig. 3 and Fig. 4, along with their transformed versions on the corresponding right sides.
In these cases, we can employ sliding from the ArrowLoop laws, which allows parts of our program to be moved around inside the loop.Sliding is defined as follows: If we have loop f , we can move a signal function  which appears just before the second output of  to be just after the second input of  , and vice versa.The equivalence holds because  still receives inputs from and gives outputs to the same signal functions as before.Figure 3 shows this at work with ℎ, which is connected to the same signal functions before and after sliding.Note that, by design, this law does not permit  to be composed with an effectful computation.Since  * * *  is enclosed in arr, it is a signal function consisting of a pure function.In general, this is important, because changing the order of operations might lead to different results in the presence of implicit, computational effects.
However, we focus on a minimal subset of the arrow operators, which does not allow for effectful signal functions.Therefore we can generalise the sliding rule slightly: We discuss the consequences of this decision in AFRP systems where signal functions can be effectful in Section 7.1.
We refer to transforming a program of the form on the left to the form on the right as right sliding, since we move the body of the loop to the right, causing  to fall off the end and reappear on the left side.We call the reverse direction left sliding.Sliding gives us the rules needed to justify the transformations in Fig. 3 and Fig. 4. The first example is solved through right sliding, moving ℎ from the right of the loop to the left.In the second example we can left slide twice, moving  and then pre  from the left of the loop to the right.
3.1.1Distributivity of Composition.This presentation of sliding may not be applicable if programs are written in subtly different, but equivalent ways.An equivalent way to write Fig. 3 is loop ( ≫ (id * * * ( ≫ pre  ≫ ℎ))).However, applying right sliding here moves all of  ≫ pre  ≫ ℎ over to the left side, preventing us from getting pre  into the desired position.We solve this by noting that ≫ distributes over * * * for CCAs, proved as follows: With this distributive law, we can rewrite id * * * ( ≫ pre  ≫ ℎ) to (id * * * ) ≫ (id * * * pre ) ≫ (id * * * ℎ).This is the same as the original definition of Fig. 3, allowing us to apply right sliding to get the pre  into the correct position.
3.1.2Sliding next to non-id.Another obstacle that can arise is when we have a term in parallel with the one we are trying to slide, as in the first diagram in Fig. 5.We are unable to apply left sliding here since it requires id * * *  at the start of the loop, but we have  * * *  instead of id * * * .We require a more general pair of program equivalences: We prove the first of these below.The second is proved symmetrically. loop With these, we can apply our new left sliding rule to reach loop (( * * * ) ≫ ( * * * pre ) ≫  ≫ ( * * * )) as shown in the second diagram in Fig. 5, which makes progress towards getting pre  into the expected position.Unfortunately, we are now stuck -if we keep applying left sliding, all we do is keep sliding id, which does not help us move the pre .

Pushing non-id terms through id.
To avoid the problem of having id block non-id terms which we want to slide, we need rules to remove the offending id.We note that since id terms do not change program meaning, we can move them around and remove them as is needed.We therefore define some new rules to "push" a non-id term to take the place of an id, thus allowing it to be used by other rules.Our aim in this section is to take programs such as LFill- * * * We start by defining the left fill operation, which takes a composition of two terms and fills in any gaps (id) in the left term with parts of the right term.This is defined using the three rules shown in Fig. 6.
LFill-Id says that if we have an id as the left term, replace it with the right term in order to fill the gap within the left term.This does not change the meaning of the program since  ≫  =  =  ≫ .LFill-NonId says that if there is no id to fill, then do nothing.
LFill- * * * considers parallel compositions.This transforms the input to ( ≫ ) * * * ( ≫ ) via our distributive law, uses the subordinate calls to left fill to transform  ≫  and  ≫  individually, and then uses the distributive law again to combine the results of those subordinate calls to the result of the main one.
Rather than using left fill just once when we have an id to slide, we need to apply it multiple times.This is needed to make sure that terms are propagated through multiple id if needed: for example, ( * * *  ) ≫ ( * * * ) ≫ (ℎ * * * ) requires a call to left fill on the last two terms and then on the first two terms if we want the ℎ to be moved to the front of the program.We therefore define left push to be repeated application of left fill: given composition  1 ≫  2 ≫ ... ≫   , we first left fill  −1 and   , then  −2 and  −1 and so on until we left fill  1 and  2 .
We can now use this to finish transforming Fig. 5: With this set of new rules, we are now able to transform loops which previously could not have left sliding applied to them.We also utilise equivalent right fill and right push laws to move non-id terms to the right for transforming loops to have right sliding applied to them.We omit these definitions as they are symmetric to those for left fill and left push.
3.1.4CCA Composition Form.The issue of needing our program to be of a certain form in order to apply a rule is not unique to sliding.We also define CCA composition form, which forces loops to have ≫ at the top level only, in order to restrict the shape that a loop can take and thus make it easier to apply our rules.
We require that pre cannot contain a tuple value.This is because when we apply rules such as sliding, we need to have the * * * to know that we can split the term in two: for example, if we had loop (pre (, ) ≫  ), we could not apply left sliding.Any pre (, ) can instead be written as pre  * * * pre  by CCA's product rule.
We now formally state the definition of CCA composition form.
Definition 3.1.An AFRP program is in CCA composition form if it can be parsed by the following grammar, where L is the start symbol, F is any pure function, and V is any non-tuple value.For the rest of this paper, we present rules assuming that our programs are in CCA composition form.This does not affect the expressiveness of our system as it is possible to transform any existing CCA into this form through application of distributive law as we did back in Section 3.1.1and application of CCA product rule to avoid any pre (, ).

Combining Smaller pre into Larger Ones
We have shown that using a combination of our new generalised sliding rule, distributive law and push, we can move a single pre within loop f to be before the second of output of  .However, we sometimes work with multiple pre rather than a single one, e.g. in loop ( ≫ second (pre  * * * pre )).This loop still has a clear execution order: run the two pre to generate the second outputs of the loop body, which means we get the second inputs and can run .We cannot currently transform it into LoopD however, since LoopD relies on there being a single pre.
To represent this as a LoopD, we need to merge the two occurrences of pre together into single use of pre using the previously discussed CCA product rule: pre  * * * pre  = pre (, ).This means that whenever we encounter two uses of pre in parallel, we can merge them and treat them as one.With this, we can transform our example to loop ( ≫ second (pre (, ))), which is equivalent to LoopD  (, ).

3.2.1
The split Rule.The CCA product rule lets us combine pre which are in parallel, but the uses of pre we need to merge may not always be in parallel.Figure 7 shows an example with two uses of pre which cannot be solely solved by the product rule and sliding: the two halves of the second output each have a pre on them, but those uses of pre are not parallel to each other and the product rule therefore cannot be applied.
We therefore need a way of rearranging expressions such as  ≫ ( * * * pre ) ≫ (pre  * * * ℎ) to correctly group uses of pre together and merge them with the product rule.To do this, we define split, which attempts to split an input  into (  ,   ,   ) where  =   ≫   ≫   and   is a decoupled term containing no ≫.We define this operation through a collection of rules, shown in Fig. 8.
Split-Pre dictates that if we have a pre  at the end of the composition, then we already have a trivial split with   = pre .Split- * * * -R specifies that if we have two parallel paths given by some  * * * ℎ = ( 1 * * * ℎ 1 ) ≫ • • • ≫ (  * * * ℎ  ) and we are able to split the two paths  and ℎ, then we can split the two paths in parallel by aligning the   and ℎ  we get from the subordinate calls to split.
In any other case, we have not found a pre nor a  * * *  where we can find a pre in each of  and  and thus combine them with CCA's product rule.Split-NonPre dictates that in this case, we can skip over this term as it will not lead to us finding the required pre.This covers arr f and id.
We present these rules in use with a derivation that correctly splits our earlier example of  ≫ ( * * * pre ) ≫ (pre  * * * ℎ) in Fig. 9. Running split is easy for a given  =  1 ≫ ... ≫   : find the rule matching  ′ ≫   , or in the case of   =  * * *  try each of the Split- * * * rules in turn.We prove that running split always produces a valid split if it exists in Section 4.

3.2.2
Using split to Find LoopD.The split rule now lets us find a pre that can be slid into position.Given loop ( ≫ second  ), we apply split to  to transform it into loop ( ≫ second (  ≫ pre  ≫   )).We can then right slide   to get LoopD (   ≫  ≫    ) .
This only looks at the right side of the loop however: we also need to slide anything from the left side over to the right side so that it is considered by split.This is necessary for programs in which the pre we are looking for is on the opposite side, such as loop (second (pre ) ≫ ).We therefore left slide as much as we can before applying split.

LoopM
While the sliding and split rules are enough to transform most loop f into LoopD  ′ , there is one class of counterexamples for which this is not enough.In Fig. 10a, we are unable to slide a pre into position because neither  nor  can be slid, meaning that we cannot transform the loop into LoopD.We need to be able to transform this example however, as it can be executed by getting the outputs from the pre, then running , and finally running  .
To transform loop f where  is split in two halves by a pre, we introduce a new restricted form of loop called LoopM.This is defined as follows alongside an interpreter runLoopM which maps LoopM    to a corresponding signal function.

Multiple Loops
We can now transform a single loop f into its equivalent LoopD or LoopM.We now consider programs with multiple composed loops or nested loops, with the aim of being able to transform programs consisting of any number of loops.
For composed loops we note that the transformation of a given loop f relies on nothing except  .This means that we can transform compositions of loops such as loop  ≫ loop  by transforming each individual loop, giving us e.g.something of the form LoopD  ′  ≫   1   2 .
Issues arise however when we introduce nested loops such as loop (loop  ).The inner loop  could contain the pre that is needed for the outer loop to be transformed.An example of this is presented in Fig. 11, where the pre  in the inner loop is needed by the outer loop.We therefore need a way to extract such a pre from an inner loop  .
In the rest of the section, we look at extracting pre from nested loops in two cases: one where the inner loop can be transformed into LoopD, and another when it can be transformed into LoopM.Figure 11.Example where we need to extract a pre from a LoopD, and a version with the pre extracted.

3.4.1
Extracting pre from LoopD.When the inner loop is a LoopD, we need to get the pre out of that inner loop if we want to use it outside of that loop.We have already seen this in Fig. 11.
To achieve this, we turn again to the ArrowLoop laws, which given a loop f provide a way to extract unused pre from  .The two laws we need are stated below. loop Right tightening takes a loop with ( * * * ) as its last term and moves  outside of the loop.This preserves program meaning as  still receives the same inputs and produces the same outputs; it does not need to be within the loop to still be connected to the first output of .This is shown in use in Fig. 11, where we use right tightening to move the pre  outside of the loop while keeping it connected to the first output of , meaning that we can use it to transform the outer loop.Left tightening is similar, but works with the front of the loop instead.
In most cases there is only one direction in which part of a loop can be tightened.Consider an arbitrary loop (( 1 * * *  2 ) ≫  3 ≫ ( 4 * * *  5 )) where any   can be id.If  3 is not id, then only  1 can be moved outside of the loop via left tightening as  3 "blocks"  4 from being moved this way, and only  4 can be moved outside of the loop via right tightening through a similar argument.In these cases, we apply left and right tightening to move  1 and  4 out of the loop once we have transformed it into LoopD.
In the case where  3 = , we end up with loop (( 1 * * *  2 ) ≫ ( 4 * * *  5 )). 1 and  4 could be tightened out of the loop in either direction, but we do not know which way we need to tighten them to e.g.get the pre needed for an outer loop.The trick is to consider the more general case of loop ( * * * ), in which we only need to run  to get the output of the loop:  will never be run as its result is never needed.Thus we avoid having to decide which way to tighten by removing the loop,  2 and  5 to get  1 ≫  4 .
We can extract a pre from an inner LoopD by therefore either applying left and right tightening to move as much from inside the LoopD out as possible, or remove the LoopD entirely using loop ( * * * ).In either case, this allows pre that were part of the inner loop to be used when transforming the outer loop.

3.4.2
Extracting pre from LoopM.When we have an inner LoopM, things are simpler than for LoopD.The trick is that LoopM itself is decoupled like pre, as it can produce all of its outputs without any of its inputs by using its internal pre.
To run a program like loop ( ≫ second (LoopM  (, ) ℎ)), we can first get the output of the inner LoopM by running its internal pre, then run ℎ to get the second output of the outer loop, and then finally run  and .
Since our aim with these restricted forms of loop is to fix the location of a decoupled part so that we know the execution order of the loop at compile time, we allow LoopM to take the place of a pre when transforming our loops.This is implemented via a minor update to the split operation: With this we can now use LoopM wherever we were aiming for a pre, allowing our earlier example to be expressed as LoopD  (  (pre (, )) ℎ).Note that the definitions of LoopD and LoopM have changed slightly since they previously relied on specifically containing a , but can now contain arbitrary SFs consisting of pre, LoopM and * * * .The updated definitions can be seen in the code for Decoupled in Section 5.1.

Transformation Algorithm
We now combine the rules we have described for transforming different cases of loops into an algorithm that we can run.The overall process that we present inspects loops from innermost to outermost, transforming each one to a LoopD or LoopM until every loop is transformed.After transforming the program into CCA composition form (Section 3.1.4),perform the following for each loop from the innermost to the outermost:

Completeness
We now prove that our transformation works on all loop without dependency cycles while preserving program meaning.
In Section 3, we proved that program meaning is preserved by each individual operation through existing laws for arrows and CCAs.Since our transformation solely uses these operations, it preserves program meaning.We now prove that it is complete for any loop with no dependency cycles.We first need to formalise what a dependency cycle is, in order to reason about them.Some output  depends on an input  if there is some path through the program from  to : meaning that in order to get , we need to know .Decoupled functions like pre break this dependency as they are able to produce an output at a given time step without the input at that time step; the dependencies we therefore consider are represented by paths which do not go through a decoupled signal function.We define this as a direct dependency: Definition 4.1.A direct dependency exists from input  to output  of some signal function  if there is a path from  to  through  that does not flow through a pre or LoopM (equivalently, a decoupled signal function).
A dependency cycle arises if there is a direct dependency from  and  and one from  to .In loop  , dependency cycles are created via loop  's backedge from each part of its second output to each part of its second input.We define a direct dependency cycle within a loop as follows.
Definition 4.2.A direct dependency cycle within loop f exists if there is a direct dependency through  from a component of the second input of  to the same component of the second output of  .
Let us build an intuition for how paths, and thus direct dependencies, are built from each of the arrow constructors: • arr f : Since we know nothing about  , we assume that there is a path from every input to every output, and therefore every output of arr  directly depends on every input.We show later that this also holds for LoopD generated by the transformation.• pre v: If we have a pre, then there is a path from every input to every output which trivially goes through a decoupled signal function, so there is no direct dependency between pre's inputs and outputs.The same holds for LoopM.•  ≫ : This sequentially composes the paths through  and : if there is a path through  from  to , and a path through  from  to , then there is a path through  ≫  from  to . •  * * * : This composes two paths in parallel that do not interact.Therefore it will have a pre on every path if  and  each have a pre on every path between their inputs and outputs.We now present some auxiliary lemmas used within our main proof.We first define three forms that a loop can take which allow us to perform case analysis in our other proofs: □ We now prove that our split operation (Section 3.2.1)will transform every  with no direct dependencies between its inputs and outputs into   ≫   ≫   where   is decoupled.We assume that every output of any LoopD within  directly depends on every input of it: this is necessary as it is possible to construct examples where a LoopD can have a decoupled signal function "hidden" inside it (discussed in Section 3.4.1).We show later that our transformation does not generate LoopD like that.
Lemma 4.4.Given  in CCA composition form for which: 1.  contains no loop, and 2. For every LoopD within  , all of its outputs directly depend on its inputs, there are no direct dependencies between inputs and outputs of  if and only if we can apply split to  .
Proof.We consider the two directions of the equivalence in turn.The ⇐= direction is simple: all paths from the inputs of  to the outputs of  must go through   by definition of composition.  is decoupled so every path goes through a decoupled signal function, and thus there cannot be any direct dependencies.We now turn to the =⇒ direction, which we prove by induction on the size of  .We define the size of  as follows:  * * *  and  ≫  each have size equal to the sum of the sizes of  and , and all other terms have a size of 1.
We start with our base case: a program of size 1, meaning that  is one of arr, pre, LoopD or LoopM.We cannot have a LoopD or arr since there is at least one direct dependency in  , by definition and the second condition of our lemma respectively.Therefore, it must be pre or LoopM.In either of these cases, split applies Split-Pre or Split-LoopM and we are done.
We now prove our lemma for  of size  + 1, assuming that it holds for  ′ of size  and smaller.We consider cases of  in  =  ′ ≫ : •  = arr : All of arr's outputs directly depend on its inputs.
Therefore, we need  ′ to have no direct dependencies: if it had a direct dependency from  ′  to  ′  , then  would have a direct dependency from  ′  to  ′  to every output of , contradicting the statement of the lemma.Therefore, we can apply split to  ′ .This is exactly what Split-NonPre does.The same applies for id, and LoopD by the second condition of the lemma.
where  1 and  2 are arr or pre.]Step 1 does not apply, so we move onto step 2. If step 2 succeeds in splitting  , we finish with a LoopM.
If we cannot split  , then we must have that  1 and  2 are arr.This is because if either is pre, we are done since  has a decoupled signal function   on every path from the inputs of  to its outputs, meaning that it has no direct dependencies and thus can be split according to Lemma 4.4.
We know for the same reason that  ′ has at least one direct dependency from  ′  to  ′  , as otherwise we could split  : every path in  goes through  ′ , and if  ′ has no direct dependencies then every path through  ′ has a decoupled signal function on it.
It follows that  = arr  ≫  ′ ≫ arr  has a direct dependency from all of its inputs to all of its outputs: from each input of arr  to  ′  , to  ′  , and finally to each output of arr .
We require that it is possible to split  ≫  in the absence of dependency cycles for step 3b to be applicable.Assume for contradiction that  ≫  cannot be split, meaning that there exists a direct dependency within  ≫  from the th input to the th output.We then have a dependency cycle as follows: from the th part of the second input of the loop, through  to the th input of  ≫  to its th output and thus completing the cycle.Therefore, there must be no direct dependencies in  ≫  and thus it can be split by Lemma 4.4.
We therefore get a LoopD of the form LoopD (( * * * ) ≫  ≫ ( * * * )) .We apply tightening in step 3c to get  ≫ LoopD ((id * * * ) ≫  ≫ (id * * * ))  ≫ .Note that there is a direct dependency from every input to every output of this LoopD, as each input only goes through  to get to the output, and we showed earlier that  has a direct dependency from every input to every output.
If we cannot split  , then  1 must be arr since if it were pre then  would have no direct dependencies as all paths go through that pre, meaning that  in that case can be split by Lemma 4.4.
We are therefore working with  = ( * * * ) ≫ arr  ≫ ( * * * ).Apply step 3a of the transformation to get ( * * * id) ≫ arr  ≫ ( * * * ( ≫ )).We know that every output of arr  directly depends on every input, meaning that every input of  ≫  directly depends on every part of the second input of the loop, by the same logic as in Case 2b.We are guaranteed to be able to apply split in step 3b by the Lemma 4.4.
This concludes the base case.We also note that in every case where we create a LoopD, all of its outputs depend on its inputs.This means that we can use Lemma 4.4 in the inductive step as all LoopD considered will have this condition hold.
We now turn to the inductive step: proving the transformation works on loop  when all loop within  without dependency cycles can be transformed.We first apply our transformation to the inner loop.Then, the proof above also proves the inductive case, but with some minor modifications.We can now have   being LoopD or LoopM.Any LoopD can be treated identially to arr, as every LoopD created by this transformation has identical dependencies to an arr.Any LoopM can be treated identially to pre for the same reason.Therefore, the inductive step holds, and the proof holds.□

Implementation
In this section we describe the Haskell implementation of our transformation, Severn4 .We start with a minimal AFRP implementation, then implement the transformation on top, and finally test that the implementation is correct.

Signal Descriptors and CCA Composition Form
We represent signals as in Chupin and Nilsson's SFRP [2] by using signal descriptors.These are defined by the Desc GADT, which is then lifted to the kind level via DataKinds.Our arrow constructors are parameterised by these descriptors.This again mirrors SFRP, but rather than defining a single GADT with all of the arrow constructors, we enforce CCA composition form (Section 3.1.4)through the definition of multiple GADTs: By having CFSF (read composed form signal function) only introduce ≫, and NoComp introduce the remaining combinators, ≫ can only be added at the top level so that programs must be written in composition form.We also separate out decoupled terms into their own GADT, allowing us to enforce through the type system that a term is decoupled as shown in LoopD.
We provide smart constructors for each of the traditional arrow combinators that produce an equivalent CFSF using the laws discussed in Section 3, to avoid programmers having to directly use the above constructors to write their programs.This means that a programmer can write a CFSF in Severn in the same way that they would an SF in Yampa.We also implement a small optimisation pass which merges consecutive Arr together using the arr  ≫ arr  = arr ( •  ) law, which is also an optimisation applied by Yampa [16].

The Transformation Algorithm
We now outline our implementation of our transformation on CFSFs.We focus on transformLoop, which transforms a given Loop using the steps outlined in Section 3.5; the transformation itself traverses the input CFSF by calling transformLoop on each Loop from innermost to outermost.
Each of the three cases outlined in our transformation are defined as a CFSF a b -> Maybe (CFSF a b) function, since a given case is not applicable to every CFSF.transformLoop is therefore defined as trying out each case using the alternative operator <|>.
The implementation of each case utilises the rules defined in Section 3, which are each implemented as Haskell functions.As an example, we present a slightly simplified implementation of leftSlide below: We apply a few tricks here which are common throughout the implementation.We use auxiliary GADTs when we cannot determine the exact type of the output from the type of the input CFSF: here we require LoopBox since we cannot guarantee that sliding will lead to the same type c.We can also use this to guarantee that some part of the output is decoupled, which we do in our implementation of split.
Since CFSF allows arbitrary bracketing of ≫, we cannot use pattern matching to get the first or last element of a given composition.We therefore provide headTail x to do this, which returns Left x if there are no ≫, or the head and tail of the chain of ≫ otherwise.The rest of the implementation follows from the definition of the rule: use our auxiliary functions to match the form of the rule, and if we can, return the result of applying it.

Running CFSFs
Once the transformation has been applied, we are left with a CFSF containing no Loop.Severn provides  which produces the output without using any input, and the program to run at the next time step once it gets an input.Thus running LoopD f d consists of getting the output from d, running f and then using d's input to get the next CFSF.

Testing via Arbitrary Program Generation
As well as our proof that the transformation is correct (Section 4), we also test arbitrary Severn programs without dependency cycles against their Yampa equivalents to make sure that the implementation is correct.We build test programs with a pair of mutually recursive generators using the Hedgehog library 5 which generate a Yampa SF and its equivalent Severn CFSF.One test generates decoupled programs, and one generates non-decoupled programs.
Programs are generated inductively: start with the smallest decoupled program pre v and the smallest non-decoupled program arr f, and then build larger programs by combining them.We use rules similar to those used by Sculthorpe and Nilsson [27] for their arrow combinator types indexed by decoupledness, with rules such as  ≫  being decoupled if one of  or  is.To generate a program of a given size and decoupledness, we generate two smaller programs and combine them using one of those rules.
These arbitrary decoupled and non-decoupled programs are then used to build loops for testing.We use the same techniques as in Case 2b of Theorem 4.5 for building generic loops without dependency cycles: start with loop (( * * * ) ≫  ≫ ( * * * )), generating LoopM by generating a decoupled  and generating LoopD by generating a decoupled  ≫ .
We are able to test that if we have an SF and its equivalent CFSF, both programs produce the same results after transforming the CFSF using our transformation.Our implementation passes the tests for programs of arbitrary size.

Performance
To show the impact of our work on performance, we have two sets of benchmarks: on fixed networks to identify the improvements for specifics constructs, and on randomised networks.We use the Criterion library 6 to benchmark each program 100,000 times.
We first benchmarked four programs in order to test individual uses of loop.Fig. 12 shows the benchmarks, their definitions and average speedups compared to Yampa.Severn gives a speedup of between 1.5x and 1.7x for the first three benchmarks.The nested program gave a lower speedup of 1.1x, which we expect is due to the CFSF being allocated by Severn being larger in that case.We also varied the number of arr in arrs, in order to test whether the speedup varied based on the amount of work done within the loop, but found no clear change.We expect this is due to the optimisations implemented by Yampa and in our optimisation pass (Section 5.1): the composition law allows composed arr to be merged into existing arr, so that the enlarged program is effectively the same but with larger pure functions to execute.
To avoid the effects of these optimisations we constructed a benchmark based on LoopM with pres and arrs interleaved.By doing this, the composition law could not be applied.The results for this are shown in Fig. 13.We achieved speedups of between 1x and 2x for programs with 150 or fewer primitives, but performance improved significantly with 200 primitives.
For our randomised tests, we take a similar approach as was done for SFRP [2].We varied two parameters: the size of the generated program, and the number of loops within that program as a proportion of the size.We do not include the time it takes our transformation to run.We found that our speedups were always greater than 1x, and averaged 2.5x 7 .
Since we achieve speedups in all benchmarks, we conclude that our transformation provides an effective improvement for loops in AFRP.Further improvements may possible in the future by using IORefs to allocate pres rather than returning an entirely new CFSF, by using stream fusion [3], or with the Pipes library8 to get more performance out of runCFSF.
While our transformation works on a large subset of Yampa programs, we make a few assumptions that may not hold for all AFRP programs.We discuss the consequences here and how our work could be extended in future work.

Effects and Monadic Signal Functions (MSFs)
We require that our arrows satisfy the laws needed for CCAs in order to apply the distributive law, which underpins most of operations we have defined.We also assume that there are no side-effects in the sliding law.These two points mean that our transformation is not applicable to effectful programs in general: it changes the execution order so that that a program can be run strictly, which may not be the same execution order as before.In the following example, lazy semantics will run  first since pre  produces its input immediately: However, our transformation will turn this into a LoopD which, when run strictly, runs  first.This does not pose an issue in Yampa as pre is the only "effectful" operation, and its effects are entirely local to itself.However, it is easy to add effects into such a system: Perez et al. embed monads into AFRP with Monadic Signal Functions (MSFs) [20].Modifying our transformation to preserve execution order, and therefore support MSFs, is future work.
It is important to note however that the reordering does not change the meaning of programs when our effects are commutative: we eventually run every part of the program.Piponi [23] shows a number of monads whose effects are commutative, meaning their computations can be reordered without issues.MSFs built with these commutative monads, such as Reader and Writer with a commutative monoid, could therefore be safely transformed by our technique.

arr is a Black Box
We know nothing about  within arr  and must assume that all outputs of  depend on all of its inputs.However, arr is the only constructor we have for routing data and it permits programmers to write routing functions like swap = arr ((, ).(, )) where that assumption is not required.Dealing with this would introduce complex dependencies between the inputs and outputs of an arr, but would allow additional ways to transform programs: for example, first (pre x) ≫ swap = swap ≫ second (pre x).This poses a particular problem when working with proc notation [19], which introduces many additional arr during desugaring.
If we differentiated between arr for applying pure functions and arr for routing, we could modify our transformation to take these into account.Joseph's generalised arrows [10] introduces a variety of additional combinators such as ga_assoc and ga_swap with which routing can be implemented without using arr explicitly.SFRP [2] uses routers for arbitrary rearrangements of inputs into outputs.

Switching and Choice
Members of the ArrowChoice class allow for conditional execution of arrows.The key operator is f +++ g, which runs f if given a Left value and g otherwise.Since f +++ g depends on its input to decide which of f and g to run, we can never decouple it.It can therefore be treated in the same way as arr and thus should be easy to add.
Switching is harder to add: switch f c uses a continuation c to change the arrow being run (f) to a different one at runtime.This means that switch can change the structure of a program in a way that is unknown at compile-time.SFRP implements switching by running its transformation again once a switch occurs, but this can slow down the program temporarily as the entire transformation procedure is rerun.Winograd-Cort and Hudak transform some uses of switch into +++, which avoids these issues for them [29].

Well-typed loop
We proved that our transformation only fails if we are unable to run a program anyway (Section 4).Therefore, if our transformation succeeds, a loop is well-formed.A type system which guarantees that a loop contains no dependency cycles would be helpful to avoid running the transformation on a loop with dependency cycles.This also avoids the issue of a switch f c generating a loop that cannot be transformed, thus generating a runtime error.
A type system for checking for no dependency cycles in loop could build on some of the existing work by Sculthorpe and Nilsson [27] who label the decoupledness of signal functions at the type level, and Bahr [1] who introduces a modal type system that detects space leaks.
8 Related Work FRP Applications.FRP sees significant use in a variety of domains where performance is important.It has been used in many embedded settings: the Juniper language for Arduino microcontrollers [7], the Hailstorm language for IoT [25] and the Emfrp language for embedded systems [26] are three examples which use a variant of FRP designed for restricted memory use, but could move to AFRP if it became performant enough.The original introduction to AFRP [8] discussed robots as its basis, which also tend to consist of programs run on embedded systems.
There has also been some research into making FRP safer in these contexts.Perez and Goodloe [22] incorporate fault tolerance into FRP, which could also be useful in domains like robotics.Copilot [21] allows users to write runtime verification systems in the style of FRP.AFRP Optimisation.Beyond CCAs (Section 2.1), Scalable FRP (SFRP) [2] is another optimisation which transforms AFRP programs into IO operations on mutable memory cells to reduce the cost of routing data between signal functions.Notably, SFRP does not currently support the loop  combinator at all since it needs to know the order in which the component signal functions of  are executed to order its IO operations.Our transformation would allow the loop combinator to be added.
Other projects have taken a similar approach to SFRP: Ultrametric FRP [12] implements FRP as an imperatively updated dataflow graph, and Patai's work on higher-order streams [18] translates FRP to an IO stepper action which runs with each new sample.
A common optimisation in the FRP world that may be applicable to AFRP is deciding whether a value needs to be computed at a given time step.Elliott [4] discusses how values that only sometimes change should only be recomputed when new values are pushed, but also that the results of FRP code should only be recomputed when they are pulled by whatever is utilising its results.Sculthorpe and Nilsson [28] define some temporal logic properties of FRP networks that could also be used to reason about change and thus whether a value needs recomputing.
All of the above optimisations could be combined with our work, and would likely produce improved speedups compared to those we presented in Section 6.
Synchronous Programming.Much of the work we have discussed aims to bring FRP's efficiency and safety closer to that of synchronous dataflow languages such as Lustre [6] which also permit writing reactive programs.While less expressive than FRP, they are simpler to efficiently implement [13].They deal with dataflow cycles (loop in AFRP) via a syntactic check for a delay operator present in every cycle, which is similar to what we have done here.Digital circuits are similar: Ghica et al. [5] introduce a theory for rewriting dataflow categories with a delay operator, which is then used to talk about digital circuits that could also be applied to AFRP.

Conclusions
We showed that loops in AFRP without dependency cycles can be transformed into more restrictive LoopD and LoopM forms which can be evaluated strictly, thus avoiding the overheads of lazy evaluation.This offers performance benefits and allows for easier compilation of bespoke AFRP-style languages in the future since such a language will no longer need lazy evaluation.
We proved that our transformation preserves program meaning, both theoretically using the Arrow and ArrowLoop laws, but also practically through a Haskell implementation whose tests ensure that programs before and after transformation behave equivalently.We also provided a proof that this transformation works on every loop expressible in our subset of AFRP that does not contain a dependency cycle.
While our implementation is a subset of Yampa, we believe it is large enough to support most useful programs.We also laid out how we could extend our transformation to be able to support even more programs in future.Finally, our benchmark shows that our implementation, Severn, provided a modest speedup for a variety of AFRP programs, and outlined potential further improvements through implementation of Yampa's optimisations.

Figure 2 .
Figure 2. The minimal set of arrow operators used in this paper, presented as box-and-wire diagrams.

Figure 6 .
Figure 6.The three steps which define the left-fill rule.

data
Desc x where V :: a -> Desc a P :: Desc a -> Desc b -> Desc (a, b) This allows us to define signals which produce values of some type, and pairs of signals: P (V Int) (V Int) describes a pair of signals each containing Ints.Values produced by a signal with descriptor d are represented by the GADT Val d, used throughout the implementation.
runCFSF :: CFSF a b -> a -> (b, CFSF a b)to run these transformed CFSF, taking an input value and producing the output at that time step along with the next CFSF to run.Since any CFSF applied to runCFSF no longer contains Loop, we define it strictly and thus avoid all of the overheads of lazy evaluation.The decoupled parts of LoopD and LoopM are run with runDec :: Decoupled a b -> (Val b, Val a -> Decoupled a b) Bench Definition Speedup noloop arrs = arr  ≫ ... ≫ arr  1.65x LoopD LoopD arrs (pre v) 1.53x LoopM LoopM arrs (pre v) arrs 1.66x Nested LoopD arrs (LoopM arrs (pre v) arrs) 1.10x
1. Attempt to apply loop ( 1 * * *  2 ) =  1 (Section 3.4.1) to remove the loop altogether.In CCA composition form, this is equivalent to checking whether each term   in loop ( 1 ≫  2 ≫ • • • ≫   ) consists of   * * *   .2. If that does not work, attempt to transform the loop f to LoopM  ′   ′ by using split to find  =  ′ ≫  ≫  ′ for decoupled  (Section 3.3).3.If that does not work, attempt to transform the loop f into LoopD ′  in three stages: a. Slide left as much as possible, using left fill as needed, to get a program of the form loop ( ≫ second ) (Section 3.1 and Section 3.2.2). b.Apply split to  to get  =  ≫ second (  ≫   ≫   ).Right slide   to get   in the right position, giving us LoopD (second   ≫  ≫ second   )   (Section 3.2).c.If this works, apply left and right tightening to extract any more pre or LoopM that could be used in outer loop from that LoopD (Section 3.4.1).
•  = pre  or LoopM:  matches the form needed by Split-Pre and Split-LoopM respectively.•= (  * * *   ) ≫ ... ≫ ( +1 * * *  +1 ), where  ′ does not end with  * * *  : This means that  ′ =  ≫  and  is one of arr, pre, LoopD or LoopM.If  is pre or LoopM, we are done: use Split-NonPre to skip past  and then one of Split-Pre or Split-LoopM.If  is arr or LoopD, then there is a path from every input of  to every output.Consider cases of .This is because if we had a direct dependency from   to   in , we would have a direct dependency in  : from   to   , then through  to   , and finally to   .This means that each of   ≫ ... ≫  +1 and   ≫ ... ≫  +1 have no direct dependencies by definition of * * * .We can therefore split each of  and  separately by the induction hypothesis and thus can apply Split- * * * -R.
First, if  has no direct dependencies, we can apply split to it by the induction hypothesis and thus Split-NonPre can be applied to skip past  and .Otherwise, if  has a direct dependency from   to   , there must be no direct dependencies in  for there to be no direct dependencies in  .Proof.By induction on loop nesting.Consider a loop f which contains no loop, LoopD or LoopM.By Lemma 4.3, we can express  in one of three forms, which we show can be transformed in turn.Case 1. [ =  * * * .] Apply step 1 of the transformation to get loop  = .