Asynchronous Probabilistic Couplings in Higher-Order Separation Logic

Probabilistic couplings are the foundation for many probabilistic relational program logics and arise when relating random sampling statements across two programs. In relational program logics, this manifests as dedicated coupling rules that, e.g., say we may reason as if two sampling statements return the same value. However, this approach fundamentally requires aligning or "synchronizing" the sampling statements of the two programs which is not always possible. In this paper, we develop Clutch, a higher-order probabilistic relational separation logic that addresses this issue by supporting asynchronous probabilistic couplings. We use Clutch to develop a logical step-indexed logical relation to reason about contextual refinement and equivalence of higher-order programs written in a rich language with a probabilistic choice operator, higher-order local state, and impredicative polymorphism. Finally, we demonstrate our approach on a number of case studies. All the results that appear in the paper have been formalized in the Coq proof assistant using the Coquelicot library and the Iris separation logic framework.


INTRODUCTION
Relational reasoning is a useful technique for proving properties of probabilistic programs.By relating a complex probabilistic program to a simpler one, we can often reduce a challenging verification task to an easier one.In addition, certain important properties of probabilistic programs are naturally expressed in a relational form, such as stability of machine learning algorithms [Bousquet and Elisseeff 2002], differential privacy [Dwork and Roth 2013], and provable security [Goldwasser and Micali 1984].Consequently, a number of relational program logics and models have been developed for probabilistic programs, e.g., pRHL [Barthe et al. 2015], approximate pRHL [Barthe et al. 2016a[Barthe et al. ,b, 2012]], EpRHL [Barthe et al. 2018], HO-RPL [Aguirre et al. 2021], Polaris [Tassarotti and Harper 2019], logical relations [Bizjak and Birkedal 2015;Johann et al. 2010;Wand et al. 2018], and differential logical relations [Dal Lago and Gavazzo 2022].
Many probabilistic relational program logics make use of probabilistic couplings [Lindvall 2002;Thorisson 2000;Villani 2008], a mathematical tool for reasoning about pairs of probabilistic processes.Informally, couplings correlate outputs of two processes by specifying how corresponding sampling statements are correlated.To understand how couplings work in such logics, let us consider a pRHL-like logic.In pRHL and its variants, we prove Hoare quadruples of the form { } 1 ∼ 2 { }, where 1 and 2 are two probabilistic programs, and and are pre and postrelations on states of the two programs.Couplings arise when reasoning about random sampling statements in the two programs, such as in the following rule: Here, the two programs both sample from the same distribution and store the result in variable 1 and 2 , respectively.The rule says that we may reason as if the two sampling statements return the same value in both programs, and one says that the sample statements have been "coupled".This is a powerful method that integrates well with existing reasoning principles from relational program logics.However, this kind of coupling rules require aligning or "synchronizing" the sampling statements of the two programs: both programs have to be executing the sample statements we want to couple for their next step when applying the rule.To enable this alignment, pRHL has various rules that enable taking steps on one side of the quadruple at a time or commuting statements in a (first-order) program.Nevertheless, with the rules from existing probabilistic relational logics, it is not always possible to synchronize sampling statements.
For example, consider the following program written in an ML-like language that eagerly performs a probabilistic coin flip and returns the result in a thunk: eager ≜ let = flip() in _.
An indistinguishable-but lazy-version of the program only does the coin flip when the thunk is invoked for the first time but stores the result in a reference that is read from in future invocations: end The usual symbolic execution rules of relational logics will allow us to progress the two sides independently according to the program execution, but they will not allow us to line up the flip() expression in eager with that in lazy.Consequently, the coupling rule prhl-couple cannot be applied.
Intuitively, the flip() expression in eager is evaluated immediately but the flip() expression in lazy only gets evaluated when the thunk is invoked-to relate the two thunks one is forced to first evaluate the eager sampling, but this then makes it impossible to couple it with the lazy sampling.
While the example may seem contrived, these kinds of transformations of eager and lazy sampling are widely used, e.g., in proofs in the Random Oracle Model [Bellare and Rogaway 1993] and in game playing proofs [Bellare andRogaway 2004, 2006].For this reason, systems like EasyCrypt [Barthe et al. 2013] and CertiCrypt [Barthe et al. 2009[Barthe et al. , 2010] ] support reasoning about lazy/eager sampling through special-purpose rules for swapping statements that allows alignment of samplings; the approach is shown to work for a first-order language with global state and relies on syntactic criteria and assertions on memory disjointness.However, in rich enough languages (e.g. with general references and closures) these kinds of swapping-equivalences are themselves highly non-trivial, even in the non-probabilistic case [Dreyer et al. 2012;Pitts and Stark 1998].
In this paper we develop Clutch, a higher-order probabilistic relational separation logic that addresses this issue by enabling asynchronous probabilistic couplings.To do so, Clutch introduces a novel kind of ghost state, called presampling tapes.Presampling tapes let us reason about sampling statements as if they executed ahead of time and stored their results for later use.This converts the usual alignment problem of coupling rules into the task of reasoning about this special form of state.Fortunately, reasoning about state is well-addressed with modern separation logics.
Clutch provides a "logical" step-indexed logical relation [Dreyer et al. 2011] to reason about contextual refinement and equivalence of probabilistic higher-order programs written in F rand ,ref , a rich language with a probabilistic choice operator, higher-order local state, recursive types, and impredicative polymorphism.Intuitively, expressions 1 and 2 of type are contextually equivalent if no well-typed context C can distinguish them, i.e., if the expression C [ 1 ] has the same observable behaviors as C [ 2 ].Contextual equivalence can be decomposed into contextual refinement: we say 1 refines 2 at type , written 1 ≾ ctx 2 : , if, for all contexts C expecting something of type , if C [ 1 ] has some observable behavior, then so does C [ 2 ].As our language is probabilistic, here "observable behavior" means the probability of observing an outcome, such as termination.Using the logical approach [Timany et al. 2022], in Clutch, types are interpreted as relations expressed in separation logic.The resulting model allows us to prove, among other examples, that the eager program above is contextually equivalent to the lazy program.
The work presented in this paper is foundational [Appel 2001] in the sense that all results, including the semantics, the logic, the necessary mathematical analysis results, the relational model, and all the examples are formalized1 in the Coq proof assistant [The Coq Development Team 2022] using the Coquelicot library [Boldo et al. 2015] and the Iris separation logic framework [Jung et al. 2016[Jung et al. , 2018[Jung et al. , 2015;;Krebbers et al. 2017a].
In summary, we make the following contributions: • A higher-order probabilistic relational separation logic, Clutch, for reasoning about probabilistic programs written in F rand ,ref , an ML-like programming language with higher-order local state, recursive types, and impredicative polymorphism.
• A proof method for relating asynchronous probabilistic samplings in a program logic; a methodology that allows us to reason about sampling as if it were state and to exploit existing separation logic mechanisms such as ghost state and invariants to reason about probabilistic programs.We demonstrate the usefulness of the approach with a number of case studies.• The first coupling-based relational program logic to reason about contextual refinement and equivalence of programs in a higher-order language with local state, recursive types, and impredicative polymorphism.• Novel technical ideas, namely, left-partial couplings, a coupling modality, and an erasure argument, that allow us to prove soundness of the relational logic.• Full mechanization in Coq using Coquelicot and the Iris separation logic framework.

KEY IDEAS
The key conceptual novelties of the Clutch logic are twofold: a logical probabilistic refinement judgment and a novel kind of ghost resource, called presampling tapes.
Logical refinement.The refinement judgment Δ ⊨ E 1 ≾ 2 : should be read as "the expression 1 refines the expression 2 at type " and it satisfies a range of structural and symbolic execution rules as showcased in Figure 2 and further explained in §4.Just like contextual refinement, the judgment is indexed by a type -the environment Δ assigns semantic interpretations to type variables in and E is an invariant mask as elaborated on in §4.Both are safely ignored in this section.The meaning of the judgment is formally reflected by the following soundness theorem.
The refinement judgment is internal to the ambient Clutch separation logic.This means that we can combine the judgment in arbitrary ways with other logical connectives: e.g., the separating conjunction * and its adjoint separating implication (magic wand) * .All inference rules that we present can be internalized as propositions in the logic and we will use an inference rule with premises 1 , . . ., and conclusion as notation for ( 1 * . . .* ) ⊢ .
The language F rand ,ref contains a single probabilistic primitive rand( ) that reduces uniformly at random to some ∈ {0, 1, . . ., }: where is the current program state and − →⊆ Cfg × [0, 1] × Cfg is a small-step transition relation, annotated with the probability that the transition occurs.By defining flip() ≜ if rand(1) = 0 then false else true we recover the Boolean fair coin flip operator used in the motivating example.
To reason relationally about probabilistic choices that can be synchronized, Clutch admits a classical coupling rule that allows us to continue reasoning as if the two sampled values are related by a bijection on the sampling space {0, . . ., }: where and ′ are arbitrary evaluation contexts.
Asynchronous couplings.To support asynchronous couplings we introduce presampling tapes.Reminiscent of how prophecy variables [Abadi andLamport 1988, 1991;Jung et al. 2020] allow us to talk about the future, presampling tapes give us the means to talk about the outcome of probabilistic choices in the future.2Tapes manifest both in the operational semantics and in the logic.
Operationally, a tape consists of an upper bound ∈ N and a finite sequence of natural numbers less than or equal to , representing future outcomes of rand( ) commands.Each tape is labeled with an identifier ∈ Label, and a program's state is extended with a finite map from labels to tapes.Tapes can be dynamically allocated using a tape primitive: which extends the mapping with an empty tape and the upper bound , and it returns its fresh label .The rand primitive can then optionally be annotated with a tape label .If ( ) = ( , ), i.e., the corresponding tape is empty, rand( , ) reduces to any ≤ with equal probability: (after samples from ) . . .
. . .but if the tape is not empty, the rand( , ) primitive reduces deterministically by taking off the first element of the tape and returning it: If the tape bounds do not match, then rand( , ) reduces as if the tape was empty: ) and ≠ and ≤ However, no primitives in the language add values to the tapes!Instead, values are added to tapes as part of presampling steps that will be ghost operations appearing only in the relational logic.That is, presampling will purely be a proof-device that has no operational effect: in the end, tapes can in fact be erased entirely through refinement as will be clear by the end of this section.At the logical level, Clutch comes with a ↩→ ( , ì ) assertion that denotes ownership of the label and its contents ( , ì ), analogously to how the traditional points-to-connective ℓ ↦ → of separation logic denotes ownership of the location ℓ and its contents on the heap.When a tape is allocated, ownership of the fresh empty tape is acquired, i.e., rel-alloc-tape-l Asynchronous couplings between probabilistic choices can be established in the refinement logic by coupling ghost presamplings with program steps.For example, the rule below allows us to couple an (unlabeled) probabilistic choice on the right with a presampling on the tape on the left: Intuitively, as illustrated in Figure 1, the rule allows us to couple a logical ghost presampling step on the left (illustrated using a red dashed arrow) with a physical sampling on the right.A symmetric rule holds for the opposite direction and two ghost presamplings can be coupled as well.When we-at some point in the future-reach a presampled rand( , ), we simply read off the presampled values from the tape deterministically in a first-in-first-out order, i.e., If we do not perform any presamplings, tapes and labels can be ignored and we can couple labeled sampling commands as if they were unlabeled: Here the assertion ↩→ s ( , ) denotes ownership of an empty tape of the right-hand side program (the program on the "specification" side).
Example.Using presampling tapes, we can show that lazy is a contextual refinement of eager from §1, that is, lazy ≾ ctx eager : unit → bool.We first define an intermediate labeled version of lazy, using flip( ) ≜ if rand(1, ) = 0 then false else true: end By transitivity of contextual refinement and Theorem 1 it suffices to show ⊨ lazy ≾ lazy ′ : unit → bool and ⊨ lazy ′ ≾ eager : unit → bool.The former follows straightforwardly using symbolic execution rules and rel-rand-erase-r.To show the latter we allocate a tape and a reference ℓ on the left by symbolic execution and couple the presampling of a ∈ {0, 1} on the tape with the flip() on the right using rel-couple-tape-l.This establishes an invariant that expresses how either is on the tape and the location ℓ is empty or ℓ contains the value .
Invariants are particular kinds of propositions in Clutch that, in this particular case, are guaranteed to always hold at the beginning and at the end of the function evaluation.Under this invariant, we show that the two thunks are related by symbolic execution and rules for accessing invariants that we detail in §4.Symmetric arguments allow us to show the refinement in the other direction and consequently the contextual equivalence.This example shows how presampling tapes are simple and powerful, yet merely a proof-device: the final equivalence holds for programs without any mention of tapes.Intuitively, tapes allow us to separate the process of building a coupling from the operational semantics of the program.One might be tempted to believe, though, that as soon as the idea of presampling arises, the high-level proof rules as supported by Clutch are straightforward to state and prove.This is not the case.As we will show throughout the paper, a great deal of care goes into defining a system that supports presampling while being sound.In §7 we discuss two counterexamples that illustrate some of the subtleties involved in defining a sound system.

,ref
To account for non-terminating behavior, we will define our operational semantics using probability sub-distributions which we recall below.
The syntax of the language F rand ,ref is defined by the grammar below., ∈Val The term language is mostly standard but note that there are no types in terms; we write Λ for type abstraction and _ for type application.fold and unfold are the special term constructs for iso-recursive types.ref ( ) allocates a new reference, !dereferences the location evaluates to, and 1 ← 2 assigns the result of evaluating 2 to the location that 1 evaluates to.We introduce syntactic sugar for lambda abstractions .defined as rec _ = , let-bindings let = 1 in 2 defined as ( . 2 )( 1 ), and sequencing 1 ; 2 defined as let _ = 1 in 2 .We write rand( ) for rand( , ()), i.e. an unlabeled probabilistic choice.We implicitly coerce from ∈ State to heaps and tapes, e.g., (ℓ) = 1 ( )(ℓ) and ( ) = 2 ( )( ).Tapes are formally pairs ( , ì ) of ∈ N and a finite sequence ì of natural numbers less than or equal to .The language has a call-by-value single-step-reduction relation − →⊆ Cfg × [0, 1] × Cfg defined using evaluation contexts ∈ Ectx.The relation is mostly standard: all the non-probabilistic constructs reduce as usual with weight 1 and rand( 1 , 2 ) reduces as discussed in §2.
To define full program execution, let step( ) ∈ D (Cfg) denote the distribution induced by the single step reduction of configuration ∈ Cfg.First, we define a stratified execution probability exec : Cfg → D (Val) by induction on : where 0 denotes the everywhere-zero distribution.That is, exec ( , )( ) denotes the probability of stepping from the configuration ( , ) to a value in less than steps.The probability that a full execution, starting from configuration , reaches a value is the limit of its stratified approximations, which exists by monotonicity and boundedness: The probability that a full execution from a starting configuration terminates then becomes exec ⇓ ( ) ≜ ∈Val exec( )( ).Typing judgments have the form Θ | Γ ⊢ : where Γ is a context assigning types to program variables, and Θ is a context of type variables that may occur in Γ and .The inference rules for the typing judgments are standard (see, e.g., Frumin et al. [2021b] or the Coq formalization) and omitted, except for the straightforward rules for typing tapes and samplings shown below: The notion of contextual refinement that we use is also mostly standard and uses the termination probability exec ⇓ as observation predicate.Since we are in a typed setting, we consider only typed contexts.A program context is well-typed, written C : Note that contextual refinement is a precongruence, and that the statement itself is in the meta-logic (e.g., Coq) and makes no mention of Clutch or Iris.Contextual equivalence Θ | Γ ⊢ 1 ≃ ctx 2 : is defined as the symmetric interior of refinement:

THE CLUTCH REFINEMENT LOGIC
In the style of ReLoC [Frumin et al. 2021b], we define a logical refinement judgment Δ ⊨ E 1 ≾ 2 : as an internal notion in the Clutch separation logic by structural recursion over the type .The fundamental theorem of logical relations will then show that logical refinement implies contextual refinement.This means proving contextual refinement can be reduced to proving logical refinement, which is generally much easier.When defining and proving logical refinement, we can leverage the features of modern separation logic, e.g., (impredicative) invariants and (higher-order) ghost state as inherited from Iris, to model and reason about complex programs and language features.
Clutch is based on higher-order intuitionistic separation logic and the most important propositions are shown below.
. .As Clutch is built upon the base logic of Iris [Jung et al. 2018], it includes all its connectives such as the persistence modality , the later modality ⊲, fixpoints ., invariants N , and non-atomic invariants [The Iris Development Team 2022], written N , which we will introduce as needed.
The proposition ⌜ ⌝ embeds a meta-logic (e.g., Coq) proposition (e.g., equality or a coupling) into Clutch but we will omit the brackets whenever the type of is clear from the context.Like ordinary separation logic, Clutch has heap points-to assertions.Since the logic is relational, these come in two forms: ℓ ↦ → for the left-hand side program's state and ℓ ↦ → s for the right-hand side's state (the "specification" side).For the same reason, tape assertions come in two forms as well, ↩→ ( , ì ) and ↩→ s ( , ì ) respectively.

Refinement Judgments
The refinement judgment Δ ⊨ E 1 ≾ 2 : should be read as "in environment Δ, the expression 1 refines the expression 2 at type under the invariants in E".We refer to 1 as the implementation and to 2 as the specification.The environment Δ assigns interpretations to type variables occurring in .These interpretations are Clutch relations of type Val ×Val → iProp.One such relation is the binary interpretation Δ (−, −) of a syntactic type ∈ Type which is used to define the refinement judgment, as discussed in §5.2.For example, for base types such as bool and int, the value interpretation asserts equality between the values.
Figure 2 showcases a selection of the type-directed structural and computational rules for proving logical refinement for deterministic reductions.Our computational rules resemble the typical forward-symbolic-execution-style rules from, e.g., the weakest precondition calculus in Iris [Jung et al. 2018], but come in forms for both the left-hand side and the right-hand side.For example, rel-pure-l and rel-pure-r symbolically execute "pure" reductions, i.e. reductions that do not depend on the state, such as -reductions.rel-store-l and rel-store-r on the other hand depend on the heap and require ownership of a location to store values at it.We remark that all the rules for the deterministic fragment of the Clutch refinement judgment are identical to the rules for the sequential fragment of the non-probabilistic relational logic ReLoC [Frumin et al. 2021b]-even though the underlying semantics and model are very different.This is one of the key reasons behind the support for modular reasoning.
The rules in Figure 3 showcase the computational rules for non-coupled probabilistic reductions and for interactions with presampling tapes.The rules rel-rand-tape-l and rel-rand-tape-r allow us to read off values from a tape as explained in §2; if the tapes are empty, rel-rand-tape-empty-l and rel-rand-tape-empty-r continue with a fresh sampling just like for unlabeled rands in rel-rand-l and rel-rand-r.Notice how the rules resemble the rules for interacting with the heap.
The main novelty of Clutch is the support for both synchronous and asynchronous couplings for which rules are shown in Figure 4. rel-couple-rands is a classical coupling rule that relates two samplings that can be aligned, just like prhl-couple as we saw in §1.The rules rel-couple-tape-l and rel-couple-tape-r, on the other hand, are asynchronous coupling rules; they both couple a sampling reduction with an arbitrary expression on the opposite side by presampling a coupled value to a tape, as discussed in §2.Finally, rel-couple-tapes couples two ghost presamplings to two tapes, and hence offers full asynchrony.

Persistence and Invariants
As mentioned above, the environment Δ in Clutch's refinement judgement provides an interpretation of types as relations in the logic.However, Clutch is a substructural separation logic, while the type system of F rand ,ref is not substructural.To account for the non-substructural nature of F rand ,ref 's types, we make use of the persistence modality .We say is persistent, written persistent( ) if ⊢ ; otherwise, we say that is ephemeral.Persistent resources can freely be duplicated ( ⊣⊢ * ) and eliminated ( ⊢ ).For example, invariants and non-atomic invariants are persistent: once established, they will remain true forever.On the contrary, ephemeral propositions like the points-to connective ℓ ↦ → for the heap may be invalidated in the future when the location is updated.For exactly this reason, the rule rel-pack also requires the interpretation of the type variable to be persistent, to guarantee that it does not depend on ephemeral resources.
To reason about, e.g., functions that make use of ephemeral resources, a common pattern is to "put them in an invariant" to make them persistent, as sketched in §2 for the lazy/eager example.Since our language is sequential, when a function is invoked, no other code can execute before the function returns.This means that we can soundly keep invariants "open" and temporarily rel-pure-l invalidate them for the entire duration of a function invocation-as long as the invariants are reestablished before returning.Non-atomic invariants allow us to capture exactly this intuition.
Invariants are annotated with invariant names N ∈ InvName and the refinement judgment is annotated by invariant masks E ⊆ InvName that indicates which non-atomic invariants that are currently closed.This is needed for bookkeeping of the invariant mechanism in order to avoid reentrancy issues, where invariants are opened in a nested (and unsound) fashion.
Figure 5 shows structural rules for the refinement judgment's interaction with non-atomic invariants.An invariant N can be allocated (rel-na-inv-alloc) by giving up ownership of .
When opening an invariant (rel-na-inv-open) one obtains the resources together with a resource closeNaInv N ( ) that allows one to close the invariant again (rel-na-inv-close) by reestablishing .
We guarantee that all invariants are closed by the end of evaluation by requiring ⊤, the set of all invariant names, as mask annotation on the judgment in all value cases (see, e.g., rel-rec, rel-pack, and rel-return in Figure 2).Clutch invariants are inherited from Iris and hence they are impredicative [Svendsen and Birkedal 2014] which means that the proposition in N is arbitrary and can, e.g., contain other invariant assertions.To ensure soundness of the logic and avoid self-referential paradoxes, invariant access guards by the later modality ⊲.When invariants are not used impredicatively, the later modality rel-rand-tape-empty-r   can mostly be ignored as we have done and will do throughout the paper.The later modality is essential for the soundness of the logical relation and taking guarded fixpoints .that require the recursive occurrence to appear under the later modality, but our use is entirely standard.We refer to Jung et al. [2018] for more details on the later modality and how it is generally used in Iris.

MODEL OF CLUTCH
In this section we show how the connectives of Clutch are modeled through a shallow embedding in the base logic of the Iris separation logic [Jung et al. 2018].First, we describe how we define a relational coupling logic ( §5.1) that is used to establish couplings between programs.Next, we show how the coupling logic in combination with a binary interpretation of types is used to define the refinement logic ( §5.2).Finally, we summarize how the final soundness theorem is proven ( §5.3).
The general structure and skeleton of our model mimics the construction of several nonprobabilistic logical relations found in prior work [Frumin et al. 2021b;Krebbers et al. 2017b;Turon et al. 2013a,b].A key contribution and benefit of Clutch is that that same structure can be adapted to handle probabilistic refinements through the right choice of intermediate definitions and abstractions, as we will highlight throughout this section.While some aspects of the model are Iris-specific, the key ideas are general and should apply to other frameworks as well.

Coupling Logic
We recall that probabilistic couplings are used to prove relations between distributions by constructing a joint distribution that relates two distributions in a particularly desirable way: Given a relation ⊆ × we say is an -coupling if furthermore supp( ) ⊆ .We write 1 ∼ 2 : if there exists an -coupling of 1 and 2 .
Couplings can be constructed and composed along the monadic structure of sub-distributions.
(2) If 1 ∼ 2 : and for all ( , ) ∈ it is the case that 1 ( ) ∼ 2 ( ) : Once a coupling has been established, we can often extract a concrete relation from it between the probability distributions.In particular, for (=)-couplings, we have the following result.
The Clutch coupling logic can be seen as a higher-order separation logic analogue of Barthe et al. [2015]'s pRHL logic.However, unlike pRHL, which uses the four-part Hoare quadruples that we saw in §1 to do relational reasoning, the coupling logic instead follows CaReSL [Turon et al. 2013a] and encodes one of the programs as a separation logic ghost resource.In particular, the coupling logic consists of two components: (1) a unary weakest precondition theory wp {Φ}; and (2) a specification resource spec( ′ ) with specification context specCtx.We think of the program in the weakest precondition predicate as representing the program that occurs on the left side of a quadruple, while the specification program ′ represents the right side program.The specification context assertion specCtx will be used to connect the weakest precondition to the specification resource.Ultimately, by showing in the logic, we will have established a -coupling of the executions of the programs and ′ .
The weakest precondition.The weakest precondition connective wp { .Φ} is a new probabilistic weakest precondition that we formally define below.In isolation it simply means that the execution of is safe (i.e., the probability of crashing is zero), and for every possible return value of , the postcondition Φ( ) holds.Note however, that it encodes partial correctness, as it does not imply that the probability of termination is necessarily one, meaning the program may diverge.
In most Iris-style program logics, the weakest precondition wp {Φ} is a predicate stating that either the program is a value satisfying Φ or it is reducible such that for any other term ′ that it reduces to, then wp ′ {Φ} must hold as well.This guarantees safety of the full execution of the program .The weakest precondition that we define in this section has-in isolation-the same intuition but it is fundamentally different.It is still a unary predicate, but in order to do relational reasoning, the weakest precondition pairs up the probability distribution of individual program steps of the left-hand side with the probability distribution of individual steps of some other program in such a way that there exists a probabilistic coupling among them.Through the specCtx we will guarantee that this "other" program is tied to the program tracked by the spec( ′ ) resource.The weakest precondition itself satisfies all the usual structural rules such as wp-wand and wp-bind found in Figure 6 as well as language-level primitive rules such as wp-load, but in combination with the specCtx and spec( ′ ) resources, the coupling logic satisfies rules like wp-couple-rands and wp-couple-tape-l.Notice the resemblance between wp-couple-rands and prhl-couple from §1.
The weakest precondition connective is given by a guarded fixpoint of the equation below-the fixpoint exists because the recursive occurrence appears under the later modality.3 The base case says that if the expression 1 is a value then the postcondition Φ( 1 ) must hold.On the other hand, if 1 is not a value, we get to assume two propositions ( 1 ) and ( ′ 1 ) for any 1 ∈ State, 1 ′ ∈ Cfg, and then we must prove execCoupl(( 1 , 1 ), ′ 1 )(. ..).The : State → iProp predicate is a state interpretation that interprets the state (the heap and the tapes) of the language as resources in Clutch and gives meaning to the ℓ ↦ → and ↩→ ( , ì ) connectives.The : Cfg → iProp predicate is a specification interpretation that allows us to interpret and track the "other" program that we are constructing a coupling with-we return to its instantiation momentarily.
The key technical novelty and the essence of the weakest precondition is the coupling modality: ) says that there exists a series of (composable) couplings starting from configurations 1 and ′ 1 that ends up in some configurations 2 and ′ 2 such that the proposition holds.With this intuition in mind, the last clause of the weakest precondition says that the execution of ( 1 , 1 ) can be coupled with the execution of ′ 1 such that the state and specification interpretations still hold for the end configurations, and the weakest precondition holds recursively for the continuation 2 .
Coupling modality.The coupling modality is an inductively defined proposition in Clutch, formally defined as a least fixpoint of an equation with six different disjuncts found in the appendix [Gregersen et al. 2023b].The modality supports both synchronous and asynchronous couplings on both sides while ensuring that the left program takes at least one step.As it is inductively defined, we can chain together multiple couplings but it always ends in base cases that couple a single step of the left-hand side program-this aligns with the usual intuition that each unfolding of the recursively defined weakest precondition corresponds to one physical program step.
For instance, we can couple two physical program steps through the following constructor: Intuitively, this says that to show execCoupl( 1 , 1 ′ )( ) we (1) have to show that the configuration 1 is reducible which means that the program can take a step (this is to guarantee safety of the left-hand side program), (2) pick a relation and show that there exists an -coupling of the two program steps, and (3) for all configurations 2 , 2 ′ in the support of the coupling, the logical predicate ( 2 , 2 ′ ) holds.This rule is used to justify the classical coupling rule wp-couple-rands that (synchronously) couples two program samplings.
The coupling modality also allows to construct a coupling between a program step and a trivial (Dirac) distribution; this is used to validate proof rules that symbolically execute just one of the two sides.Indeed, the rule below allows us to progress the right-hand side independently from the left-hand side, but notice the occurrence of the coupling modality in the premise-this allows us to chain multiple couplings together in a single coupling modality.
To support asynchronous couplings, we introduce a state step reduction relation − → ⊆ State × [0, 1] × State that uniformly at random samples a natural number to the end of the tape : if ( ) = ( , ì ) and ≤ Let step ( ) denote the induced distribution of a single state step reduction of .The coupling modality allows us to introduce couplings between step ( ) and a sampling step: Note that here the left-hand side program does not take a physical step, thus the coupling modality appears in the premise as well.This particular rule is key to the soundness of the asynchronous coupling rule wp-couple-tape-l that couples a sampling to a tape on the left with a program sampling on the right.We use similar constructors of execCoupl to prove, e.g.rel-couple-tape-r.The crux is, however, that the extra state steps that we inject in the coupling modality to prove the asynchronous coupling rules do not matter (!) in the sense that they can be entirely erased as part of the coupling logic's adequacy theorem (Theorem 11).
A specification resource and context with run ahead.We will encode a relational specification into a unary specification by proving a unary weakest precondition about (the implementation), in which ′ (the specification) is tracked using a ghost resource spec( ′ ) that can be updated to reflect execution steps.The ghost specification connective spec( ′ ), together with the specCtx proposition, satisfies a number of symbolic execution rules following the operational semantics.
The specCtx proposition is an Iris invariant and its purpose is twofold: (1) it gives meaning to the ghost specification resource spec( ) and the heap and tape assertions, ℓ ↦ → s and ↩→ s ( , ì ), and (2) it connects the spec( ) resource to the program ′ that we are constructing a coupling with in the weakest precondition.We keep track of ′ through the specification interpretation .When constructing a final closed proof we will want to be equal to ′ , however, during proofs they are not always going to be the same-we will allow to run ahead of ′ .As a consequence, it will be possible to reason independently about the right-hand side without consideration of the left-hand side as exemplified by the rules below4 , that allow us to progress the specification program but without considering the weakest precondition or the left-hand side program.
Similarly looking rules exists for all the deterministic right-hand side reductions.
To connect the two parts we will keep specInterp • ( ) in the specification interpretation (that "lives" in the weakest precondition), and the corresponding specInterp • ( ) in specCtx: This ensures that the configuration tracked in the weakest precondition is the same as the configuration tracked in specCtx.On top of this, specCtx contains resources spec • ( ) and heaps( ) while guaranteeing that the configuration ( , ) can be reached in deterministic program steps from .The heaps( ) resource gives meaning-using standard Iris ghost theory-to the heap and tape assertions, ℓ ↦ → s and ↩→ s ( , ì ), just like the state interpretation in the weakest precondition.execConf : Cfg → D (Cfg) denotes the distribution of -step partial execution.By letting spec( ) ≜ spec • ( ) this construction permits the right-hand side program to progress (with deterministic reduction steps) without consideration of the left-hand side as exemplified by spec-pure and spec-store.However, when applying coupling rules that actually need to relate the two sides, the proof first "catches up" with spec( ) using the execCoupl rule that progresses the right-hand side independently, before constructing the coupling of interest.

Refinement Logic
Contextual refinement is a typed relation and hence logical refinement must be typed as well.To define the refinement logic, we first define a binary value interpretation Δ that characterizes the set of pairs of closed values ( 1 , 2 ) of type such that 1 contextually refines 2 .The definition follows the usual structure of ("logical") logical relations, see, e.g., Frumin et al. [2021b]; Timany et al. [2022], by structural recursion on and uses corresponding logical connectives.Functions are interpreted via (separating) implication, universal types are interpreted through universal quantification, etc., as found in the appendix [Gregersen et al. 2023b].The only novelty is the interpretation of the new type of tapes shown below: The interpretation requires that the values are tape labels, i.e., references to tapes, and that they are always empty as captured by the invariant.Intuitively, this guarantees through coupling rules and the symbolic execution rules from Figure 3 that we always can couple samplings on these tapes as needed in the compatibility lemma for t-rand as discussed in §5.3.Point-wise equality of the two tapes would also have been sufficient for the compatibility lemma but by requiring them to be empty we can prove general equivalences such as : tape ⊢ rand( ) ≃ ctx rand( , ) : nat.
The refinement judgment is defined using the coupling logic in combination with the binary value interpretation.Recall how the intuitive reading of the refinement judgment Δ ⊨ E 1 ≾ 2 : is that the expression 1 refines the expression 2 at type under the invariants in the mask E with interpretations of type variables in taken from Δ.Besides the coupling logic and the binary value interpretations, we will also make use of the resource naTok(E) that keeps track of the set of non-atomic invariants that are currently closed.
Putting everything together, the refinement judgment is formally defined as follows: The definition assumes that the right-hand side program is executing 2 and that the invariants in E are closed, and it concludes that the two executions can be aligned so that if 1 reduces to some value 1 then there exists a corresponding execution of 2 to a value 2 and all invariants have been Proc.ACM Program.Lang., Vol. 8, No. POPL, Article 26.Publication date: January 2024.closed.Moreover, the values 1 and 2 are related via the binary value interpretation Δ ( 1 , 2 ).By quantifying over , we close the definition under evaluation contexts on the right-hand side.For the left-hand side this is not needed as the weakest precondition already satisfies wp-bind.

Soundness
The soundness of the refinement judgment hinges on the soundness of the coupling logic.The goal of the coupling logic is to show a coupling of the execution of the two programs, but to establish a coupling of two distributions they must have the same mass.Intuitively, due to the approximative nature of step-indexed logics like Clutch, we need to show-at every logical step-index-that a coupling exists, even when the left-hand side program has not yet terminated.This means we might not have enough mass on the left-hand side to cover all of the mass on the right-hand side.For this reason we introduce a new notion of left-partial coupling.
Given a relation ⊆ × we say is an -left-partial-coupling if furthermore supp( ) ⊆ .We write 1 ≲ 2 : if there exists an -left-partial-coupling of 1 and 2 .This means that, for any ∈ D ( ) and any ⊆ × , the zero distribution 0 trivially satisfies 0 ≲ : .This reflects the asymmetry of both contextual refinement and our weakest preconditionit allows us to show that a diverging program refines any other program of appropriate type.
Left-partial couplings can also be constructed and composed along the monadic structure of the sub-distribution monad and are implied by regular couplings: Lemma 9.If 1 ∼ 2 : then 1 ≲ 2 : .Additionally, proving a (=)-left-partial-coupling coincides with the point-wise inequality of distributions that will allow us to reason about contextual refinement.
The adequacy theorem of the coupling logic is stated using left-partial couplings.
Theorem 11 (Adeqacy).Let :Val ×Val → Prop be a predicate on values in the meta-logic.If As a simple corollary, contextual refinement follows from continuity of exec .The proof of the adequacy theorem goes by induction in both and the execCoupl fixpoint, followed by a case distinction on the big disjunction in the definition of execCoupl.Most cases are simple coupling compositions along the monadic structure except the cases where we introduce state step couplings that rely on erasure in the following sense: Intuitively, this lemma tells us that we can prepend any program execution with a state step reduction and it will not have an effect on the final result.The idea behind the proof is that if we append a sampled value to the end of a tape, and if we eventually consume , then we obtain the same distribution as if we never appended in the first place.This is a property that one should not take for granted: the operational semantics has been carefully defined such that reading from an empty tape reduces to a value as well, and none of the other program operations can alter or observe the contents of the tape.This ensures that presampled values are untouched until consumed and that the proof and the execution is independent.
To show the soundness theorem of the refinement logic, we extend the interpretation of types to typing contexts-Γ Δ (ì , ì ) iff for every : in Γ then Δ ( , ) holds-and the refinement judgment to open terms by closing substitutions as usual: where 1 [ì /Γ] denotes simultaneous substitution of every from Γ in 1 by the value .
We then show, using the structural and symbolic execution rules of the refinement judgment, that the typing rules are compatible with the relational interpretation: for every typing rule, if we have a pair of related terms for every premise, then we also have a pair of related terms for the conclusion.See for instance the compatibility rule for t-rand below in the case = tape that follows using rel-bind and rel-couple-tapes.rand-compat As a consequence of the compatibility rules, we obtain the fundamental theorem of logical relations.
Theorem 13 (Fundamental theorem).Let Ξ | Γ ⊢ : be a well-typed term, and let Δ assign a relational interpretation to every type variable in Ξ.Then Δ | Γ ⊨ ≾ : .The compatibility rules, moreover, yield that the refinement judgment is a congruence, and together with Theorem 11 we can then recover contextual refinement: Theorem 14 (Soundness).Let Ξ be a type variable context, and assume that, for all Δ assigning a relational interpretation to all type variables in Ξ, we can derive

CASE STUDIES
In the coming sections, we give an overview of some of the example equivalences we have proven with Clutch.Further details are found in the appendix [Gregersen et al. 2023b] and our Coq development.In particular, in the appendix [Gregersen et al. 2023b] we discuss an example by Sangiorgi and Vignudelli [2016], which previous probabilistic logical relations without asynchronous couplings could not prove [Bizjak 2016, Sec. 1.5].

Lazy/Eager Coin
In this section we give a more detailed proof of the lazy-eager coin example from §1.We will go through the proof step by step but omit the use of rel-pure-l and rel-pure-r which should be interleaved with the application of most of the mentioned proof rules.
Recall the definitions of lazy and eager from §1.The goal is to show ⊢ lazy ≃ ctx eager : unit → bool by first showing lazy ≾ ctx eager : unit → bool and then eager ≾ ctx lazy : unit → bool.
To show lazy ≾ ctx eager : unit → bool, we first define an intermediate labeled version lazy ′ of lazy (found in §2).By transitivity of contextual refinement and Theorem 1 it is sufficient to show ⊨ lazy ≾ lazy ′ : unit → bool and ⊨ lazy ′ ≾ eager : unit → bool.
The first refinement ⊨ lazy ≾ lazy ′ : unit → bool is mostly straightforward.By applying rel-alloc-l followed by rel-alloc-tape-r and rel-alloc-r we are left with the goal of proving that the two thunks are related, given ↩→ s (1, ), ℓ ↦ → None and ℓ ′ ↦ → s None for some fresh label and fresh locations on the heap ℓ and ℓ ′ .Using rel-na-inv-alloc we allocate the invariant with some name N that expresses how the tape is always empty and that either both ℓ and ℓ ′ contain None or both contain Some( ) for some .We continue by rel-rec after which we open the invariant and do a case distinction on the disjunction in the invariant.If ℓ and ℓ ′ are empty, this is the first time we invoke the function.We continue using rel-load-l and rel-load-r after which we are left with the goal We continue using rel-rand-erase-r to couple the two flips, we follow by rel-store-l and rel-store-r to store the fresh bit on the heaps, we close the invariant (now showing the right disjunct as the locations have been updated) using rel-na-inv-close, and we finish the case using rel-return as the program returns the same Boolean on both sides.If ℓ and ℓ ′ were not empty, this is not the first time the function is invoked and we straightforwardly load the same Boolean on both sides using rel-load-l and rel-load-r and finish the proof using rel-na-inv-close and rel-return.
For the second refinement ⊨ lazy ′ ≾ eager : unit → bool we start by allocating the tape on the left using rel-alloc-tape-l which gives us ownership of a fresh tape ↩→ (1, ).We now couple the tape with the unlabeled flip() on the right using rel-couple-tape-l.This gives us that for some then ↩→ (1, ) and the flip() on the right returned as well.We continue by allocating the reference on the left using rel-alloc-l which gives us some location ℓ and ℓ ↦ → None.Now, we allocate the invariant which expresses that either the location ℓ is empty but is on the tape, or has been stored at ℓ.We are now left with proving that the two thunks are related under this invariant.We continue using rel-rec after which we open the invariant using rel-na-inv-open, do a case distinction on the disjunction, and continue using rel-load-l.If the location ℓ is empty, we have to show But as we own ↩→ (1, ) we continue using rel-rand-tape-l, rel-store-l, rel-na-inv-close (now establishing the right disjunct as ℓ has been updated), and rel-return as the return value is the same on both sides.If the location ℓ was not empty, we know ℓ ↦ → Some( ) which means rel-load-l reads from ℓ and we finish the proof using rel-na-inv-close and rel-return.
The proof of eager ≾ ctx lazy : unit → bool is analogous and we have shown the contextual equivalence of the programs eager and lazy.

ElGamal Public Key Encryption
An encryption scheme is seen as secure if no probabilistic polynomial-time (PPT) adversary A can break it with non-negligible probability.A common pattern in cryptographic security proofs are security reductions.To perform a reduction, one assumes that such an adversary A exists, and constructs another PPT adversary B that, using A, solves a computational problem that is believed to be hard.By contradiction, this means the construction is secure under the assumption PK real ≜ let ( , in ( , query) (a) The security games.
[ that the problem is hard.A crucial proof step is showing that B together with corresponds to the original construction which can be thought of as the "soundness" of the security reduction.In this section, we use Clutch to show the soundness of a security reduction of the ElGamal public key encryption scheme [Elgamal 1985] to the decisional Diffie-Hellman (DDH) computational assumption.
The ElGamal construction is a public key encryption scheme consisting a tuple of algorithms (keygen, enc, dec) whose implementation in F rand ,ref is shown in Figure 7.The implementation is parameterized by a group which serves to represent messages, ciphertexts, and keys.We write = (1, • , − −1 ) for a finite cyclic group of order | |, generated by , and let = | | − 1. Intuitively, to show that ElGamal encryption is secure it suffices to show that, given the DDH assumption holds for the group , an adversary A cannot distinguish an encrypted message from a random ciphertext (see, e.g., [Rosulek 2020, §15.3]).The DDH assumption for a group says that the two games DH real and DH rand in Figure 9 are PPT-indistinguishable which intuitively means that the value looks random, even to someone who has seen and .The intuitive notion of encryption scheme security can be made precise 5 as the indistinguishability of two security games, i.e., stylized interactions, PK real and PK rand shown in Figure 8, by a PPT 6 adversary.Here we interpret the notion of an "adversary" as a program context.Both security 5 Several formulations exist in the literature; we take inspiration from the textbook presentation of Rosulek [2020]. 6Polynomial-time with respect to the security parameter, i.e. the logarithm of the size of the group for ElGamal.games are initialised by generating a secret/public-key pair ( , ), of which is returned to the adversary (the context).The adversary gets to examine the public key and an "encryption oracle" query, i.e., a partial application of the encryption function specialized to a particular key.The difference between PK real and PK rand lies in the query function.While PK real encrypts the message provided as input, PK rand instead returns a randomly sampled ciphertext.Both games use a counter to ensure that the query oracle can be called only once.One attempt at distinguishing the security games will thus correspond exactly to one attempt at distinguishing DH real from DH rand .
The idea now is to use Clutch as a step towards reducing indistinguishability of PK real and PK rand to the indistinguishability of DH real and DH rand .Specifically, we will exhibit a context C and show Then we can complete the reduction on paper (outside of Clutch) as follows. 7To prove that the DDH assumption implies public key security, we assume the contrapositive, i.e., that there exists an adversarial context A that can distinguish PK real from PK rand .Using (1) and ( 2 2) is given by Figure 8b (note that the hole is in the first line).The proof that is PPT is outside of the scope of Clutch.
We will only focus on the first equation ( 1), since the proof of ( 2) is similar.The proof proceeds via an intermediate program, PK tape real , which differs from PK real only in that the random sampling in query is labelled with the tape .By transitivity, it suffices to show that ⊢ PK real ≃ ctx PK tape real : and ⊢ PK tape real ≃ ctx C [DH real ] : , as displayed in Figure 10.The first equivalence is trivial.The essential difference between PK tape real and C [DH real ] is that the query function in PK tape real samples lazily, whereas in C [DH real ], the sampling of occurs eagerly in the beginning.The proof now proceeds in a manner similar to the lazy-eager coin example; details can be found in the formalization.
Clutch is well-suited for proving the soundness of the reduction for two reasons.Firstly, any public key encryption scheme can only be secure if it employs randomized encryption [Goldwasser and Micali 1984].Dealing with randomization is thus unavoidable.Secondly, reasoning about the encryption oracle involves moving the random sampling used in the encryption across a function boundary (the query oracle) as we saw.This part of the argument crucially relies on asynchronous couplings.Systems like EasyCrypt and CertiCrypt handle this part of the argument through special-purpose rules for swapping statements that allows moving the random sampling outside the function boundary.However, it crucially relies on the fact that these works consider first-order languages with global state and use syntactic criteria and assertions on memory disjointness.Note moreover that our security formulation makes crucial use of the fact that F rand ,ref is higherorder, randomized, and supports local state to return the query closure as a first class value.This allows us to capture the textbook cryptographic notion of adversaries and of a (closed-box) "oracle" precisely using standard notions such as higher-order functions and contextual equivalence, without introducing special linguistic and logical categories of adversaries parameterized by a set of oracles.

Hash Functions
When analyzing data structures that use hash functions, one commonly models the hash function under the uniform hash assumption or the random oracle model [Bellare and Rogaway 1993].That is, a hash function ℎ from a set of keys to values behaves as if, for each key , the hash ℎ( ) is randomly sampled from a uniform distribution over , independently of all the other keys.Of course, hash functions are not known to satisfy this assumption perfectly, but it can nevertheless be a useful modeling assumption for analyzing programs that use hashes.
The function eager_hash in Figure 11 encodes such a model of hash functions in F rand ,ref .
(We explain the reason for the "eager" name later.)Given a non-negative integer , executing eager_hash returns a hash function with = {0, . . ., } and = B. To do so, it initializes a mutable map and then calls sample_all, which samples a Boolean with flip for each key and stores the results in .These Booleans serve as the hash values.On input , the hash function returned by eager_hash looks up in the map and returns the result, with a default value of false if ∉ .
However, this model of uniform hash functions can be inconvenient for proofs because all of the random hash values are sampled eagerly when the function is initialized.To overcome this, an important technique in pencil-and-paper proofs is to show that the hash values can be sampled lazily (see, e.g., Mittelbach and Fischlin [2021]).That is, we only sample a key 's hash value when it is hashed for the first time.This lets us more conveniently couple that sampling step with some step in another program.
Motivated by applications to proofs in cryptography, Almeida et al. [2019] formalized in Easy-Crypt a proof of equivalence between an eager and lazy random oracle.Although sufficient for their intended application, this proof was done in the context of a language that uses syntactic restrictions to model the hash function's private state.To the best of our knowledge, no such equivalence proof between lazy and eager sampling has previously been given for a language with higher-order state and general references.
As an application of Clutch, we prove such an equivalence in F rand ,ref .
The function lazy_hash shown in Figure 11 encodes the lazy sampling version of the random hash generator.For its internal state, the lazy hash uses two mutable maps: the tape map stores tapes to be used for random sampling, and the value map stores the previously sampled values for keys that have been hashed.After initializing these maps, it calls alloc_tapes, which allocates a tape for each key ∈ and stores the associated tape in , but does not yet sample hashes for any keys.The hash function returned by lazy_hash determines the hash for a key in two stages.It first looks up in to see if already has a previously sampled hash value, and if so, returns the found value.Otherwise, it looks up in the tape map .If no tape is found, then must not be in , so the function returns false.If a tape is found, then the code samples a Boolean from this tape with flip, stores for the key in , and then returns .
We prove that the eager and lazy versions are contextually equivalent, that is, ⊢ eager_hash ≃ ctx lazy_hash : int → bool.The core idea behind this contextual equivalence proof is to maintain an invariant between the internal state of the two hash functions.Let be the internal map used by the eager hash and let and be the tape and value maps, respectively, for the lazy hash.Then, at a high level, the invariant maintains the following properties: (1) dom( ) = dom( ) = {0, . . ., }.
(2) For all ∈ {0, . in must match that of , or it has not been looked up (case b) and the tape for the key must contain the same value as [ ] for its next value.
To establish this invariant when the hashes are initialized, we asynchronously couple the eager hash function's flip for key with a tape step for the tape associated with in the lazy table.The invariant ensures that the values returned by the two hash functions will be the same when a key is queried.The cases of the invariant correspond to the branches of the lazy function's match statements: if the key is in and has been queried before, the maps will return the same values found in and .If it has not been queried before, then flip in the lazy version will sample the value on the tape for the key, which matches [ ].Moreover, the update that writes this sampled value to preserves the invariant, switching from case (b) to case (a) for the queried key.We have used this more convenient lazy encoding to verify examples that use hash functions.For instance, one scheme to implement random number generators is to use a cryptographic hash function [Barker and Kelsey 2015].The program init_hash_rng in Figure 12a implements a simplified version of such a scheme.
When run, init_hash_rng generates a lazy hash function for the key space = {0, . . ., MAX} for some fixed constant MAX.It also allocates a counter as a reference initialized to 0. It returns a sampling function, let us call it ℎ, that uses and to generate random Booleans.Each time ℎ is called, it loads the current value from and hashes with to get a Boolean .It then increments and returns the Boolean .Repeated calls to ℎ return independent, uniformly sampled Booleans, so long as we make no more than MAX calls.
We prove that init_hash_rng is contextually equivalent to a "bounded" random number generator init_bounded_rng in Figure 12b that directly calls flip.The proof works by showing that, so long as ≤ MAX, then each time a sample is generated, the value of will not have been hashed before.Thus, we may couple the random hash value with the flip call in init_bounded_rng.This argument relies on the fact that the counter is private, encapsulated state, which is easy to reason about using the relational judgment since Clutch is a separation logic.

Lazily Sampled Big Integers
Certain randomized data structures, such as treaps [Seidel and Aragon 1996], need to generate random priorities as operations are performed.One can view these priorities as an abstract data type equipped with a total order supporting two operations: (1) a sample function that randomly generates a new priority according to some distribution, and (2) a comparison operation that takes a pair of priorities ( 1 , 2 ) and returns −1 (if 1 < 2 ), 0 (if 1 = 2 ), or 1 (if 2 < 1 ).The full details of how priorities are used in such data structures are not relevant here.Instead, what is important to know is that it is ideal to avoid collisions, that is, sampling the same priority multiple times.A simple way to implement priorities is to represent them as integers sampled from some fixed set {0, . . ., }.However, to minimize collisions, we may need to make very large.But making large has a cost, because then priorities requires more random bits to generate and more space to store.An alternative is to lazily sample the integer that represents the priority.Because we only need to compare priorities, we can delay sampling bits of the integer until they are needed to resolve ties during comparisons.A lazily-sampled integer can be encoded as a pair of a tape label and a linked list of length at most , where each node in the list represents a digit of the integer in base , with the head of the list being the most significant digit.
In the appendix [Gregersen et al. 2023b], we describe such an implementation of lazily-sampled integers, with = 8 and = 2 32 .Our Coq development contains a proof that this implementation is contextually equivalent to code that eagerly samples a 256-bit integer by bit-shifting and adding 8 32-bit integers.Crucially, this contextual equivalence is at an abstract existential type .Specifically, we define the type of abstract priorities ≜ ∃ .(unit → ) × (( × ) → int).Then we have the equivalence ⊢ (sample_lazy_int, cmp_lazy) ≃ ctx (sample256, cmp) : where cmp is just primitive integer comparison.The proof uses tapes to presample the bits of the lazy integer and couples these with the eager version.The cmp_lazy function traverses and mutates the linked lists representing the integers being compared, which separation logic is well-suited for reasoning about.

COUNTEREXAMPLES
This section justifies some design choices in Clutch by presenting counterexamples showing the unsoundness of two variants of the logic.In the first counterexample, we show that annotating sampling statements with tape labels is needed in our current formulation of the logic, since their omission leads to unsoundness.In the second, we show that combining prophecy variables [Jung et al. 2020] with the usual coupling rules of pRHL (without presampling) is unsound, implying that presampling cannot somehow be implemented in terms of prophecy variables.

Syntactic Restriction on Presampling
One may wonder whether it is necessary for tapes and labels to appear in the program and program state, but they do in fact play a subtle yet crucial role.Consider the following program flip_or that applies a logical disjunction to two fresh samples:

||
and compare it to the program flip ≜ flip() that just samples a bit.These two programs are obviously not contextually equivalent: with probability 3/4 the program flip_or will return true whereas the program flip only does so with probability 1/2.Yet, if we introduce a rule for flip that could draw from any presampling tape (i.e., without requiring sampling statements to be annotated with the tape they will draw from), the logic would allow one to "prove" that they are equivalent.Assume the following (unsound!)rule that says that when sampling on the left-hand side, we may instead draw a bit from some proverchosen presampling tape .To see why this rule cannot be sound, we will show ⊨ flip ≾ flip_or : bool.
First, we introduce two tapes with resources 1 ↩→ (1, ) and 2 ↩→ (1, ) on the left-hand side, either explicitly allocated in code as in Clutch or as pure ghost resources, if that is possible in our hypothetical logic.Second, we couple the tape 1 with the -sampling and 2 with the -sampling using rel-couple-tape-l such that we end up with 1 ↩→ (1, 1 ) and 2 ↩→ (1, 2 ) and the goal ⊨ flip() ≾ 1 || 2 : bool.Finally, we do a case distinction on both 1 and 2 : if both are true, or both are false, it does not matter which tape we use when applying rel-tape-unsound.If, on the other hand, only is true, we choose and apply rel-tape-unsound which finishes the proof.
The crucial observation is that by labeling tapes in the program syntax, however, we prevent the prover from doing case analysis on presampled values to decide which tape to read-the syntax will dictate which tape to use and hence which value to read.Concretely, in F rand ,ref , unlabeled flips always reduce uniformly at random and only labeled sampling statements will read from presampling tapes which prevents us from proving the unsound rel-tape-unsound.
Besides motivating why soundly allowing presampling is subtle, this counterexample also emphasizes why the fact that labels appear in the program and in the program syntax is important.We do not claim that these annotations are absolutely necessary for some kind of presampling to be sound, as some very different formulation of the logic might be able to avoid them, but like for prophecy variables [Jung et al. 2020] where similar "ghost information" is needed in the actual program code, it is not obvious how to do without it.We remind the reader that presampling tapes nevertheless remain a proof-device as tapes can be erased through refinement as discussed in §2.

Incompatibility with Prophecy Variables
Presampling tapes bear some resemblance to prophecy variables in that they give us the means to talk about the future.However, prophecy variables, as previously developed in the context of Iris [Jung et al. 2020], are unsound for the (synchronous) coupling logic as illustrated below.
Assume the existence of two operators NewProph and Resolve to in our programming language and their (unsound for Clutch!) Hoare-triple specifications found below.The specifications give us access to Boolean one-shot prophecies [Jung et al. 2020].NewProph allocates a fresh prophecy variable and a resource Proph( , ) that tracks its future resolution .Given ownership of Proph( , ) then Resolve to ′ resolves the prophecy variable to a value ′ and knowledge that = ′ was the case all along.To see why these operations and rules cannot be sound in the coupling logic, we will show ⊨ flip_proph ≾ flip : bool where Resolve to ; && which cannot be the case as flip_proph returns true only with probability 1/4.We unfold the relational judgment and apply wp-newproph-unsound which gives us a prophecy about and its future resolution .If is true, the evaluation on the left is predetermined to be &&true = .By coupling the sampling of with the flip() on the right using rel-couple-rands, we finish using rel-rand-l and wp-resolve-unsound.On the other hand, if is false, the evaluation on the left is predetermined to be && false = false.We apply rel-rand-l first and couple the sampling of with the flip() on the right using rel-couple-rands and finish using wp-resolve-unsound.
The counterexample shows that prophecy variables are unsound for the coupling logic, for the same reason that presampling is unsound without syntactic tape labels: If the prover can predict the outcomes of random samples ahead of time, it gives them too much power to choose which sampling they couple with.

COQ FORMALIZATION
All the results presented in the paper, including the background on probability theory, the formalization of the logic, and the case studies have been formalized in the Coq proof assistant [The Coq Development Team 2022].The results about probability theory are built on top of the Coquelicot library [Boldo et al. 2015], extending their results to real series indexed by countable types.
Although we build our logic on top of Iris [Jung et al. 2018], significant work is involved in formalizing the operational semantics of probabilistic languages, our new notion of weakest precondition that internalizes the coupling-based reasoning, and the erasure theorem that allows us to conclude the existence of a coupling.Our development integrates smoothly with the Iris Proof Mode [Krebbers et al. 2017b] and we have adapted much of the tactical support from ReLoC [Frumin et al. 2021b] to reason about the relational judgment.

RELATED WORK
Separation logic.Relational separation logics have been developed on top of Iris for a range of properties, such as contextual refinement [Frumin et al. 2021b;Krebbers et al. 2017b;Timany and Birkedal 2019;Timany et al. 2018], simulation [Chajed et al. 2019;Gäher et al. 2022;Timany et al. 2021], and security [Frumin et al. 2021a;Georges et al. 2022;Gregersen et al. 2021].The representation of the right-hand side program as a resource is a recurring idea, but our technical construction with run ahead is novel.With the exception of Tassarotti and Harper [2019], probabilistic languages have not been considered in Iris.Tassarotti and Harper develop a logic to show refinement between a probabilistic program and a semantic model, not a program.The logic relies on couplings, but it requires synchronization of sampling.
In Batz et al. [2019], a framework in which logical assertions are functions ranging over the nonnegative reals is presented.The connectives of separation logic are given an interpretation as maps from pairs of non-negative reals to the positive reals.This work focuses on proving quantitative properties of a single program, e.g., bounding the probability that certain events happen.A variety of works have developed separation logics in which the separating conjunction models various forms of probabilistic independence [Bao et al. 2021[Bao et al. , 2022;;Barthe et al. 2020].For example, the statement * is taken to mean "the distribution of is independent from the distribution of ".
Prophecy variables [Abadi andLamport 1988, 1991] have been integrated into separation logic in both unary [Jung et al. 2020] and relational settings [Frumin et al. 2021b].The technical solution uses program annotations and physical state reminiscent of our construction with presampling tapes, but prophecy resolution is a physical program step, whereas presampling in our work is a logical operation.Prophecies can also be erased through refinement [Frumin et al. 2021b].
Probabilistic couplings.Probabilistic couplings are a technique from probability theory that can be used to prove equivalences between distributions or mixing times of Markov chains [Aldous 1983].In computer science, they have been used to reason about relational properties of programs such as equivalences [Barthe et al. 2015] and differential privacy [Barthe et al. 2016a].However, these logics requires the sampling points on both programs to be synchronized in order to construct couplings.In a higher-order setting, the logic by Aguirre et al. [2018] establish so-called "shift couplings" between probabilistic streams that evolve at different rates, but these rules are ad-hoc and limited to the stream type.Also in the higher-order setting, Aguirre et al. [2021] use couplings to reason about adversarially-defined properties, however they only support synchronous couplings, first-order global state, and use a graded state monad to enforce separation of adversary memories.

Logical relations.
Step-indexed logical relations have been applied to reason about contextual equivalence of probabilistic programs in a variety of settings.Bizjak and Birkedal [2015] develop logical relations for a language similar to ours, although only with first-order state.This work has since been extended to a language with continuous probabilistic choice (but without state and impredicative polymorphism) [Wand et al. 2018], for which equivalence is shown by establishing a measure preserving transformation between the sources of randomness for both programs.Recently, this was further extended to support nested inference queries [Zhang and Amin 2022].
Another line of work [Dal Lago andGavazzo 2021, 2022] uses so called differential logical relations to reason about contextual distance rather than equivalence.Programs are related using metrics rather than equivalence relations, which allows to quantify how similar programs are.
Cryptographic frameworks.CertiCrypt [Barthe et al. 2009[Barthe et al. , 2010] ] is a framework for cryptographic game-playing proofs written in a simple probabilistic first-order while-language ("pWhile").CertiCrypt formalizes a denotational semantics for pWhile in Coq and supports reasoning about the induced notion of program equivalence via a pRHL, and provides dedicated tactics for lazy/eager sampling transformations.These kind of transformations are non-trivial for expressive languages like ours.CertiCrypt also provides a quantitative unary logic.
EasyCrypt [Barthe et al. 2013] is a standalone prover for higher-order logic building on Cer-tiCrypt's ideas.It leverages the first-order nature of pWhile for proof automation via SMT solvers.EasyCrypt extends pWhile with a module system [Barbosa et al. 2021] to support reasoning about abstract code as module parameters.It integrates a quantitative unary logic with pRHL, and supports reasoning about complexity in terms of oracle calls [Barbosa et al. 2021].Both automation and these kind of properties are out of scope for our work but would be interesting future directions.
In FCF [Petcher and Morrisett 2015], programs are written as Coq expressions in the free subdistributions monad.Proofs are conducted in a pRHL-like logic, where successive sampling statements can be swapped thanks to the commutativity of the monad.
SSProve [Abate et al. 2021;Haselwarter et al. 2021] supports modular crypto proofs by composing "packages" of programs written in the free monad for state and probabilities.The swap rule in SSProve allows exchanging commands which maintain a state invariant.Reasoning about dynamically allocated local state is not supported.
IPDL [Gancher et al. 2023] is a process calculus for stating and proving cryptographic observational equivalences.IPDL is mechanized in Coq and targeted at equational reasoning about interactive message-passing in high-level cryptographic protocol models, and hence considers a different set of language features.

CONCLUSION
We have presented Clutch, a novel higher-order probabilistic relational separation logic with support for asynchronous probabilistic coupling-based proofs of contextual refinement and equivalence of probabilistic higher-order programs with local state and impredicative polymorphism.We have proved the soundness of Clutch formally in Coq using a range of new technical concepts and ideas such as left-partial couplings, presampling tapes, and a coupling modality.We have demonstrated the usefulness of our approach through several example program equivalences that, to the best of our knowledge, were not possible to establish with previous methods.

DATA AVAILABILITY STATEMENT
The Coq formalization accompanying this work is available on Zenodo [Gregersen et al. 2023a] and on GitHub at https://github.com/logsem/clutch.

Fig. 1 .
Fig. 1.Illustration of an asynchronous coupling established through the rule rel-couple-tape-l.

Fig. 2 .
Fig. 2. Selected structural and symbolic execution rules for the Clutch refinement judgment.

Lemma
Fig. 6.Selected structural rules of the weakest preconditon.
) we then get that A can distinguish C [DH real ] from C [DH rand ].But this means that A [C [−]] is a context that can distinguish the DDH games, and hence contradicts our assumption, if A [C [−]] is PPT.The context C for (1) and (
Fig. 11.Eager and lazy models of hash functions.