A Deductive Verification Infrastructure for Probabilistic Programs

This paper presents a quantitative program verification infrastructure for discrete probabilistic programs. Our infrastructure can be viewed as the probabilistic analogue of Boogie: its central components are an intermediate verification language (IVL) together with a real-valued logic. Our IVL provides a programming-language-style for expressing verification conditions whose validity implies the correctness of a program under investigation. As our focus is on verifying quantitative properties such as bounds on expected outcomes, expected run-times, or termination probabilities, off-the-shelf IVLs based on Boolean first-order logic do not suffice. Instead, a paradigm shift from the standard Boolean to a real-valued domain is required. Our IVL features quantitative generalizations of standard verification constructs such as assume- and assert-statements. Verification conditions are generated by a weakest-precondition-style semantics, based on our real-valued logic. We show that our verification infrastructure supports natural encodings of numerous verification techniques from the literature. With our SMT-based implementation, we automatically verify a variety of benchmarks. To the best of our knowledge, this establishes the first deductive verification infrastructure for expectation-based reasoning about probabilistic programs.


INTRODUCTION AND OVERVIEW
Probabilistic programs differ from ordinary programs by the ability to base decision on samples from probability distributions.They are found in randomized algorithms, communication protocols, models of physical and biological processes, and -more recently -statistical models used in machine learning and artificial intelligence (cf.[Barthe et al. 2020;Gordon et al. 2014]).Typical questions in the design and analysis of probabilistic programs are concerned with quantifying aspects of their expected -or averagebehavior, e.g. the expected runtime of a randomized algorithm, the expected number of retransmissions in a protocol, or the probability that a particle reaches its destination.
Writing correct probabilistic programs is notoriously hard.They may contain subtle bugs occurring with low probability or undesirably favor certain results in the long run.In fact, reasoning about the expected behavior of probabilistic programs is known to be strictly harder than for ordinary programs [Kaminski et al. 2019  There exists a plethora of research on verification techniques for probabilistic programs, ranging from program logics (cf.[Kaminski et al. 2018;McIver and Morgan 2005]) to highly specialized proof rules [Hark et al. 2019;McIver et al. 2018], often with little (if any) automation.These techniques are based on different branches of mathematics -e.g.domain theory or martingale analysis -and their relationships are non-trivial (cf.Takisaka et al. [2021]).This poses major challenges for comparing -let alone combining -such different approaches.
In this paper, we build a verification infrastructure for reasoning about the expected behavior of (discrete) probabilistic programs; Figure 1 gives an overview.Modern program verifiers for nonprobabilistic programs often have a front-end that translates a given program and its specification into an intermediate language, such as Boogie [Leino 2008], Why3 [Filliâtre and Paskevich 2013], or Viper [Müller et al. 2016b].Such intermediate languages enable the encoding of complex verification techniques, while allowing for the separate development of efficient back-ends, e.g.verification condition generators.In this very spirit, we introduce a novel quantitative intermediate verification language that enables researchers to (i) prototype and automate new verification techniques, (ii) combine proof rules, and (iii) benefit from back-end improvements.Before we dive into details, we discuss five examples of probabilistic programs from the literature that have been verified with five different techniques -all of them have been encoded in our language and verified with our tool.
Example 1.1 (Rabin's Mutual Exclusion Protocol [Kushilevitz and Rabin 1992]).This protocol controls processes competing for access to a critical section.To determine which process gets access, every process will repeatedly toss a fair coin until it sees heads; the process that needed the largest number of tosses is then granted access.Figure 2 shows a probabilistic program modeling Rabin's protocol:  is the number of remaining processes competing for access.While more than 1 competitor remains, each competitor tosses one coin (inner loop).If the coin shows heads (i.e. if flip(0.5)samples a 1), that competitor is removed from the pool of remaining competitors (by subtracting  = 1 from ).One can verify with the weakest liberal preexpectation calculus by McIver and Morgan [2005] that the probability to select exactly one process (plus the probability of nontermination) is at least 2 /3 if there are initially at least 2 processes.
Example 1.2 (The Coupon Collector [Wikipedia 2023a]).Figure 3 models the coupon collector problem -a well-known problem in probability theory: Suppose any box of cereals contains one of  different coupons.What is the average number of boxes one needs to buy to collect at least one of all  different coupons, assuming that each coupon type occurs with the same probability?Our formulation is taken from [Kaminski et al. 2018]; the authors develop an expected runtime  [Hark et al. 2019] calculus and use invariant-based arguments to show that the expected number of loop iterations, which coincides with the average number of boxes one needs to buy, is bounded from above by  •   , where   is the  -th harmonic number.
Example 1.3 (Lossy List Traversal [Batz et al. 2019]).Figure 4 depicts a recursive function implementing a lossy list traversal; it flips a fair coin (using the probabilistic choice { . . .} [0.5] { . . .}) and, depending on the outcome, either calls itself with the list's tail or diverges, i.e. enters an infinite loop.Using the weakest preexpectation calculus [Kozen 1983;McIver and Morgan 2005], one can prove that this program terminates with probability at most 0.5 len ( ) .Analyzing the lossy list traversal is intuitive -for every non-empty list, there is exactly one execution that does not diverge; its probability is 0.5 len ( ) .What is noteworthy, however, is that even for such a simple program, we need to reason about an exponential function.This is common when verifying probabilistic programs: proving non-trivial bounds often requires non-linear arithmetic.
Example 1.4 (Fair Random Walk [Wikipedia 2023b]).Figure 5 depicts a variant of a one-dimensional random walk of a particle with position  -a well-studied model in physics.Analyzing the program's termination behavior is hard because the probability  of moving to the left or right changes in every loop iteration depending on the previous position .McIver et al. [2018] propose a proof rule based on quasi-variants that allows proving that this program terminates almost-surely, i.e. with probability one.Fair random walks, i.e. if  = 1 /2, are well-known to terminate almost-surely but still have infinite expected runtime.
Example 1.5 (Lower Bounds on Expected Values [Hark et al. 2019]).Figure 6 shows an another loop whose control flow depends on the outcome of coin flips.Hark et al. [2019] studied this example to demonstrate that induction-based proof rules for lower bounds1 , which are sound for classical verification, may become unsound when reasoning about probabilistic programs.The authors used martingale analysis and the optional stopping theorem to develop a sound proof rule capable of proving that, whenever  ≠ 0 initially holds, then the expected value of  after the program's termination is at least 1 + .
Challenges.We summarize the challenges of developing an infrastructure for automated verification of probabilistic programs unvealed by the examples in Figures 2 to 6: First, there are many different verification techniques for probabilistic programs that are based on different concepts, e.g.quantitative invariants, quasi-variants, different notions of martingales, or stopping times of stochastic processes.Developing a language that is sufficiently expressive to encode these techniques while keeping it amenable to automation is a major challenge.
Second, verification of probabilistic programs involves reasoning about both lower-and upper bounds on expected values.This is different from classical program verification, which can be understood as proving that a given precondition implies a program's weakest precondition, i.e. pre ⇒ wp  (post).In other words, pre is a lower bound (in the Boolean lattice) on wp  (post).Proving upper bounds, i.e. wp  (post) ⇒ pre, has received scarce attention. 2hird, in Figures 3 to 5, we noticed that verification of probabilistic programs often involves reasoning about unbounded random variables and non-linear arithmetic involving exponentials, harmonic numbers, limits, and possibly infinite sums.
Our approach.We address the first challenge by developing a quantitative IVL and a real-valued logic tailored to verification of probabilistic programs.The IVL features quantitative generalizations of standard verification constructs such as assume-and assert-statements.Our quantitative constructs are inspired by Gödel logics [Baaz 1996;Preining 2010].In particular, they have dual co-constructs for verifying upper-instead of lower bounds, thereby addressing the second challenge.These dual constructs are not only interesting for quantitative reasoning, but indeed also for Boolean reasoning à la wp  (post) ⇒ pre.To address the third challenge, we rely on modern SMT solvers' abilities to deal with custom theories, standard techniques for limiting the number of user-defined function applications, and custom optimizations.
Figure 7 shows a program written in our quantitative IVL; it encodes the verification of Example 1.3.We use a coprocedure to prove that the quantitative precondition exp(0.5, len()) = 0.5 len( ) is an upper bound on the procedure's termination probability3 given by the quantitative postcondition 1.We establish the above bound for the procedure body while assuming that it holds for recursive calls (cf.[Olmedo et al. 2016]).Our dual quantitative assert-and assume-statements encode the call in the usual way: we assert the procedure's pre and assume its post.
Contributions.The main contributions of our work are: (1) A novel intermediate verification language (→ Section 3) for automating probabilistic program verification techniques featuring quantitative generalizations of standard verification constructs, e.g.assert and assume, and a formalization of its semantics based on a realvalued logic (→ Section 2) with constructs inspired by Gödel logics.
(2) Encodings of verification techniques and proof rules with different theoretical underpinnings (e.g.domain theory, martingales, and the optional stopping theorem) taken from the probabilistic program verification literature into our intermediate language (→ Section 4).(3) An SMT-backed verification infrastructure that enables researchers to prototype and automate verification techniques for probabilistic programs by encoding to our intermediate language, an experimental evaluation of its feasibility, and a prototypical frontend for verifying programs written in the probabilistic guarded command language (→ Section 5).

HEYLO: A QUANTITATIVE ASSERTION LANGUAGE
When analyzing quantitative program properties such as runtimes, failure probabilities, or space usage, it is often more direct, more intuitive, and more practical to reason directly about values like the runtime  2 , the probability 1 /2  , or a list's length, instead of predicates like rt =  2 , prob ≤ 1 /2  , or length() > 0 (cf., [Kaminski et al. 2018;Ngo et al. 2018]).This section introduces HeyLo -a real-valued logic for quantitative verification of probabilistic programs, which aims to take the role that predicate logic has for classical verification.By syntactifying real-valued functions, HeyLo serves as (1) a language for specifying quantitative propertiesin particular those that McIver and Morgan [2005] (and many other authors) call expectations4 -, and (2) a foundation for automation by reducing many verification problems to a decision problem for HeyLo, e.g.validity or entailment checking.To ensure that HeyLo is expressive enough for (1), we design it reminiscently of the language by Batz et al. [2021b], which is relatively complete for the verification of probabilistic programs.
To ensure that HeyLo is suitable for (2), HeyLo is first-order, so as to simplify automation.Moreover, verification problems can often be stated as inequalities between to functions.To ensure that such inequalities can, in principle, be encoded into a single decision problem for HeyLo, we introduce quantitative (co)implications -which provide a syntax for comparing HeyLo formulae -and prove an analogue to the classical deduction theorem for predicate logic [Kleene 1952].Supporting comparisons between expectations via (co)implications is essential for encoding proof rules for probabilistic programs.The (co)implications are inspired by intuitionistic Gödel logics [Baaz 1996;Preining 2010] and form Heyting algebras (cf.Theorem 2.1), hence the name HeyLo.

Program States and Expectations
Let Vars = {, , . ..} be a countably infinite set of typed variables.We write  :  to indicate that  is of type , i.e.  is the set of values  can take.We assume the built-in types B = {true, false}, N, Z, Q, Q ≥0 , R, R ≥0 , and R ∞ ≥0 = R ≥0 ∪ {∞}; our verification infrastructure also supports user-defined mathematical types (cf.Section 5.1).We collect all types in Types and all values in Vals =  ∈Types .A (program) state  maps every variable  :  to a value in .The set of states is thus Expectations are the quantitative analogue to logical predicates: they map program states to R ∞ ≥0 instead of truth values.The complete lattice (E, ⪯) of expectations is given by

Syntax of HeyLo
We start with the construction of HeyLo's atoms.The set T of terms is given by the grammar where  is a constant in Q ∪ B,  is a variable in Vars, and  is either one of the built-in function symbols +, •, −, −, <, =, ∧, ∨, ¬ ( − is subtraction truncated at 0) or a typed user-defined function symbol  :  1 × . . .×   →  for some  ≥ 0 and types  1 , . . .,   ,  (cf.Section 5.1).Function symbols include, for example, the length of lists len : Lists → N and the exponential function exp : R × Z → R mapping (, ) to   .We write  :  to indicate that term  is of type .Typing and subtyping of terms is standard.In particular, if  :  1 and  1 ⊆  2 , then  :  2 .We only consider well-typed terms.
We denote terms of type Q ≥0 (resp.B) by  (resp.) and call them arithmetic expressions (resp.Boolean expressions).The set of HeyLo formulae is given by the following grammar:
We explain the meaning of HeyLo formulae in the next subsection.Free-and bound (by S or J quantifiers) variables of a HeyLo formula  are defined as usual.The order of precedence for arithmetic-and Boolean expressions is standard.For HeyLo formulae, the order of precedence is, i.e.J and S are least binding and • is most binding.We use parentheses to resolve ambiguities.

Semantics and Properties of HeyLo
A term  :  evaluates to value  () ∈  on state .We assume the standard semantics for constants and built-in functions and that  is given for all user-defined functions.
The semantics of a HeyLo formula  is an expectation  : States → R ∞ ≥0 defined by induction on the structure of  in Figure 8, where we define 0 • ∞ = ∞ • 0 = 0 as is common in measure theory.These notions are central since we will encode verification problems as inequalities between HeyLo formulae.In contrast to classical IVLs, HeyLo contains constructs for both reasoning about lower-bounds and for reasoning about upper bounds.We briefly go over each construct in Figure 8.
Arithmetic-and Boolean Expressions.These expressions form the atoms of HeyLo.Consider, e.g. the arithmetic expressions  + 1 for some numeric variable  and 2 • len() for a variable  : Lists.On state ,  + 1 evaluates to  () + 1, and 2 • len() evaluates to 2 times the length of list  ().
Quantifiers.The infimum quantifier J and the supremum quantifier S from [Batz et al. 2021b] are the quantitative analogues of the universal ∀ and the existential ∃ quantifier from predicate logic.Intuitively, the J quantifier minimizes a quantity, just like the ∀ quantifier minimizes a predicate's truth value.Dually, the S quantifier maximizes a quantity just like ∃ maximizes a predicate's truth value.The quantitative quantifiers embed ∀ and ∃ in HeyLo, i.e. for  : B and  ∈ States, J  :  .For a quantitative example, consider the formula  = S : Q ≥0 .?( •  < 2) ⊓ .On state , the subformula ?( •  < 2) ⊓  evaluates to  () if  () •  () < 2, and to 0 otherwise.Consequently, Notice that  () is irrational even though all constituents of  are rational-valued.It has been shown in [Batz et al. 2021b] that -similar to our above construction of √ 2 -the quantitative quantifiers combined with arithmetic-and (embedded) Boolean expressions over Q ≥0 enable the construction of all expected values emerging from discrete probabilistic programs.
(Co)implication.→ and generalize Boolean implication and converse nonimplication. 5For state , the implication  →  evaluates to ∞ if  () ≤  (), and to  () otherwise.Dually, the coimplication   evaluates to 0 if  () ≥  (), and to  () otherwise.To gain some intuition, we first note that the top element ∞ of our quantitative domain R ∞ ≥0 can be viewed as "entirely true" (i.e. as true as it can possibly get) and 0 can be viewed as "entirely false" (i.e. as false as it can possibly get).The implication  →  makes  more true by lowering the threshold above which  is considered entirely true -and thus ∞ -to .In other words: Anything that is at least as true as  is considered entirely true.Anything less true than  remains as true as  .Figure 9 illustrates this for the formula 5 → .
As another example,  2 →  evaluates to ∞ for states  with  () ∈ [0, 1]; otherwise,  is below the threshold  2 at which  is considered entirely true and thus the implication evaluates to .
The intuition underlying the coimplication is dual:   makes  less true by raising the threshold below which  is considered entirely false -and thus 0 -to .In other words: Anything that is not more true than  is considered entirely false.Anything that is more true than  remains as true as  .Figure 10 illustrates this for the formula 5 .Chained implications can also be understood in terms of lowering thresholds:  → ( → ) lowers the threshold at which  is considered entirely true to  and  , whichever is lower.Formally,  → ( → ) is equivalent to ( ⊓  ) → .More generally, (co)implications are the adjoints of the minimum ⊓ and maximum ⊔: Theorem 2.1 (Adjointness Properties).For all HeyLo formulae ,  , and , we have Both → and are backward compatible to Boolean implication and converse nonimplication: We will primarily use (co)implications to (1) incorporate the capability of comparing expectations syntactically in HeyLo and to (2) express assumptions.Application (1) is justified by the following quantitative version of the well-known deduction theorem6 from first-order logic [Kleene 1952]: Theorem 2.2 (HeyLo Deduction Theorem).For all HeyLo formulae  and  , we have The proof is in Appendix A. For application (2), consider the implication ?() →  ; it evaluates to  whenever  holds, and to ∞ otherwise.As in predicate logic, the implication can be read as assuming  holds before evaluating  .Formally, Now, consider the inequality  ⊑ ?() →  .For all states  not satisfying  (i.e. the set of states that we do not assume), the inequality vacuously holds.For all other states (i.e.those states that we actually assume),  must lower-bound  in order for the inequality to hold.
Example 2.3.Let ,  ∈ HeyLo and  : B. We construct a HeyLo formula  that, on state , evaluates to  if  |= , and to  otherwise.For that, we use the Boolean embedding and the implication: To encode assumptions using the coimplication , we first introduce Boolean co-embeddings We then obtain a dual construction using for encoding assumptions: By Theorem 2.1, we have i.e. the coimplication co?()  ensures that it suffices to reason about states satisfying .

Qualitative Reasoning in HeyLo
The verification of probabilistic programs comprises both quantitative and qualitative reasoning.Whereas questions like "what is the expected value of program variable  upon termination" are inherently quantitative, questions like "does  increase in expectation after one loop iteration?"are qualitative.HeyLo marries quantitative and qualitative reasoning.To shift to a qualitative statement, we first consider the negation ¬ and conegation ∼ of  obtained from our (co)implications: The (co)negation always evaluates to either ∞, the top element of R ∞ ≥0 (entirely true), or 0, the bottom element of R ∞ ≥0 (entirely false).By applying a (co)negation twice, we turn an arbitrary expectation into a qualitative statement.Formally, we define the (pointwise) validation △() and (pointwise) covalidation ▽() by7 In words, the validation △() is (pointwise) entirely true whenever  is entirely true, and entirely false otherwise.Dually, ▽() is entirely false whenever  is entirely false, and entirely true otherwise.Thus, both validations and covalidations "boolify" HeyLo formulae.The difference is that validations pull intermediate truth values down to entire falsehood whereas covalidations lift intermediate truth values up to entire truth.
Turning expectations into qualitative statements has an important application, which often arises when encoding verification problems: Suppose we are given two formulae , with free variables  1 , . . .,   .Moreover, our goal is to construct a HeyLo formula  that evaluates to  of type Q ≥0 if  ⊑  , and to 0 otherwise.For that, we first construct the formula J  1 , . . .,   .△( →  ).Due to the infimum quantifier over all free variables, this formula is equivalent to ∞ if  ⊑  , and equivalent to 0 otherwise.Hence, we construct  as Moreover, we obtain a dual construction using and the supremum quantifier: Many verification problems for probabilistic programs reduce naturally to checking inequalities between HeyLo formulae.8Consider, for instance, the program which sets  either to  or to  + 1, depending on the outcome of a fair coin flip.Suppose we want to verify that  + 1 2 is a lower bound on the expected value of  after executing above program.According to McIver and Morgan [2005], verifying this bound amounts to proving the inequality where the weakest preexpectation wp  ( ) is a function (which we can represent as a HeyLo formula) that maps every initial state  to the expected value of  after executing the program  on input .Our goal is to simplify writing, composing, and reasoning modularly about such expected values and similar quantities.To this end, we propose HeyVL, a novel intermediate verification language for modeling quantitative verification problems.
HeyVL programs are organized as a collection of procedures.Each procedure  is equipped with a body  and a specification.The body  is a HeyVL statement and can for now be thought of as a more or less ordinary probabilistic program. 9The specification of a procedure comprises a pre  and a post  , both HeyLo formulae.Intuitively, a procedure  verifies if its body  adheres proc ex (: UInt) -> (: UInt) // procedure that takes  as input and returns the value of  pre  + 1 /2 // lower bound on the expected value of  after termination of the body post  // quantity of interest evaluated in final states to 's specification, meaning essentially that the inequality  ⊑ wp  ( ) holds, i.e. the expected value of  after executing  is lower-bounded by .This inequality will be called the verification condition of .An entire HeyVL program verifies if all of its procedures verify.
How do we describe the verification problem (ex) in HeyVL?As shown in Figure 11, we write a single procedure  with body  1 /2 • ⟨⟩ + 1 /2 • ⟨ + 1⟩, pre  + 1 2 , and post .This gives rise to the verification condition , which is precisely the inequality (ex) we aim to verify.The HeyLo program (i.e. the single procedure ) verifies if and only if we have positively answered the verification problem (ex).
To encode more complex verification problems or proof rules, one may need to write more than one HeyVL procedure.For example, in Section 4.1, we will encode a proof rule for conditional expected values that requires establishing a lower and a different upper bound.The latter can be described using a second HeyVL procedure, see Section 3.1.Furthermore, it is natural to break down large programs and/or complex proof rules into smaller (possibly mutually recursive) procedures, which can be verified modularly based on the truth of their verification conditions.

HeyVL Procedures
A HeyVL procedure consists of a name, a list of (typed) input and output variables, a body, and a quantitative specification.Syntactically, a HeyVL procedure is of the form proc P (in: ) -> (out: ) // procedure name  with read-only inputs in and outputs out pre  // pre: HeyLo formula over inputs post  // post: HeyLo formula over inputs or outputs where  is the procedure's name, in and out are (possibly empty and pairwise distinct) lists of typed program variables called the inputs and outputs of .The specification is given by a pre  which is a HeyLo formula over variables in in and a post  which is also a HeyLo formula but ranging over variables in in or out.The procedure body  is a HeyVL statement, whose syntax and semantics will be formalized in Sections 3.2 and 3.3.As mentioned above, the procedure  gives rise to a verification condition, namely  ⊑ wp  ( ).However, this is only accurate if  is an ordinary probabilistic program.As our statements  may also contain non-executable 10 verification-specific assume and assert commands, the verification condition generated by  is actually proc rabin (: UInt) -> (: Bool) where vp is the verification preexpectation transformer that extends the aforementioned weakest preexpectation wp by semantics for the verification-specific statements, see Section 3.3.For procedure calls, we approximate the weakest preexpectation based on the callee's specification to enable modular verification, see Section 3.5.
Readers familiar with classical Boolean deductive verification may think of the verification condition  ⊑ vp  ( ) as a quantitative Hoare triple ⟨⟩  ⟨ ⟩, where ⊑ takes the quantitative role of the Boolean =⇒, i.e. we have . Indeed, if  and  are ordinary Boolean predicates and  is a non-recursive non-probabilistic program, then ⟨⟩  ⟨ ⟩ is a standard Hoare triple: whenever state  satisfies precondition , then procedure body  must successfully terminate on  in a state satisfying postcondition  .
Phrased differently: for every initial state , the truth value  () lower-bounds the anticipated truth value (evaluated in ) of postcondition  after termination of  on .For arbitrary HeyLo formulae , and probabilistic procedure bodies , the second view generalizes to quantitative reasoning à la McIver and Morgan [2005]: The quantitative triple ⟨⟩  ⟨ ⟩ is valid iff the pre  lower-bounds the expected value (evaluated in initial states) of the post  after termination of .In Section 3.5, we will describe how calling a (verified) procedure  can be thought of as "invoking" the validity of the quantitative Hoare triple that is given by 's specification.
Notice that the above inequality is our definition of validity of a quantitative Hoare triple and we do not provide an operational definition of validity.This is due to a lack of an intuitive operational semantics for quantitative assume and assert statements (cf.also Section 7).
Examples.Besides Figure 11, Figures 12 and 13 further illustrate how HeyVL procedures specify quantitative program properties; we omit concrete procedure bodies  to focus on the specification.The procedure in Figure 12 specifies that the expected value of output  must be at least 3.5 •  -a property satisfied by any statement  that rolls  fair dice.The procedure in Figure 13 specifies that the expected value of output  being true after termination of , i.e. the probability that the returned value  will be true, is at least 2 /3 whenever input  is greater than one -a key property of Rabin's randomized mutual exclusion algorithm [Kushilevitz and Rabin 1992] from Figure 2 and discussed in the introduction.Since we aim to reason about probabilities, we ensure that the post is one-bounded by considering 1 ⊓ ?() instead of ?().
Coprocedures -Duals to Procedures.Proving upper bounds is often relevant for quantitative verification, e.g. when analyzing expected runtimes of randomized algorithms (cf.[Kaminski et al. 2018]).HeyVL also supports coprocedures which give rise to the dual verification condition  ⊒ vp  ( ). 11The syntax of coprocedures is analogous to HeyVL procedures; the only difference is the keyword coproc instead of proc.For example, a coprocedure which was defined as in Figure 12 (except for replacing proc by coproc) would specify that the expected value of output 11 Notice ⊒ for coprocedures as opposed to ⊑ for procedures.
must be at most 3.5 • .We demonstrate in Section 4 that intricate verification techniques for probabilistic programs may require lower and upper bound reasoning, i.e.HeyVL programs that are collections of both procedures and coprocedures.
HeyVL Programs.To summarize, a HeyVL program is a list of procedures and coprocedures that each give rise to a verification condition, i.e. a HeyLo inequality.We say that a HeyVL program verifies iff all verification conditions of its (co)procedures hold.
Design Decisions.Since HeyVL is an intermediate language, we favor simplicity over convenience.In particular, we require procedure inputs to be read-only, i.e. evaluate to the same values in initial and final states.Moreover, HeyVL has no loops and no global variables.All variables that can possibly be modified by a procedure call are given by its outputs.All of the above restrictions can be lifted by high-level languages that encode to HeyVL.

Syntax of HeyVL Statements
HeyVL statements, which appear in procedure bodies, provide a programming-language-style to express and approximate expected values arising in the verification of probabilistic programs, including expected outcomes of program variables, reachability probabilities such as the probability of termination, and expected rewards.HeyVL statements consist of (a) standard constructs such as assignments, sampling from discrete probability distributions, sequencing, and nondeterministic branching, and (b) verification-specific constructs for modeling rewards such as runtime, quantitative assertions and assumptions, and for forgetting values of program variables in the current state.
The syntax of HeyVL statements  is given by the grammar where  ∈ Vars is of type ,  is an arithmetic expression, and  is a HeyLo formula.Moreover,  is a distribution expression of type 12 with  ≥ 1, where each   is a term of type [0, 1], each   is a term of type , and  =1   () = 1 for every state .A distribution expression  represents finite-support probability distributions, which assign probability   to each   .We often write flip() We briefly go over the above constructs.var  :   is a probabilistic assignment which assigns to variable  a value sampled from the probability distribution described by .The statement  1 , . . .,    ( 1 , . . .,   ) is a (co)procedure call.We can think of it as passing the parameters  1 , . . .,   to (co)procedure , executing 's body, and assigning the return values to variables  1 , . . .,   .The statement reward  collects/accumulates/adds a reward of , modeling e.g.progression in (run)time or resource consumption. 1 ;  2 puts HeyVL statements in sequence.if (•) { 1 } else { 2 } is a nondeterministic choice between  1 and  2 , where • determines whether the nondeterminisim is resolved in a minimizing (⊓) or maximizing (⊔) manner.assert  and assume  are quantitative generalizations of assertions and assumptions from classical IVLs.coassert   vp  () and coassume  are novel statements that enable reasoning about upper bounds; there is yet no analogue in classical verification infrastructures.havoc  and cohavoc  forget the current value of  by branching nondeterministically over all possible values of  either in a minimizing (havoc ) or maximizing (cohavoc ) manner.Finally, validate and covalidate turn quantitative expectations into qualitative expressions, much in the flavor of validation and covalidation described earlier (see Section 2.4).

Declarations and Types.
We assume that all local variables (those that are neither inputs nor outputs) are initialized by an assignment before they are used; those assignments also declare the variables' types.If we assign to an already initialized variable, we often write   instead of var  :  .Moreover, if  is a Dirac distribution, i.e. if  1 = 1, we often write   1 instead of  .Finally, we assume that all programs and associated HeyLo formulae are well-typed.

Semantics of HeyVL Statements
Inspired Random Assignments.The expected value of  after executing var  :   is the weighted sum where each   is the probability that  is assigned   .
Rewards.Suppose that the post  captures the expected reward collected in an execution that follows after executing reward .Then the entire expected reward is given by  + .
13 Some verification-specific statements are not really executable but serve the purpose of manipulating expected values.
(Co)assertions.In classical intermediate verification languages, the statement assert  for some predicate  models a proof obligation: All states reaching assert  on some execution must satisfy .In terms of classical weakest preconditions, assert  transforms a postcondition  to wp assert  () =  ∧  .
In words, assert  caps the truth of postcondition  at : all lower-bounds on the above weakest precondition (in terms of the Boolean lattice (States → B, ⇒)) must not exceed .
This perspective generalizes well to our quantitative assertions: Given a HeyLo formula  , the statement assert  caps the post at  .Thus, analogously to classical assertions, all lower bounds on the verification preexpectation vp assert  () (in terms of ⊑) must not exceed  .
Coassertions are dual to assertions: coassert  raises the post  to at least  .Hence, all upper bounds on vp coassert  () must not subceed  .
(Co)assumptions.In the classical setting, the statement assume  for some predicate  weakens the verification condition: verification succeeds vacuously for all states not satisfying .In terms of classical weakest preconditions, assume  transforms a postcondition  to wp assume  () =  →  i.e. assume  lowers the threshold at which the post  is considered true (the top element of the Boolean lattice) to .Indeed, if we identify true = 1 and false = 0, then (), otherwise .The above perspective on classical assumptions generalizes to our quantitative assumptions.Given a HeyLo formula  , assume  lowers the threshold above which the post  is considered entirely true (i.e.∞ -the top element of the lattice of expectations) to  .Formally, (), otherwise .
Reconsider Figure 9 on page 8, which illustrates vp assume 5 (): assume 5 lowers the threshold at which the post  is considered entirely true to 5, i.e. whenever the post-expectation  evaluates at least to 5, then vp assume 5 () evaluates to ∞.Notice furthermore that our quantitative assume is backward compatible to the classical one in the sense that vp assume ?() () evaluates to  for every state satisfying , and to ∞ otherwise.
Coassumptions are dual to assumptions.coassume  raises the threshold at which the post  is considered entirely false (i.e.0 -the bottom element of the lattice of expectations) to  .Reconsider Figure 10 on page 8 illustrating vp coassume 5 (): coassume 5 raises the threshold below which the post  is considered entirely false to 5, i.e. if the post  evaluates at most to 5, then vp coassume 5 () evaluates to 0.
Example 3.1 (Modeling Conditionals).We did not include if () { 1 } else { 2 } for conditional branching in HeyVL's grammar.We can encode it as follows (and will use it from now on): The vp semantics of this statement is analogous to the formula described in Example 2.3 and complies with our above description of assumptions: Depending on the satisfaction of  by the current state , the vp of  either evaluates to the vp of  1 or  2 , respectively.(Co)validations.These statements convert quantitative statements into qualitative ones by casting expectations into the {0, ∞}-valued realm, thus eradicating intermediate truth values strictly between 0 and ∞.Their classical analogues would be effectless, as the Boolean setting features no intermediate truth values.We briefly explained in Section 2.4 how such a conversion to a qualitative statement works in HeyLo.An example will be discussed in Section 4.2.

Properties of HeyVL Statements
We study two properties of HeyVL.First, our vp semantics is monotonic -a crucial property for encoding proof rules (cf.Section 3.5).
Furthermore, HeyVL conservatively extends an existing IVL for non-probabilistic programs due to Müller [2019] in the following sense: Theorem 3.3 (Conservativity of HeyVL).Let  be a program in the programming language of Müller [2019] and let  be a postcondition.Moreover, let  be obtained by replacing every assert  and every assume  occurring in  by assert ?() and assume ?(), respectively (cf.Boolean embeddings, Section 2.3).Then ) ≡ HeyVL vp  (?()) .

Procedure Calls
We conclude this section with a treatment of (co)procedure calls.Consider a callee procedure  as shown in Figure 15.Intuitively, the effect of a call  1 , . .
There are two main issues that arise when we would actually inline  at every call-site: (1) For recursive procedure calls [Olmedo et al. 2016], we would need to define a (non-computable) fixed point semantics for the vp transformer.Our goal, however, is to render verification feasible in practice, so we would like to avoid fixed point computations.
(2) Even without recursive calls, we would have to re-verify  at every call-site, which would not scale.We thus do not inline the procedure body but use an encoding  encoding which underapproximates the effect of  in the sense that vp  encoding () ⊑ vp  () for all HeyLo formulae .By monotonicity of vp, we can then verify lower bounds for calls: for all ,  ∈ HeyLo, That is, we assert the procedure's pre  before the call, forget the values of all outputs, i.e. variables that are potentially modified by the call, and assume the procedure's post  after the call.Phrased in terms of underapproximations: We assert that we have at most  before the call and, while minimising over all possible outputs (using the havoc statements), lower the threshold at which the post is considered entirely true (i.e.∞) to  , i.e. whenever  lower-bounds the post.The intuition underlying the above HeyVL statement works for encoding procedure calls of non-probabilistic programs.However, there is a subtle unsoundness that arises when reasoning about expected behaviors.Figure 16 shows two procedures, foo and bar.Intuitively, foo flips a fair coin and aborts execution if the result is heads (false).Read backwards, the expected value of the post will be at most  after executing foo -exactly as stated in foo's specification.Procedure bar encodes the call foo() in its body 16 and requires in its specification that the expected value of  does not decrease, i.e. is at least .Both procedures verify.However, when inlining foo, i.e. using its body instead of the encoding assert ; assume 2 • , bar does not verify.Hence, the above encoding does, in general, not model a sound underapproximation of a procedure's inlining.Taking a closer look, recall from above that assume 2 •  is used to encode a monotonicity check,17 which is an inherently qualitative property.However, verifying bar involves proving  ⊑  ⊓ (2 •  → ), where the quantitative implication 2 •  →  evaluates to  for  > 0; the expectation  does not reflect the inherently qualitative nature of the monotonicity check.To fix this issue, we add a validate statement that turns quantitative results into qualitative ones: it reduces any value less than ∞, which indicates a failed monotonicity check, to 0. An encoding underapproximating the inlining of foo() -and thus correctly failing verification of bar -is assert ; validate; assume 2 • .Similarly to Section 2.4, verifying bar for the fixed encoding involves proving  ⊑  ⊓ △(2 •  → ), which does not hold for  > 0.
A proof is found in Appendix B. A HeyVL encoding that overapproximates calls of coprocedures is analogous -it suffices to use the dual costatements in  encoding .The presented under-and overapproximations are useful when encoding proof rules in HeyVL.Whether they are meaningful does, however, depend on the verification technique at hand that should be encoded.

ENCODING CASE STUDIES
To evaluate the expressiveness of our verification language, we encoded various existing calculi and proof rules targeting verification problems for probabilistic programs in HeyVL.We will first focus on programs without while loops (Section 4.1) and then consider loops (Section 4.2).The practicality of our automated verification infrastructure will be evaluated separately in Section 5. A summary of all encodings is given at the end of this section.Further details are found in Appendix C.

Reasoning about While-Loop-Free pGCL Dialects
Pioneered by Kozen [1983]   and introduced the probabilistic Guarded Command Language (pGCL), which is convenient for modelling probabilistic systems.The syntax of while-loop-free pGCL programs  is 18 where skip has no effect, diverge never terminates,  :=  assigns the value of term  to ,  We now outline encodings of several reasoning techniques targeting pGCL and extensions thereof.We will only consider expectations that can be expressed as HeyLo formulae.To improve readability, we identify every HeyLo formula  with its expectation  ∈ E.
Weakest Liberal Preexpectations (wlp).McIver and Morgan [2005] also proposed a liberal weakest preexpectation calculus, a partial correctness variant of weakest preexpectations.More precisely, if  ⊑ 1, then the weakest liberal preexpectation wlp(, ) is the expected value of  after termination of  plus the probability of non-termination of  (on a given initial state).We denote by  wlp ⌊⌋ the HeyVL encoding of the weakest liberal preexpectation calculus; it is defined analogously to Figure 17 except for diverge.Since diverge never terminates, the probability of non-termination is one, i.e. wlp(diverge, . ..) = 1.The updated encoding of diverge is  wlp ⌊diverge⌋ = assert 1; assume 0 , 18 pGCL usually supports only one type, e.g.integers, rationals, or reals.We are more liberal and admit arbitrary terms  but assume a sufficiently strong type inference system and consider only well-typed programs.where assert 1 ensures one-boundedness and assume 0 lowers the threshold at which the post is considered entirely true to 0. Put together, we have vp Conditional Preexpectations (cwp).Conditioning on observed events (in the sense of conditional probabilities) is a key feature of modern probabilistic programming languages [Gordon et al. 2014].Intuitively, the statement observe  discards an execution whenever Boolean expression  does not hold.Moreover, it re-normalizes such that the accumulated probability of all executions violating no observation equals one.Olmedo et al. [2018] showed that reasoning about observe  requires a combination of wp and wlp reasoning.They extended both calculi such that violating an observation is interpreted as a failure resulting in pre-expectation zero; we can encode it with an assertion: w(l)p(observe , ) = ?()⊓  = vp assert ?() ().
For every pGCL program  with observe statements, initial state  and expectation , the conditional expected value cwp(, ) () of  after termination of  is then given by the expected value wp(, ) () normalized by the probability wlp(, 1) () of violating no observation: We can re-use our existing HeyVL encodings to reason about conditional expected values.Notice that proving bounds on cwp requires establishing both lower and upper bounds.For example, the pGCL program  die in Figure 19 assigns to  the result of a six-sided die roll, which is simulated using three fair coin flips and an observation.To show that the expected value of  is at most 3.5 -the expected value of a six-sided die roll -we prove the upper bound wp( die ,  ) ⊑ 2.625 and the lower bound 0.75 ⊑ wlp( die , 1).Then, cwp( die ,  ) ⊑ 2.625 0.75 = 3.5.Figure 20 shows the HeyVL encoding of  die (cleaned up for readability).As shown in Figure 21, the proof obligations wp( die ,  ) ⊑ 2.625 and 0.75 ⊑ wlp( die , 1) are then encoded using a coprocedure for the upper bound and a procedure for the lower bound, respectively.
There exist alternative interpretations of conditioning.For instance, Nori et al. [2014] use wp(, 1) () in the denominator in the above fraction.A benefit of HeyVL is that such alternative interpretations can be realized by a straightforward adaptation of our encoding.
The rule can be viewed as a quantitative version of the loop rule from Hoare [1969] logic, where  is an inductive invariant underapproximating the expected value of any loop iteration.Figure 22 depicts an encoding  wlp ⌊while () {}⌋ that underapproximates wlp(while () {}, ), i.e.
Before we go into details, we remark for readers familiar with classical deductive verification that our encoding is almost identical to standard loop encodings (cf.[Müller 2019]).Apart from the quantitative interpretation of statements, the only exception is the validate in line 3.
It is instructive to go over the encoding in Figure 22 step by step for a given initial state .The following expanded version of the above equation's right-hand side serves as a roadmap: Reading the HeyVL code in Figure 22 top-down then corresponds to reading the equation from left to right as indicated by the colors.We first assert that our underapproximation of the loop's wlp is at most  .The remaining code will ensure that said underapproximation is exactly  whenever  is an inductive loop invariant; it will be 0 otherwise.Proving that  is an inductive loop invariant requires checking an inequality ⊑, where  ⊑  holds iff  ( ′ ) ≤  ( ′ ) for all states  ′ .We havoc the values of all program variables such that the invariant check encoded afterward is performed for every evaluation of the program variables, i.e. for every state  ′ . 19Moreover, havoc picks the minimal result of all those invariant checks.The statement " is an inductive loop invariant" is inherently qualitative.We thus validate that the invariant check encoded next is a qualitative statement that can only have two results: ∞ if  is an inductive invariant and 0 if it is not.To check if  is an inductive invariant for a fixed state  ′ , we need to prove an inequality, namely that  ( ′ ) lower bounds wlp(,  ) ( ′ ) if loop guard  holds and  ( ′ ) if  does not hold.We first use assume  to lower the threshold for the expected value of the remaining code to be considered ∞ to  ( ′ ).Hence, we obtain ∞ if the invariant check succeeds for  ′ .The conditional choice is the invariant check's right-hand side.If state  ′ satisfies , we use our existing wlp encoding to compute wlp(,  ) ( ′ ), where assert  ; assume ?(false) ensures that wlp is computed with respect to postexpectation  .If state  ′ satisfies ¬, we do nothing and just take the postexpectation .
Upper bounds.Consider an iterative version of the lossy list traversal from Figure 4 on page 3: The Park induction rule can also be used to overapproximate weakest preexpectations.The encoding is dual, i.e. it suffices to use the co-versions of the involved statements.For example, Figure 23 encodes the above loop with exp(0.5, len()) as inductive invariant overapproximating the loop's termination probability.The list type and the exponential function exp(0.5, len()) are represented in HeyLo by custom domain declarations (cf.Section 5.1).
Recursion.We can encode verification of wlp-lower bounds for recursive procedure calls of pGCL programs as discussed in Section 3.5 and justified by Olmedo et al. [2016] and Matheja [2020] -it is another application of Park induction.For wp-upper bounds, the encoding is dual.Hence, Figure 7 on page 5 encodes that the termination probability of the program in Figure 4 is at most 0.5 len( ) .

Overview of Encodings
Table 1 summarizes all verification techniques -program logics and proof rules -that have been encoded in HeyVL.While a detailed discussion is beyond the scope of this paper, we briefly go over Table 1.The main takeaway is that HeyVL enables the encoding -and thus automation -of advanced verification methods based on diverse theoretical foundations and targeting different verification problems.The practicality of our encodings will be evaluated in Section 5.
Expected Values.We encoded McIver and Morgan [2005]'s weakest (liberal) preexpectation calculus for analyzing expected values of probabilistic programs (cf.Section 4.1).To analyze conditional expected values, we combined the two calculi as suggested by Olmedo et al. [2018].For loops, we encoded three proof rules based on domain theory: First, Park Induction generalizes the standard loop rule from Hoare logic [Hoare 1969] to a quantitative setting; it can be applied to lower bound weakest liberal preexpectations and upper bound weakest preexpectations (cf.Section 4.2).However, it is unsound for the converse directions.
Second, -Invariants are sound and complete for proving lower and upper bounds.However, they are arguably more complex because users must provide a family of invariants and compute limits.We modeled families of invariants as HeyLo formulas with additional free variables and used havoc  and cohavoc  to represent limits.
Third, we encoded a quantitative version of -induction (for proving upper bounds) -an established verification technique (cf.[Sheeran et al. 2000]).The encodings are based on latticed -induction [Batz et al. 2021a], a generalization of -induction to arbitrary complete lattices.After encoding -induction for upper bounds on wp, we benefited from the duality of HeyVL statements: we obtained a dual encoding for lower bounds on wlp that has, to our knowledge, not been implemented before.Furthermore, we encoded an advanced proof rule for lower bounds on expected Table 1.Verification techniques encoded in HeyVL sorted by verification problem: lower-and upper bounds on probability of events (LPROB and UPROB), upper-and lower bounds on expected values (UEXP and LEXP), conditional expected values (CEXP), almost-sure termination (AST), positive almost-sure termination (PAST), upper bounds on expected runtimes (UERT), and lower bounds on expected runtimes (LERT).
verification infrastructures [Müller et al. 2016b].A domain declaration introduces a new type symbol alongside with a set of typed function symbols and first-order formulae (called axioms) characterizing feasible interpretations of the type-and function symbols.
Consider the harmonic numbers -often required for, e.g., expected runtime analysis -as an example.The -th harmonic number is given by   =  =1 1  .To enable reasoning about verification problems involving the harmonic numbers, we introduce the following domain declaration: introduces a new function symbol  : N → R ≥0 and two axioms ℎ 0 and ℎ  characterizing feasible interpretations of  recursively.Other non-linear functions such as exponential functions (e.g., exp(0.5, ) from Section 4.2) as well as algebraic data types can be defined in a similar way (see, e.g., [Müller et al. 2016a]).In our implementation, validity of verification conditionsinequalities between HeyLo formulae -is defined modulo validity of all user-provided axioms.

The Verifier Caesar
We have implemented HeyVL in our tool Caesar 20 which consists of approximately 10k lines of Rust code.Caesar takes as input a HeyVL program  and a set of domain declarations (cf.Section 5.1).It then generates all verification conditions described by , i.e, inequalities between HeyLo formulae of the form  ⊑ vp  ( ) or  ⊒ vp  ( ), and translates these verification conditions to a Satisfiability Modulo Theories (SMT) query.Our SMT back end is z3 [de Moura and Bjørner 2008].Since the translation to SMT can involve undecidable theories, Caesar might return unknown.Otherwise, Caesar either returns verified or not verified.In the latter case, z3 often reports a counterexample state witnessing the violation of one of the verification conditions, which helps, e.g., debugging loop invariants.Moreover, we have implemented a prototypical front-end that translates (numeric) pGCL programs and their specifications to HeyVL, and invokes Caesar for automated verification.Currently, it supports all techniques from Table 1 targeting loops.
SMT Encodings and Optimizations.We translate validity of inequalities between HeyLo to SMT following the semantics of formulae from Figure 8.
To encode the sort R ∞ ≥0 , we evaluated to two options, which are both supported by our implementation.The first option represents every number of sort R ∞ ≥0 as a pair (, isInfty), where  is a real number and isInfty is a Boolean flag that is true if and only if the represented number is equal to ∞.We add constraints  ≥ 0 to ensure that  is non-negative.All operations on R ∞ ≥0 are then defined over such pairs.For example, the addition ( 1 , isInfty 1 ) + ( 2 , isInfty 2 ) is defined as ( 1 +  2 , isInfty 1 ∨ isInfty 2 ).For multiplication, we ensure that 0 • ∞ = ∞ -a common assumption in probability theory.The second option leverages Z3-specific data type declarations to specify values that are either infinite or non-negative reals.We observed that the first option performs better overall and thus use it by default.
The J -and S quantifiers are translated using the textbook definition of infima and suprema over R ∞ ≥0 , but are eliminated whenever possible using that for  ⊆ R ∞ ≥0 and  ∈ R ∞ ≥0 , we have sup  ≤  iff ∀ ∈  :  ≤  and dually  ≤ inf  iff ∀ ∈  :  ≤  .
Benchmarks.To validate whether our implementation is capable of verifying interesting quantitative properties of probabilistic programs, we have considered various verification problems taken from the literature.These benchmarks involve unbounded probabilistic loops or recursion and include quantitative correctness properties of communication protocols [D'Argenio et al. 1997;Helmink et al. 1993] and randomised algorithms [Hurd et al. 2005;Kushilevitz and Rabin 1992;Lumbroso 2013], bounds on expected runtimes of stochastic processes [Kaminski et al. 2020[Kaminski et al. , 2018;;Ngo et al. 2018], proofs of positive almost-sure termination [Chakarov and Sankaranarayanan 2013] and proofs of almost-sure termination for the case studies provided in [McIver et al. 2018].For each of these benchmarks, we apply the HeyVL encodings provided in Section 4 and Appendix C, and cover all verification techniques from Table 1.
Table 2 summarizes the results of our benchmarks.For each benchmark, it provides the benchmark name, the verification problem, the encoded techniques (cf.Table 1), the lines of HeyVL code (without comments), notable features, and running time.For the running time, we also provide the shares of pruning.i.e. simplification of sub-formulae, and the final SAT check.Table 1 together with the column "Problem" provides pointers to each benchmark's source and encoding.For latticed -induction, we indicate the value of  that was used for the encoding.Benchmarks that use exponential functions (e.g.rabin, zeroconf) or harmonic numbers (e.g.ast) are marked with F1.Benchmarks that use multiple possibly mixed (co)procedures are marked with F2.One example encodes verification of nested loops (feature F3).
The size of our benchmarks ranges from 19-224 lines of HeyVL code.85% of our benchmarks (those shaded in gray) have been verified with our front-end; the remaining encodings are handcrafted.All benchmark files are available as part of our artifact.
Evaluation.On average, Caesar needs 0.2 seconds to verify a HeyVL program, with a maximum of 2.3 seconds.Most benchmarks verify within less than a second.The brp3 benchmark times out because of the large nested branching resulting from the exponential size of the -induction encoding with  = 23.
We conclude that Caesar is capable of verifying interesting quantitative verification problems of probabilistic programs taken from the literature.Moreover, we conclude that modern SMT solvers are a suitable back-end besides the fact that our benchmarks often require reasoning about highly non-linear functions.This is due to the fact that it often suffices to (un)fold recursive definitions of, e.g., the harmonic numbers, finitely many times.Finally, our benchmarks demonstrate that our verification infrastructure provides a unifying interface for encoding and solving various kinds of probabilistic verification problems in an automated manner.

RELATED WORK
We focus on automated verification techniques for probabilistic programs and deductive verification infrastructures for non-probabilistic programs; encoded proof rules have been discussed in Section 4.
Probabilistic Program Verification.Expectation-based probabilistic program verification has been pioneered by Kozen [1983Kozen [ , 1985] ] and McIver & Morgan [McIver and Morgan 2005].Hurd et al. [2005] formalised the w(l)p calculus in Isabelle/HOL [Nipkow et al. 2002].They focus on the calculus' meta theory and provide a verification-condition generator for proving partial correctness.Hölzl [2016] implemented the meta theory of Kaminski et al. [2016]'s ert calculus in Isabelle/HOL and verified bounds on expected runtimes of randomised algorithms.We focus on unifying verification techniques in a single infrastructure.
Deductive Verification Infrastructures.Boogie [Leino 2008] and Why3 [Filliâtre and Paskevich 2013] are prominent examples of IVLs for non-probabilistic programs that lie at the foundation of various modern verifiers, such as Dafny [Leino 2010] and Frama-C [Kirchner et al. 2015].Neither of these IVLs targets reasoning about expectations or upper bounds (aka necessary preconditions [Cousot et al. 2011]).For example, Boogie's statements are specific to verifying lower bounds on Boolean predicates.Evaluating whether our implementation could benefit from encoding HeyLo formulae into Why3 is interesting future work.

CONCLUSION AND FUTURE WORK
We have presented a verification infrastructure for probabilistic programs based on a novel quantitative intermediate verification language that aids researchers with prototyping and automating their proof rules.As future work, we plan to automate more rules and explore the relationship between our language, particularly its dual operators, and (partial) incorrectness logic [O'Hearn 2020; Zhang and Kaminski 2022].A further promising direction is to generalize our infrastructure for the verification of probabilistic pointer programs [Batz et al. 2022a[Batz et al. , 2019] ] and weighted programs [Batz et al. 2022b].
Furthermore, establishing a formal "ground truth" for our intermediate language HeyVL in terms of an operational semantics that assigns precise meaning to quantitative Hoare triples, which we admittedly introduced ad-hoc, is important future work.However, defining an operational semantics that yields a pleasant forward-reading intuition for all statements in our intermediate language HeyVL appears non-trivial.In particular, we are unaware of a semantics for (co)assume statements that is independent of the semantics of the remaining program.We believe that stochastic games might be an adequate formalism but the details have not been worked out yet.

DATA-AVAILABILITY STATEMENT
The tool Caesar, our prototypical front-end for pGCL programs, as well as our benchmarks that we submitted for the artifact evaluation are available [Schroer et al. 2023].We also develop our tools as open-source software at https://github.com/moves-rwth/caesar.
The co cases are dual, but we show the coassume case for illustration: -Case  = coassume  .For all  ∈ States, Now assume that the induction hypothesis holds for arbitrary but fixed  1 ,  2 ∈ HeyVL.
Applying definitions, we get:  Müller [2019] and let  be a postcondition.Moreover, let  be obtained by replacing every assert  and every assume  occurring in  by assert ?() and assume ?(), respectively (cf.Boolean embeddings, Section 2.3).Then Proof.Let  be a program in the Boolean IVL of [Müller 2019].Let  ∈ P be a predicate.We prove vp  (?()) = ?(vc ()) by induction over the structure of .
The other claim, vp init;  encoding ; return () ⊑ vp init; ; return () , follows by the above and the definition of vp.□

C PROOF RULE ENCODINGS INTO HEYVL
This appendix section details the HeyVL encodings mentioned in Section 4. These encodings are all implemented in our frontend that translates annotated pGCL programs to HeyVL.We follow Table 1 and present encodings for the various verification problems.For each encoding, we first state the formal proof rule on expectations.Then, we specify the encoding inputs that our frontend requires, as well as a schematic description of the encoding output.All encodings of loops require HeyVL encodings of their loop bodies.For loop-free programs, the encoding from Section 4.1 can be used.Furthermore, proof rule encodings from this section may be used to encode nested loops.
Formally, the encoding  is given by:
by weakest preexpectations [Kaminski 2019; McIver and Morgan 2005], we give semantics to HeyVL statements as a backward-moving continuation-passing style HeyLo transformer vp  : HeyLo → HeyLo by induction on  in Figure 14.(Co)procedure calls are treated separately in Section 3.5.We call vp  () the verification preexpectation of  with respect to post .Intuitively, vp  () () is the expected value of  w.r.t. the distribution of final states obtained from "executing" 13  on .The post  is either given by the surrounding procedure declaration or can be thought of as the verification preexpectation described by the remaining HeyVL statement: for  =  1 ;  2 , we first obtain the intermediate verification preexpectation vp  2 () -the expected value of what remains after executing  1 -and pass this into vp  1 .

?
( wp  ()    verification condition obtained from[Müller 2019] Fig. 16.Unsound encoding of a procedure call foo() in bar.Both procedures verify but inlining the body of foo in bar does not as it produces the (wrong) inequality  ⊑  ⊓ (0.5 • ∞).
. .,    ( 1 , . . .,   ) corresponds to (1) initializing 's formal input parameters  1 , . . .,   with the arguments  1 , . . .,   , (2) inlining 's body , and (3) assigning to  1 , . . .,   the values of outputs  1 , . . .,   .The semantics of  1 , . . .,    ( 1 , . . .,   ) can be thought of as the statement 14 ⊑ vp init;  encoding ; return [Müller 2019]ver we can verify a HeyVL program using the modular encoding, we could have also verified it using inlining.The advantage of the modular encoding is that  encoding does not contain the procedure body -it could be changed without requiring re-verification of call sites, so long as the updated procedure body still adheres to the procedure's specification.To construct  encoding , we leverage only 's specification pre  and post  , cf.Figure15: Assuming that  verifies, we can safely assume that 's verification condition -namely  ⊑ vp  ( ) -holds.15By monotonicity of vp, we have  ⊑ vp  ( ) ⊑ vp  () whenever  ⊑  holds.To underapproximate vp  (), we construct  encoding such that vp  encoding () is the known lower bound  if  ⊑ ; otherwise, it is the trivial lower bound 0. So how do we construct  encoding concretely?In classical verification infrastructures (cf.[Müller 2019]),  encoding corresponds to the statement assert ; havoc  1 ; . . .; havoc   ; assume  .
17. Encoding of weakest preexpectation for pGCL, where tmp is a fresh variable.

Table 2 .
Benchmarks.Rows shaded in gray indicateHeyVL examples automatically generated from pGCL code with annotations using our frontend.Timeout (TO) was set to 10 seconds.Verification techniques correspond to those presented in Table1.Lines of HeyVL code (LOC) are counted without comments.Features: user-defined uninterpreted functions (F1), multiple (co)procedures (F2), nested loops (F3).
Pardo et al. [2022]]tures.Their specifications are predicates over (sub)distributions instead of expectations.While Ellora employs specialised proof rules for loops and does not support nondeterminism or recursion, thus being more restrictive than HeyVL in this regard, Ellora embeds, e.g., logics for reasoning about probabilistic independence.As stated in[Barthe et al. 2018], an in-depth comparison of assertion-and expectation-based approaches is difficult.Pardo et al. [2022]propose a propositional dynamic logic for pGCL featuring reasoning about convergence of estimators.Their logic is not automated yet.