Programmatic Strategy Synthesis: Resolving Nondeterminism in Probabilistic Programs

We consider imperative programs that involve both randomization and pure nondeterminism. The central question is how to find a strategy resolving the pure nondeterminism such that the so-obtained determinized program satisfies a given quantitative specification, i.e., bounds on expected outcomes such as the expected final value of a program variable or the probability to terminate in a given set of states. We show how memoryless and deterministic (MD) strategies can be obtained in a semi-automatic fashion using deductive verification techniques. For loop-free programs, the MD strategies resulting from our weakest precondition-style framework are correct by construction. This extends to loopy programs, provided the loops are equipped with suitable loop invariants - just like in program verification. We show how our technique relates to the well-studied problem of obtaining strategies in countably infinite Markov decision processes with reachability-reward objectives. Finally, we apply our technique to several case studies.


INTRODUCTION
Nondeterministic probabilistic programs are like usual imperative programs with the additional abilities to (1) flip coins and (2) choose between multiple execution branches in a purely nondeterministic fashion.These programs have various applications: If the nondeterminism models an uncontrollable adversary, they can be used to reason about safe abstractions or underspecifications of (fully) probabilistic programs, processes, and systems [Kattenbelt et al. 2009[Kattenbelt et al. , 2010;;Kozine and Utkin 2002].If, on the other hand, the nondeterminism models a controllable agent, such programs model games and security mechanisms under stochastic adversaries, and can be used for planning and control, see [Feng et al. 2015;Haesaert et al. 2017] for various applications.
In many of these settings, a central problem is to determine strategies -resolutions of the nondeterminism -satisfying a given specification.In this paper, we consider quantitative specifications imposing bounds on probabilities to terminate in a given set of states or on expected outcomes, e.g., "the expected final value of  is at most 5".We tackle this problem from a programmatic perspective: Given a nondeterministic probabilistic program  and a quantitative specification, if possible, resolve the nondeterminism in  to obtain a deterministic, but still probabilistic, program  ′ that satisfies the given quantitative specification.
The idea is that the "determinized" program  ′ is a symbolic representation of a strategy steering the program execution towards satisfying the given quantitative specification.
10 □ ¬ℎ {skip} Strategies for Loop-Free Programs.Using the weakest preexpecation (wp) calculus by McIver and Morgan [2005], one can prove that the maximal probability to win is 2 /3.
Our technique employs this wpcalculus to construct, in a fully mechanizable way, a strategy in form of strengthenings of the predicates guarding the nondeterministic choices, which attains this winning probability.
More precisely, we obtain a determinized program in which lines 1-2 are replaced by a statement equivalent to ℎ := true, as switching in stage (3) is indeed the best strategy.The choices in lines 3-5 are not resolved as they do not affect the winning probability.In fact, our strategies are generally permissive [Dräger et al. 2015]: they do not remove more nondeterminism than necessary.
Loops, Unbounded Variables, and Invariants.The program from Figure 1 contains no loops and only bounded variables.However, our technique also yields strategies for possibly unbounded nondeterministic probabilistic loops, provided they are annotated with suitable quantitative loop invariants.To illustrate this, consider the program in Figure 2. It models a variant of the game Nim, a 2-player zero-sum game which goes as follows:  tokens are placed on a table.The players take turns; in each turn, the player has to remove 1, 2, or 3 tokens from the table.The first player to remove the last token looses the game.We have annotated the loop with a quantitative invariant  Nim (see Section 7 for details).This invariant certifies that the maximal winning probability of the controllable player 2 is at least 2 /3.Now, Given a quantitative specification and suitably strong quantitative loop invariants for all -possibly nested -loops in a program , our technique automatically yields a program  ′ where the nondeterministic choices are restricted in such a way that any strategy consistent with  ′ satisfies the desired quantitative specification.
The result of applying this technique to our example is given in Figure 3. Any strategy playing the game according to this program wins with probability at least 2 /3 for all values of  .
Contributions.In summary, this paper makes the following contributions: Program-level construction of strategies A mechanizable technique to determinize loopfree nondeterministic probabilistic programs in an optimal manner (Theorem 6.3).Given quantitative loop invariants, our technique as well determinizes programs with -possibly multiple nested and sequential -loops, in a mechanizable way (Theorem 6.5).A novel proof rule for lower bounds on expected outcomes of loops As a by-product of our results, we obtain a generalization of a powerful proof principle for verifying lower bounds on expected outcomes of deterministic probabilistic loops to nondeterministic probabilistic loops -a problem left open by Hark et al. [2020].See Section 6.2.3 for details.A bridge to the world of Markov Decision Processes We establish tight connections between our setting and the well-known problem of finding good, or even optimal, strategies for resolving nondeterminism in countably infinite MDPs.Case studies and examples to demonstrate the applicability of our technique (Section 7).
Limitations and Assumptions.We focus on quantitative specifications imposing lower-or upper bounds on expected outcomes of a program , i.e., bounds on expected values of random variables w.r.t. the distribution of final states obtained from running  (cf.Section 4.1 for a formal problem statement).We do not consider long-run properties such as mean-payoff objectives [Puterman 1994] or general -regular properties [Baier and Katoen 2008, Ch. 10].Further, we restrict to memoryless and deterministic strategies (as opposed to history-dependent randomized strategies).For countable state spaces, this is sufficient for (-)optimality w.r.t. the considered objectives (cf.Section 2.2.4).For this reason, we assume that program variables range over a countable domain (cf.Section 2.1).
Regarding mechanizability, our approach automatically yields optimal strategies for loop-free programs.Hence, if it exists, we can compute a determinized program satisfying the given quantitative specification (cf.Section 4.2 and Theorem 6.3).In the presence of loops, we require that all loops are annotated with sufficiently strong invariants.We do not automate the synthesis or the verification of such invariants.Rather, we show that if a nondeterministic program is annotated with such loop invariants, then suitable determinizations can be computed (cf.Section 4.3 and Theorem 6.5).
Paper Structure.Sections 2 and 3 introduce the deductive verification techniques and operational MDP semantics our technique is based on.In Section 4 we provide an informal, illustrative bird's eye view on our approach.Our technique is parameterized by quantitative verification condition generators, which we introduce in Section 5. Program-level synthesis of strategies -the main technical contribution of our paper -is described in Section 6.We provide further case studies in Section 7, discuss related works in Section 8, and conclude in Section 9.

NONDETERMINSTIC PROBABILISTIC PROGRAMS
In this section, we introduce the syntax as well as an operational Markov decision process semantics of a simple probabilistic programming language featuring nondeterminstic choices.

Syntax
Let Vars = {, , , . ..} be a countably infinite set of (program) variables with values 1 from the set Vals = Q ≥0 .The countably infinite set of (program) states is introduces randomization: In state ,  1 is executed with probability  () and  2 is executed with the remaining probability 1 −  ().Finally, the loop while  {} executes the loop body  as long as  evaluates to true, which is the only possible source of non-termination.We conclude this section with an example. 1 We have chosen the value domain Q ≥0 for the sake of concreteness.Our results straightforwardly apply to more general (possibly many-sorted) countable domains.4. Program  is a loop which contains both randomization and nondeterminism.In each iteration, a fair coin is flipped.Depending on the outcome, the loop either terminates or increments  nondeterministically by either 1 or 2. △

Markov Decision Process Semantics
In this section, we define a formal semantics of pGCL programs in terms of (countably infinite) Markov decision processes (MDP), based on [Gretz et al. 2012] with adaptions from [Batz et al. 2019].
Formally, an MDP is a quadruple M = (,  init , , P) where  is a countable non-empty set of states,  init ⊆  is a set of initial states,  is a finite non-empty set of action labels, and P :  × × → [0, 1] is a transition probability function such that for all  ∈  and all  ∈ ,  ′ ∈ P(, ,  ′ ) ∈ {0, 1}.For  ∈  we write 2.2.1 MDP Semantics of pGCL.We first define a small-step execution relation → between program configurations.These configurations consist of (i) either a pGCL program  that is still to be executed or a symbol ⇓ indicating termination and (ii) a program state  from States.Formally, the countable set Conf of program configurations is given by Conf = (pGCL ∪ { ⇓ }) × States.The small-step execution relation is of the form → ⊆ Conf × {, , } × [0, 1] × Conf.Intuitively, we can think of the elements from → as labeled transitions between program configurations.The second component of → is an action label ℓ ∈ {, , }, and the third component is the transition's probability  ∈ [0, 1].The formal definition of → is standard and given in Appendix B.1.In a nutshell, → realizes the intended semantics of pGCL programs as described in Section 2.1.The action label  is used for all transitions except for those corresponding to a (possibly nondeterministic) guarded command if  1 { 1 } □  2 { 2 }.The transition labels  and  distinguish between the branches chosen by such a guarded command.We often write   Example 2.3.A fragment of the MDP M () of program  from Example 2.1 is sketched in Figure 5, where we assume that  ∈ N and  ∈ {0, 1}.We depict a fragment reachable from the initial states where  = 0 (middle row).2)) are called optimal.It is known that optimal MD strategies always exist in the minimizing setting [Puterman 1994, Theorem 7.3.6a].In fact, our theory established in the upcoming sections implies this result for the class of MDPs described by pGCL programs.The above problem is thus guaranteed to have a solution  if ∼ is ≤ and  ≥ MinExpRew M (rew).
The maximizing setting is fundamentally different and more subtle [Blackwell 1967;Ornstein 1969].First, optimal maximizing strategies do not exist in general, i.e., the supremum in (2) might not be attained.For instance, the MDP M in Figure 6 (black states only, blue rewards,  init = topmost row) satisfies MaxExpRew M (rew) = 1, but for all strategies  and initial states  ∈  init we have ExpRew  M (rew) () < 1.To see this, observe that any strategy that reaches  with positive probability must play some action  (say the -th one) with positive probability  > 0, resulting in an expected reward of at most  •  +1 + (1 − ) which is clearly less than 1.This example shows that, unlike in the minimizing setting, the above problem does not necessarily have a solution if ∼ is ≥ and  ≤ MaxExpRew M (rew), even if general strategies are allowed.
On the other hand, there are also situations where MD strategies are strictly less powerful than general strategies.For example, consider the MDP in Figure 6 (black states only, red rewards,  init = topmost row).Here, MaxExpRew M (rew) = ∞ as witnessed by the optimal randomizing strategy that plays each action with probability 1 /2.In contrast, each MD strategy obviously yields only finite expected reward.MD strategies are still somewhat useful in this example because for every constant threshold  ∈ R ≥0 -no matter how large -we can find an MD strategy that yields expected reward at least  for each initial state.However, this does not hold in general.In fact, Ornstein [1969] gives an MDP M where MaxExpRew M (rew) = ∞ which is attained by a randomizing strategy, but for all MD strategies  and all  > 0, there exists  ∈  init such that ExpRew  M (rew) ≤  (Figure 6, gray & black states, red reward function,  init = gray states).
MD strategies are nonetheless guaranteed to be reasonably powerful under mild assumptions, which we summarize in the following theorem (where (2) is due to [Ornstein 1969]): Theorem 2.5.Consider an MDP M with countable  and reachability-reward function rew.
Note that the premise of (2) holds if rew is a bounded function.Fig. 6.Example MDP described by Ornstein [1969].Transition probabilities equal to 1 are omitted.

WEAKEST PREEXPECTATION REASONING
In this section, we introduce the program calculi we use throughout to reason about (minimal and maximal) expected outcomes of nondeterminstic probabilistic programs.Expectation-based reasoning for deterministic probabilistic programs was pioneered by Kozen [Kozen 1983[Kozen , 1985]].McIver and Morgan [2005] extended expectation-based reasoning to support programs with nondeterminism.

Expectations
Expectations 2 are the central objects the calculi considered in this paper operate on.They are the quantitative analogue of predicates: Instead of mapping program states to {true, false}, program states are mapped to R ∞ ≥0 = R ≥0 ∪ {∞}.Formally, the complete lattice (E, ⊑) of expectations is Expectations are denoted by  , , . . .and variations thereof.Infima and suprema in this lattice are thus understood pointwise.In particular, pairwise minima and maxima are given by  ⊓  = .min{ (), ()} and  ⊔  = .max{ (), ()} .
Standard arithmetic operations addition + and multiplication • are also understood pointwise, i.e., for • ∈ {+, •}, we let  •  = .() • (), where we set 0 [] casts a predicate  into an expectation [Iverson 1962]: . Intuitively, → acts like a filter: if  evaluates to true, the implication evaluates to the value of the right-hand side's expectation.Otherwise, → evaluates to the top element ∞ of the lattice (E, ⊑).
We agree on the following order of precedence for the connectives between expectations: That is, • takes precedence over +, etc.We use brackets to resolve ambiguities.Finally, given  ∈ E,  ∈ Vars, and an arithmetic expression , we define the substitution of  in  by  as

Angelic and Demonic Weakest Preexpectations
To reason about minimal and maximal expected outcomes of programs, we introduce two program calculi -expectation transformers -which associate to each  ∈ pGCL a map of type E → E.
Definition 3.1 (Weakest preexpectation transformers).Let  ∈ pGCL and  ∈ E. Each of the following is defined by induction on the structure of  in Figure 7: (1) dwp  ( ) ∈ E is the demonic weakest preexpectation of  w.r.t.postexpectation  .
If  is deterministic, then dwp  ( ) and awp  ( ) coincide, in which case we often simply write wp  ( ).Let us briefly go over the individual rules for T ∈ {dwp, awp}.
For the effectless program skip, T skip ( ) is just  .For assignments  := , we substitute  in  by the assignment's right-hand side .For sequentially composed programs  1  2 , we first determine the intermediate preexpectation T  2 ( ), which is then plugged into T  1 .Thepossibly nondeterministic -guarded choice is treated in more detail below.For the probabilistic choice {  1 } [] {  2 }, we determine the preexpectations of the two branches and add them up, weighing each branch according to its probability of being executed.Finally, preexpectations of a loop are given by a least fixpoint which, intuitively, is the limit of all finite loop unrollings.We refer to [Kaminski 2019] for an in-depth treatment of expectation-based reasoning.Now, given  ∈ pGCL and  ∈ E, each T ∈ {dwp, awp} defines an expectation T  ( ), i.e., a map from program states  to the quantity T  ( ) (𝜎).In what follows, we convey some intuition on this quantity.Assume for the moment that  is deterministic, i.e.,  possibly contains randomization but no nondeterminism.In the absence of nondeterminism, dwp  ( ) and awp  ( ) coincide.Thinking of the postexpectation  as a random variable over 's state space, we have wp  ( ) () = expected value of  w.r.t. the (sub-)distribution of final states reached after executing  on initial state .The distribution of final states is a sub-distribution whenever  does not terminate almost-surely on initial state , where the missing probability mass is the probability of divergence.As outlined above, and analogous to Dijkstra's weakest preconditions for standard programs [Dijkstra 1975], wp  ( ) is obtained by recursively applying the rules from Figure 7, i.e., we start with the postexpectation  at the end of the program and -as suggested by the rule for sequential composition -move backwards through  until we arrive at the beginning, obtaining wp  ( ).This is exemplified in Figure 8, where we annotate3 the given program for determining wp  ().Due to the backwardmoving nature of the transformer, these annotations are best read from bottom to top.Complying with the above explanation, the result wp  () =  + 3 now tells us that the expected final value of  is given by the initial value of  plus 3.This is intuitive: As  is not initialized before it is read, the expected final value of  depends on the initial value of .
Let us now consider a possibly nondeterministic program .In this case, the final distribution of states obtained from executing  on some initial state  might not be unique: it generally depends on the resolution of the nondeterminism.It does therefore no longer make sense to speak about the expected value of  w.r.t. the distribution of final states.Instead, we reason about optimal values: dwp/awp  ( ) () = minimal/maximal expected values of  w.r.t.all (sub-)distributions of final states reached after executing  on initial state  .
To see this, consider the preexpectation of a guarded choice in more detail: We have i.e, if both  1 and  2 evaluate to true under , then the above quantity is the minimum (resp.maximum) of the preexpectations of the two branches  1 and  2 .Consider the program  ′ shown in Figure 9 as an example.It contains  from Figure 8 as a subprogram.The annotations on the left-hand side determine dwp  ′ () -mapping initial states to the minimal expected final value of  -, and the right-hand side's annotations determine awp  ′ () -mapping initial states to the maximal expected final of . ′ first nondeterministically assigns either 1 or 4 to variable , and then executes .The minimal expected final value of  is 4, whereas the maximal one is 7. 93:11

MDP Semantics vs. Weakest Preexpectations
There is a tight connection between reachability-reward objectives in MDPs and weakest preexpectations.It is this tight connection which enables us to link our program-level strategy synthesis techniques to the well-known problem of synthesizing strategies in MDPs.More concretely, dwp  ( ) () and awp  ( ) () are the minimal and maximal expected rewards in the MDP M (), respectively, where postexpectation  induces the reward function: upon reaching a terminal configuration (⇓,  ′ ), a reward of  ( ′ ) is collected.Formally: Theorem 3.2 ( [Batz et al. 2019;Gretz et al. 2012 In particular, if  is deterministic, then so is M (), and we have wp  ( ) () = ExpRew M () rew  (, ) .
The dwp and awp calculi can thus be understood as a means to reason deductively about reachabilityreward objectives of possibly infinite-state MDPs modeled by pGCL programs.

A BIRD'S EYE VIEW: PROGRAMMATIC STRATEGY SYNTHESIS
Before we deal with the fully fledged formalization of our approach, we set the stage in this section by (i) formalizing our problem statement (Section 4.1), (ii) giving an informal description of our approach for loop-free programs (Section 4.2), and (iii) demonstrating how these technique generalize to programs with loops in Section 4.3.All of this is done in an example-driven manner.

Problem Statement
Recall the synthesis problem for MDPs M from Section 2.2.3 of finding determinizations M ′ which guarantee bounds on the expected rewards of interest.If the state space of M is finite, it is known that the problem can be solved in polynomial time via linear programming [Puterman 1994].However, this technique is in general not applicable to countably infinite-state MDPs, which arise naturally from pGCL programs.Our goal is to obtain program-level strategy synthesis techniques by means of weakest preexpectation reasoning.Towards lifting the synthesis problem to programs, we formalize the notions of implementations and determinizations.Definition 4.1.The implementation relation ⊸ ⊆ pGCL × pGCL is the smallest partial order on pGCL satisfying the rules given in Figure 10.If  ′ ⊸ , then we say that  ′ implements .If moreover  ′ is deterministic, then we say that  ′ is a determinization of .If  ′ ⊸ , then  ′ and  coincide syntactically up to the guards occurring in the guarded choices.The guards in  ′ may be strengthened to resolve some of the nondeterministic choices from , which is formalized by the premises  ′ 1 |=  1 and  ′ 2 |=  2 in the rule for guarded choices.The premise |=  ′ 1 ∨  ′ 2 ensures that  ′ does not eliminate all choices from some guarded choice in .Example 4.2.Consider the programs  1 (left)  2 (middle), and  (right).

𝑥
Since  is loop-free, these annotations are obtained in a syntactic manner by recursively applying the rules given in Figure 7.The topmost annotation tells us that dwp  () =  ⊓, i.e., the minimal expected final value of  is the minimum of the initial values of  and .Towards constructing  min , consider the highlighted intermediate preexpectations  and .These annotations tell us that the expected final value of  will be  if  executes the first branch, and that it will be  if  executes the second branch.Hence, we can readily read off strengthenings of the guards to resolve the nondeterminism in an optimal way: If  <  holds, the first branch should be taken.Conversely, if  >  holds, the second branch should be taken.In case  =  holds, both choices are optimal.We can thus construct the following implementation  ′ of : Program  ′ is still nondeterministic if  =  holds initially.This nondeterminism can, however, be resolved arbitrarily in the sense that any determinization of  ′ will be optimal.We call  ′ an optimal permissive determinization of  w.r.t.postexpectation . ′ is now easily determinized by, e.g., turning one of the inequalities, say  ≤ , into a strict one, obtaining (one choice for)  min : The construction for awp and  max is dual by flipping the inequalities.△ We formalize our construction of such optimal (permissive) determinizations for arbitrary loop-free programs in Section 6.Now reconsider our problem statement from Section 4.1: Given  ∈ pGCL, postexpectation  , ∼ ∈ {⊑, ⊒}, and a threshold  ∈ E, if it exists, find a determinization  ′ of  with wp  ′ ( ) ∼ .
In Section 6 further below, we show that the solution for loop-free  is as follows: • If ∼ is ⊑, then  ′ exists if and only if dwp  ( ) ⊑ , in which case  ′ is given by  min .
• If ∼ is ⊒, then  ′ exists if and only if awp  ( ) ⊒ , in which case  ′ is given by  max .
That is,  ′ exists iff the minimal (resp.maximal) expected final value of  is upper (resp.lower) bounded by () for every initial state .Moreover,  ′ can be constructed as exemplified above.
from [Batz et al. 2021b]  Remark 4.5.Our results do not imply decidability of the existence of the sought-after determinizations.Depending on the arithmetic necessary for expressing  , dwp  ( ), awp  ( ), or , quantitative entailments of the form dwp  ( ) ⊑  or awp  ( ) ⊒  are often undecidable.△

Second
Step: From Quantitative Loop Invariants to Determinizations of Loops Naturally, constructing determinizations of loops is more involved.Reasoning about minimal and maximal expected outcomes of loops requires reasoning about least fixpoints (cf. Figure 7), which are uncomputable in general [Kaminski et al. 2019].How can we nonetheless determinize loops as asked for by our problem statement?Our primary observation is that quantitative loop invariants yield determinizations of nondeterministic probabilistic loops .
These quantitative loop invariants are generally hard to find, let alone algorithmically.Our key insight here is that once we have a quantitative loop invariant at hand, we can use it to find determinizations of loops.Our approach thus applies to any technique which verifies pGCL programs by means of quantitative loop invariant-based reasoning satisfying the assumptions formalized in Section 6.In what follows, we first introduce quantitative loop invariants, and then present an example of our construction for obtaining determinizations guided by these invariants.
On every iteration,  flips a fair coin and, depending on the outcome, either terminates by setting  to 0, or continues iterating after nondeterministically squaring  or incrementing  by one.Now suppose we wish to upper-bound dwp  () -the expectation which maps every initial state to the minimal expected final value of  -by In words, if initially  ≠ 0 holds, then the minimal expected final value is upper-bounded by the initial value of  plus 1. Conversely, if initially  = 0 holds, then  is not modified at all.We employ Theorem 4.7 to prove that dwp  () ⊑  indeed holds5 by verifying that  is a dwp-superinvariant of  w.r.t.: Obtaining lower bounds from subinvariants requires additional conditions, see Section 4.3.3.

From Loop
Invariants to Loop Determinization.How can we use the concept of superinvariants for finding determinizations as demanded by our problem statement?Let us, for the moment, focus on the conceptually simpler case of aiming for determinizations which guarantee upper bounds on expected outcomes, i.e., let  be a loop and  ,  ∈ E and suppose we want to find a determinization  ′ of  with wp  ′ ( ) ⊑ .We proceed as follows: (1) Find a dwp-superinvariant  of  w.r.t. such that  satisfies  ⊑ , and (2) compute -guided by  -a determinization  ′ of  such that  as well is a dwp-superinvariant of  ′ w.r.t. , i.e., determinizing  to  ′ preserves superinvariance of  w.r.t. .
Step (1) is undecidable in general.We assume that  is provided by some external means.
Step (2), on the other hand, is as syntactic as our construction for loop-free programs from Section 4.2.Steps (1) and ( 2) together solve the given problem instance because wp  ′ ( ) For that, recall from Section 4.2 that we can determinize loop-free programs in an optimal permissive manner.In particular, for the loop-free body  body of  we can construct  ′ body such that dwp  body ( ) = wp  ′ body ( ) .Consequently,  ′ = while  ≠ 0 { ′ body } will be a determinization of  satisfying wp  ′ () Thm. 4.7 ⊑  and therefore, by transitivity, also wp  ′ () ⊑  .
To construct  ′ body , we proceed as in Section 4.2: Annotate  body for determining dwp  body ( ) and derive the strengthenings for the guards from the highlighted intermediate preexpectations: which yields the correct-by-construction determinization  ′ of  satisfying wp  ′ () ⊑  =  + 1: we may proceed in a way analogous to the above procedure for upper bounds: (1) Find an awp-subinvariant  such that both vc  ( ) and  ⊒  hold, and (2) compute -guided by  -a determinization  ′ of  such that vc  ′ ( ) is satisfied as well, i.e., determinizing  into  ′ preserves validity of the verification conditions.These side conditions come in different flavors 6 and their restrictiveness typically depends on the expressive power of  and the postexpectation  .Moreover, finding such side conditions is an active field of research [Hark et al. 2020].This motivates our general framework presented in Section 6, where we abstract from the specific side conditions under consideration.This framework may then be instantiated with verification conditions tailored to the specific problem instances under consideration.Finally, we remark that our framework also handles determinizations of arbitrary pGCL programs -possibly containing nested-and sequenced loops.

On Completeness and Inherent Incompleteness of our Approach
We show in Section 6.1 that our approach for upper bounds as outlined in Section 4.3.2 is complete: For all pGCL programs  and all postexpectations  , our framework can (in theory) produce a determinization  ′ ⊸  such that wp  ′ ( ) = dwp  ( ).In words, we can find a determinization that actually realizes the minimal preexpectation.This matches a known result about countably infinite MDP, namely that MD strategies for minimizing expected rewards independent of the initial state always exist (cf.Section 2.2.4).The situation is fundamentally different for the maximizing expectation transformer awp: For some  and  there does not exist a deterministic  ′ ⊸  such that wp  ′ ( ) = awp  ( ).A simple counterexample is the (non-probabilistic) loop  = while  ≠ 0 {if true { :=  + 1} □ true { := 0}} with postexpectation  =  /+1.Assuming that  ∈ N, this program realizes the MDP in Figure 6 (black states & blue reward function).It is clear that awp  ( ) = [ ≠ 0] • 1 + [ = 0] •  /+1, but no determinization  ′ of  can actually terminate in a state where  = 1.However, recall that under the mild assumptions of Theorem 2.5, existence of (-)optimal determinizations is guaranteed.Theorem 6.9).If  or  are unbounded, more restrictions are needed [Hark et al. 2020

REASONING WITH VERIFICATION CONDITIONS
Towards our framework for determinizing pGCL programs, this section introduces a simple verification condition generator adapted from [Navarro and Olmedo 2022].A distinguishing aspect of our formulation is that -following the motivation from the previous section -our verification condition generator is parameterized by the invariant-based proof rule for reasoning about loops.
Auxiliary Expectation Transformers.To simplify notation for reasoning about (possibly nested-or sequenced) loops, recall from Section 2.1 that all loops occurring in a pGCL program  are annotated with a quantitative loop invariant  ∈ E, i.e., all loops in  are of the form while  { ′ } ⟨ ⟩ .We define for each T ∈ {dwp, awp} (and T = wp in case we deal with deterministic programs) an auxiliary expectation transformer T *  .The inductive definition of T *  ( ) is completely analogous to the definition of T  ( ), except for loops, for which we define T * while  { ′ } ⟨ ⟩ ( )  .
In particular, for loop-free , T *  and T  coincide.As is standard in verification conditionbased program verification [Leino 2010], T *  replaces the (generally uncomputable) least fixpoint by the (externally provided) annotated invariant so that determining T *  ( ) reduces to syntactic reasoning.The idea is then to define suitable verification conditions for  and  in such a way that the validity of these conditions implies that T *  ( ) upper-or lower-bounds T  ( ).
The Parametric Verification Condition Generator.In order to parameterize our verification condition generator by the invariant-based proof rule employed for approximating expected outcomes of loops, we consider objects ℭ of type { ∈ pGCL |  = while  { ′ } ⟨ ⟩ } × E → B and call them verification condition providers (vc-providers, for short).The truth value ℭ(,  ) indicates whether loop  with invariant  satisfies a verification condition w.r.t.postexpectation  .Now fix a vc-provider ℭ and some T ∈ {dwp, awp} for the remainder of this section.Intuitively, vc T ℭ  ( ) is true if and only if all loops occurring in  satisfy the verification condition given by the vc-provider ℭ.Notice that, for the sequential composition  1  2 , we employ the auxiliary transformer T * on  2 to determine the postexpectation for the vc of  1 since we reason with the annotated loop invariants instead of least fixpoints.The verification condition for the body  ′ of a loop while  { ′ } ⟨ ⟩ is taken w.r.t.postexpectation  since, for nested loops, an inner loop's verification condition depends on the outer loop's invariant (see Example 5.3).We (the dwp * of a loop is given by its invariant) (denote this invariant by  )  ( is the postexpectation) Fig. 12. Program  with annotations for determining dwp *  ().Invariant  ′ of the inner loop is . Annotations for the inner loop are omitted.
are interested in vc-providers ℭ so that validity of vc T ℭ  ( ) does indeed imply that T *  ( ) bounds T  ( ): Definition 5.2.We say that ℭ yields upper bounds for T ∈ {dwp, awp}, if for all  ∈ pGCL and all  ∈ E : vc T ℭ  ( ) implies T  ( ) ⊑ T *  ( ) .Analogously, we say that ℭ yields lower bounds for T ∈ {dwp, awp}, if for all  ∈ pGCL and all  ∈ E : vc T ℭ  ( ) implies T  ( ) ⊒ T *  ( ) .△ It is useful to note that, since both dwp  and awp  are monotonic (Theorem B.1.1), it actually suffices to show that ℭ yields upper (resp.lower) bounds for loops in order to conclude that ℭ yields upper (resp.lower) bounds for all pGCL programs.See Lemma B.2 for details.
Example 5.3 (An unbounded nested loop).Recall from Theorem 4.7 that dwp-superinvariants establish upper bounds on demonic weakest preexpectations of loops.Using Lemma B.2, it is thus straightforward to show that the vc-provider SUPERINV given by yields upper bounds for dwp.Notice that, for the loop body  ′ , we consider the auxiliary transformer dwp * to enable reasoning about nested loops.This comes at the cost of the typical dependency between inner-and outer invariants: dwp *  ′ ( ) must not approximate dwp  ′ ( ) too coarsely, i.e., if  ′ itself contains loops, then the inner loop's superinvariants must be tight enough for the outer  to satisfy the above superinvariant condition.Now consider program  from Figure 12, which we annotate for determining dwp *  ().Program  contains a nested loop.On every iteration, the outer loop, which we denote by  1 , nondeterministically chooses a value for  and a probability for .It then flips a fair coin and either terminates or executes the inner loop, which we denote by  2 .Loop  2 depends on the choices made by  1 : It keeps flipping a coin with bias  and either terminates or keeps incrementing  by .
The annotations yield the conditions for vc dwp SUPERINV  () to be true.Recall that both loops must satisfy their respective verification conditions: The postexpectation for the outer loop  1 is , which yields the condition SUPERINV( 1 , ).Moreover, the postexpectation for the inner loop  2 is the invariant of the outer loop -, which yields the condition SUPERINV( 2 ,  ).Both conditions can be shown to be true.Since SUPERINV yields upper bounds for dwp, we thus get dwp  () ⊑ dwp *  () = 3 ⊓  , i.e., the minimal expected final value of  is upper-bounded by the minimum of 3 and .We will see in the next section how the validity of vc dwp SUPERINV  () and the annotation of  1 's body yield a correct-by-construction determinization of  which preserves this bound.△

DETERMINIZATIONS OF pGCL PROGRAMS FROM WEAKEST PREEXPECTATIONS
In this section, we formalize our approach for obtaining determinizations of pGCL programs from weakest preexpectations based on our parametric verification conditions from Section 5.
First, recall from Section 4 that our approach is based on (pointwise) comparisons of intermediate preexpectations obtained from applying the rules in Figure 7.To formalize this, we define two functions ⪯, ⪰ : E × E → P given by We often write  ⪯  and  ⪰  instead of ⪯ ( , ) and ⪰ ( , ).In words,  ⪯  (resp. ⪰ ) yields a predicate which evaluates to true on state  iff  () is upper-(resp.lower-) bounded by ().These predicates form our sought-after strengthenings of guards to resolve nondeterminism.Based on the above notion, we define the following expectation-based program transformation: Let us go over the rules in Figure 13.skip statements and assignments are never transformed.For the sequential composition  1  2 , we transform  2 w.r.t. and  1 w.r.t. to T *  2 ( ).Notice that we employ the auxiliary transformer from Section 5 since our determinizations are guided by the annotated loop invariants.The guarded choice is the most interesting case.Recall from our informal description from Section 4 that we construct strengthenings for the guards  1 and  2 by comparing the intermediate preexpectations T *  1 ( ) and T *  2 ( ) of the two branches.Whenever both  1 and  2 hold, there is nondeterminism that is to be resolved, which is realized by applying function ⊲⊳ ∈ {⪯, ⪰} to the intermediate preexpectations.Here, ⇒ is the standard implication between predicates, i.e,  ⇒  is false on state  iff  |=  and  ̸ |=  .We refer to Section 4.2 for an illustrative example, where the corresponding intermediate preexpectations are highlighted.For the probabilistic choice, we simply transform the two respective branches.Finally, for loops, we transform the loop body w.r.t. the annotated loop invariant (cf.Example 6.6).
As demonstrated in Section 4.2, the program transformations generally yield programs which are still nondeterministic.This remaining nondeterminism can, however, be resolved in an arbitrary manner in the sense that dtrans(,  ) and atrans(,  ) yield optimal permissive determinizations (cf.Section 4.2) w.r.t. the auxiliary transformers dwp * and awp * : Theorem 6.3.Let ,  ′ ∈ pGCL and  ∈ E. We have Moreover, we have Proof.By induction on .See Appendix A.3 for details.□ In particular, if  is loop-free then dwp *  ( ) and dwp  ( ) (resp.awp *  ( ) and awp  ( )) coincide, so the above theorem indeed yields that our transformation is optimal and correct for loop-free programs.In the presence of loops, we need to ensure that the respective annotated invariants soundly bound the sought-after preexpectations.For that, recall from Section 5 that we introduced vc-providers ℭ and from Section 4.3 that our idea is to construct determinizations which preserve validity of the so-defined verification conditions.This motivates the following definition: Definition 6.4.We say that dtrans preserves ℭ if for all ,  ′ ∈ pGCL and all  ∈ E, Moreover, we say that atrans preserves ℭ if for all ,  ′ ∈ pGCL and all  ∈ E, vc awp ℭ  ( ) and  ′ ⊸ atrans (,  ) implies vc awp ℭ  ′ ( ) .△ In words, all implementations  ′ of dtrans(,  ) (including all determinizations and dtrans(,  ) itself) must preserve validity of the verification condition given by the vc-provider ℭ (analogously for atrans (,  )). the respective program transformations preserve the bounds obtained from the annotated loop invariants, i.e., dwp *  ( ) and awp *  ( ), respectively, which is the main result of this section: Theorem 6.5.Assume that dtrans preserves ℭ.If ℭ yields upper bounds for dwp, then for all ,  ′ ∈ pGCL and all  ∈ E, Dually, assume that atrans preserves ℭ.If ℭ yields lower bounds for awp, then for all ,  ′ ∈ pGCL and all  ∈ E, Proof.We prove the claim for dwp.The reasoning for awp is analogous.First, dwp  ( ) ⊑ dwp  ′ ( ) holds by Theorem 6.2 and Lemma 4.3.Moreover, since dtrans preserves ℭ, we have vc dwp ℭ  ′ ( ).Hence, since ℭ yields upper bounds for dwp, we get dwp  ′ ( ) Example 6.6.Recall the vc-provider SUPERINV from Example 5.3, which yields upper bounds for dwp.Using Theorem 6.3, it is immediate that dtrans preserves SUPERINV.Now reconsider program  from Figure 12.Since dwp * is obtained using the annotated loop invariants, the highlighted intermediate preexpectations are obtained in a purely syntactic manner.Applying minor arithmetic simplifications to the guard strengthenings obtained from these highlighted preexpectations, we thus also obtain dtrans(, ) in a purely syntactic manner: The above nested loop still behaves nondeterministically whenever 6 =  holds at the beginning of an outer loop iteration.However, thanks to Theorem 6.5, any determinization  ′ of dtrans (, ) (obtained by, e.g., turning the first inequality 6 ≤  into a strict one) satisfies i.e, the expected final value of  of any determinization  ′ is upper-bounded by 3 ⊓ .△

Completeness
Recall from Section 4.4 that our approach is inherently incomplete when it comes to finding determinizations which guarantee lower bounds on expected outcomes of pGCL programs.Yet, Theorem 6.5 yields a sufficient condition for our approach to be complete for subclasses of problem instances.Let Π ⊆ pGCL and Ξ ⊆ E. We call a vc-provider ℭ demonically (resp.angelically) complete w.r.Theorem 6.7.Let ℭ be demonically complete w.r.t.(Π, Ξ) and preserved by dtrans.Then, for all  ∈ Π and  ∈ Ξ, there exist invariant annotations for  such that for all  ′ ∈ pGCL, Analogously, let ℭ be angelically complete w.r.t.(Π, Ξ) and preserved by atrans.Then, for every  ∈ Π and  ∈ Ξ, there exist invariant annotations for  such that for all  ′ ∈ pGCL, As mentioned in Section 4.4, Theorem 6.7 immediately yields that our approach is complete w.r.t.upper bounds: the vc-provider SUPERINV from Examples 5.3 and 6.6 is demonically complete w.r.t.
(pGCL, E) by annotating each loop occurring in some program  with its respective least fixpoint (cf. Figure 7).We provide a complete subclass for lower bounds in the next section.
6.2 Instances of our Framework 6.2.1 Upper-Bounded Determinizations.We restate the vc-provider SUPERINV for later reference.
Theorem 6.8.The vc-provider SUPERINV yields upper bounds for dwp and is preserved by dtrans: SUPERINV(while We briefly discuss additional instances employing more advanced proof rules in Section 9.
6.2.2 Lower-Bounded Determinizations.Lower-bounding weakest preexpectations of loops is more involved.There exist specialized proof rules, which typically restrict the expressive power of the invariants, the postexpectation, and the termination behavior of the loop under consideration.
We Proof.Preservation by atrans follows immediately from Lemma 4.3 and Theorem 6.3 and the fact that implementations  ′ of  remain dAST and dPAST, respectively.The fact that the providers yield lower bounds for awp follows from [Hark et al. 2020, Theorem 38]  that dPASTSUBINV does not require  or  to be bounded which comes at the cost of requiring the stronger termination criterion dPAST and being suitable for optional stopping.

CASE STUDIES
In this section, we apply our framework to synthesize bound-guaranteeing determinizations of pGCL programs, all of which can be understood as countably infinite-state MDPs.

Game of Nim
We consider a variant of the game Nim (e.g.[Wikipedia 2023]), a 2-player zero-sum game which goes as follows:  tokens are placed on a table.The players take turns; in each turn, the player has to remove 1, 2, or 3 tokens from the table.The first player to remove the last token looses the game.Suppose we are interested in finding a strategy performing reasonably well against an opponent that plays randomly.We model this situation as the pGCL program  Nim in Figure 2 on page 3. Variable  counts the number of tokens that have been removed so far.Variable  ∈ {1, 2} indicates which player takes the next token(s).The randomized opponent is player 1.If the program terminates in state where  = , then player  wins the game.
We claim that the controllable player 2 can guarantee the following when starting the game with  <  removed tokens: • If it's player 1's turn and  + 1 ≡ 4  (i.e.,  + 1 −  is a multiple of 4), or if it's player 2's turn and  + 1 4  , then player 2 can win with probability one.• If it' player 1's turn but  + 1 4  , then player 2 can still win with probability ≥ 2 /3.Formally, the maximal probability that player 2 wins is given by awp  Nim ( [ = 2]).We wish to find a determinization  ′′ Nim of  Nim -a strategy for player 2 -which guarantees the above lower bound on player 2's probability to win, i.e., For that, we annotate  Nim with the subinvariant  Nim : (see Appendix C.4 for a proof of subinvariance) Applying atrans( Nim , [ = 2]), we obtain  ′ Nim ⊸  Nim shown in Figure 3 on page 3. Since  Nim is clearly dAST (it iterates at most  −  times), we may employ the vc-provider from Theorem 6.9.1 to conclude that  ′ Nim represents a permissive controller for player 2 which guarantees the above bound, i.e., each deterministic  ′′ Nim ⊸  ′ Nim satisfies the requirement ( †).
7.2 Optimal Gambling 7.2.1 Maximizing the Winning Probability.Consider the following gambling situation.A gambler has to collect  tokens to win a prize.The game is played in rounds: In each round, the gambler has to choose between flipping two coins with different biases.Suppose the coins yield heads with probability  and , respectively.If the result of the coin flip is tails, then the game is immediately lost.Otherwise, the gambler wins one token in case of the bias- coin, and two tokens in case of the bias- coin.The goal is to maximize the probability of winning given an initial budget  of tokens, and the game parameters , , and  .We briefly describe the optimal way to play this game (note that the probability to win two tokens in consecutive rounds with the bias- coin is  2 ): • If  2 ≥ , then playing with the bias- coin is optimal.
• Similarly, if  ≤ , then the gambler should always choose the bias- coin.
• Otherwise  2 <  < .In this case, the optimal choice depends on the current budget : If only  −  = 1 token is needed to win the game, then it is better to choose the bias- coin.More generally, if  −  is an odd number, then the best strategy is to play once with the bias- coin and up to ( − −1) /2 times with the bias- coin (the order is irrelevant).The gamble is readily modeled as the pGCL program  Gamb in Figure 14 (we assume that ,  ∈ N,  ∈ {0, 1}, and ,  ∈ [0, 1]).Note that the game is lost as soon as  = 1.Finding an optimal gambling Proc.ACM Program.Lang., Vol. 8, No. POPL, Article 93.Publication date: January 2024.strategy that minimizes the probability of losing thus amounts to finding

Programmatic Strategy Synthesis
For that, we annotate  Gamb with the superinvariant  Gamb : See Appendix C.2 for a proof of superinvariance.Now, applying dtrans( Gamb , [ = 1]), we obtain  ′ Gamb ⊸  Gamb shown in Figure 15.We may thus apply the vc-provider from Theorem 6.8 to conclude that dwp  Gamb ( [ = 1]) ⊑ dwp  ′ Gamb ( [ = 1]).In fact, it can be shown that  Gamb is the exact least fixpoint7 of  Gamb , so we obtain that

𝐶 ′
Gamb is a correct-by-construction optimal implementation of  Gamb w.r.t.minimizing the probability that  = 1 holds when the program terminates.

RELATED WORK
Strategy Synthesis in Markov Decision Processes.MDPs have a rich mathematical theory [Puterman 1994] and widespread applications across different fields.In machine learning, the well-known reinforcement learning problem is typically phrased in terms of MDPs, see e.g.[van Otterlo and Wiering 2012].However, RL usually does not provide strict guarantees about the resulting strategy.Exact analysis of MDPs -i.e., finding provably correct strategies as we do in this paper-is one of the primary problems studied in the Probabilistic Model Checking (PMC) community, see [Katoen 2016] and references therein for an overview.PMC tools such as PRISM [Kwiatkowska et al. 2002] or Storm [Hensel et al. 2022] support strategy synthesis for MDPs given in pGCL-like modeling languages.Compact symbolic representations of such strategies have been studied as well [Ashok et al. 2020].The main difference to our approach is that these tools, and in fact most of the PMC literature, support only finite MDPs and work by exploring the full state space.Several subclasses of infinite MDPs have also been studied, including solvency games [Berger et al. 2008], 1-counter and recursive MDPs [Brázdil et al. 2010;Etessami and Yannakakis 2015], or parametric MDPs [Junges et al. 2021].Some of these works yield efficient algorithms, e.g.deciding if a target state can be reached with probability one in an MDP with one unbounded counter is decidable in PTIME [Brázdil et al. 2010].Our pGCL programs subsume these models, but their high expressivity comes at the cost of general undecidability.
Beyond PMC, researchers in AI have studied symbolic dynamic programming [Boutilier et al. 2001;Sanner et al. 2011], a class of logic-based representations and solution methods for MDPs.These methods can be seen as a symbolic variant of value iteration.In contrast to that, our approach is based on programs and uses invariants rather than explicit iteration.
Program Refinement.Our approach is related to program refinement, where, originally, the goal is to refine an abstract model or specification to an executable (non-probabilistic) program [Abrial 2010;Back and von Wright 1998;Lamport 2002].Later, the concept of program refinement has been generalized to the probabilistic setting [Aouadhi et al. 2019;Hoang et al. 2005;McIver and Morgan 2005].Our problem statement can be understood as a program refinement problem: Given a nondeterministic pGCL program  (the abstract model) satisfying some specification (i.e., dwp  ( ) ⊑  or awp  ( ) ⊒ ), refine  to (a determinization)  ′ which preserves this specification.The aforementioned approaches typically allow for more general specifications, which, however, comes with the loss of mechanizability.Given loop invariants satisfying their respective verification conditions, our approach is highly constructive as we obtain refinements in a syntactic manner.Rather than being a formal system for deriving provably correct algorithms, our approach is thus more tailored to a planning/AI setting, where the nondeterminism models different ways for an agent/adversary to behave, and where one wants to find suitable strategies.

while
Fig. 2. Program modeling the Nim game.
for all but finitely many  ∈ Vars} .The set of predicates over States is P =  |  : States → {true, false} .We write  |=  instead of  ∈ .A predicate  is valid, denoted |= , if  |=  for every , and it is called unsatisfiable if ¬ is valid.Programs  in the probabilistic guarded command language pGCL adhere to the grammar Loops are annotated with quantitative invariants  which are functions of type States → R ∞ ≥0 (see Section 4.3 for details).We often omit the invariant  if it is irrelevant in the current context.We briefly describe each pGCL construct.skipdoesnothing.:=evaluates expression  in the current state and assigns the resulting value to variable . 1  2 first executes  1 , and then -if 1 terminates - 2 .The guarded choice if  1 { 1 } □  2 { 2 }first checks which of the guards  1 and  2 evaluates to true under the current state.If only one of the guards, say  1 , evaluates to true, then the guarded choice deterministically executes  1 .If both  1 and  2 evaluate to true, then the guarded choice behaves nondeterministically by either executing  1 or  2 .Notice that standard conditional choice if 1 {} □  2 {} (guarded choice) | {  } [] {  } (probabilistic choice) | while  {} ⟨ ⟩ (loop with invariant annotation  ) where  is a function of type States → Vals called expression,  1 ,  2 and  are predicates from P, and  is a probability expression of type States → [0, 1].For the guarded choice, we require that for every state , either  |=  1 or  |=  2 or both hold, i.e., that  1 ∨  2 is valid.We call a program  ∈ pGCL deterministic if for all if  1 { ′ 1 } □  2 { ′ 2 } occurring in ,  1 ∧  2 is unsatisfiable.
Transition probabilities equal to 1 are omitted.

𝑠 𝑘 , ℓ, 𝑠 𝑘+1 ) probability to move in one step from 𝑠 𝑘 to 𝑠 𝑘+1 under strategy 𝔖 , with the convention that the empty product (for 𝑚 = 0) is equal to
deterministic (MD) can be identified with maps of type  → .An MD strategy  :  →  for M induces a deterministic MDP M  = (,  init , , P ′ ) where for ,  ′ ∈  and  ∈ ,We call MDP M ′ a determinization of M if there exists an MD strategy  such that M ′ = M  .2.2.3 Objectives.The fundamental objective considered in this paper is called reachability-reward, a mixture of reachability and expected reward objectives.Intuitively, reachability-reward is like standard reachability, with the difference that each state in the target set  is a sink and has a R ∞ ≥0 -valued reward that is collected when that state is reached for the first time.The goal is to either minimize or maximize the expected reward.We now formalize reachability-reward.For  ⊆ , the set of finite paths eventually reaching  is♢ = {  0 ...∈  + |   ∈  , ∀ ∈ {0, ...,  − 1} :   ∉  } .Moreover, given  0 ...∈  + and strategy , we define Prob  ( 0 . . .  ) = 1.Finally, given rew :  → R ∞ ≥0 , we define the function ExpRew  M (rew) :  init → R ∞ ≥0 , which maps every initial state  init to its expected (reachability-)reward ExpRew  M (rew) ( init ) under strategy  byExpRew  M (rew) ( init ) = ∑︁  init  1 ...  ∈♢ Prob  ( init  1 . . .  ) • rew(  ) .Example 2.4.Reconsider the MDP M () from Figure5.Suppose that reward function rew assigns the value of  to each terminal state in the top row.Then MaxExpRew M () (rew) (, ) =  () + 2 for every initial state (, ) with  () = 0.This maximal expected reward is attained by the MD strategy that always chooses action .△Thecomparison relation ∼ in the problem statement above refers to the pointwise lifted order ≤ (resp.≥) on R ∞ ≥0 to maps of type  init → R ∞ ≥0 .Note that we are interested in finding strategies that guarantee a given threshold for all initial states.Such strategies are called uniform in the literature.
The strengthened guards  ≤  and  ≥  in  2 resolve some of the nondeterminism from .Program  2 is, however, not a determinization of  since  2 behaves nondeterministically whenever  =  holds initially.Program  1 , on the other hand, is a determinization of both  2 and  since the guards  ≤  and  >  are mutually exclusive.Due to the tight connection between weakest preexpectations and reachability-rewards in MDPs (Theorem 3.2), the above synthesis problem can be understood as a synthesis problem for MDPs: Each program  induces a countable MDP M ().If  ′ is a determinization of , then M ( ′ ) is a determinization of M ().Hence, from an MDP perspective, our problem statement reads: First Step: Optimal Determinizations for Loop-Free Programs Our first key insight is that, for loop-free programs , we can compute 4 -in a purely syntactic manner -optimal determinizations of .Put more formally, given loop-free  and postexpectation  , there are effectively constructible determinizations  min and  max of  with dwp  ( ) = wp  min ( ) and awp  ( ) = wp  max ( ) .exemplify the construction of  min .The construction for  max is dual.Example 4.4.Reconsider the nondeterministic program  from Example 4.2 and fix the postexpectation  .Below we give annotations for determining dwp  () -the expectation which maps every initial state to the minimal expected final value of : △Since  ′ ⊸  possibly permits less nondeterministic choices than , it is an immediate, yet important, characteristic of  ′ that minimal (resp.maximal)expectedoutcomes of  ′ only get larger (resp.smaller)whencompared to expected outcomes of .Formally: Lemma 4.3.Let ,  ′ ∈ pGCL and  ∈ E. We have ′ ⊸  implies dwp  ( ) ⊑ dwp  ′ ( ) and awp  ( ) ⊒ awp  ′ ( ) .Proof.See Appendix A.1.□With the notions of determinizations at hand, we formalize our problem statement:Given  ∈ pGCL, postexpectation  ∈ E, ∼ ∈ {⊑,⊒}, and a threshold  ∈ E, if it exists, find a determinization  ′ of  with wp  ′ ( ) ∼ .Given  ∈ pGCL, postexpectation  ∈ E, ∼ ∈ {≤, ≥}, and a threshold  ∈ E, if it exists, find  ′ such that M ( ′ ) is a determinization of M () and for all  ∈ States : ExpRew M ( ′ ) rew  ( ′ , ) ∼ ().Proc.ACM Program.Lang., Vol. 8, No. POPL, Article 93.Publication date: January 2024.
Reconsider the loop  from Example 4.8 with postexpectation .Now fix expectation  + 1  and suppose we wish to find a determinization  ′ of  with wp  ′ () ⊑ , i.e., the expected final value of  shall be no greater than one plus the initial value of  for all initial states.Recall from Example 4.8 that expectation  = [ ≠ 0] • ( + 1) + [ = 0] •  is a dwp-superinvariant of  w.r.t..Moreover, we have  ⊑ .Hence, if we can determinize  to  ′ in such a way that  remains a superinvariant of  ′ w.r.t., we are done.
3.3 Subinvariants and Lower Bounds.We have so far disregarded the case where we want to find determinizations which guarantee lower bounds on expected outcomes, i.e., for loop  and  , , ∈ E, find a determinization  ′ of  with  ⊑ wp  ′ ( ).Compared to upper-bound reasoning, lowerbounding preexpectations of loops is more involved.T -subinvariants  of  do not necessarily lower-bound T  ( ).This is not surprising as this is already the case for standard weakest preconditions for non-probabilistic programs, which are subsumed by weakest preexpectations.In the realm of weakest preconditions, expectations are replaced by predicates and ⊑ corresponds to |=, i.e., entailment between predicates.Proving  |= wp  ( ) corresponds to establishing a total correctness property of , which generally requires additional side conditions such as termination.
(see Section 6.2.3).□The vc-provider dASTSUBINV is angelically complete w.r.t.{ ∈ pGCL |  is dAST}, { ∈ E |  bounded by some  ∈ R ≥0 } by annotating every loop in a given program  by its respective least fixpoint, which is also bounded by Theorem B.1.2.Theorem 6.7 thus yields completeness of our approach for this class of problem instances.Determining the class for which dPASTSUBINV is complete is an open problem.Notice Proc.ACM Program.Lang., Vol. 8, No. POPL, Article 93.Publication date: January 2024.
[Hark et al. 2020 if  1 { 1 } □  2 { 2 }, the execution relation has one transition for each enabled branch in the current configuration.Recall that exactly one or both branches are enabled.The action labels  and  are used to mark which of the two branches was taken.For a probabilistic choice {  1 } [] {  2 } where  1 ≠  2 are different programs, there are two possible executions: With probability  we execute  1 and with probability 1 − , we execute  2 .If the programs  1 and  2 happen to be equal, i.e.,  1 =  2 = , then the execution proceeds deterministically with program .Note that action label  is used in all transitions except in those that correspond to executing a guarded choice statement.B.2 Suitable for Optional Stopping[Hark et al. 2020] employ the Optional Stopping Theorem to obtain a proof rule for lower bounds on expected outcomes of loops.This proof rule relies on the following side condition: A loop while  { body } ⟨ ⟩ is suitable for optional stopping w.r.t.if (i)  = [] •  ′ + [¬] •  for some  ′ ∈ E, (ii)  , , and [] • awp  body ( ) + [¬] •  are all pointwise smaller than ∞, and (iii)  is demonically conditionally difference bounded, i.e., there is  ∈ R ≥0 such that for all states  we have ( [] • awp  body (| −  ()|)) () ≤ .B.3 Facts about dwp, awp, and vc Theorem B.1 ([McIver and Morgan