Probabilistic Programming Interfaces for Random Graphs: Markov Categories, Graphons, and Nominal Sets

We study semantic models of probabilistic programming languages over graphs, and establish a connection to graphons from graph theory and combinatorics. We show that every well-behaved equational theory for our graph probabilistic programming language corresponds to a graphon, and conversely, every graphon arises in this way. We provide three constructions for showing that every graphon arises from an equational theory. The first is an abstract construction, using Markov categories and monoidal indeterminates. The second and third are more concrete. The second is in terms of traditional measure theoretic probability, which covers 'black-and-white' graphons. The third is in terms of probability monads on the nominal sets of Gabbay and Pitts. Specifically, we use a variation of nominal sets induced by the theory of graphs, which covers Erdős-Rényi graphons. In this way, we build new models of graph probabilistic programming from graphons.


INTRODUCTION
This paper is about the semantic structures underlying probabilistic programming with random graphs.Random graphs have applications in statistical modelling across biology, chemistry, epidemiology, and so on, as well as theoretical interest in graph theory and combinatorics (e.g.[Bornholdt and Schuster 2002]).Probabilistic programming, i.e. programming for statistical modelling [van de Meent et al. 2018], is useful for building the statistical models for the applications.Moreover, as we show (Theorem 23 and Corollary 26), the semantic aspects of programming languages for random graphs correspond to graphons [Lovász 2012], a core structure in graph theory and combinatorics.
To set the scene more precisely, we recall the setting of probabilistic programming with realvalued distributions, and contrast it with the setting with graphs.Many probabilistic programming languages provide a type of real numbers (real) and distributions such as the normal distribution normal : real * real → real (1) together with arithmetic operations such as (+) : real * real → real. (2) Even if we encounter an unfamiliar distribution over (real) in a library, we have a rough idea of how to explain what it could be, in terms of probability densities and measures.
In this paper, we consider the setting of probabilistic programming with graphs, where the probabilistic programming language or library provides a type (vertex) and some distribution new : unit → vertex (3) together with a test edge : vertex * vertex → bool. (4) Our goal is to analyze the interface (vertex, new, edge) for graphs semantically, and answer, for instance, what they could be and what they could do.We give one example analysis in Section 1.1 first, and the general one later in Theorem 23 and Corollary 26, which says that to give an implementation of (vertex, new, edge), satisfying the laws of probabilistic programming, is to give a graphon.In doing so, we connect the theory of probabilistic programming with graph theory and combinatorics.
Probabilistic programming is generally used for statistical inference, in which we describe a generative model by writing a program using primitives such as (1)-( 4) above, and then infer a distribution on certain parameters, given particular observed data.This paper is focused on the generative model aspect, and not inference (although for simple examples, generic inference methods apply immediately, see §1.5).

Example of an Implementation of a Random Graph: Geometric Random Graphs
To illustrate the interface (vertex, new, edge) of ( 3)-( 4), we consider for illustration a random geometric graph (e.g.[Bubeck et al. 2016;Penrose 2003]) where the vertices are points on the surface of the unit sphere, chosen uniformly at random, and where there is an edge between two vertices if the angle between them is less than some fixed  .This random graph might be used, for instance, to model the connections between people on the earth.
For example, a simple statistical inference problem might start from the observed connectivity in Figure 1(a).We might ask for the distribution on  given that this graph arose from the spherical random geometric graph.One sample from this posterior distribution on random geometric graphs with  = /3 is shown in Figure 1(b).Another, unconditioned sample from the random geometric graph with  = /6 is shown in Figure 1(c).We can regard this example as an implementation of the interface (vertex, new, edge) as follows: we implement (vertex) as the surface of the sphere (e.g.implemented as Euclidean coordinates).
• new() : vertex randomly picks a new vertex as a point on the sphere uniformly at random. Figure 1(c) shows the progress after calling new() 15 times.• edge : vertex * vertex → bool checks whether there is an edge between two vertices; this amounts to checking whether the angle between two points is less than  .randomly returns true or false; the probability of true is the probability of a triangle.This implementation using the sphere is only one way to implement (vertex, new, edge).There are implementations using higher-dimensional spheres, or other geometric objects.We can also consider random equivalence relations as graphs, i.e. disjoint unions of complete graphs, or random bipartite graphs, which are triangle-free.We can consider the Erdős-Rényi random graph, where the chance of an edge between two vertices is independent of the other edges, and has a fixed probability.These are all different implementations of the same abstract interface, (vertex, new, edge), and programs such as (5) make sense for all of them.The point of this paper is to characterize all these implementations, as graphons.

Implementations Regarded as Equational Theories
The key method of this paper is to treat implementations of the interface (vertex, new, edge) extensionally, as equational theories.That is, rather than looking at specific implementation details, we look at the equations between programs that a user of the implementation would rely on.(This is analogous to the idea in model theory of studying first-order theories rather than specific models; similar ideas arise in the algebraic theory of computational effects [Plotkin and Power 2002].)For example, if an implementation always provides a bipartite random graph, we have the equation Program (5) ≡ false between programs, because a triangle is never generated.This equation does not hold in the example of Figure 1(b-c), since triangles are possible.We focus on a class of equational theories that are well behaved, as follows.First, we suppose that they contain basic laws for probabilistic programming (eqns.( 7) -(11), §2.2).This basic structure already appears broadly in different guises, including in Moggi's monadic metalanguage [Moggi 1989], in linear logic [Ehrhard and Tasson 2019], and in synthetic probability theory [Fritz 2020].Second, we suppose that the equational theories are equipped with a 'Bernoulli base', which means that although we do not specify an implementation for the type (vertex), each closed program of type (bool) is equated with some ordinary Bernoulli distribution, in such a way as to satisfy the classical laws of traditional finite probability theory ( § 2.4).Finally, we suppose that the edge relation is symmetric (the graphs are undirected) and that it doesn't change when the same question is asked multiple times ('deterministic'), e.g.let  = new() in let  = new() in edge(, ) & ¬edge(, ) ≡ false. (6) A graphon is a symmetric measurable function [0, 1] 2 → [0, 1].We show that every equational theory for the interface (vertex, new, edge) gives rise to a graphon (Theorem 23), and conversely that every graphon arises in this way (Corollary 26).
We emphasize that this abstract treatment of implementations, in terms of equational theories, is very open-ended, and permits a diverse range of implementation methods.Indeed, we show in Section 5 that any implementation using traditional measure-theoretic methods will only produce black-and-white graphons, so this abstract treatment is crucial.

From Equational Theories to Graphons
In Section 3, we show how an equational theory over programs in the interface (vertex, new, edge) gives rise to a graphon.The key first step is that graphons (modulo equivalence) can be characterized in terms of sequences of finite random graphs that satisfy three conditions: exchangeability, consistency, and locality.
To define a graphon, we show how to define programs that describe finite random graphs, by using new and edge to build boolean-valued  ×  adjacency matrices, for all  (shown in ( 18)).Assuming that the equational theory of programs is Bernoulli-based, these programs can be interpreted as probability distributions on the finite spaces of adjacency matrices which, we show, are finite random graphs.
It remains to show that the induced sequence of random graphs satisfies the three conditions for graphons (exchangeability, consistency, and locality).These can be formulated as equational properties, and so they can be verified by using the equational reasoning in the equational theory.This is Theorem 23.A key part of the proof is the observation that exchangeability for graphons connects to commutativity of let (9): we can permute the order in which vertices are instantiated without changing the distributions.

From Graphons to Equational Theories
We also show the converse: every graphon arises from a good equational theory for the interface (vertex, new, edge).We look at this from three angles: first, we prove this in the general case using an abstract method, and then, we use concrete methods for two special cases.
Fixing a graphon, we build an equational theory by following a categorical viewpoint.A good equational theory for probabilistic programming amounts to a 'distributive Markov category', which is a monoidal category with coproducts that is well-suited to probability ( §2.2 and [Fritz 2020]).The idea that distributive categories are a good way to analyze abstract interfaces goes back at least to [Walters 1989], which used distributive categories to study interfaces for stacks and storage.We can thus use now-standard abstract methods for building monoidal and distributive categories to build an equational theory for the programming language.
We proceed in two steps.We first use methods such as [Hermida and Tennent 2012;Hu and Tholen 1995] to build an abstract distributive Markov category that supports the interface (vertex, new, edge) in a generic way.This equational theory is generic and not Bernoulli-based: although it satisfies the equational laws of probabilistic programming, there is no given connection to traditional probability.The second step is to show that (a) it is possible to quotient this generic category to get Bernoulli-based equational theories; (b) the choices of quotient are actually in bijective correspondence with graphons.Thus, we can build an equational theory from which any given graphon arises, via (18): this is Corollary 26.(The framework of Bernoulli-based Markov categories is new here, and the techniques of [Hermida and Tennent 2012;Hu and Tholen 1995] have not previously been applied in categorical probability, so a challenge for future work is to investigate these ideas in other aspects of categorical probability.) Although this is a general method, it is an abstract method involving quotient constructions.The ideal form of denotational semantics is to explain what programs are by regarding them as functions between certain kinds of spaces.Although Corollary 26 demonstrates that every graphon arises from an equational theory, the type (vertex) is interpreted as an object of an abstract category, and programs are equivalence classes of abstract morphisms.In the remainder of the paper, we give two situations where we can interpret (vertex) as a genuine concrete space, and programs are functions or distributions on spaces.Such an interpretation immediately yields an equational theory, where two programs are equal if they have the same interpretation.
• Section 5: For 'black-and-white graphons', we present measure-theoretic models of the interface, based on a standard measure-theoretic interpretation of probabilistic programming (e.g.[Kozen 1981]).We interpret (vertex) as a measurable space, and (new) as a probability measure on it, and (edge) in terms of a measurable predicate.Then, the composition of programs is defined in terms of probability kernels and Lebesgue integration.This kind of model exactly captures the black-and-white graphons (Prop.29).• Section 6: For 'Erdős-Rényi' graphons, which are constantly gray, and not black-and-white, we present a model based on Rado-nominal sets ( §6.1).These are a variant of nominal sets ( [Gabbay and Pitts 1999;Pitts 2013]) where the atoms are vertices of the Rado graph (following [Bojańczyk et al. 2014]).We consider a new notion of 'internal probability measure' in this setting, and use this to give a compositional semantics that gives rise to the Erdős-Rényi graphons (Corollary 45).
Together, these more concrete sections then provide further intuition for the correspondence between equational theories and graphons.

Connection to Practice
We conclude this introduction with remarks on the connection to practical modelling.In practice, the graph interface might form part of a generative model, on which we perform inference.The structure is clearest in a typed language, and one example is the LazyPPL library for Haskell [Dash et al. 2023 We can use this as a building block for more complex models.For a simple example, we generated Figure 1(b) by using the generic Metropolis-Hastings inference of the LazyPPL library to infer  given a particular graph (Fig. 1(a)).We have also implemented other random graphs; our implementation of the Erdős-Rényi graph uses stochastic memoization [Kaddar and Staton 2023;Roy et al. 2008].
Summary and context.As we have discussed, our main result is that equational theories for the programming interface ( §1.1) give rise to graphons ( §1.3) and every graphon arises in this way ( §1.4).These results open up new ways to study random graphs, by using programming semantics.On the other hand, our results here put the abstractions of practical probabilistic programming on a solid theoretical foundation (see also §7).

PROGRAMMING INTERFACES FOR RANDOM GRAPHS: EQUATIONAL THEORIES AND MARKOV CATEGORIES
In Section 1.1, we considered probabilistic programming over a graph interface.To make this formal, we now recall syntax, types, and equational reasoning for simple probabilistic programming languages.We begin with a general syntax ( §2.1), which can accommodate various interfaces in the form of type and term constants, including the interface for graphs (Ex.1(3)).
We study different instantiations of the probabilistic programming language in terms of the equational theories that they satisfy.We consider two equivalent ways of understanding equational theories: as distributive Markov categories ( §2.2) and in terms of affine monads ( §2.3).Markov categories are a categorical formulation of probability theory (e.g.[Fritz 2020]), and affine monads arise in the categorical analysis of probability (e.g.[Fritz et al. 2023;Jacobs 2018;Kock 2012]) as well as in the semantics for probabilistic programming (e.g.[Azevedo de Amorim 2023;Dahlqvist et al. 2018;Dash et al. 2023]).We make a connection with traditional probability via the notion of Bernoulli base ( §2.4).
Much of this section will be unsurprising to experts: the main purpose is to collect definitions and results.The definition of distributive Markov category appears to be novel, and so we go over that definition and correspondence with monads (Propositions 8 and 13).In Section 2.5, we give a construction for quotienting a distributive Markov category, which we will need in Section 4. We include the result in the section because it may be of independent interest.

Syntax for a Generic Probabilistic Programming Language
Our generic probabilistic programming language is, very roughly, an idealized, typed fragment of a typical language like Church [Goodman et al. 2008].We start with a simple programming language (following [Ehrhard and Tasson 2019;Staton 2017;Stein 2021] but also [Moggi 1989]) with at least the following product and sum type constructors: and terms, including the typical constructors and destructors but also explicit sequencing (let in) We consider the standard typing rules (where  ∈ {1, 2}): In what follows, we use shorthands such as bool = unit + unit, and if-then-else instead of case.This language is intended to be a generic probabilistic programming language, but so far there is nothing specifically probabilistic about this syntax.Different probabilistic programming languages support distributions over different kinds of structures.Thus, our language is extended according to an 'interface' by specifying type constants and typed term constants  :  → .For each term constant  :  → , we include a new typing rule, Γ ⊢  :  Γ ⊢  () :  Example 1.We consider the following examples of interfaces.
(1) For probabilistic programming over finite domains, we may have term constants such as bernoulli 0.5 : unit → bool, intuitively a fair coin toss.
(2) For probabilistic programming over real numbers, we may have a type constant real and term constants such as normal : real * real → real, intuitively a parameterized normal distribution, and arithmetic operations such as (+) : real * real → real.
(3) The main interface of this paper is for random graphs: this has a type constant vertex and term constants new : unit → vertex and edge : vertex * vertex → bool.
(We have kept this language as simple as possible, to focus on the interesting aspects.A practical probabilistic programming language will include other features, which are largely orthogonal, and indeed within our implementation in Haskell ( §1.5), programming features like higher order functions and recursion are present and useful.See also the discussion in §2.3.4.)

Equational Theories and Markov Categories
Section 2.1 introduced a syntax for various probabilistic programming interfaces.The idea is that this is a generic language which applies to different interfaces with different distributions that are implemented in different ways.Rather than considering various ad hoc operational semantics, we study the instances of interfaces by the program equations that they support.
Regardless of the specifics of a particular implementation, we expect basic equational reasoning principles for probabilistic programming to hold, such as the following laws: (where  ∉ fv( ′ ) and  ′ ∉ fv()) (9) The following law does not always hold, but does hold when  is 'deterministic'.
Equations ( 9) and (10) say that parts of programs can be re-ordered and discarded, as long as the dataflow is respected.This is a feature of probabilistic programming.For example, coins do not remember the order nor how many times they have been tossed.But these equations would typically not hold in a language with state.
The cleanest way to study equational theories of programs is via a categorical semantics, and for Markov categories have arisen as a canonical setting for categorical probability.Informally, a category is a structure for composition, and this matches the composition structure of let in our language.We also have monoidal structure which allows for the type constructor  ×  and for the compound contexts Γ, comonoid structure which allows duplication of variables, and distributive coproduct structure which allows for the sum types.Definition 2. A symmetric monoidal category (C, ⊗,  ) is a category C equipped with a functor ⊗ : C × C → C and an object  together with associativity, unit and symmetry structure ( [Mac Lane 1998, XI.1.]).A Markov category ( [Fritz 2020]) is a symmetric monoidal category in which • the monoidal unit  is a terminal object ( = 1), and • every object  is equipped with a comonoid Δ  :  →  ⊗  , compatible with the tensor product (Δ  ⊗ = ( ⊗ swp ⊗  ) • (Δ  ⊗ Δ  ), where swp is the swap map of C).
A morphism  :  →  in a Markov category is deterministic if it commutes with the comonoids: A distributive symmetric monoidal category (e.g.[Jay 1993;Walters 1989]) is a symmetric monoidal category equipped with chosen finite coproducts such that the canonical maps  ⊗  +  ⊗  → ( +  ) ⊗  and 0 → 0 ⊗  are isomorphisms.A distributive Markov category is a Markov category whose underlying monoidal category is also distributive and whose chosen coproduct injections  →  +  ←  are deterministic.A distributive category [Carboni et al. 1993;Cockett 1993] is a distributive Markov category where all morphisms are deterministic.
A (strict) distributive Markov functor is a functor  : C → D between distributive Markov categories which strictly preserves the chosen symmetric monoidal, coproduct, and comonoid structures.
In this paper we mainly focus on functors between distributive Markov categories that strictly preserve the relevant structure, so we elide 'strict'.(Nonetheless, non-strict functors are important, e.g.[Fritz 2020, §10.2] and Prop.13.) We interpret the language of Section 2.1 in a distributive Markov category C by interpreting types  and type contexts Γ as objects  and Γ , and typed terms Γ ⊢  :  as morphisms Γ →  .(See e.g.[Pitts 2001] for a general discussion of terms as morphisms.) In more detail, to give such an interpretation, type constants must first be given chosen interpretations as objects of C. We can then interpret types and contexts using the monoidal and coproduct structure of C. Following this, term constants  :  →  must be given chosen interpretations as morphisms  :  →  in C. The interpretation of other terms is made by induction on the structure of typing derivations in a standard manner, using the structure of the distributive Markov category (e.g.[Benton et al. 1992], [Stein 2021, §7.2]).For example, An interpretation in a Markov category induces an equational theory between programs: let Proposition 3 (e.g.[Stein 2021], §7.1).The equational theory induced by the interpretation in a distributive Markov category, with given interpretations of type and term constants, always includes the equations ( 7)-( 10), and also (11) whenever  is a deterministic morphism.
Example 4. The category (FinSet, ×, 1) of finite sets is a distributive Markov category.As in any category with products, each object has a unique comonoid structure, and all morphisms are deterministic.This is a good Markov category for interpreting the plain language with no type or term constants.For example, bool is a set with two elements.
Example 5.The category FinStoch has natural numbers as objects and the morphisms are stochastic matrices.In more detail, a morphism  →  is a matrix in (R ≥0 ) × such that each row sums to 1. Composition is by matrix multiplication.The monoidal structure is given on objects by multiplication of numbers, and on morphisms by Kronecker product of matrices.By choosing an enumeration of each finite set, we get a functor FinSet → FinStoch that converts a function to the corresponding (0/1)-valued matrix.So every object of FinStoch can be regarded with the comonoid structure from FinSet.The deterministic morphisms in FinStoch are exactly the morphisms from FinSet [Fritz 2020, 10.3].This is a good Markov category for interpreting the language with Bernoulli distributions (Ex.1(1)).We interpret the fair coin as the 1 × 2 matrix (0.5, 0.5).
We can also give some interpretations for the graph interface (Ex.1(3)) in FinStoch.For instance, consider random graphs made of two disjoint complete subgraphs, as is typical in a clustering model.We can interpret this by putting vertex = 2, edge = ( 1 0 0 1 0 1 1 0 ) ⊤ , and new = (0.5, 0.5).
We look at other examples of distributive Markov categories and interpretations of these interfaces in Sections 2.3.2 and 2.3.3, and then in Sections 4-6.2.3 Equational Theories and Affine Monads 2.3.1 Distributive Markov Categories from Affine Monads.One way to generate equational theories via Markov categories is by considering certain kinds of monads, following Moggi [Moggi 1989].Definition 6.A strong monad on a category A with finite products is given by • for each object  , an object  ( ); • for each object  , a morphism   :  →  ( ); • for objects , ,  , a family of functions natural in such that >>= is associative with unit .
(There are various different formulations of this structure.When A is cartesian closed, as in Defs.9 and 41, then the bind (>>=) is represented by a morphism (>>=) :  ( ) × ( ⇒  ( )) →  ( ), by the Yoneda lemma.)Definition 7 ( [Jacobs 1994;Kock 1970;Lindner 1979]).Given a strong monad  , we say that two morphisms  : The Kleisli category Kl( ) of a strong monad  has the same objects as A, but the morphisms are different: Kl( ) (, ) = A(, ()).There is a functor  : A → Kl( ), given on morphisms by composing with  (e.g.[Mac Lane 1998, §VI.5], [Moggi 1989]).Proposition 8. Let  be a strong monad on a category A. If  is commutative and affine and A has finite products, then the Kleisli category Kl( ) has a canonical structure of a Markov category.Furthermore, if A is distributive, then Kl( ) can be regarded as a distributive Markov category.
Proof notes.The Markov structure follows [Fritz 2020, §3].Since  is commutative, the product structure of A extends to a symmetric monoidal structure on Kl( ).Since  (1) = 1, the monoidal unit (1) is terminal in Kl( ).Every object in A has a comonoid structure, and this is extended to Kl( ) via  .The morphisms in the image of  are deterministic, although this need not be a full characterization of determinism.
For the distributive structure, recall that  preserves coproducts and indeed it has a right adjoint.Hence, the coproduct injections will be deterministic.□ We can thus interpret the language of Section 2.1 using any strong monad, interpreting the types  as objects  of A, and a term Γ ⊢  :  as a morphism  : Γ →  (  ).This interpretation matches Moggi's interpretation of the language of Section 2.1 in a strong monad.

Example Affine Monad: Distribution Monad.
Definition 9 (e.g.[Jacobs 2016], §4.1).The distribution monad D on Set is defined as follows: • On objects: each set  is mapped to the set of all finitely-supported discrete probability measures on  , that is, all functions  :  → R that are non-zero for only finitely many elements and satisfy  ∈  () = 1.• The unit   :  → D ( ) maps  ∈  to the indicator function .[ = ], i.e. the Dirac distribution   .• The bind function (>>=) is defined as follows: By the standard construction for strong monads, each morphism  :  →  gets mapped to D  : D → D , that is, the pushforward in this case: Consider the language with no type constants, and just the term constant bernoulli 0.5 (Ex.1( 1)).This can be interpreted in the distribution monad.Since every type  is interpreted as a finite set  , and every context Γ as a finite set Γ , a term Γ ⊢  :  is interpreted as a function Γ → D  .To give a Kleisli morphism between finite sets is to give a stochastic matrix, and so the induced equational theory is the same as the interpretation in FinStoch (Ex.5).

Example Affine
Monad: Giry Monad.We recall some rudiments of measure-theoretic probability.
Definition 10.A -algebra on a set is a non-empty collection of subsets that contains the empty set and is closed under countable unions and complements.A measurable space is a pair (, Σ) of a set and a -algebra on it.A measurable function A probability measure on a measurable space (, Σ) is a function  : Σ → [0, 1] that has total mass 1 ( ( ) = 1) and that is -additive:  ( ∞ =1   ) = ∞ =1  (  ) for any sequence of disjoint   .Examples of measurable spaces include: the finite sets  equipped with their powerset -algebras; the unit interval [0, 1] equipped with its Borel -algebra, which is the least -algebra containing the open sets.Examples of probability measures include: discrete probability measures (Def.9); the uniform measure on [0, 1]; the Dirac distribution The product of two measurable spaces (, Σ  ) × ( , Σ  ) = ( ×  , Σ  ⊗ Σ  ) comprises the product of sets with the least -algebra making the projections  ←  ×  →  measurable.The category of measurable spaces and measurable functions is a distributive category.
A probability kernel between measurable spaces (, Σ  ) and ( , Σ  ) is a function  :  × Σ  → [0, 1] that is measurable in the first argument and that is -additive and has mass 1 in the second argument.
To compose probability kernels, we briefly recall Lebesgue integration.Consider a measurable space (, Σ  ), a measure  : Σ  → [0, 1], and a measurable function If  is not a simple function, there exists a sequence of increasing simple functions  1 ,  2 , . . .:  → [0, 1] such that sup    () =  () (for example, by taking   () def = ⌊10   ()⌋/10  ).In that case, the integral is defined to be the limit of the integrals of the   's (which exists by monotone convergence).
Probability kernels can be equivalently formulated as morphisms  → G( ), where G is the Giry monad: Definition 11 ( [Giry 1980]).The Giry monad G is a strong monad on the category Meas of measurable spaces given by • G( ) is the set of probability measures on  , with the least -algebra making ∫  d(−) : G( ) → [0, 1] measurable for all measurable  :  → [0, 1]; • the unit  maps  to the Dirac distribution   ; • the bind is given by composing kernels: Proposition 12.The monad G is commutative and affine.
Proof notes.Commutativity boils down to Fubini's theorem for reordering integrals and affineness is marginalization (since probability measures have mass 1).See also [Jacobs 2018].□ Consider the real-numbers language (Ex.1(2)).Let real = R, with the Borel sets, and interpret normal as the normal probability measure on R. The basic arithmetic operations are all measurable.
Among the following three programs the programs ( 14) and ( 15) denote the same normal distribution with variance 2, whereas (13) denotes a distribution with variance 4. Notice that we cannot use ( 11) to equate all the programs, because normal is not deterministic.We can also interpret the Bernoulli language (Ex.1(1)) in the Giry monad; this interpretation gives the same equational theory as the interpretation in FinStoch and in the distribution monad in Section 2.3.2.
We can also give some interpretations for the graph interface (Ex.1(3)) in the Giry monad.For an informal example, consider the geometric example from Section 1.1, let vertex =  2 (the sphere), and define new to be the uniform distribution on the sphere.(See also Section 5.2.) 2.3.4Affine Monads from Distributive Markov Categories.The following result, a converse to Proposition 8, demonstrates that the new notion of distributive Markov category (Def.2) is a canonical one, and emphasizes the close relationship between semantics with distributive Markov categories and semantics with commutative affine monads.
Proposition 13.Let C be a small distributive Markov category.Then, there is a distributive category A with a commutative affine monad  on it and a full and faithful functor C → Kl( ) that preserves symmetric monoidal structure, comonoids, and sums.
Proof notes.Our proof is essentially a recasting of [Power 2006b, §7] to this different situation, as follows.
Let C det be the wide subcategory of C comprising the deterministic morphisms, and write  : C det → C for the identity-on-objects inclusion functor.Note that C det is a distributive category.We would like to exhibit C as the Kleisli category for a monad on C det , but this might not be possible: intuitively, C det might be too small for the monad to exist.Instead, we first embed C det in a larger category A and construct a monad on A.
The main construction in our proof is the idea that if X is a small distributive monoidal category, then the category FP(X op , Set) of finite-product-preserving functors is such that • FP(X op , Set) is cocomplete and moreover total ([Street and Walters 1978]) as a category; • FP(X op , Set) admits a distributive monoidal structure; • the Yoneda embedding X → [X op , Set], which is full and faithful, factors through FP(X op , Set), and this embedding X → FP(X op , Set) preserves finite sums and is strongly monoidal; • the Yoneda embedding exhibits FP(X op , Set) as a free colimit completion of X as a monoidal category that already has finite coproducts.So we let A = FP(C op det , Set) comprise the finite-product-preserving functors C op det → Set.This is a distributive category.To get a monad on A, we note that since FP(C op , Set) has finite coproducts and C det → C → FP(C op , Set) preserves finite coproducts and is monoidal, the monoidal structure induces a canonical colimit-preserving monoidal functor  !: FP(C op det , Set) → FP(C op , Set).Any colimit-preserving functor  !out of a total category has a right adjoint  * , and hence a monoidal monad ( *  ! ) is induced on A.
It remains for us to check that the embedding C → FP(C op , Set) factors through the comparison functor Kl( *  ! ) → FP(C op , Set), which follows from the fact that  : C det → C is identity on objects. □ As an aside, we note that, although our simple language in Section 2.1 did not include higherorder functions, the category A constructed in the proof of Proposition 13 is cartesian closed, and since the embedding is full and faithful, this shows that higher-order functions would be a conservative extension of our language.Indeed, this kind of conservativity result was part of the motivation of [Power 2006b].For the same reason, inductive types (lists, and so on) would also be a conservative extension.We leave conservativity with other language features for future work.Recursion in probabilistic programming is still under investigation [Ehrhard et al. 2018;Goubault-Larrecq et al. 2021;Jia et al. 2021;Matache et al. 2022;Vákár et al. 2019]; there is also the question of conservativity with respect to combining Markov categories, e.g.combining real number distributions ((1)-( 2)) with graph programming ((3)-( 4)).

Bernoulli Bases, Numerals and Observation
Although an interface may have different type constants, it will always have the 'numeral' types, sometimes called 'finite' types: For probabilistic programming languages, there is a clear expectation of what will happen when we run a program of type bool: it will randomly produce either true or false, each with some probability.Similarly for other numeral types.For type constants, we might not have evident notions of observation or expected outcomes.But for numeral types, it should be routine.We now make this precise via the notion of Bernoulli base.
On the semantic side, distributive Markov categories will always have 'numeral' objects For any type  formed without type constants, and any Markov category, we have that   for some numeral object.Any equational theory for the programming language induces in particular an equational theory for the sub-language without any type constants.Proposition 14.For any distributive Markov category C, let C N be the category whose objects are natural numbers, and where the morphisms are the morphisms in C between the corresponding numeral objects.This is again a distributive Markov category.
(1) FinSet N = Set N is equivalent to FinSet as a category.
Recall that a functor is faithful if it is injective on hom-sets.Definition 16.A Bernoulli base for a distributive Markov category C is a faithful distributive Markov functor Ψ : C N ↣ FinStoch.
Thus, for any distributive Markov category with a Bernoulli base, for any closed term ⊢  :  of numeral type (  = ), we can regard its interpretation  : 1 →  as nothing but a probability distribution Ψ(  ) on  outcomes.This is the case even if  uses term constants and has intermediate subterms using type constants.
Example 17.All the examples seen so far can be given Bernoulli bases.In fact, for FinStoch, Kl(D) and Kl(G), the functor Ψ : C N ↣ FinStoch is an isomorphism of distributive Markov categories.
When Ψ is an isomorphism of categories, that means that all the finite probabilities are present in C.This is slightly stronger than we need in general.For instance, when C = FinSet, there is a unique Bernoulli base Ψ : FinSet N ↣ FinStoch, taking a function to a 0/1-valued matrix, but it is not full.We could also consider variations on FinStoch.For example, consider the subcategory FinQStoch of FinStoch where the matrices are rational-valued; this has a Bernoulli base that is not an isomorphism.

Quotients of Distributive Markov Categories
We provide a new, general method for constructing a Bernoulli-based Markov category out of a distributive Markov category.Our construction is a categorical formulation of the notion of contextual equivalence.
Recall that, in general, contextual equivalence for a programming language starts with a notion of basic observation for closed programs at ground types.We then say that programs Γ ⊢ ,  :  at other types are contextually equivalent if for every context C with ⊢ C [], C [] : , for some ground type , we have that C [] and C [] satisfy the same observations.In the categorical setting, the notion of observation is given by a distributive Markov functor C N → FinStoch, and the notion of context C is replaced by suitable morphisms (ℎ,  below).We now introduce a quotient construction that will be key in showing that every graphon arises from a distributive Markov category (Corollary 26), via Theorem 23.We note that this is a general new method for building Markov categories.
Proposition 18.Let C be a distributive Markov category, and let Ψ : C N → FinStoch be a distributive Markov functor.Suppose that for every object  ∈ C, either  = 0 or there exists a morphism 1 →  .Then, there is a distributive Markov category C/ Ψ with a Bernoulli base, equipped with a distributive Markov functor C → C/ Ψ and a factorization of distributive Markov functors Informally, our equivalence relation considers all ways of generating  's via precomposition (ℎ), all ways for testing  's via postcomposition (), and all ways of combining with some ancillary data ( ).It is essential that we consider all these kinds of composition in order for the quotient category to have the categorical structure.
It is immediate that composition of morphisms respects ∼, and hence we have a category: the objects are the same as C, and the morphisms are ∼-equivalence classes.This is our category C/ Ψ .
The functor Ψ : C N → FinStoch clearly factors through (C/ Ψ ) N , but it remains to check that the functor (C/ Ψ ) N → FinStoch is now faithful (Bernoulli base).So suppose that Ψ( ) = Ψ().To show that  ∼  : 1 → , we consider ℎ : 1 → 1 ⊗  , and  :  ⊗  → .We must show that The graph interface for the probabilistic programming language (Ex.1(3)) does not have one fixed equational theory.Rather, we want to consider different equational theories for the language, corresponding to different implementations of the interface for the graph (see also §1.2).We now show how the different equational theories for the graph language each give rise to a graphon, by building adjacency matrices for finite graphs (shown in ( 18)).To do this, we set up the well-behaved equational theories ( §2.4), recall the connection between graphons and finite random graphs ( §3.1), and then show the main result ( §3.2, Theorem 23).

Graphons as Consistent and Local Random Graph Models
For all  ≥ 1, let [] be the set {1, . . ., }.(We sometimes omit the square brackets, when it is clear.)A simple undirected graph  with  nodes can be represented by its adjacency matrix   ∈ 2 [] 2 such that   (, ) = 0 and   (, ) =   ( , ).Henceforth, we will assume that finite graphs are simple and undirected, unless otherwise stated.A random finite graph, then, has a probability distribution in D 2 [] 2 that only assigns non-zero probability to adjacency matrices.
Definition 19 (e.g.[Lovász 2012, §11.2.1]).A random graph model is a sequence of distributions of random finite graphs of the form: We say such a sequence is • exchangeable if each of its elements is invariant under permuting nodes: for every  and bijection  : ] 2 is the function that permutes the rows and columns according to ; we are regarding D as a covariant functor, Def. 9, and 2 (−) as a contravariant functor); • consistent if the sequence is related by marginals: for every  and for the inclusion function where 2 ( 2 ) : 2 ( [+1] 2 ) → 2 [] 2 is the evident projection); • local if the subgraphs are independent: if  ⊆ [] and  ⊆ [] are disjoint, then we have an injective function  : is the evident pairing of projections).
Note.There are various methods for constructing  from an exchangeable, consistent and local random graph model, however all are highly non-trivial.A general idea is that  is a kind of limit object.For examples see e.g.[Lovász and Szegedy 2006, §11.3] or [Tao 2013].Fortunately though, we will not need explicit constructions in this paper.□

Theories of Program Equivalence Induce Graphons
In this section we consider the instance of the generic language with the graph interface (Ex.1(3)): We consider a theory of program equivalence, i.e. a distributive Markov category with a distinguished object vertex and morphisms new : 1 → vertex and edge : vertex ⊗ vertex → 1 + 1.We make two assumptions about the theory: • The graphs are simple and undirected: : vertex ⊢ edge(, ) ≡ false ,  : vertex ⊢ edge(, ) ≡ edge(, ) and edge is deterministic.• The theory is Bernoulli based ( §2.4).
For each  ∈ N, we can build a random graph with  vertices as follows.We consider the following program   : (Here we use syntactic sugar, writing a matrix instead of iteratively using pairs.)Because the equational theory is Bernoulli-based, the interpretation   induces a probability distribution Ψ   on 2 ( 2 ) .For clarity, we elide Ψ in what follows, since it is faithful.Proposition 22.Each random matrix in (18) is a random adjacency matrix, i.e. a random graph.
Proof note.This follows from (17).□ Theorem 23.For any Bernoulli-based equational theory, the random graph model (   )  in (18) is exchangeable, consistent, and local.Thus, the equational theory induces a graphon.
Exchangeability.We show that the distribution   is invariant under relabeling the nodes.By commutativity of the let construct (9), the program satisfies    =   .Hence, D (2  2 ) (   ) =    =   , for every  and bijection  : [] → [].Consistency.We define a macro subm  in the graph programming language to extract a submatrix at the index set  ⊆ []: we have the (definitional) equality We need to show that, if we delete the last node from a graph sampled from  +1 , the resulting graph has distribution   .This amounts to the affineness property (10), as follows.Let  ∼  +1 be a random graph, and let  ′ def =  | [] be the graph obtained by deleting the last node from .Then clearly, the adjacency matrix of  ′ is the adjacency matrix of  where the last row and column have been removed, i.e.  ′ is sampled from the interpretation of the program: (by ( 10)) ≡   .
Locality.Without loss of generality (by exchangeability and consistency), we need to show that for every random graph  ∼   and 1 <  < , the subgraphs    , (by ( 7)) and  ′ ∼ D (2  ) (   ) is indeed sampled from the interpretation of the latter program, which yields the result.□

FROM GRAPHONS TO PROGRAM EQUATIONS
In Section 3, we showed how a distributive Markov category modelling the graph interface (Ex.1(3)) gives rise to a graphon.In this section, we establish a converse: every graphon arises in this way (Corollary 26).Theorem 25 will establish slightly more: there is a 'generic' distributive Markov category ( §4.1) modelling the graph interface whose Bernoulli-based quotients are in precise correspondence with graphons ( §4.2).This approach also suggests an operational way of implementing the graph interface for any graphon ( §4.3).

A Generic Distributive Markov Category for the Graph Interface
We construct this generic category in two steps.We first create a distributive Markov category, actually a distributive category, Fam(G op ), that supports (vertex, edge).We then add new using the monoidal indeterminates method of [Hermida and Tennent 2012].

4.1.1
Step 1: A Distributive Category with edge.We first define a distributive category that supports (vertex, edge).Let G be the category of finite graphs and functions that preserve and reflect the edge relation.That is, a morphism  :  →  ′ is a function  :   →  ′  such that for all ,  ∈   we have   (, ) if and only if   ( (),  ()).
(1) The free coproduct completion Fam(G op ) is a distributive category, with the product vertex  being the sequence of all graphs with  vertices.In particular, vertex 2 is a sequence with two components, the complete graph and the edgeless graph with two vertices.
(2) Let edge : vertex × vertex → 1 + 1 be the morphism (id, {!, !}), intuitively returning true for the edge, and false for the edgeless graph.Here the terminal object 1 of Fam(G op ) is the singleton tuple of the empty graph.This interpretation satisfies (17).
Proof notes.Item (1) follows from [Hu and Tholen 1995], which shows that limits in Fam(G op ) amount to "multi-colimits" in G.For example, the family of all graphs with  vertices is a multicoproduct of the one-vertex graph in G, hence forms a product in Fam(G op ).Item ( 2) is then a quick calculation.All morphisms in Fam(G op ) are deterministic.□

4.1.2
Step 2: Adjoining new.In Section 4.1.1,we introduced a distributive category that interprets the interface (vertex, edge).But it does not support new, and indeed there are no morphisms 1 → vertex .To additionally interpret (new), we freely adjoin it.We essentially use the 'monoidal indeterminates' method of Hermida and Tennent [Hermida and Tennent 2012] to do this.Their work was motivated by semantics of dynamic memory allocation, but has also been related to quantum phenomena [Andrés-Martínez et al. 2022;Huot and Staton 2018] and to categorical gradient/probabilistic methods [Cruttwell et al. 2021;Fong et al. 2021;Shiebler 2021], where it is known as the 'para construction'.It is connected to earlier methods for the action calculus [Pavlović 1997].
Let FinSetInj be the category of finite sets and injections.It is a monoidal category with the disjoint union monoidal structure (e.g.[Fiore 2005;Power 2006a]).Consider the functor  : FinSetInj op → Fam(G op ), with  () = vertex  , and where the functorial action is by exchange and projection.This is a strong monoidal functor.(Indeed, it is the unique monoidal functor with  (1) = vertex .)For any monoidal functor, Hermida and Tennent [Hermida and Tennent 2012] where ì  = ( 1 , . . .,   ).In particular, when  = 1, i.e. ì  =  is a singleton sequence, we have Composition and monoidal structure accumulate in vertex  , as usual in the monoidal indeterminates ('para') construction, e.g.• Fam(G op ) [] is a distributive Markov category.
• Fam(G op ) [] supports the graph interface, via the interpretation of (vertex, edge) in Fam(G op ), but also with the interpretation new =  : 1 → vertex .

Bernoulli Bases for Random Graph Models
The following gives a precise characterization of graphons in terms of the numerals of Fam(G op ) [].
Theorem 25.To give a distributive Markov functor Fam(G op ) [] N → FinStoch is to give a graphon.
Proof outline.We begin by showing a related characterization: that graphons correspond to certain natural transformations.Observe that any distributive Markov category C gives rise to a symmetric monoidal functor C(1, −) : FinSet N → Set, regarding the numerals of FinSet N as objects of C ( §2.4).Let   = 2  ( −1)/2 be the set of -vertex graphs.We can characterize the natural An element of this limit of sets is by definition a sequence of distributions   on   that is invariant under reindexing by FinSetInj op .Since injections are generated by inclusions and permutations, this is then a sequence that is consistent and exchangeable (Def.19), respectively.Such a natural transformation  is monoidal if and only if the sequence is also local.Hence a monoidal natural transformation is the same thing as a random graph model.
In fact, every monoidal natural transformation  : Fam(G op ) [] (1, −) → FinStoch(1, −) arises uniquely by restricting a distributive Markov functor  : Fam(G op ) [] N → FinStoch.We now show this, to conclude our proof.Given , let  , : Fam(G op ) [] N (, ) → FinStoch(, ) be: It is immediate that this  preserves the symmetric monoidal structure and coproduct structure, but not that  is a functor.However, the naturality of  in FinSet N gives us that  preserves postcomposition by morphisms of FinSet N .All of this implies that general categorical composition is preserved as well, since, in any distributive Markov category of the form C N , for  :  →  and  :  → , the composite  •  :  →  is equal to where   =  •   for  = 1, . . .,  and eval is just the evaluation map  ×   →  in FinSet.□ Corollary 26.Every graphon arises from a distributive Markov category via the random graph model in (18).Proof summary.Given a graphon, we consider the distributive Markov functor that corresponds to it, Ψ : Fam(G op ) [] N → FinStoch, by Theorem 25.Using the quotient construction of Proposition 18, we get a distributive Markov category with a Bernoulli base.It is straightforward to verify that the random graph model induced by ( 18) is the original graphon.□

Remark on Operational Semantics
The interpretation in this section suggests a general purpose operational semantics for closed programs at ground type, ⊢  : , along the following lines: (1) Calculate the interpretation There are no probabilistic choices in this step, it is a symbolic manipulation, because the morphisms of the Markov category Fam(G op ) [] are built from tuples of finite graph homomorphisms.In effect, this interpretation pulls all the new's to the front of the term.(2) Apply the Markov functor Ψ(  ) to obtain a probability distribution on , and sample from this distribution to return a result.

INTERPRETATION: BLACK-AND-WHITE GRAPHONS VIA MEASURE-THEORETIC PROBABILITY
In Section 4, we gave a general syntactic construction for building an equational theory from a graphon.Since that definition is based on free constructions and quotients, a priori, it does not 'explain' what the type vertex stands for.Like contextual equivalence of programs, a priori, it does not give useful compositional reasoning methods.To prove two programs are equal, according to the construction of Prop.18, one needs to quantify over all  , ℎ, and , in general.
In this section, we show that one class of graphons, black-and-white graphons (Def.27), admits a straightforward measure-theoretic semantics, and we can thus use the equational theory induced by this semantics, rather than the method of Section 4. This measure-theoretic semantics is close to previous measure-theoretic work on probabilistic programming languages (e.g.[Kozen 1981;Staton 2017]).

Black-and-White Graphons from Equational Theories
Definition 27. [e.g.[Janson 2013 Recall that the Giry monad (Def.11) gives rise to a Bernoulli-based distributive Markov category ( §2.3.3,Ex. 15).For any black-and-white graphon  , we define an interpretation of the graph interface for the probabilistic programming language using G, as follows.
Proposition 28.Let  be a black-and-white graphon.The equational theory induced by −  induces the graphon  according to the construction in Section 3.2.
Proof.Suppose that corresponds to the sequence of random graphs  1 ,  2 , . . .as in Section 3.1.Consider the term   in (18), and directly calculate its interpretation.Then, we get    =   , via ( 16), as required.
The choice of  does not matter in the interpretation of these terms, because  =  almost everywhere.□

All Measure-Theoretic Interpretations are Black-and-White
Although the model in Section 5.1 is fairly canonical, there are sometimes other enlightening interpretations using the Giry monad.These also correspond to black-and-white graphons.For example, consider the geometric-graph example from Figure 1.We interpret this using the Giry monad, putting • vertex =  2 , the sphere; bool = 2; • new() = Uniform( 2 ), the uniform distribution on the sphere; • edge (, ) =  ( (, ) <  ), i.e. an edge if their distance is less than  .
This will again induce a graphon, via (18).We briefly look at theories that arise in this more flexible way: Proposition 29.Consider any interpretation of the graph interface in the Giry monad: a measurable space vertex , a measurable set edge ⊆ vertex 2 , and a probability measure new() on vertex .The induced graphon is black-and-white.
If vertex is not standard Borel, we note that there is an equivalent interpretation where it is, because there exists a measure-preserving map vertex → Ω to a standard Borel space Ω and a measurable set  ⊆ Ω 2 that pulls back to edge , giving rise to the same graphon (e.g.[Janson 2013, Lemma 7.3]).□ Discussion.Proposition 29 demonstrates that this measure-theoretic interpretation has limitations.
Definition 30.For  ∈ (0, 1), the Erdős-Rényi graphon The Erdős-Rényi graphons cannot arise from measure-theoretic interpretations of the graph interface, because they are not black-and-white.In Section 6, we give an alternative interpretation for the Erdős-Rényi graphons.
However, this interpretation does not provide a model for the basic equations of the language, because this edge is not deterministic, and derivable equations such as (6) will fail.Intuitively, once an edge has been sampled between two given nodes, its presence (or absence) remains unchanged in the rest of the program, i.e. the edge is not resampled again, it is memoized (see also [Kaddar and Staton 2023;Roy et al. 2008]).
Although not all graphons are black-and-white, these are still a widely studied and useful class.They are often called 'random-free'.For example, an alternative characterization is that the random graph model of Prop.21 has subquadratic entropy function [Janson 2013, §10.6].

INTERPRETATION: ERDŐS-RÉNYI GRAPHONS VIA RADO-NOMINAL SETS
In Section 4, we gave a general construction to show that every graphon arises from a Bernoullibased equational theory.In Section 5, we gave a more concrete interpretation, based on measuretheory, for black-and-white graphons.We now consider the Erdős-Rényi graphons (Def.30), which are not black-and-white.
Rado-nominal sets ( §6.1) are sets that are equipped with an action of the automorphisms of the Rado graph, which is an infinite graph that contains every finite graph.There is a particular Rado-nominal set V of the vertices of the Rado graph.The type vertex will be interpreted as V; edge is interpreted using the edge relation  on V.The equational theory induced by this interpretation gives rise to the Erdős-Rényi graphons (Def.30).
Since Rado-nominal sets form a model of ZFA set theory (Prop.36), we revisit probability theory internal to this setting.We consider internal probability measures on Rado-nominal sets ( §6.3), and we show that there are internal probability measures on V that give rise to Erdős-Rényi graphons ( §6.3).The key starting point here is that, internal to Rado-nominal sets, the only functions V → 2 are the sets of vertices that are definable in the language of graphs ( §6.2).
We organize the probability measures (Def.37) into a probability monad on Rado-nominal sets ( §6.4), analogous to the Giry monad.Fubini does not routinely hold in this setting ( §6.4.4), but we use a standard technique to cut down to a commutative affine monad ( §6.4.5).This gives rise to a Bernoulli-based equational theory, and in fact, this theory corresponds to : Corollary 45).

Definition and First Examples
The Rado graph (V, ) ( [Ackermann 1937;Rado 1964], also known as the 'random graph' [Erdős and Rényi 1959]) is the unique graph, up to isomorphism, with a countably infinite set of vertices that has the extension property: if ,  are disjoint finite subsets of V, then there is a vertex  ∈ V \ ( ∪ ) with an edge to all the vertices in  but none of the vertices in .
The Rado graph embeds every finite graph, which can be shown by using the extension property inductively.
An automorphism of the Rado graph is a graph isomorphism V → V.The automorphisms of the Rado graph relate to isomorphisms between finite graphs, as follows.First, if  is a finite graph regarded as a subset of V, then any automorphism  induces an isomorphism of finite graphs   [𝐴].Conversely, if  :   is an isomorphism of finite graphs, and we regard  and  as disjoint subsets of V, then there exists an automorphism  of V that restricts to  (i.e. =  |  ).
We write Aut (Rado) for the group of automorphisms of (V, ).(This has been extensively studied in model theory and descriptive set theory, e.g.[Angel et al. 2014;Kechris et al. 2005 An element  ∈  is defined to have finite support if there is a finite set  ⊆ V such that for all automorphisms , if  fixes  (i.e. |  = id  ), it also fixes  (i.e. •  = ).
(1) The set V of vertices is a Rado-nominal set, with  •  =  ().The support of vertex  is {}.
The support of (, ) is {, }.More generally, a finite product of Rado-nominal sets has a coordinate-wise group action.
(3) The edge relation  ⊆ V × V is a Rado-nominal subset (which is formally defined in §6.2) because automorphisms preserve the edge relation.(4) Any set  can be regarded with the discrete action,  •  = , and then every element has empty support.We regard these sets with the discrete action: 1 = {★}; 2 = {0, 1}; N; and the unit interval [0, 1].

Powersets and Definable Sets
For any subset  ⊆  of a Rado-nominal set, we can define This is a Rado-nominal set.
Example 34.We give some concrete examples of subsets.
(1) For vertices  and  in V with no edge between them, the set { ∈ V |  (, ) ∧  (, )} is the set of ways of forming a horn.It has support {, }.
In fact, the finitely supported subsets correspond exactly to the definable sets in first-order logic over the theory of graphs.The following results may be folklore.
(⇐) This is a consequence of the Ryll-Nardzewski theorem for the theory of the Rado graph (which can be shown to be -categorical by a back-and-forth argument, using the extension property of the Rado graph).But we give here a more direct proof, assuming  = 1 for simplicity.Suppose  ⊆ V is a finite support for .Then, for any ,  ′ ∈ V\, if  and  ′ have the same connectivity to , then they are either both in or not in  since, by the extension property, we can find an automorphism fixing  and sending  to  ′ .The set of vertices with the same connectivity to  as  is definable, and there are only 2 || such sets.Hence, \ is a union of finitely many definable sets, and as  ∩  is definable (being finite), so is  = (\) ∪ ( ∩ ).□ We note that 2  in ( 20) is a canonical notion of internal powerset, from a categorical perspective.
Proof notes.RadoNom can be regarded as continuous actions of Aut (Rado), regarded as a topological group with the product topology, and then we invoke standard methods [Johnstone 2002, Ex. A2.1.6].It is also equivalent to the category of sheaves over finite graphs and embeddings with the atomic topology.See [Caramello 2013[Caramello , 2014] ] for general discussion.□

Probability Measures on Rado-Nominal Sets
The finitely supported sets  ⊆ V can be regarded as 'events' to which we would assign a probability.For example, if we already have vertices  and , we may want to know the chance of picking a vertex that forms a horn, and this would be the probability of the set in Ex. 34(a).
Definition 37. A sequence  1 ,  2 • • • ⊆  is said to be support-bounded if there is one finite set  ⊆ V that supports all the sets   .
We remark that there are two subtleties here.First, we restrict to support-bounded sequences.These are the correctly internalized notion of sequence in Rado-nominal sets, since they correspond precisely to finitely-supported functions N → 2  .Second, we consider a Rado-nominal set to be equipped with its internal powerset 2  , rather than considering sub--algebras.
Measures on the space of vertices.We define an internal probability measure (Def.37) on the space V of vertices, which, we will show, corresponds to the Erdős-Rényi graphon.Fix  ∈ [0, 1], the chance of an edge.
We define the measure   of a definable set  ∈ 2 V as follows.Suppose that  has support { 1 , . . .,   }.We choose an enumeration of vertices ( 1 , . . .,  2  ) in V (disjoint from { 1 , . . .,   }) that covers all the 2  possible edge relationships that a vertex could have with the   's.(For example,  1 has no edges to any   , and  2  has an edge to every   , and the other   's have the other possible edge relationships.)Let: Proposition 38.The assignment given in ( 21) is an internal probability measure (Def.37) on V.
Proof.The function   is well-defined: it does not depend on the choice of   's (by Prop.35), nor on the choice of support (by direct calculation).It is equivariant, since for  • , a valid enumeration of vertices is given by The definitions and results of this section appear to be novel.However, the general idea of considering measures on formulas which are invariant to substitutions that permute the variables goes back to work of Gaifman [Gaifman 1964].The paper [Ackerman et al. 2016a] characterizes those countably infinite graphs that can arise with probability 1 in that framework; see [Ackerman et al. 2017b] for a discussion of how Gaifman's work connects to Prop. 21.

Nominal Probability Monads
Since RadoNom is a Boolean topos with natural numbers object (Prop.36), we can interpret measure-theoretic notions in the internal language of the topos, as long as they do not require the axiom of choice.We now spell out the resulting development, without assuming familiarity with topos theory.By doing this, we build new probability monads on RadoNom.
(For example, the powerobject 2  ( §6.2) can be regarded as [ ⇒ 2], if we regard a set as its characteristic function.) In Def.37, we focused on equivariant probability measures.We generalize this to finitely supported measures.For example, pick a vertex  ∈ V.Then, the Dirac measure on V (i.e.  () = 1 if  ∈ , and   () = 0 if  ∉ ) has support {}.
Definition 39.For a Rado-nominal set  , let P ( ) comprise the finitely supported functions  : 2  → [0, 1] that are internally countably additive, and satisfy  ( ) = 1.This is a Rado-nominal set, as a subset of [2  ⇒ [0, 1]].Functions in P ( ) are called finitely supported probability measures.6.4.2Internal Integration.We revisit some basic integration theory in this nominal setting.In traditional measure theory, one can define the Lebesgue integral of a measurable function  :  → [0, 1] by ∫  () (d) = sup  =1    (  ) where the supremum ranges over simple functions    [− ∈   ] with   measurable in  and bounded above by  ( §2.3.3).The same construction works in the internal logic of RadoNom.
Note that the following does not mention  being measurable: since  is considered to have its internal powerset -algebra, finite-supportedness implies 'measurability' here.
Proof.If  1 , . . .,   ⊆  are finitely supported,  1 , . . .,   ∈ [0, 1], and    [− ∈   ] ≤  , then by ordinary additivity of , we have    (  ) ∈ [0, 1].By ordinary real analysis, the supremum of all such values exists and is in [0, 1].For equivariance, recall that [0, 1] is equipped with the trivial action of Aut (Rado).Use the fact that The last claim is the monotone convergence theorem internalized to RadoNom.□ 6.4.3Kernels and a Monad.We can regard a 'probability kernel' as a finitely supported function  :  → P ( ).Equivalently,  is a finitely supported function  :  × 2  → [0, 1] that is countably additive and has mass 1 in its second argument.(In traditional measure theory, one would explicitly ask that  is measurable in its first argument, but as we observed, finite-supportedness already implies it.) As usual, probability kernels compose, and this allows us to regard them as Kleisli morphisms for a monad (Def.6), defined as follows.
Definition 41.We define the strong monad P on RadoNom as follows.

Commuting Integrals (Fubini).
For measures  1 ∈ P ( ) and  2 ∈ P ( ), the monad structure allows us to define a product measure Although this iterated integration is reminiscent of the traditional approach, in general we cannot reorder integrals ('Fubini does not hold').For example, given two measures   and   for  ≠  and  being the characteristic function of the set {(, ) : However, it does hold when we consider only copies of the same measure.
Here we have not constructed product -algebras, but rather always take the internal powerset as the -algebra.This allows us to view all the definable sets as measurable on V  (Prop.35), which is very useful.We remark that alternative product spaces also arise in non-standard approaches to graphons (see [Tao 2013, §6] for an overview), and also in quasi-Borel spaces [Heunen et al. 2017] for different reasons.
6.4.5A Commutative Monad.We now use Prop.42 to build a commutative affine submonad P  of the monad P, which we will use to model the graph interface for the probabilistic programming language.With Prop.36, we use the following general result.Proposition 43.Let T be a strong monad on a Grothendieck topos.Consider a family of morphisms {  :   → T (  )}  ∈ .
• There is a least strong submonad T  ⊆ T through which all   factor.
• If the morphisms   all commute with each other, then T  is a commutative monad (Def.7).
We let T  be the least subfunctor of T that contains the images of the   's and , and is closed under the image of monadic bind (>>=).To show that this exists, we proceed as follows.First, fix a regular cardinal  >  such that   's are all -presentable, such that the topos is locally -presentable (e.g.[Adámek and Rosický 1994]).Consider the poset Sub  (T ) of -accessible subfunctors of T .The cardinality bound  ensures it is small.Ordered by pointwise inclusion, this is a complete lattice: the non-empty meets are immediate, and the empty meet requires us to consider the -accessible coreflection of T .We defined T  by a monotone property which we can regard as a monotone operator on this complete lattice Sub  (T ), and so the least -accessible subfunctor exists.This is T  .Concretely, it is a least upper bound of an ordinal indexed chain.The chain starts with the functor  0 ( ) =  ∈,:  → image(T () •   ) ⊆ T ( ) which is -accessible because the   's are -presentable.The chain iteratively closes under the image of monadic bind, until we reach a subfunctor that is a submonad of T .
To see that T  is commutative, we appeal to (transfinite) induction.Say that a subfunctor  of T is commutative if all morphisms that factor through  commute (Def.7), and then note that the property of being commutative is preserved along the ordinal indexed chain.□ With this in mind, fixing a measure   as in ( 21), we form the least submonad P  of P induced by the morphisms   : 1 → P (V) bernoulli : [0, 1] → P (2) (24) where bernoulli( ) =  •  (0) + (1 −  ) •  (1).
Corollary 44.The least submonad P  of the probability monad P induced by the morphisms in (24) is a commutative affine monad (Def.7).
Proof notes.It is easy to show that bernoulli commutes with every morphism  → P ( ).Moreover,   commutes with itself (Prop.42).Finally, P  is affine since P is.□
Proof notes.The semantics interprets ground types as finite sets with discrete Aut (Rado) action -in which case internal probability kernels correspond to stochastic matrices, agreeing with FinStoch.Thus, the theory is Bernoulli-based.To see that the graphon arises, consider for instance when  = 2, we have: for  2 as in (18), and therefore  2 =  0 , bernoulli() bernoulli(),  0 : P (2 4 ) For general , this corresponds to the random graph model    , for the Erdős-Rényi graphon   .□

CONCLUSION
Summary.We have shown that equational theories for the graph interface to the probabilistic programming language (Ex. 1) give rise to graphons (Theorem 23).Conversely, every graphon arises in this way.We showed this generally using an abstract construction based on Markov categories (Corollary 26) and methods from category theory [Hermida and Tennent 2012;Hu and Tholen 1995].Since this is an abstract method, we also considered two concrete styles of semantic interpretation that give rise to classes of graphons: traditional measure-theoretic interpretations give rise to black-and-white graphons (Prop.28), and an interpretation using the internal probability theory of Rado-nominal sets gives rise to Erdős-Rényi graphons (Corollary 45).
Further context, and future work.The idea of studying exchangeable structures through program equations is perhaps first discussed in the abstract [Staton et al. 2017], whose §3.2 ends with an open question about semantics of languages with graphs that the present paper addresses.Subsequent work addressed the simpler setting of exchangeable sequences and beta-bernoulli conjugacy through program equations [Staton et al. 2018], and stochastic memoization [Kaddar and Staton 2023]; the latter uses a category similar to RadoNom, although the monad is different.Beyond sequences [Staton et al. 2018] and graphs (this paper), a natural question is how to generalize to arbitrary exchangeable interfaces (see e.g.[Orbanz and Roy 2015]).For example, we could consider exchangeable random boolean arrays via the interface new-row : unit → row, new-column() : unit → column, entry : row * column → bool and random hypergraphs with the interface new : unit → vertex, hyperedge  : vertex  → bool.
We could also consider interfaces for hierarchical structures, such as arrays where every entry contains a graph.Diverse exchangeable random structures have been considered from the modeltheoretic viewpoint [Ackerman 2015;Crane and Towsner 2018] and from the perspective of probability theory (e.g.[Campbell et al. 2023;Jung et al. 2021;Kallenberg 2010]), but it remains to be seen whether the programming perspective here can provide a unifying view.Another point is that graphons correspond to dense graphs, and so a question is how to accommodate sparse graphs from a programming perspective (e.g.[Caron and Fox 2017;Veitch and Roy 2019]).
This paper has focused on a very simple programming language ( §2.1).As mentioned in Section 1.5, several implementations of probabilistic programming languages do support various Bayesian nonparametric primitives based on exchangeable sequences, partitions, and relations (e.g.[Dash et al. 2023;Goodman et al. 2008;Kiselyov and Shan 2010;Mansinghka et al. 2014;Roy et al. 2008;Wood et al. 2014]).In particular, the 'exchangeable random primitive' (XRP) interface [Ackerman et al. 2016b;Wu 2013] provides a built-in abstract data type for representing exchangeable sequences.This aids model design by its abstraction, but also aids inference performance by clarifying the independence relationships.
Aside from practical inference performance, we can ask whether representation and inference are computable.For the simpler setting of exchangeable sequences, this is dealt with positively by [Freer andRoy 2010, 2012].The question of computability for graphons and exchangeable graphs is considerably subtler, and some standard representations are noncomputable [Ackerman et al. 2019] (see also [Ackerman et al. 2017a]).This suggests several natural questions about whether certain natural classes of computable exchangeable graphs can be identified by program analyses in the present context.
For a simple example, we can write a program over the interface to calculate the probability of three random vertices forming a triangle: the program let  = new() in let  = new() in let  = new() in edge(, ) & edge(, ) & edge(, ) : bool (5)
provide monoidal indeterminates by introducing a 'polynomial category', by analogy with a polynomial ring.Unfortunately, a general version for distributive monoidal categories is not yet known, so we focus on the specific case of  : FinSetInj op → Fam(G op ).We build a new category Fam(G op ) [ :  FinSetInj op ], which we abbreviate Fam(G op ) [].It has the same objects as Fam(G op ), but the morphisms ì  → ì  are equivalence classes of morphisms [,  ] : vertex  × ì  → ì  in Fam(G op ), modulo reindexing.The reindexing equivalence relation is generated by putting [,  ] ∼ [, ] when there exist injections  1 . . .  :  →  such that