Bit Blasting Probabilistic Programs

Probabilistic programming languages (PPLs) are expressive means for creating and reasoning about probabilistic models. Unfortunately hybrid probabilistic programs, involving both continuous and discrete structures, are not well supported by today's PPLs. In this paper we develop a new approximate inference algorithm for hybrid probabilistic programs that first discretizes the continuous distributions and then performs discrete inference on the resulting program. The key novelty is a form of discretization that we call bit blasting, which uses a binary representation of numbers such that a domain of $2^b$ discretized points can be succinctly represented as a discrete probabilistic program over poly($b$) Boolean random variables. Surprisingly, we prove that many common continuous distributions can be bit blasted in a manner that incurs no loss of accuracy over an explicit discretization and supports efficient probabilistic inference. We have built a probabilistic programming system for hybrid programs called HyBit, which employs bit blasting followed by discrete probabilistic inference. We empirically demonstrate the benefits of our approach over existing sampling-based and symbolic inference approaches.


INTRODUCTION
Probabilistic programming languages (PPLs) are an expressive means for creating and reasoning about probabilistic models.Many such models are naturally hybrid, involving both continuous (e.g., Gaussian distributions) and discrete structures (e.g., Bernoulli random variables, if statements and other control flow).For example, hybrid models arise in applications such as medical diagnosis, gene expression and cyber-physical systems [Chen et al. 2020;Lee and Seshia 2017].
Unfortunately, hybrid programs are not well supported by today's PPLs.The primary analysis task in probabilistic programming languages is probabilistic inference, computing the probability that an event occurs according to the distribution defined by the program.Existing inference algorithms employ forms of sampling to perform approximate inference.Some approaches, notably Hamiltonian Monte Carlo, used in the PPLs Pyro and Stan [Bingham et al. 2019;Gorinova et al. 2021], do not support discrete random variables, instead requiring them to be (manually or automatically) marginalized out.However, this approach has numerous fatal cases that explode exponentially Authors' addresses: Poorva Garg, University of California, Los Angeles, USA, poorvagarg@cs.ucla.edu;Steven Holtzen, Northeastern University, Boston, USA, s.holtzen@northeastern.edu;Guy Van den Broeck, University of California, Los Angeles, USA, guyvdb@cs.ucla.edu;Todd Millstein, University of California, Los Angeles, USA, todd@cs.ucla.edu.
inference [Chavira and Darwiche 2008;Chavira et al. 2006;De Raedt et al. 2007;Fierens et al. 2015;Holtzen et al. 2020], which reduces inference to weighted model counting on a boolean formula, has this property for bit blasted mixed-gamma distributions.Therefore, the process of bit blasting a mixed-gamma distribution followed by discrete inference via knowledge compilation is guaranteed to take time that is polynomial in the bitwidth used for discretization.
Third, we have used the above theoretical results to design a new probabilistic programming system called HyBit that performs non-stochastic approximate inference for hybrid probabilistic programs via bit blasting.HyBit is a discrete probabilistic language that includes support for fixed-point binary numbers and associated arithmetic operations, and it is implemented as an embedded domain specific language in Julia [Bezanson et al. 2017].The HyBit API allows users to produce bit blasted fixed-point approximations of arbitrary continuous distributions.Mixedgamma distributions can be represented by their sound bit blasted discrete distributions.For other distributions the API enables users to employ piece-wise discrete approximations, where each piece is itself a bit blasted mixed-gamma distribution.
HyBit leverages knowledge compilation to perform exact inference on the given discrete probabilistic program.In the worst case, the inference for an arbitrary hybrid probabilistic program via bit blasting can be exponential in the number of bits, but we empirically demonstrate the benefits of our approach.We show that HyBit performs better than existing sampling-based and symbolic inference approaches on a comprehensive benchmark suite of hybrid probabilistic programs.
Overall, we present the following contributions in this paper: (1) We motivate the challenges for inference on hybrid probabilistic programs in Section 2.
(2) We present a new form of discretization called bit blasting that is characterized by its succinctness in the number of discrete intervals in Section 3. (3) We present a class of continuous distributions, namely mixed-gamma distributions, for which a sound bit blasted representation exists.We formalize this construction and prove its properties in Section 3. We further prove that knowledge compilation based inference scales polynomially in the bitwidth for these distributions.(4) We describe the HyBit PPL and its new inference algorithm via bit blasting in Section 4.
(5) In Section 5, we empirically compare HyBit with other PPLs on benchmarks obtained from the existing literature.We also characterize the behavior of HyBit with respect to its hyperparameters, i.e. number of bits and pieces.
HyBit is available at https://github.com/Tractables/Dice.jl/tree/hybit.The full version of this paper is available on arXiv as [Garg et al. 2023], which contains full proofs.

MOTIVATING EXAMPLES
This section motivates the challenge of inference for hybrid probabilistic programs using three examples.First, we present an example from computational biology with inherent logical structure.
Next, we show an example from the literature with a multimodal posterior arising due to discrete control flow.Finally, we show an example of low probability observations through conjugate Gaussians.We then investigate the performance of various inference algorithms, including HyBit.
The examples demonstrate the advantages of HyBit over other approximate inference algorithms.
We provide a detailed comparison with several approximate and exact baselines in Section 5.

Logical Structure
We present a simplified example from computational biology which relates genetic expression with blood sugar levels.Figure 1 shows the probabilistic program, where the task is to get the updated belief of a gene's occurrence in a patient given their blood sugar levels.
The first four lines of the probabilistic program in Figure 1 use a beta distribution as the prior probability of each of genes occurring in the general population.The syntax flip( ) denotes a Bernoulli random variable with success probability .On line 5, the program uses the syntax reduce(|, gene) to denote the expression =1 gene [i].In other words, the patient is considered to have diabetes if at least one of the genes is expressed.What follows is multiple readings of the patient's blood sugar level.For each reading a random variable first defines the blood sugar depending on whether they have diabetes.Then we use the syntax observe ( , ) to condition on the random variable having the value -in the program this is used to condition on actual blood-sugar readings from the patient.Finally, on line 12, the program queries for the expectation of the posterior distribution of the occurrence of the first gene.Figure 2 shows the results of inference with a timeout of 20 minutes on this program using different inference algorithms, as the number of genes (T) increases.Stan uses Hamiltonian Monte Carlo, which does not directly support discrete random variables.Instead, they are marginalized out, either manually or automatically using variable elimination [Gorinova et al. 2020].As shown in the figure, Stan times out when there are more than 15 genes -as its strategy scales exponentially in T. The same issue of exponential blowup plagues GuBPI [Beutner et al. 2022], which employs a combination of symbolic evaluation and discretization to find upper and lower bounds.As the figure shows, the universal sampling methods MCMC with a Metropolis Hastings kernel (WebPPL MH) and Sequential Monte Carlo (WebPPL SMC) can scale and provide reasonable accuracy.Exact inference strategy Psi [Gehr et al. 2016] also scales well with the number of discrete random variables when the program is written to avoid a large discrete state space.AQUA discretizes hybrid programs [Huang et al. 2021] but does not support this program.Our system and approach, HyBit, scales to 50 genes and has the least absolute error among approximate inference algorithms.The program, when written in HyBit, is represented by its discrete bit-level abstraction.The user, while writing the program can employ the HyBit API to replace all continuous distributions (specifically on Lines 2, 6, 9 in Figure 1) with their bit blasted discrete approximations.As a result, we now have a discrete program with distributions over Boolean and fixed-point numbers.Note that now observe on Line 8 conditions on the fact that the value of the discrete distribution for blood_sugar1 lies in the interval corresponding to the value 79.
In the experiment shown we use a bitwidth of 25 -each continuous distribution is discretized as a program over a tuple of 25 bits interpreted as a fixed-point number.Further, we approximate these continuous distribution using 4096 pieces, each of which is a bit blasted exponential distribution.
Naive discretization with 25 bits would be prohibitively slow, as it yields 2 25 intervals i.e. 134M intervals.However, our bit blasted program only uses 53K coin flips (Boolean random variables) to represent them.Moreover, knowledge compilation based inference [Fierens et al. 2015;Holtzen et al. 2020] automatically identifies and exploits conditional independences in the program's logical structure and helps to scale inference.More details of this experiment can be found in the appendix.

Handling Multimodality
This section presents an example of a multimodal distribution to highlight another challenge for inference on hybrid probabilistic programs.Multimodal distributions have multiple peaks separated by low probability regions.These distributions commonly emerge in various applications such as sensor network localization, cosmology and many more [Shaw et al. 2007;Tak et al. 2018].We adapt an example from the existing literature [Yao et al. 2021], as shown in Figure 3.The probabilistic program shown in Figure 3 is hard for existing probabilistic inference approaches.The datapts on Line 3 has nine entries, with two-thirds being 5 and the other one-third being −5.This leads to the posterior for ( 1 , 2 ) being bimodal around (5, −5) and (−5, 5).As the number of data points increases with the same proportion, the posterior of ( 1 , 2 ) converges to (5, −5).However, in the presence of limited data points, the posterior for 1 is bimodal around 5 and −5.
The existence of multiple modes challenges sampling-based algorithms as they tend to get stuck in one of the modes.Specifically, WebPPL using MCMC with a Metropolis Hastings kernel and both Stan and WebPPL using HMC end up arbitrarily in one of the modes and fail to explore the other mode.Figures 4a and 4b show the results obtained using WebPPL MCMC and Stan HMC respectively, where two different runs get stuck in two different modes.On the other hand, HyBit performs exact inference on its discrete abstraction and so explores the distribution globally, allowing it to identify both modes.Sequential Monte Carlo (SMC) (Figure 4c), Psi and GuBPI also explore the distribution globally and thus, are able to find both modes.Finally, to address the computational challenge of direct discretization, AQUA adapts its discretizing intervals to focus on high probability regions, and ends up only identifying the higher probability mode (Figure 4d).This section presents conjugate Gaussians (Figure 5) with low probability observations.In Figure 5, the posterior distribution for random variable mu is queried after conditioning on low probability data on Lines 2 and 3.

Handling Low Probability Observations
Why are low probability observations hard?Intuitively, general-purpose sampling algorithms begin sampling from the prior distribution and struggle to find samples with considerable weight.Only after a very large number of samples do these algorithms manage to sample from the true posterior.On the other hand, HyBit explores the domain of the posterior distribution globally, via exact inference on a bit blasted abstraction, and so it is insensitive to this issue.Figure 6 plots the true prior and posterior distributions along with results from different inference algorithms.For the sampling-based algorithms MCMC with a Metropolis Hastings kernel and SMC, we obtained and plotted 1000 samples after running the corresponding WebPPL [Goodman and Stuhlmüller 2014] program.The importance sampling algorithm was not able to obtain any sample with nonnegligible weight for this program.The samplers are shifting the posterior towards the true posterior distribution but require many more samples to achieve that.Even after sampling about 16M and 65K samples respectively, the expectation of the samples obtained from MCMC and SMC have absolute error of 0.549798 and 1.520776.GuBPI reports upper and lower bounds on the probability for each interval and incurs an absolute error of 2.33.On the other hand, the posterior distribution from HyBit overlaps perfectly with the ground truth.Stan HMC handles low probability observations well and obtains high accuracy.Psi also obtains the exact symbolic expression for the posterior distribution.Finally, the mean of AQUA's reported posterior had an error of 5.66 as it fails to make any update to the prior of .

BIT BLASTING: KEY INSIGHTS
To scale inference on hybrid probabilistic programs with respect to discrete structure, we need an algorithm that treats discreteness as first class, and that discretizes away continuous structure.This section defines the semantic notion of bit blasting and sets it up as a special case of discretization with desirable properties.Then, we provide bit blasting functions for common classes of continuous distributions.We provide discretization techniques that are sound (accurate up to bits), succinct, and amenable to efficient inference.

Discretization and Bit Blasting
In the standard terminology of probability theory [Rosenthal 2006], a probability space (Ω, Σ, ) consists of a sample space Ω, a -algebra on Ω denoted Σ, and a probability measure on Σ denoted .In a general sense, a discretization function takes as input such a probability space (Ω, Σ, ) and outputs a discrete probability space (Ω , Σ , ) where Ω is a countable set.
We will study a more specific notion of discretization: one that takes as input a continuous distribution over a finite interval and outputs a discrete distribution over 2 points for some number of bits .Formally, let [ , ) be an interval with , ∈ R. For the input probability space, we use B ([ , )) to denote the Borel -algebra of subsets of interval [ , ) [Rosenthal 2006].For the output probability space, we write P ( ) to refer to the power set (a -algebra) of set .Moreover, we will assume the sample space to be discretized as follows.
We are now ready to define the notion of discretization function used in this paper.
As the example shows, one can come up with any arbitrary -bit discretization function.To qualify them further, we need a notion of accuracy -soundness up to bits defined as follows.
Before we define a -bit blasting function, we need to fix a generic representation of discrete probability distributions.To that purpose, we define the concept of a discrete probabilistic closure, akin to probabilistic Turing machines [Arora and Barak 2006].
Each probabilistic closure is a deterministic function from a set of biased coin flips to a discrete set.This induces a probability distribution on the output of the function through probabilities associated with coin flips.It also consists of an accepting Boolean formula that handles observations and limits the set of values that input flips can take.We provide a formal definition below: Definition 4 (discrete probabilistic closure).A discrete probabilistic closure is defined as the tuple ( , , ) where is a vector of Boolean random variables (or biased coin flips), is a deterministic function from {T, F} | | to a finitely countable set S and is a deterministic Boolean formula over variables in .
The semantics of a discrete probabilistic closure, i.e. ⟨( , , )⟩ defines a probability space ( , P ( ), ) such that Example 3. From Example 2, consider = {0 → 0.25, 0.25 → 0.25, 0.5 → 0.25, 0.75 → 0.25}.can be represented using a discrete probabilistic closure using the function naïve_unif, the weight function and the accepting Boolean formula as follows: Now, one can calculate the probability that naïve_unif returns 0.5 as follows: Dice [Holtzen et al. 2020] and Problog [Fierens et al. 2015] are examples of PPLs that directly fit into the paradigm of a discrete probabilistic closure.
We want a discrete probabilistic closure to be more succinct -of size polynomial in the number of bits of precision, .To this purpose, we define a bit blasting function.
Definition 5 ( -bit blasting function).A -bit blasting function [.] is a -bit discretization function that outputs a discrete probabilistic closure ( , , ) that uses a number of Boolean random variables that is polynomial in the number of bits , that is, | | ∈ (poly( )) It follows from Definitions 3, 4 and 5 that for an integer > 0, a -bit blasting function is sound for a given probability space Example 4. Again, consider the result of the sound 2-bit discretization function from Example 2, that is = {0 → 0.25, 0.25 → 0.25, 0.5 → 0.25, 0.75 → 0.25}.Measure can alternatively be represented using a discrete probabilistic closure using the function bitblast_unif, the weight function and the accepting Boolean formula as follows: Note that in bitblast_unif, 0 and 1 are being used as bits in the binary representation of the variable val.Now, one can calculate the probability that bitblast_unif returns 0.5 as follows: We would like to point out that bitblast_unif uses only 2 coin flips to represent a distribution 1 function bitblast_unif( 0 , 1 ) over 4 values.It can be generalized to use coin flips to represent a uniform distribution over 2 values.On the other hand, naïve_unif uses 3 coin flips and would generalize to use 2 − 1 coin flips.Hence, bitblast_unif is a viable output for a -bit blasting function.In the next section, we provide details for the -bit blasting functions for exponential and mixed-gamma distributions.

Concrete Bit Blasting Function: Preliminaries
Next, our goal is to provide a concrete instantiation of a sound bit blasting function for mixed gamma distributions, which have probability density functions as defined below.
Definition 6 (generalized-gamma density).Given parameters ∈ Z + and ∈ R, a generalizedgamma density , is a probability density function over the interval [0, 1) of the form . Definition 7 (mixed-gamma density).Given a collection of ∈ Z + generalized-gamma densities , with their associated weights ∈ [0, 1] such that =1 = 1, a mixed-gamma density Υ is a probability density function over the interval [0, 1) of the form For notational convenience, we confine the continuous distributions to the unit interval to get discrete distributions over a -bit unit interval.We generalize our approach to any finite interval for building the probabilistic programming system HyBit based on bit blasting.
To describe our construction of the bit blasting function, we make use of Dice [Holtzen et al. 2020].Dice already compiles its programs to weighted Boolean formulas (via the ⇝ judgement) that fit the definition of a discrete probabilistic closure. 3This allows us to only define a ⇝ judgment from probability density functions to Dice programs to specify a bit blasting function.Dice also defines a distributional semantics function .
: p → → [0, 1] that takes as input a Dice program p and outputs a normalized probability distribution (described as a function from the set of values to probabilities).We use the function .to argue about soundness of our construction later.More details about syntax and semantics of Dice can be found in the appendix.
The compilation judgement for mixed-gamma densities are of the form Υ ⇝ p where p are Dice programs.We further provide the following definitions for -equivalence between densities and Dice programs and -succinctness of Dice programs.
Definition 10.The flip count function flip_count (.) takes as input a Dice program p and outputs the number of Boolean random variables (coin flips).
We define -succinctness such that the Dice program p employs coin flips linear in the number of bits .Observe that -succinctness imposes a stricter condition than that required by a -bit blasting function (which requires poly( ) coin flips).This implies that if we have a -succinct judgment for a mixed-gamma density, then we can have a -bit blasting function for that distribution.We describe this in more detail later.

Judgment ⇝ and bit blasting
This section describes the rules for the judgement ⇝ .We first describe the rules for an exponential distribution and then move on to generalized gamma distributions.Finally, we describe the rule for mixed-gamma densities.For each of the rules Υ ⇝ p, we prove the -equivalence of the density Υ and the Dice program p and the -succinctness of p. Detailed proofs can be found in the appendix.We also describe how ⇝ allows us to construct a bit blasting function for mixed-gamma densities.

Exponential
Distribution, 0, .Let us first consider the uniform distribution ( 0,0 ), a special case of an exponential distribution.If we bit blast a uniform distribution using bits into 2 intervals, we end up with a discrete distribution over [0, 1] with 2 discrete points each having probability 1 2 .A straightforward discretization strategy enumerates 2 values using 2 − 1 coin flips.But the same can be achieved using a tuple of bits, where each bit is an unbiased coin flip(0.5).
The strategy to bit blast a uniform distribution using its binary representation works because of the independence between binary digits.The same strategy can be extended to general exponential distributions as well.This fact was shown in a classic paper in the statistics literature [Marsaglia 1971].We formalize that idea using the following rule.Lemma 1. ∀ ∈ Z + , ∈ R, p, if 0, ⇝ p, then 0, and p are -equivalent.
We provide detailed proofs of the above lemmas in the appendix.For rest of the paper, we consider exponential distributions as primitives to build other distributions, since these are the only ones that enjoy the property of independent bits.But, sound bit blasting functions are still possible for other distributions, as we show next.

Gamma Distribution 1, .
To come up with a sound bit blasting function for 1, , we present a key mathematical insight.Consider the program in Figure 8a.Continuous random variables X and Y have a uniform ( 0,0 ) and exponential ( 0, ) distribution respectively.It returns the new distribution of X after conditioning on the inequality Y < X.It turns out that the posterior distribution is a specific gamma distribution 1, .We show the resulting calculation below, where pdf refers to the probability density function.
What happens if we discretize the program in Figure 8a using bits?We get the program in Figure 8b where each continuous random variable has been replaced with its bit blasted counterpart (X replace by X and so on).We have already seen that for uniform and exponential distributions, -equivalent Dice programs exist.But what about the other constructs?As Figure 8d demonstrates, observe (Y < X ) incurs error over its continuous counterpart observe (X < Y).The good news is that we can account for the error as shown by the following equations: The correction term in the above equations when computed algebraically turns out to be proportional to the exponential density ( 0, ).So now, the resulting discrete probabilistic program after correction looks as shown in Figure 8c where = Pr ( == | < ).The rule Expo1 and Trans-expo1zero (in the appendix) captures the above intuition.Here, unifObs(y, b) = p is a helper judgment where p constructs a uniform distribution that it conditions through observe on being less than y.fresh y 1 , y 2 , y 3 ≠ 0 We prove the -equivalence and -succinctnesss.The proof for -equivalence is much more involved but it is easy to see that the Dice program in the above rule uses 3 + 1 coin flips: coin flips in each occurrence of p 1 , coin flips in p 3 and 1 coin flip in the if then else guard to create a mixture.

Generalized Gamma Distribution
, .The previous subsection shows how conditioning on the inequality (Y < X) introduces a linear factor to 0, to obtain 1, .Note that conditioning on (Y < X) introduces a linear factor regardless of what initial probability density X had.That is, if X's initial probability density was ( ), the resulting density of the following probabilistic program would be ( ).This implies that if we can bit blast ( ), we can bit blast ( ).We still need to account for the error incurred by observe (Y < X ) i.e.Pr( | == , < ).It turns out that the correction term is a mixture of gamma distributions which can be bit blasted soundly as well.We provide the judgement rule and proof of the following lemma in the appendix.

Mixture of Gamma Distributions
, .Since generalized gamma densities , can be bit blasted, mixed gamma densities can be bit blasted as well.One bit blasts each individual generalized gamma density and creates their mixture using if then else constructs as follows.
fresh y 1 , y 2 , y 3 > 1 let y 1 = flip( ) in let y 2 = p 1 in let y 3 = p 2 in if y 1 then y 2 else y 3 (Trans-mix) We prove the following theorems with details in the appendix.
Theorem 8. ⇝ is -succinct for all mixed-gamma densities Υ Now that we have specified the rules for judgment ⇝ , we specifically define a sound bit blasting function for all mixed-gamma densities Υ, that is ⇝ • ⇝ .

Theorem 9. ⇝ • ⇝ is a sound -bit blasting function
Earlier work [Holtzen et al. 2020] defines the judgment ⇝ that takes as input a Dice program and outputs a weighted Boolean formula ( , , ) that aligns with Definition 4 of a discrete probabilistic closure.And since ⇝ is -succinct for all mixed-gamma densities by Theorem 8, ⇝ always outputs a with poly( ) coin flips.Thus, ⇝ • ⇝ is a -bit blasting function.Earlier work [Holtzen et al. 2020] also proves the correctness of compilation to weighted Boolean formula with respect to the semantics of the Dice program.This fact combined with Theorem 7 concludes that ⇝ • ⇝ is a sound -bit discretization function.Detailed proofs can be found in the appendix.

3.3.5
Example: Laplace Distribution.Previous sections described how mixed gamma densities can be bit blasted when they are confined to a unit interval.But how can distributions that are shifted or scaled to other finite intervals be bit blasted?We explain it through the example of a Laplace distribution.A Laplace distribution has two parameters: location ( ) and scale ( ) and has the probability density function as described below where ∈ R. Let us consider the Laplace distribution truncated at the interval [ − , + ).We assume that would be a suitable power of 2 allowing product with (denoted by × p) to be just a decimal shift and to be a -bit representable number allowing precise shifting of p.First we generate the exponentials scaled for an interval of width instead of width 1: 0,− ⇝ p 1 0, ⇝ p 2 And then we create a mixture of them: Since p 1 and p 2 use coin flips each, the program shown above uses 2 coin flips and ⇝ • ⇝ is a sound bit blasting function Laplace distributions as well.

How does bit blasting help in inference?
We have demonstrated sound bit blasting functions for mixed gamma distributions.But how does that help in inference for probabilistic programs with these distributions?To answer this question, we focus on a particular inference strategy -that is knowledge compilation.We first describe the necessary preliminaries about knowledge compilation and then argue about how the programs obtained through ⇝ are efficient for knowledge compilation.
Knowledge compilation based approaches [Fierens et al. 2015;Holtzen et al. 2020] for exact discrete probabilistic inference compile discrete probabilistic programs into weighted Boolean formulas that are represented using (reduced) ordered binary decision diagrams (OBDDs).These OBDDs are single rooted in case of a single Boolean random variable being returned and multirooted in case of a tuple of Boolean random variables being returned.By fixing a value for all the flips (biased coin flips with associated weights) in the program, and by traversing the OBDD following those values (solid line for true, dashed for false), we reach the terminal corresponding to the value of each bit.The operation of weighted model counting computes the probability of reaching the 1-terminal for each bit in the returned value.It is a dynamic programming algorithm that runs in time linear in the OBDD size.The size of an OBDD for a Boolean formula , denoted OBDD( ), is the number of nodes in the OBDD.Thus, if we can obtain a smaller OBDD representation for a distribution, we can efficiently compose it with other constructs in a discrete probabilistic program.
We discuss and prove formally how every program obtained through the judgement ⇝ compiles into a weighted Boolean formula that compiles to a multi-rooted OBDD that grows linearly in the number of bits as opposed to the worst case exponentially.Recall that Dice programs compile to weighted Boolean formula ( , , ) where is the (tuple of) Boolean formulae corresponding to the return value of the program, is the accepting Boolean formula to encode observations and is the weight function with flip probabilities.
Theorem 10. ∀Υ, p, , , , ∃ , ∀ , if Υ ⇝ p and p ⇝ ( , , ), then there exists a variable order Π of Boolean random variables in such that OBDD( ) + OBDD( ) ≤ We now provide intuition for the proof of the above theorem.Note that in the programs obtained through the judgement ⇝ , there are only two constructs that depend on the number of bits: (1) construction of exponential distribution 0, , and (2) conditioning on inequality between an exponential and a uniform distribution through unifObs(y, b).We provide intuition how the OBDD size for these constructs increase linearly in the number of bits, .

Exponential Distribution.
In Figure 9, we 3-bit blast an exponential distribution, i.e. we have a discrete exponential distribution over 8 values (0, 0.125, 0.25, . . ., 0.875).The figure shows a 3-rooted OBDD where each root labeled as 1 , 2 and 3 represents a bit in the returned value of 3 bits.Consider that we fix the value of the flip corresponding to the node 1 to be true, then for 1 , we would reach the terminal 1 and its value would be assigned 1.The operation of weighted model counting (WMC) would calculate the probability of 1 to be 1 as 6.14 −6 as node 1 has probability 6.14 −6 to be true.Similarly WMC can be used for other roots of this OBDD.Since each bit needs one OBDD node, the overall OBDD size grows linearly with the number of bits.Since WMC runs in time linear in the OBDD size, probabilistic inference for an exponential distribution would run linear in the number of bits O ( ).Another example of OBDD for a uniform distribution can be found in the appendix.For all programs p obtained through the judgement ⇝ , if p ⇝ ( , , ), then is the tuple of Boolean formulae representing the return value of the program.We argue that the return value of p is always a mixture of exponential distributions making its OBDD size linear in the number of bits.

Conditioning on inequality between an exponential and a uniform distribution.
The rules for judgment ⇝ use the helper judgment unifObs(y, b) = p to condition on an inequality between binary representations of a uniform distribution and an exponential distribution.
Definition 13 (ineqality function).A -bit inequality function, LT : {0, 1} × {0, 1} → {0, 1} takes as input two -bit numbers , and outputs 1 if < and 0 otherwise.Thus, We prove the following lemma which states that the OBDD size for the inequality function grows linearly with the number of bits.
Since the only constructs that depend on the number of bits (exponential distributions and inequalities) grow linearly with the number of bits, Theorem 10 holds intuitively.We provide a formal proof in the appendix.

HYBIT: A PROBABILISTIC PROGRAMMING SYSTEM
The previous section described how to bit blast mixed-gamma distributions.We further use it to build a probabilistic program system HyBit for hybrid probabilistic programs.This section describes its syntax and implementation and elaborates on two important aspects: piece-wise approximations of continuous distributions and advantages of a binary representation.

HyBit -Syntax and Implementation Details
We build a probabilistic programming system HyBit around sound bit blasting of mixed-gamma densities and approximate bit blasting of other continuous distributions.HyBit has been implemented as a shallow embedded domain specific language in Julia [Bezanson et al. 2017].
The core syntax of HyBit expressions is given in Figure 10.It provides support for distributions over Booleans (flip ) and fixed-point numbers (general_gamma and bitblast).It supports Boolean operations (¬, ∧, ∨) and arithmetic operations (+, -, *, /, %, <, ==) over these distributions as well as hard observations for probabilistic conditioning (observe).For all the constructs in Figure 10, HyBit performs a non-standard execution and compiles them to OBDDs to perform probabilistic inference.Since HyBit has been implemented as a library in Julia, HyBit programmers can also make use of Julia constructs such as (bounded) loops, tuples and functions.As an example, a for loop from Julia can be used with HyBit constructs in the loop body to build a probabilistic model.HyBit is available as an open source repository with a comprehensive set of examples. 4.
Figure 11 contains more details on the API of HyBit.DistFix{W, F} is the type of fixed point numbers of bitwidth W with F bits after the binary point.The function general-gamma performs sound bit blasting of a specified generalized gamma density to the given fixed-point W and F. Sound bit blasting of mixed-gamma densities is achieved by using the if-then-else construct over generalized gamma densities.The function bitblast is used for bit blasting any arbitrary continuous distribution using piece-wise approximation, employing the bits and pieces specified using the parameters W, F, and pieces.The API also allows the user to choose the type of discrete distribution − linear or exponential − for the piece-wise approximation.The parameters of linear (slope) and exponential ( ) pieces are automatically chosen such that the ratio of probabilities of the first and last interval is the same as that for the naïve discretization.Finally, the API also provides functions for querying the probability distribution, expectation and the variance of a random variable.The next subsections describe piece-wise approximations and the computation of expectation and variance in more detail.

Piece-wise Approximations
Even though mixed-gamma distributions capture many natural distributions, there are other commonly occurring ones, such as the Gaussian.It is an open problem as to whether Gaussians admit a sound bit blasting function, let alone one that compiles to a compact OBDD.For such distributions, one can instead use a piece-wise approximation.
Let be an arbitrary continuous probability distribution over the interval [ , ).To bit blast using a piece-wise distribution with pieces, C is approximated using a mixture of discrete probability distributions over disjoint intervals.For every piece, one creates a shifted and scaled instance of a bit blasted mixed-gamma density and then creates a mixture of them.Note that since each piece uses O ( ) coin flips, a piece-wise distribution with pieces uses O ( ) coin flips.Section 5 shows empirical advantages of this approach.This piece-wise approximation using linear or exponential pieces can be easily built using the bitblast API available in HyBit (Figure 11). Figure 12 shows bit blasting of a Gaussian distribution using 2, 4, 8 and 16 pieces, where each piece is a bit blasted exponential.This provides the user with the conventional trade off between accuracy and performance.We elaborate more on this in Section 5.2.

Advantages of the Binary Representation
The binary representation has important advantages for probabilistic reasoning beyond the succinctness that bit blasting provides.First, many hybrid probabilistic models involve arithmetic operation on continuous random variables.Since we use a binary representation of fixed point numbers, arithmetic operations such as +, *, /, < are compiled as Boolean formulas over binary numbers (similar to ALU circuits in architecture).This representation allows probabilistic inference (specifically the knowledge compilation approach that we employ) to identify and exploit the structure that exists in arithmetic, such as conditional independences among the resulting bits in a computation.Recent work [Cao et al. 2023] described this compilation and showed its advantage empirically for integers; HyBit leverages these advantages for computations over fixed-point numbers.
The binary representation also enables an optimized computation of expectation and variance.Naïve computation of expectation and variance for a distribution over 2 values requires one to compute probability of 2 values.Bitwise representation allows one to achieve this computation by only computing probability of bits which gives an exponential improvement.Note that in the worst case for an arbitrary hybrid probabilistic program, getting the corresponding OBDD for the binary representation can itself be exponential in the number of bits.But for the class of mixed-gamma distributions, this conversion is linear in the number of bits (Theorem 10).We formalize the computation of expectation and variance in the following two theorems and provide proofs in the appendix.
Theorem 12. Let D be a discrete probability distribution over the interval [0, 2 ) represented as a distribution over bits as ( , −1 , . . ., 1 ), then the expectation of D can be computed using linearity of expectation as follows: Theorem 13.Let D be a discrete probability distribution over the interval [0, 2 ) represented as a distribution over bits as ( , −1 , . . ., 1 ), then the variance of D can be computed as follows: Example.Consider a discrete uniform distribution 4 over the integers {0, 1, 2, 3} represented using two bits, ( 2 , 1 ).The direct way of calculating the expectation and variance of this uniform distribution requires inferring the probability of all the integers in the domain.But Theorems 12 and 13 allow us to compute these quantities by using only the probabilities of the individual bits.
Figure 13 empirically shows the performance benefits in computing expectation and variance of a distribution as we increase the number of bits in bit blasting a standard normal distribution.

EMPIRICAL EVALUATION
We evaluate the practicality of bit blasting on real-life probabilistic programs.We have carried out relevant experiments to investigate the following questions: Q1: How does HyBit perform in comparison to existing inference algorithms?Section 5.1 Q2: How effective is the piece-wise approximation?Section 5.2

Comparison with existing inference algorithms
5.1.1Approximate Inference Algorithms.We evaluate HyBit against two classes of approximate inference algorithms.Sampling Methods We compare against WebPPL rejection sampling, MCMC sampling (with a Metropolis Hastings kernel), SMC sampling and Stan HMC as representatives of this class.
Discretization Methods We compare against AQUA and GuBPI in this class of inference algorithms [Beutner et al. 2022;Huang et al. 2021].
Comparing performance of different probabilistic programming systems is a challenging task since performance is directly affected by the structure of the program.We write equivalent programs for these benchmarks in each system and put in our best effort to optimize them.The tables in this section and subsequent sections report the mean value of absolute error over 10 runs for stochastic algorithms.For other inference algorithms, we report output of a single run.All experiments were single-threaded and were carried out on a server with 2.4 GHz CPU and 512 GB RAM.
Table 2 reports the results of performance evaluation of HyBit against other approximate inference algorithms.We take all the hybrid and continuous benchmarks on which Psi [Gehr et al. 2016] was evaluated and a few more relevant benchmarks from existing work [Huang et al. 2021].We put in our best effort to compute ground truth for these benchmarks either analytically or using computer algebra systems.We include only those benchmarks in our evaluation for which we were able to compute the ground truth reliably.We report the absolute error with respect to the ground truth for all the benchmarks.For benchmarks that returned a non-boolean value, we compute the absolute error of expectation for each of the approaches.We report the minimum error achieved by an inference algorithm within a timeout of 20 minutes.
For all the benchmarks, HyBit replaced mixed-gamma distributions with their sound bit blasted distribution and all other distributions with their linear piece-wise approximation 1,0 .The employed bits and pieces for each benchmark are reported in Table 2. To run Stan on these benchmarks, we make use of SlicStan [Gorinova et al. 2020] to get the Stan program with marginalized discrete random variables.For all WebPPL baselines, default settings were used for all the sampling algorithms with maximum number of samples within 20 minutes.As Table 2 shows, HyBit with bit blasting is comparable with the existing approaches on all the benchmarks, even better on 11/19 of them.For the other 8 benchmarks, HyBit achieves a very close accuracy.AQUA performs better on only four of the benchmarks and GuBPI fails to obtain good accuracy.This is primarily because their enumerative discretization does not scale well for higher precision.WebPPL and Stan (equipped with automated marginalization through SlicStan) support most of the benchmarks but do not achieve good accuracy within the threshold time.This is because sampling based algorithms are stochastic and cannot obtain sufficiently many samples from the true posterior in limited time.3 compares HyBit against a probabilistic programming system that performs exact inference using algebraic methods i.e.Psi [Gehr et al. 2016].We put in our best effort to translate the benchmarks for optimal performance in Psi.It would often output a symbolic expression which we would feed to Mathematica for further simplification.Computing and simplifying these algebraic expressions is not a trivial task and hence, Psi timed out on 6 of these benchmarks and Mathematica failed to simplify 4 of these benchmarks.HyBit works for all 19 benchmarks as it reduces the computation to discrete inference on Boolean random variables and approximates the inference query.

How effective is piece-wise approximation?
We analyze the tradeoff between performance and accuracy when using different numbers of pieces to approximate the continuous distribution.Figure 14 demonstrates the trends of runtime and accuracy with the increase in pieces for different bitwidths for four benchmarks.As the number of linear pieces increases, runtime tends to first decrease and then increase.As the number of pieces increases, the accuracy tends to improve as shown by the lower four plots.This is because as we increase the number of pieces, continuous distributions are replaced with more accurate bit blasted distributions.That accuracy improvement comes at the cost of increased runtime after a certain sweet spot.The appendix provides additional experiments that also justify the usage of piece-wise approximations over an approach based on the central limit theorem.

RELATED WORK
Probabilistic programming has been an an active area of research both from the perspective of semantics and inference [Dahlqvist et al. 2023;Milch et al. 2005].This section positions HyBit with respect related work.At a high level, the key distinction in HyBit is the development of bit blasting for succinct discretization of hybrid probabilistic programs.
Discretization approaches.Prior approaches that discretize continuous or hybrid probabilistic programs estimate the posterior by exhaustively enumerating all of the discretized values [Huang et al. 2021], which does not scale to provide sufficient accuracy in many cases.One prior discretization technique also employs a bit representation [Claret et al. 2013].However, their approach is not a form of bit blasting, since it is not succinct and still in general produces a representation that is proportional to the number of discretized points.Finally, a recent approach uses discretization to produce upper and lower bounds on the posterior of a probabilistic program [Beutner et al. 2022].
Inference algorithms for hybrid probabilistic programs.Other research specifically targets hybrid probabilistic programs.Leios [Laurel and Misailovic 2020] continualizes the hybrid probabilistic program in order to harness the power of existing continuous inference algorithms.HyBit, on the other hand discretizes the hybrid programs which helps in scaling inference for hybrid programs specifically with respect to the discrete structure.SPPL supports hybrid programs by translating them to specific representations for inference [Saad et al. 2021].However, these representations constrain the hybrid programs that can be supported.For instance, SPPL does not support arithmetic on continuous random variables while HyBit can.Finally, probabilistic logic programming languages have been extended to support hybrid models using interval traces [Gutmann et al. 2011].
Algebraic approaches.Some PPL inference algorithms produce closed form algebraic expressions to encode probability distributions and then use symbolic techniques to perform exact inference [Gehr et al. 2016;Hur et al. 2014;Narayanan et al. 2016].However, these systems are necessarily limited in their expressivity and the programs that they can handle, as shown in Table 3.
Path based inference algorithms.A common class of inference algorithms for PPLs are operational: they record traces on the program by using concrete values of the random variables.This includes sampling algorithms and variational approximations [Bingham et al. 2019;Carpenter et al. 2017;Chaganty et al. 2013;Dillon et al. 2017;Goodman et al. 2008;Hur et al. 2015;Kucukelbir et al. 2015;Mansinghka et al. 2013Mansinghka et al. , 2018;;Minka et al. 2014;Pfeffer 2007;Saad and Mansinghka 2016;Tristan et al. 2014;van de Meent et al. 2015;Wingate and Weber 2013;Wood et al. 2014].Sampling algorithms like rejection sampling and MCMC methods are universal but have known limitations such as difficulty in handling multi-modality and low-probability evidence, as described in Section 2.More sophisticated techniques like Hamiltonian Monte Carlo and variational approximation address these limitations but impose constraints of continuity and almost-everywhere differentiability, so they must resort to marginalizing out all discrete structure.
Use of a binary representation.Bit blasting has been a widespread technique in software verification, used in constraint solvers to reason about arithmetic using a bit representation [Bruttomesso and Sharygina 2009].Recent work in scaling inference for probabilistic programs over integers also employs a binary representation for numbers [Cao et al. 2023], in order to exploit conditional independences in that representation.The bit blasting in HyBit is inspired by these techniques but has a different purpose and hence a very different technical approach: to develop succinct, and in many cases provably sound, approximations of continuous probability distributions.

CONCLUSION AND FUTURE WORK
In this work, we motivated the need for new inference methods for hybrid probabilistic programs.We described bit blasting, whereby hybrid probabilistic programs are succinctly discretized and then analyzed using algorithms for discrete inference.We characterized a class of continuous densitiesmixed-gamma densities -for which bit blasting is not only succinct but also sound relative to an explicit discretization approach as well as provably efficient to analyze.We then presented a new PPL HyBit that employs a novel inference algorithm for hybrid programs based on bit blasting.We demonstrated the performance benefits of HyBit over existing approximate inference algorithms.
In future work, we hope to expand the class of distributions that can be bit blasted soundly.We plan to investigate how HyBit can be extended to support hierarchical Bayesian models.We plan to enhance its usability by not requiring user to specify the hyperparameters for every probabilistic program.We are also interested to explore the integration of HyBit with other inference approaches, to leverage their relative strengths for support of a wider range of hybrid probabilistic programs.

Figure 2 .
Figure 2. Scaling on Logical Constraints.HyBit scales to 50 genes with the least absolute error.The graph for Psi and HyBit overlap closely.

Figure 4 .
Figure 4. Posterior from different baselines compared with that of HyBit for Figure 3. HyBit and WebPPL SMC are able to identify both the modes in the posterior distribution.(a) Different runs of WebPPL MCMC with a Metropolis Hastings kernel converge to different modes.(b) Different runs of Stan HMC converge to different modes.(c) Different runs of WebPPL SMC are able to find both the modes.(d) AQUA with its adaptive interval strategy only finds the more probable mode

Figure 6 .
Figure 6.Results for Example 3. The HyBit estimated posterior overlaps closely with the true posterior distribution.

Figure 7 .
Figure 7. Commutative diagram for a -sound bit blasting function
Sound -bit blasting of .Prior densities for X and Y (shown in blue) when conditioned on < (shown in violet) returns the posterior for X (shown in orange).

Figure 8 .
Figure 8. Key insight in bit blasting probability density.[.] refers to discretization of a continuous density into 2 intervals and X refers to the discretization of X.
Figure10.Syntax for the core HyBit expressions.DistFix{ , } refers to the type of fixed-point numbers.The metavariable ranges over real numbers, over integers, over Booleans, over variable names, and over real numbers in the range [0, 1].The metavariable pdf ranges over continuous density functions pdf : R → R. op ranges over arithemtic operations with -arity.We explain the API DistFix, general_gamma and bitblast and their arguments in Figure11.
Figure 13.Speedup in computing expectation and variance

Figure 14 .
Figure 14.Runtime and Accuracy trends with respect to linear pieces for different bitwidths

Table 2 .
Comparison of HyBit against other approximate inference algorithms.Each row consists of one entry in bold indicating the lowest absolute error achieved among all inference algorithms.A '✗' denotes that the baseline does not support inference for the benchmark.A ' ' denotes timeout.A '∞' denotes infinite bounds.