Probabilistic Query Evaluation: The Combined FPRAS Landscape

We consider the problem of computing the probability of a query over a tuple-independent probabilistic database, known as theprobabilistic query evaluation (PQE) problem. The problem is well-known to be #¶-hard in data complexity for conjunctive queries in general, as well as for several subclasses of conjunctive queries. Existing approximation approaches for dealing with hard queries have centred on computing the lineage of the query over the database, which can be intractable for all but the smallest of queries due to the exponential dependence of the lineage size on the query length. In this paper, we take a first step towards bridging this gap, by showing how to construct a fully polynomial-time randomized approximation scheme (FPRAS) for the PQE problem for any class of self-join-free conjunctive queries of bounded hypertree width, that runs in time polynomial inboth the query length and database instance size. An interesting consequence of our result is the existence of classes of queries that are #¶-hard in data complexity to evaluate exactly, yet easy to approximate both in terms of query length and database size.


INTRODUCTION
A wide range of applications exist that require querying answers over structured datasets in the presence of uncertainty or imprecision.Consider, for example, knowledge extracted from text using an imperfect NLP system, or data collected from noisy sensors.Such information is often specified in a relational format suitable for use in a relational database system, but the standard relational model makes no provisions for modeling the uncertainty inherent in the data.Probabilistic databases have been proposed as a simple, principled formalism for filling this need [12,23].In this model, each fact appearing in an underlying relational database is annotated with an independent probability, which intuitively represents the probability of that fact's presence.
A substantial body of research exists studying the probabilistic query evaluation (PQE) problem, the canonical problem in the context of probabilistic databases.Given a query specified in some logical language, and a probabilistic database as described above, the goal is determine the likelihood of the query holding on a randomly sampled database, in which each row is included independently with the probability of its annotated value.Following the framework of data and query complexity introduced in the seminal work of Vardi [24], the problem of PQE has mainly been studied through the lens of data complexity (barring limited exceptions [4]), in which we are interested in time complexity with respect to the size of the database instance for a fixed query.The pioneering work of Dalvi and Suciu [8][9][10] eventually culminated in the well-known dichotomy theorem [11] for PQE of unions of conjunctive queries (UCQs).The dichotomy theorem states that, given a fixed UCQ, computing its probability over some input probabilistic database is either in FP or #P-hard, depending on the query in question.This dichotomy result was later extended beyond UCQs to the more general setting of homomorphism-closed queries, in the special case of probabilistic graphs (essentially probabilistic databases over schemas limited to binary relations) [2].
While extensive effort has been undertaken to classify queries by their tractability, the body of work studying how queries that are known to be intractable can be dealt with in practice has been more limited.Indeed, the primary approach, exact or otherwise, to dealing with queries that are #P-hard in data complexity has been to take the so-called intensional approach to PQE [20], which involves computing the lineage of the query over the database as a propositional formula, and computing the weighted model count of this formula (either exactly or approximately).Unfortunately, the size of this lineage can be exponential in the length of the query, thus rendering the intensional approach of limited practical utility for all but the smallest of queries.For example, evaluating a conjunctive query of only five atoms over a database with just a few hundred rows can yield a propositional DNF formula with over 10 12 (one trillion!)clauses, out of reach of even the most cutting-edge approximate model counters.Consequently, it is desirable to avoid the above intensional approach, and instead develop algorithms that can avoid exponential dependence on the query length.This raises the question:  [11] Open Note: The top two rightmost cells highlighted in bold indicate the contribution of this paper.The "Bounded HW?" and "Self-Join-Free?" columns, respectively, denote bounded hypertree width and self-join-freeness for all queries in the class.The "Safe?" column denotes the syntactic notion of safety for queries in the class, as defined by Dalvi and Suciu [11].
Can we design an approximation scheme with guarantees for the PQE problem, whose runtime is polynomial in both the query length and database size?Note that we likely cannot hope to get a fully polynomial-time approximation scheme (FPRAS) applicable to all conjunctive queries.Recall that Boolean conjunctive query evaluation on deterministic databases is NP-complete in combined complexity [7].If such an FPRAS for probabilistic query evaluation were possible, we could answer any Boolean conjunctive query on a deterministic database with high probability in polynomial combined complexity (just set all probabilities to 1), hence implying NP ⊆ BPP.
Therefore, we instead answer the question posed above in the affirmative for a large class of queries whose deterministic query evaluation problem is tractable.In particular, we propose an FPRAS for the PQE problem, applicable to any class of conjunctive queries of bounded hypertree width [18], so long as queries in the class do not contain repeated relation symbols (in other words, they are self-join-free).Crucially, our FPRAS runs in time polynomial in both the query length and database instance size, setting it apart from classical intensional approaches described above which suffer from an exponential dependence on the query length.Indeed, although we are not the first to study approximation for the PQE problem [14][15][16]22], so far no other techniques have been proposed that have both polynomial runtime in combined complexity for a wide class of queries, as well as rigorous guarantees on the quality of the probability computed.
Finally, it is worth remarking that the study of bounded hypertree width queries is motivated by the observation that conjunctive queries found in real-world benchmarks typically have very low hypertree width in practice (usually no more than 3) [17].Therefore, we believe that our approach could serve as a useful starting point for the development of practical, scalable algorithms for the probabilistic query evaluation problem.

Technical Contributions
The primary contribution of our work is to establish the following result.
Theorem 1.Let  be a self-join-free conjunctive query of bounded hypertree width, and  a probabilistic database instance.Then there exists an algorithm PQEEstimate such that, for all  ∈ (0, 1): with high probability.Moreover, PQEEstimate has runtime: An interesting consequence of the FPRAS presented here is the existence of classes of queries for which PQE is provably #P-hard even in data complexity alone, yet are tractable to approximate in both database size and query length.For example, consider the class 3Path = ∪  ≥3   of self-join-free path queries of length at least three: It is easy to check that every query in the class 3Path is nonhierarchical [11], which is known to be an equivalent condition to #P-hardness in data complexity for self-join-free conjunctive queries.Attempts to approximate its probability by computing its lineage as a weighted DNF formula are unlikely to succeed: the lineage of   over a database  expressed as a propositional formula can have size Θ(| |  ).However, path queries have bounded hypertree width-in fact, since they are acyclic they have hypertree width 1.The FPRAS given here therefore shows that the probability of any query in the class can be tractably approximated in a manner polynomial not only in terms of | |, but also , thereby eliminating the exponential dependence.We hence obtain the following corollary.
Corollary 1.There exists a class of queries C such that: (1) for an arbitrary  ∈ C, PQE for  on an input probabilistic database instance is #P-hard (2) for an arbitrary  ∈ C and probabilistic database instance  , approximating Pr  () to a (1±)-factor with high probability can be performed in time We contextualize our main result in Table 1, by placing our FPRAS among some existing tractability results for PQE.
Key Ideas.Our approach follows the spirit of seminal work by Kolaitis and Vardi [21] that connected two fundamental, yet seemingly unrelated problems: conjunctive query containment and constraint satisfaction.While conjunctive query containment and constraint satisfaction are decision problems, the PQE problem is a counting problem, and therefore, we must turn our attention to corresponding counting problems.In this work, we focus on uncovering a fundamental relationship between PQE and counting problems in the context of tree automata.In particular, we demonstrate that PQE for self-join-free conjunctive queries of bounded hypertree width can be reduced to counting the number of trees accepted by a non-deterministic finite tree automaton (NFTA).This NFTA is constructed from both the query and database instance together.We can then leverage a recent breakthrough FPRAS for counting these trees, that was originally designed to tackle the entirely separate problem of answer counting for (bounded hypertree width) conjunctive queries [6].Although some aspects of the reduction employed there are similar to the one here, the fact that the same approximation result can also be leveraged for PQE is not at all obvious.In the context of PQE there are new challenges that we must address: namely, dealing with the exponential number of subinstances possible, as well as incorporating individual fact probabilities.
At a high level, our procedure as it applies to counting subinstances of  that satisfy  (a special case of the PQE problem, known as uniform reliability [3,19]) is as follows.We take the hypertree decomposition of , which intuitively gives us an efficient evaluation plan for  on any database .We then note the first vertex in the hypertree decomposition that is a "covering vertex" for some atom in -essentially, this means that once we have reached that point in the decomposition, we have fixed our witness for that atom.Since  is self-join-free, we know that upon fixing that fact we are free to make any selection of the remaining non-witnessing facts for that relation in .Thus, we can design our tree automaton to accept all the possible traversals of the hypertree decomposition with assignments from facts in .A key point to note here is that even though the number of satisfying subinstances of  may be exponentially large, the number of witnesses in  of any atom in  is at most the size of , making such a construction feasible.
In order to extend this approach to the PQE problem in general, we also need a way to incorporate the fact probabilities into the reduction.To this end, we substitute every transition with a new automaton gadget through a construction based on binary comparators, in the process scaling the number of trees accepted proportional to the fact probabilities.We can then apply the aforementioned FPRAS to count the trees accepted by this NFTA, thereby yielding our desired result.
Organization.The rest of the paper is organized as follows: we review some technical background in Section 2. Next, in Section 3, we build some intuition on how probabilistic databases and automata are connected, by proving a simplified theorem pertaining to path queries on graphs that is a special case of our main result.Section 4 is split into two parts: in Section 4.1, we introduce augmented NFTAs, which form a syntactic building block we use in the main result as it applies to uniform reliability in Section 4.2.We then follow a similar approach in Section 5, by first introducing NFTAs with multipliers in Section 5.1, which we then use in proving the primary theorem of this paper for PQE in Section 5.2.We finally conclude in Section 6, and discuss some possible directions for future work.

PRELIMINARIES
We begin by reviewing some background on probabilistic databases, conjunctive queries, and automata.Probabilistic Databases.A relational schema  is a collection of relation names, each with an associated arity.We assume the existence of a countably infinite universe of constants  that serve as values that can appear in our databases.A database instance (or simply database)  over  is a finite set of facts of the form   ( 1 , . . .,   ), where   is some relation name in  with arity , and  1 , . . .,   ∈  .We define the size | | of a database instance as the number of facts appearing in it.A database instance  ′ is said to be a subinstance of  if  ′ ⊆ .
A probabilistic database instance  = (, ) is a database instance  equipped with a probability function  :  → [0, 1], mapping each fact in  to an independent probability label.For the sake of more easily formalizing the representation of , we assume in this paper that probability labels are rational (that is, The labelling  induces a probability distribution on the subinstances  ′ ⊆  as follows: Conjunctive Queries.We focus on (Boolean) conjunctive queries: existentially quantified constant-free first-order sentences comprising conjunctions of atoms, which we write in the form  =  1 ( 1 ), . . .,   (  ).The set of variables occurring in  is denoted by vars(), and the set of atoms by atoms().Similarly, for any atom  ∈ atoms(), vars() denotes the set of variables occurring in .The notation is likewise extended to sets of atoms: for a set of atoms  ⊆ atoms(), we define vars() = ∪ ∈ vars().We define the length of a query | | as the number of atoms it contains.If  contains no repeated relation names, then  is said to be selfjoin-free.A path query is a conjunctive query , comprising only binary atoms, of the form: We use the usual semantics to determine if a database instance  satisfies a conjunctive query , and write  |=  to indicate this.The probability of a query on a probabilistic database instance  = (, ) is: Pr Computing the probability of a query over a given probabilistic database instance is known as the probabilistic query evaluation (PQE) problem.For a conjunctive query  and database , we denote by UR(, ) the uniform reliability of  on : the number of subinstances of  that satisfy .Note that this is equivalent (up to a factor of 2 | | ) to computing Pr  (), in the special case where  is the probabilistic database comprising  equipped with uniform tuple probabilities of 0.5.

Strings and String Automata.
A string over an alphabet Σ is a sequence  1 . . .  of symbols, with each   ∈ Σ.We denote the empty string with no symbols by .We denote by Σ * the set of all strings over Σ.A set  of strings over Σ is said to be prefix-closed, if  •  ∈  implies  ∈  for any ,  ∈ Σ * , where  •  denotes the concatenation of  and .Further, denote by   the set of strings obtainable by concatenation of  strings selected from , and define  0 = {}.
A non-deterministic finite (string) automaton (NFA) is a tuple (, Σ, , ,  ) where  is a finite set of states, Σ is a finite alphabet of input symbols,  :  × Σ → 2  is a transition function,  ⊆  is a set of initial states, and  ⊆  is a set of accepting states.We define the size of an automaton M, denoted |M|, as the size of the encoding of its transition relation  over a suitable alphabet.We assume the standard semantics for deciding whether a string lies in the language L (M) of strings accepted by M. We denote by L  () the (finite) language of strings of length  accepted by .By the result in [5], we assume the existence of an FPRAS CountNFA for approximating |L  (M)|, running in time polynomial in  and |M |.
Trees and Tree Automata.For  ∈ N, a -tree (or simply tree) is a prefix-closed non-empty finite subset  ⊆ [] * .A path is a 1-tree.The root of a tree  is the empty string  ∈ , and the maximal elements of  under prefix order are called leaves.For ,  ∈ ,  is said to be a parent of , and The size of  is simply its cardinality | |.Given a finite alphabet Σ, we denote by Trees  [Σ] the language of -trees in which each node  ∈  is labelled with a symbol from Σ.We abuse notation slightly and write  () to denote the label of a node  ∈ .We then define a (top-down) non-deterministic finite tree automaton (NFTA) in the standard manner, as a tuple T = (, Σ, Δ,  init ), where  is a finite set of states, Σ is a finite alphabet of input symbols, Δ ⊆  ×Σ× (∪  =0   ) is the transition relation, and  init ∈  is the initial state.Without loss of generality, we also slightly abuse notation by allowing transitions of the form (, , ) for  ∈  and  ∈ ∪  =0   , noting that such an automaton can easily be converted to an equivalent one without -transitions using standard procedures.We sometimes refer to NFTAs as ordinary NFTAs, to emphasize their distinction from augmented NFTAs and NFTAs with multipliers defined later in the paper.Like for NFAs, the size of an NFTA T , denoted |T |, is defined as the size of the encoding of its transition relation Δ over some suitable alphabet.
A run of T over a labelled tree  ∈ Trees  [Σ] is a function  :  →  such that for every  ∈  with children  • 1, . . .,  • , we have ( (),  (),  ( • 1) . . . ( • )) ∈ Δ.In particular, if  is a leaf, then we require ( (),  (), ) ∈ Δ.We say T accepts  if there exists a run of T over , and write L (T ) to denote the language of all labelled trees accepted by T , and L  (T ) for the (finite) language of labelled trees of size  accepted by T .By the result in [6], we assume the existence of an FPRAS CountNFTA for approximating |L  (T )|, running in time polynomial in  and |T |.
Hypertree Decompositions.We briefly review some background on hypertree decompositions, and refer the reader to the comprehensive paper by Gottlob et al. [18] for more details.A hypertree for a conjunctive query  is a tuple ⟨ , , ⟩, where  = ( , ) is a tree on vertices  with edges ,  and  are labelling functions mapping each  ∈  to sets of variables  () ⊆ vars() and atoms  () ⊆ atoms().We use vertices( ) as shorthand to denote  , the set of vertices of  .Moreover, for a set of vertices  ⊆ vertices( ), we define  () = ∪  ∈  ().
A hypertree decomposition for a conjunctive query  is a hypertree ⟨ , , ⟩ that satisfies the following conditions: (1) for each atom  ∈ atoms(), there exists  ∈ vertices( ) such that vars() ⊆  () (2) for each variable  ∈ vars(), the set { ∈ vertices( ) |  ∈  ()} induces a connected subtree of  (3) for each vertex  ∈ vertices( ),  () ⊆ vars( ()) (4) for each vertex  ∈ vertices( ), vars( ()) ∩  (  ) ⊆  (), where   is the subtree of  rooted at  A hypertree decomposition is said to have width  if max  ∈vertices( ) | ()| = .The width of a conjunctive query  is the minimal width across all its possible hypertree decompositions.A class of conjunctive queries is said to have bounded hypertree width if all queries in the class have hypertree width at most , for some constant .For any input conjunctive query of hypertree width , one can compute in time polynomial in the query size a hypertree decomposition of width at most .
We say a hypertree decomposition ⟨ , , ⟩ is complete if every atom  ∈ atoms() has a covering vertex in  .Any decomposition of width  for a conjunctive query  can be transformed in logspace to a complete decomposition of equal width by the following process: for any atom  ∈ atoms() that does not have a corresponding covering vertex, create a new vertex   with  () = vars() and  () = {}.Then attach   as a child of some vertex  that satisfies  () ⊆ vars() (such a vertex must exist by condition (1) in the definition of a hypertree decomposition).
We finish by remarking that removing condition (4) in the definition of a hypertree decomposition above results in defining the closely related notion of a generalized hypertree decomposition.However, testing whether a query has generalized hypertree width at most  (for a fixed constant  ≥ 3) is NP-complete.Moreover, for any conjunctive query  it is known that ghtw() ≤ htw() ≤ 3 • ghtw() + 1, where (g)htw denotes (generalized) hypertree width [1].Hence, we write our results here in terms of hypertree decompositions, bearing in mind that our results apply equally to queries of bounded generalized hypertree width.

A WARM-UP: PATH QUERIES ON GRAPHS
In this section, we prove a special case of the main result of this paper.In particular, we show the existence of an FPRAS for computing the uniform reliability of self-join-free path queries on database instances where all relations are binary-in other words, labelled graphs.In doing so, we build some intuition on the proof of the full theorem given later.
Theorem 2. Let  be a database instance in which all relations are binary, and  be a self-join-free path query.Then there exists an algorithm PathEstimate such that, for all  ∈ (0, 1), (1 − )UR(, ) ≤ PathEstimate(, ) ≤ (1 + )UR(, ) with high probability.Moreover, PathEstimate has runtime: Proof intuition.Let  be our input database instance, and  the self-join-free path query, taking the form: with all of the   (for  ∈ []) distinct.
Our ultimate goal will be to construct an NFA M = (, Σ, , ,  ) whose accepted strings correspond one-to-one with the subinstances of  that satisfy , upon which we can apply the Count-NFA FPRAS.Before doing so, we first construct a slightly different NFA M ′ = ( ′ , Σ ′ ,  ′ , ,  ), as follows: let the alphabet be It is not difficult to see that every string accepted by M ′ takes the form  1 ( 1 ,  2 ) 2 ( 2 ,  3 ) . . .  (  ,  +1 ), corresponding oneto-one to the possible sequences of witnessing facts for  on .Every subinstance  ′ ⊆  must therefore contain all of the facts in one of these strings to be a satisfying subinstance for .Moreover, as long as our selection of facts contains the facts in one of these strings, we can make any choice as to the presence of the remaining facts in .However, we want to do this in a way that avoids representing the same subinstance twice.We can avoid this issue by: (1) indicating the absence of a fact from our subinstance, rather than only the presence (so all strings accepted should have length | |); and (2) ensuring a consistent ordering in which the symbols indicating each fact's presence or absence in  ′ appears in any accepted string Notice that M ′ already accepts strings with atoms appearing only in the order they appear in the query  1 ≺ • • • ≺   .It will therefore suffice to fix an arbitrary total ordering ≺  on the   -facts for each atom   .We are now ready to begin the construction of M = (, Σ, , ,  ).We first expand Σ to allow us to indicate the absence of a fact by defining Σ = Σ ′ ∪ {¬  (, ) |   (, ) ∈  }.We also define an expanded state set: Thus, we can apply the FPRAS CountNFA [5] on M to approximate the number of strings accepted of length | |, which yields with high probability a (1 ± )-approximation of the uniform reliability of  on .Hence, we can realize the algorithm PathEstimate as above.□

UNIFORM RELIABILITY
In this section, we build on the intuition presented in the previous section, generalizing the approach there so that it applies to computing uniform reliability of self-join-free queries of bounded hypertree width over arbitrary instances.To do so, we move from the setting of string automata to tree automata for the remainder of this paper.

Augmented NFTAs
We first introduce augmented NFTAs, which augment NFTAs with two additional constructs: (1) (String Annotations) First, we extend the definition of an NFTA to allow a transition to be annotated with a string of symbols  1 . . .  (rather than a single symbol), with the implicit meaning that an additional  − 1 fresh intermediate states are inserted between the start and end states of the transition so that the string  1 . . .  is accepted.(2) (? Symbols) Second, we also allow a symbol   appearing in this string to be annotated with a ?, as shorthand expressing that either the symbol   or ¬  should be accepted (note that this adds no additional states).We distil the two ideas above into the definition of an augmented NFTA below, and then define its semantics in terms of a translation into an ordinary NFTA.
Definition 1 (Augmented NFTA).An augmented (top-down) nondeterministic finite tree automaton (augmented NFTA, for short) is a tuple T + = (, Σ, Δ,  init ), where  is a finite set of states, Σ is a finite alphabet of input symbols, and  init ∈  is the initial state.The transition relation is defined as Δ ⊆  × Γ × (∪  =0   ), where Γ = { |  ∈  * } \  and  = {,  ?|  ∈ Σ}.In other words, Γ is the set of non-empty strings formed by symbols from Σ, where some of these symbols may be annotated with a ? .Analogously to ordinary NFTAs, we define the size of an augmented NFTA as the size of the encoding of its transition relation.
We then replace every transition in Δ ′ of the form: where  ∈ Σ with the two transitions: to get a new transition relation Δ ′′ , thereby obtaining our final NFTA T = (, Σ ′ , Δ ′′ ,  init ).
Finally, we remark that the translation outlined above does not lead to any material blow-up in size of the translated NFTA.
Remark 1.The translation defined above from an augmented NFTA T + to its corresponding ordinary NFTA T can be performed in time  (poly(|T + |)).

Result
In this section, we show our approximation scheme as far as it applies to computing uniform reliability, which hinges on the construction of an augmented NFTA from the query-database pair, such that there is a bijection between the trees accepted by the NFTA with the subinstances of the database satisfying the query.
To do so, we traverse the hypertree decomposition in a manner analogous to [6, Theorem 3.2], but take into account a number of adapations necessary for the uniform reliability problem that are motivated in the previous section.Proof.Let  be a self-join-free query of bounded hypertree width, and  be a database instance defined only over relations occurring in .We will construct an augmented NFTA T + = (, Σ, Δ,  init ) satisfying the desired properties.For each relation   occurring in , fix some total ordering ≺  over the   -facts in .Fix also a total ordering ≺ atoms on atoms().By the results discussed earlier, we can efficiently construct a complete hypertree decomposition ⟨ , , ⟩ of  of constant width in polynomial time.We fix another total ordering ≺ vertices , this one over vertices( ), with the requirement that for any ,  ∈ vertices( ), we have  ≺ vertices  if and only if depth() ≤ depth() (where depth() denotes the distance of  from the root vertex).We are now ready to begin the construction.
We start by fixing the alphabet Σ = {  () |   () ∈  }, that is, one symbol per fact in .Given a tuple of variables  = ( 1 , . . .,   ) and constants  = ( 1 , . . .,   ), we use the notation  ↦ →  to denote the assignment of variables in  to the constants in .We assume any assignment  ↦ →  is always well-behaved, in the sense that if a variable  appears more than once in , it always gets assigned to the same value.We say two assignments  ↦ →  and  ↦ →  are consistent if every variable  that occurs in both  and  gets assigned to the same value in both  and .Finally, we use the notation   to denote the variable tuple ( 1 , . . .,   ) for every atom   ( 1 , . . .,   ) appearing in .
It is clear that T + can be constructed in time poly(| |, | |), observing that the number of transitions in Δ is polynomial in | | and | | (assuming that the hypertree width is constant), and that each transition can be encoded in  (| |) symbols.Thus, the only thing remaining to be proved is the existence of a bijection between the labelled trees of size | | accepted by T + , and the subinstances of  satisfying .Consider some  ′ ⊆  such that  ′ |= .Construct a labelled tree  from  ′ as follows.Start with a tree with the same structure as  , but perform the following procedure: contract any vertex that is not a ≺ vertices -minimal covering vertex for some atom in  by deleting the vertex, and connecting its children (if any) to its parent 1 , repeating this process until every vertex in the tree remaining is a ≺ vertices -minimal covering vertex.Now expand each vertex  in this tree by replacing it with a path in which each vertex is labelled with an   -fact or its negation (depending on its presence in  ′ ) for every atom   covered by , ensuring that the orderings on the facts in  and on atoms() is respected.One can check that, by construction,  ∈ L (T + ).Clearly, the mapping just described is injective, as two distinct subinstances will lead to trees with different labellings.It is also surjective, since given any  ∈ L (T + ), one can simply read off the labels on the vertices of  1 Such a parent must always exist, since the root node must be a covering vertex by definition of a hypertree decomposition, and the root node is clearly ≺ vertices -minimal.
to reconstruct the corresponding subinstance  ′ ⊆ .This shows the existence of a bijection, as desired. □ Using this construction we can now state our result for self-joinfree conjunctive queries of bounded hypertree width, as far as it applies to uniform reliability.Note that in the theorem statement below we have dropped the condition on the database schema that was present in Proposition 1.
Theorem 3. Let  be a self-join-free conjunctive query of bounded hypertree width, and  a database instance.Then there exists an algorithm UREstimate such that, for all  ∈ (0, 1), with high probability.Moreover, UREstimate has runtime: Proof.We will show how to realize the algorithm UREstimate, given a conjunctive query  satisfying the specified conditions and database instance .
Consider the subinstance  ′ ⊆  "projected" on the relations in , obtained from  by removing all facts over relations that do not occur in .By Proposition 1, we can construct an augmented NFTA T + such that |L | ′ | (T + )| = UR(,  ′ ) in time polynomial in | | and | ′ |.Observing that the semantics for this augmented NFTA T + is determined by translation to an ordinary NFTA T accepting the same trees as described in Section 4.1, and that this translation can be performed in polynomial time, we can tractably approximate UR(,  ′ ) by applying the NFTA counting algorithm CountNFTA presented in [6] to T .Thus, we have: (

PROBABILISTIC QUERY EVALUATION
The approximation scheme presented in the previous section is applicable only for computing the uniform reliability of a (bounded hypertree width, self-join-free) conjunctive query.We now consider how to extend the approach above, to allow for arbitrary rational probability values on the individual facts of the database instance.We do this by attaching a series of extra states and transitions to the states in the NFTA constructed in Proposition 1, in order to multiply the number of trees accepted proportionally to each subinstance's weight.

NFTAs with multipliers
Since the addition of these extra states is rather notation-heavy, we again use some syntactic sugar.We define a notion of NFTAs with multipliers below, which allow the annotation of transitions with a positive integer , which we call a "multiplier", indicating the number of extra trees that should be induced upon taking that transition.Syntactically, NFTAs with multipliers are otherwise identical to NFTAs.Definition 2 (NFTA with multipliers).A (top-down) nondeterministic finite tree automaton with multipliers (NFTA with multipliers, for short) is a tuple T c = (, Σ, Δ,  init ), where  is a finite set of states, Σ is a finite alphabet of input symbols, Δ ⊆  ×Σ×N× (∪  =0   ) is a set of transition tuples, and  init ∈  is the initial state.Again like for ordinary NFTAs, we define the size of an NFTA with multipliers as the size of the encoding of its transition relation.
For every tuple (, , ,  1 . . .  ) in the NFTA with multipliers, the transitions2 added by the translation above cause the number of trees accepted to be multiplied by .The additional trees are obtained by gluing on paths corresponding to binary strings of length  = log 2 ( − 1) + 1 (for  > 1), starting from: 0 . . .0  times up to  1 . . .  .Moreover, the number of new states added to the translated automaton for each multiplier value is logarithmic in .
Remark 2. The translation defined above from an NFTA with multipliers T c to its corresponding ordinary NFTA T can be performed in time  (poly(|T c |)).

Result
We can now prove the theorem from the introduction to the paper.Theorem 1.Let  be a self-join-free conjunctive query of bounded hypertree width, and  a probabilistic database instance.Then there exists an algorithm PQEEstimate such that, for all  ∈ (0, 1): with high probability.Moreover, PQEEstimate has runtime: Proof.Like we did for Theorem 3, we will again show how to realize the algorithm PQEEstimate, given a conjunctive query  satisfying the specified conditions and probabilistic database instance  = (, ).Recall that probability labels are rational numbers taking the form  (  ) =   /  .Without loss of generality, we can assume that  is defined only on relations occurring in , since the probabilities of the additional subinstances marginalize to 1.
Let  denote the product of the denominators of all fact labels in , that is,  =   ∈   .For any ( ′ , ) =  ′ ⊆  , we have: Noting that  is a known constant, it will suffice to approximate the sum term in the last equation, which we do below.Take the augmented NFTA T + constructed in Proposition 1 from  and , and consider the corresponding ordinary NFTA obtained from its translation T = (, Σ, Δ,  init ).We will construct an NFTA with multipliers T c = (, Σ, Δ ′ ,  init ) from T as follows.The alphabet, state set, and initial state are identical to that of T .
The transition set Δ ′ is obtained from Δ as follows.It is clear that every transition in Δ must either take the form3 : (,   (),  ) or (, ¬  (),  ) We add to Δ ′ the tuple (,   (),   ,  ) for every such tuple of the former type, where   is the numerator of the weight of the fact   () appearing in .Similarly, we add to Δ ′ the tuple (, ¬  (),   −   ,  ) for every tuple of the latter type.Since every fact in the database appears exactly once either in its positive or negated form in each of the trees accepted by T , it follows that: and so, applying the NFTA counting procedure CountNFTA [6] on the translated tree automaton T ′ derived from T c we get:

CONCLUSIONS AND FUTURE WORK
We showed how to construct a combined FPRAS for the PQE problem, by taking advantage of recent results in counting trees of a fixed size accepted by an NFTA.There are two key lines of future work we would like to explore.First, we would like to expand our results to a wider class of queries, for example by relaxing the self-join-free condition in Theorem 1.Initial results we have obtained in this direction suggest that this may come at the cost of necessitating some mild structural constraints on the input database.The second avenue we wish to explore is integration of the proposed FPRAS procedure into practical systems for probabilistic databases.Such an integration would require both a tool for computing hypertree decompositions, as well as a practical implementation of the CountNFTA algorithm.The former task has been relatively well-studied, with ready-touse tools available: see, for example, the 2019 PACE challenge [13].However, practically effective approximation methods for counting fixed-size trees accepted by an NFTA remain limited, given the recency of the corresponding theoretical result.Nevertheless, we are optimistic that future work will bring the constants in this algorithm down, paving the way for a practical implementation of the approach presented here.
and they have the same labelling function .The size of | | of a probabilistic database instance is defined as the size of its underlying database instance | |, plus the aggregate size of the bit encodings of its fact probabilities.

Proposition 1 .
Let  be a self-join-free conjunctive query of bounded hypertree width, and  a database instance defined only over relations occurring in .Then we can construct an augmented NFTA T + in time poly(| |, | |) such that |L | | (T + )| = UR(, ).