A faster FPRAS for #NFA

Given a non-deterministic finite automaton (NFA) A with m states, and a natural number n (presented in unary), the #NFA problem asks to determine the size of the set L(A,n) of words of length n accepted by A. While the corresponding decision problem of checking the emptiness of L(A,n) is solvable in polynomial time, the #NFA problem is known to be #P-hard. Recently, the long-standing open question --- whether there is an FPRAS (fully polynomial time randomized approximation scheme) for #NFA --- was resolved by Arenas, Croquevielle, Jayaram, and Riveros in [ACJR19]. The authors demonstrated the existence of a fully polynomial randomized approximation scheme with a time complexity of ~O(m17 n17 • 1/ε14 • log (1/δ)), for a given tolerance ε and confidence parameter δ. Given the prohibitively high time complexity in terms of each of the input parameters, and considering the widespread application of approximate counting (and sampling) in various tasks in Computer Science, a natural question arises: is there a faster FPRAS for #NFA that can pave the way for the practical implementation of approximate #NFA tools? In this work, we answer this question in the positive. We demonstrate that significant improvements in time complexity are achievable, and propose an FPRAS for #NFA that is more efficient in terms of both time and sample complexity. A key ingredient in the FPRAS due to Arenas, Croquevielle, Jayaram, and Riveros [ACJR19] is inter-reducibility of sampling and counting, which necessitates a closer look at the more informative measure --- the number of samples maintained for each pair of state q and length i <= n. In particular, the scheme of [ACJR19] maintains O(m7/n7 ε7 ) samples per pair of state and length. In the FPRAS we propose, we systematically reduce the number of samples required for each state to be only poly-logarithmically dependent on m, with significantly less dependence on n and ε, maintaining only ~O(n4/ε2) samples per state. Consequently, our FPRAS runs in time ~O((m2n10 + m3n6) • 1/ε4 • log2(1/δ)). The FPRAS and its analysis use several novel insights. First, our FPRAS maintains a weaker invariant about the quality of the estimate of the number of samples for each state q and length i <= n. Second, our FPRAS only requires that the distribution of the samples maintained is close to uniform distribution only in total variation distance (instead of maximum norm). We believe our insights may lead to further reductions in time complexity and thus open up a promising avenue for future work towards the practical implementation of tools for approximate #NFA.

Given a non-deterministic finite automaton (NFA) A with states, and a natural number ∈ N (presented in unary), the #NFA problem asks to determine the size of the set L(A ) of words of length accepted by A. While the corresponding decision problem of checking the emptiness of L(A ) is solvable in polynomial time, the #NFA problem is known to be #P-hard.Recently, the long-standing open question -whether there is an FPRAS (fully polynomial time randomized approximation scheme) for #NFA -was resolved by Arenas, Croquevielle, Jayaram, and Riveros in [3].The authors demonstrated the existence of a fully polynomial randomized approximation scheme with a time complexity of 17 17 • 1 14 • log(1/ ) , for a given tolerance and confidence parameter .Given the prohibitively high time complexity in terms of each of the input parameters, and considering the widespread application of approximate counting (and sampling) in various tasks in Computer Science, a natural question arises: is there a faster FPRAS for #NFA that can pave the way for the practical implementation of approximate #NFA tools?In this work, we answer this question in the positive.We demonstrate that significant improvements in time complexity are achievable, and propose an FPRAS for #NFA that is more efficient in terms of both time and sample complexity.
A key ingredient in the FPRAS due to Arenas, Croquevielle, Jayaram, and Riveros [3] is inter-reducibility of sampling and counting, which necessitates a closer look at the more informative measure -the number of samples maintained for each pair of state and length ≤ .In particular, the scheme of [3] maintains 7 ) samples per pair of state and length.In the FPRAS we propose, we systematically reduce the number of samples required for each state to be only poly-logarithmically dependent on , with significantly less dependence on and , maintaining only ( 42 ) samples per state.Consequently, our FPRAS runs in time (( 2 10 + 3 6 ) • 1  4 • log 2 (1/ )).The FPRAS and its analysis use several novel insights.First, our FPRAS maintains a weaker invariant about the quality of the estimate of the number of samples for each state and length ≤ .Second, our FPRAS only requires that the distribution of the samples maintained is close to uniform distribution only in total variation distance (instead of maximum norm).We believe our insights may lead to further reductions in time complexity and thus open up a promising avenue for future work towards the practical implementation of tools for approximate #NFA.
CCS Concepts: • Theory of computation → Theory and algorithms for application domains.

INTRODUCTION
In this paper, we focus on the following computational problem: #NFA : Given an NFA A = ( , , Δ, ) over the binary alphabet Σ = {0, 1} with states, and a number in unary, determine |L (A )|, where L (A ) is the set of strings of length that are accepted by A.
The problem of #NFA is a fundamental problem in Computer Science, with range of applications in database and information extraction systems; we outline a few below.
Probabilistic Query Evaluation.Given a query and a database , the problem of probabilistic query evaluation (PQE) is to determine the probability of the query holding on a randomly sampled database, in which each row is included independently with probability given by its annotated value.PQE is known to be #P-hard, even in data complexity setting.For a large class of queries, PQE can directly be reduced to the #NFA problem.In particular, when the schema of the database consists only of binary relations, and when the query is a self-join-free path query, then the PQE problem for and reduces to an instance of #NFA, whose size is linear in the size of and linear in the size of [17], and further, the time to compute the reduction is also linear in and .Thus an efficient algorithm for solving #NFA directly yields an efficient algorithm for probabilistic query evaluation.
Counting Answers to Reglar Path Queries.Regular path queries (or property paths) [2] form a rich fragment of graph query languages such as SPARQL designed for graph database systems.
Here, one abstracts information on edges in the graph database using an alphabet Σ and asks a variety of questions about the set of paths that start from a given source node , end at a target node .In particular, a regular path query ( ) asks to enumerate, count or uniformly sample the set of paths (bounded in length by some fixed number ) from to whose labels match the regular expression .Therefore, the problem of counting the number of answers to the queries reduces to the #NFA problem, for the NFA obtained by the cross product of NFA represented by the database (with as the intial state and as the final state) and the NFA that gets compiled down to.This reduced instance is again linear in the size of the database as well as the query, and the reduction takes the same time [3].

Probabilistic Graph Homomorphism.
A probabilistic graph is a pair ( , ), where is a graph and is the associated probability labeling over edges.The associated probability space is defined as the set of all subgraphs of wherein every edge is sampled independently with probability ( ).Given a query Graph and a probabilistic graph ( , ) the problem of probabilistic graph homomorphism problem asks to to compute the probability that a randomly sampled subgraph of admits a homomorphism from G. In case of 1-way (and 2-way) path queries, the problem was again shown to reduce to #NFA [1].
The direct applications of the #NFA problem brings up the central computational questionhow do we count the number of elements of L (A )?It turns out that #NFA is, in fact, provably computationally intractable -it is known to be a #P-hard problem [18].Given the intractability, and given the widespread applications of the #NFA problem, the next natural question then arises -can we approximate #NFA using a polynomial time algorithm?Until very recently, the only approximation algorithms known for #NFA had running times that were quasi-polynomial in the size of the inputs [8,11].In a recent breakthrough by Arenas, Croquevielle, Jayaram and Riveros [3,4], #NFA was finally shown to admit fully polynomial time randomized approximation scheme (FPRAS).
One would assume that the discovery of FPRAS for #NFA would lead to design of scalable tools for all the different applications outlined above.Such has, however, not been the case and the primary hurdle to achieving practical adoption has been the prohibitively large time complexity of the proposed FPRAS-it takes time 17 17 •1 14 • log(1/ ) , where hides the poly-logarithmic multiplicative factors in and .It is worth highlighting that for most applications, the reduction to #NFA instance often only incurs a blow up that is a low-degree polynomial (linear or quadratic) factor larger than the size of the original problem.This means that the bottleneck to solving the original counting problem is really the high runtime complexity involved in the (approximate) counting algorithm and not the reduction.
In this work, we pursue the goal of designing a faster algorithm, and propose an FPRAS which runs in time (( The reliance of both FPRASes on the inter-reducibility of sampling and counting necessitates a more informative measure, such as the number of samples maintained for each state.The scheme proposed in [4] maintains ( 7 7 7) samples per state while the scheme proposed in this work maintains only ( 42 ) samples, which is independent of and with a significant reduction in its dependence on and .

Overview of our FPRAS
At a high-level, our approach exploits the tight connection between approximate counting and the related problem of (almost) uniform generation.In the context of an NFA A and a length , the almost uniform generation problem asks for a randomized algorithm (generator) , such that, on each invocation, the probability with which returns any given string ∈ L (A ) lies in the range ; here ∈ (0, 1) is a parameter.Such a random generator is a polynomial time generator if its running time is polynomial in |A|, and 1/ .The existence of a polynomial time almost uniform generator for self-reducible [10] problems implies the existence of an FPRAS, and vice versa.
This tight connection between counting and sampling was also the high-level idea behind the work of [4].Consequently, our algorithm fits the template procedure of Fig. 1, also outlined in [4].For technical convenience, the algorithm unrolls the automaton A into an acyclic graph 1 with levels; each level ℓ containing the ℓ th copy ℓ of each state of A. A key component in the algorithm is to incrementally compute an estimate N( ℓ ) of the size of the set L ( ℓ ) of words of length ℓ that have a run ending in state .Next, in order to inductively estimate N( ℓ ), we in turn rely on the estimates of sets L ( 0 ℓ −1 ) and L ( 1 ℓ −1 ); here b is a b-predecessor state of (b ∈ Σ = {0, 1}), i.e., ( b , b, ) ∈ Δ is a transition of A. These estimates can be put to use effectively because: We remark that the sets {L ( ℓ −1 )} may not be disjoint for different predecessor states .Hence, we cannot estimate

Δ}.
To address this, we maintain a set, S( ℓ −1 ), for each , which contains samples of strings from L ( ℓ −1 ).One possible idea is to construct the set S( ℓ −1 ) using strings that have been sampled uniformly at random from L ( ℓ −1 ) but such a process introduces dependence among the sets  Compute a uniform set of samples S( ℓ ) using N( ℓ ) and the prior estimates and samples {N( ), S( )} <ℓ, ∈ 5 return N( ) S( ℓ −1 ) for different .The crucial observation in [4] is that for any , the following is true: for any word of length , all words from L ( ℓ ) with the suffix can be expressed as for some set of states, where depends only on , , and .This property is referred to as the self-reducible union property.This insight allows us to utilize the self-reducibility property for sampling.We can partition L ( ℓ −1 ) into sets of strings where the last character is either 0 or 1, and compute estimates for each partition since each partition can be represented as a union of sets for which we (inductively) have access to both samples and estimates.We then sample the final character in proportion to the size of these estimates.Similarly, we can sample strings, character by character.That is, we can sample strings by progressively extending suffixes of backwards!Of course, we must also account for the errors that propagate in the estimates of N( ℓ ).
Since our algorithm also relies on the self-reducible union property, the basic structure of our algorithm is similar to that of [4] on high-level.There are, however, significant and crucial technical differences that allow us to obtain an FPRAS with significantly lower time complexity.We highlight these differences below by first highlighting the two key properties underpinning the algorithm and ensuing analysis in [4]: (ACJR-1).For every level ℓ, the following condition holds with high probability (here, = ): 2 to the distribution of multi-set obtained by independently sampling (with replacement) every element of L ( ℓ ) uniformly at random.In contrast, the FPRAS we propose maintains the following two weaker invariants: (Inv-1).For every state and level ℓ, the event that N( ℓ ) ∈ (1 ± 2 2 )|L ( ℓ )|, which we denote as AccurateN ,ℓ , happens with high probability.(Inv-2).The distribution of S( ℓ ) is close, in total variation distance, to the distribution of a multiset constructed by sampling (with replacement) every element of L ( ℓ ) uniformly at random.Two remarks are in order.First, (ℓ) implies AccurateN ,ℓ , and closeness in ∞ is a stringent condition than closeness in total variation distance.Second, the technical arguments of [4] crucially rely on their invariants and do not work with the aforementioned weaker arguments.Accordingly, this necessitates a significantly different analysis that depends on coupling of the distributions.It's also noteworthy that, due to our reliance on the weaker condition, our method for estimating N( ℓ ) differs from that in [4].In fact, our approach is based on an adaptation of the classical Monte Carlo-based computation of the union of sets [12].
For a given state and subset , achieving a 1 3 -additive approximation for with a probability of 1 − requires, according to standard concentration bounds, that the size of S( ℓ ) be ( 6 log −1 ).To ensure this additive approximation for all states and subsets , one must consider union bounds over exponentially many events.Specifically, for the analysis to be successful, one must have −1 ∈ (2 ).Consequently, the FPRAS of [4] maintains ( 7 ) samples for each state .In contrast, for AccurateN ,ℓ to hold with a probability of 1 − , our FPRAS only maintains 4 2 log −1 samples in S( ℓ ).Furthermore, a union bound over only ( ) events is required, thus setting −1 = ( ) is adequate.
The time complexity of an algorithm that follows the template outlined in Fig. 1 (including both the FPRAS of [4] as well as the FPRAS developed in this paper), has a quadratic depends on |S( ℓ )|.Accordingly, it is instructive to compare the bound of ( 2 ) in our approach.Remarkably, the number of samples for every state in our approach is independent of , the size of the automaton.

Organization.
In Section 2 we recall some helpful background as well as introduce a key concept (distribution entanglement) useful for the rest of the paper.In Section 3, we describe our algorithm, together with two auxiliary algorithms and state their formal correctness guarantees, whose analysis we present in Section 4. We conclude in Section 5.

PRELIMINARIES
NFA and words.We consider the binary alphabet Σ = {0, 1}.All results in this paper are extendable to alphabets of arbitrary but fixed constant size.A string over Σ is either the empty string (of length 0) or a non-empty finite sequence 1 2 . . .
where each ∈ Σ.The set of all strings over Σ is denoted by Σ * .A non-deterministic finite automaton is a tuple A = ( , , Δ, ) where is a finite set of states, ∈ is the initial state, Δ ⊆ × Σ × is the transition relation, and is the set of accepting states.A run of on A is a sequence = 0 , 1 . . ., such that = | |, 0 = and for every < , ( , +1 , +1 ) ∈ Δ; is accepting if ∈ .The word is accepted by A if there is an accepting run of on A. The set of strings accepted by A is L (A).For a natural number ∈ N, the th slice of A's language, L (A ), is the set of strings of length in L (A).For simplicity, this paper assumes a singleton set of accepting states, i.e., = { } for some ∈ .For ∈ and b ∈ Σ, the b-predecessors of are given by the set Given automaton A and a number in unary, we construct the unrolled automaton A unroll , and assume that all states in A unroll are reachable from the initial state.We use the notation ℓ to denote the ℓ th copy of state ∈ .For each state and level 0 ≤ ℓ ≤ , we use L ( ℓ ) to denote the the set of all distinct strings of length ℓ for which there is a path (labeled by ) from the starting state to .
Total Variation Distance.The total variation distance between two random variables , over domain Ω is defined as:

A New Notation: Distribution Entanglement
We introduce a new notation, referred to as distribution entanglement, to argue about the probability of events for cases when the distribution of a certain random variable were to mimic distribution of another random variable.For an event E and random variables and , we use Pr [E; ] to denote the probability of E when has distribution entangled to , which is defined as follows: where Ω is the support of .The above notion is well-defined only when it is the case for all ∈ Ω, we have Pr[ = ] > 0. Likewise, we can also extend this notion for conditioned events.That is, for events E, F and random variables and , the conditional probability of E given F when has distribution entangled to is defined as follows: Whenever is clear from the context, we will omit it from the notation.

A FASTER FPRAS
In this section we formally present our FPRAS for #NFA, together with the statement of its correctness and running time (Theorem 3).Our FPRAS Algorithm 3 calls two subroutines AppUnion (Algorithm 1) and sample (Algorithm 2), each of which are presented next, together with their informal descriptions and formal statements of correctness.Algorithm 3 presents the main procedure for approximating the size of L (A ), where A is the NFA and is the parameter that is given as input.The algorithm works in a dynamic programming fashion and maintains, for each state and each length 1 ≤ ℓ ≤ , a sampled subset S( ℓ ) of L ( ℓ ).As a key step, it uses the sampled sets corresponding to Pred( , 0) and Pred( , 1) to construct the desired sampled subset of L ( ℓ ).We first describe the two subroutines the AppUnion and sample.

Approximating Union of Sets
AppUnion (Algorithm 1) takes parameters , and inputs sz and (access to) sets 1 , . . ., .For each set the access is given in the form of a membership oracle , a subset (obtained by sampling with replacements) presented as a list, and an estimate sz of the size of .Using these, Algorithm 1 outputs an ( , ) estimate of the size of the union ∪ =1 .
The precise algorithm presented here represents a modification of the classic Monte Carlobased scheme due to Karp and Luby [12].The proof of its correctness also shares similarities with [12] and we summarize the intuition here.For a ∈ ∪ =1 , let ( ) denote the smallest index ∈ {1, 2, . . ., } such that ∈ .Now, let U unique be the set of all pairs ( , ( )) and let U multiple be the set of all ( , ) pairs such that ∈ .Consequently, the cardinality of the set By drawing sufficiently-many ( , to be precise) samples from U multiple and assessing the fraction belonging to U unique , one can estimate |U unique |/|U multiple |, which when multiplied by |U multiple | provides an estimate for the size of U unique , which is the desired output.
After sampling an index , in Line 7 an element is chosen (and removed) from the sampled list .In the case when the set is constructed by uniformly selecting elements from , this step mimics drawing a random sample from the actual set .Initially, each set (more precisely, list) is guaranteed to have a size of at least thresh = 24 • (1+ sz ) 2 2 • log( 4 ), surpassing the expected number of samples required from during the iterations of the loop.If the algorithm ever needs more samples than | | during the iterations, it counts as an error, but the probability of this Algorithm 1: Approximating Union of Sets occurrence will be demonstrated to be very low.Finally, Line 9 verifies whether the sample from U multiple belongs to U unique by asking a membership question to the oracle .If all the sampled sets 's are constructed uniformly, then the variable tallies the number of samples from U unique , and after iterations, / provides a estimation for |U unique |/|U multiple |.The final output is the product of this value with ( sz ) (which is the estimate for |U multiple |).
We next present (in Theorem 1) the formal correctness statement of this algorithm using distribution entanglement (see Section 2.1).For this, we set the source random variable to be the one that corresponds to the product distribution of the input sampled sets 1 , . . ., , while the target random variable corresponds to the product distribution of U 1 , . . ., U , where, U , is the random variable that obeys the distribution obtained when constructing a subset of (of at least thresh many elements) by uniformly picking each element of .Since the random variables 1 , . . ., are clear from context, we will use Pr E; U 1,..., to denote the probability of event E in the resulting entanglement distribution.For well definedness, we require that Pr[ = ] > 0 for every subset ⊆ (with ≥ thresh), and will ensure this each time we invoke this statement.

Sampling Subroutine
The sampling subroutine (Algorithm 2) takes as input a number ℓ, a set of states ℓ (at level ℓ of the unrolled automaton A unroll ), a string (of length −ℓ), a real number (representing a probability value), an error parameter and a confidence parameter , and outputs a string sampled from the Algorithm 2: Sampling subroutine for A unroll , length Algorithm 2 is a recursive algorithm.In the base case (ℓ = 0), it outputs the input string with probability , and with the remaining probability, it outputs ⊥.In the inductive case (ℓ > 0), the algorithm computes estimates {sz b } b∈ {0,1} , where sz b is the estimate of the size of ∪ b, ∈Pred( ,b) L ( b, ℓ −1 ).For this, it calls AppUnion ( , ) with previously computed estimates of states at previous levels {N( ℓ −1 )} ∈ and sampled subsets of strings {S( ℓ −1 )} ∈ , and uses the unrolled automaton A unroll as a proxy for membership oracles (see the last argument to AppUnion ( , ) in Line 11).Next, b is chosen randomly to be 0 or 1 with probability proportional to the estimates sz 1 and sz 0 respectively (Line 13).Once b is chosen the set ℓ is updated to ℓ −1 b and the string is updated to b • .Subsequently, the sample() subroutine is called recursively with the updated subset of states and the updated .
We next tread towards formally characterizing the guarantee of Algorithm 2. To this end, we will introduce few notations: • For each 1 ≤ ≤ ℓ and each ∈ , let AccurateN , be the event that • For a state ℓ , we use U( ℓ ) to represent the random variable denoting the sequence of samples (with possible repetitions) constructed by repeatedly sampling |S( ℓ )| number of elements uniformly from L ( ℓ ).Let S ≤ be the random variable corresponding to the product of {S( )} ∈ ,0≤ ≤ .Also, let U ≤ represent the random variable obtained by taking the ordered profuct of all {U( )} ∈ ,0≤ ≤ .
In • For each 1 ≤ ≤ ℓ and b ∈ {0, 1}, by AccurateAppUnion ,b we denote the event that the b th call to AppUnion corresponding to level returns a value sz b that satisfies 1 Here, is the argument of the call at level , and b is the b-predecessor of .For ease of notation, we also define AccurateAppUnion = 1≤ ≤ℓ b∈ {0,1} AccurateAppUnion ,b • Let Fail 1 be the event that ret sample = ⊥ is returned because > 1 at the time of return (Line 5).
Let Fail 2 be the event that ret sample = ⊥ is returned at Line 6. Finally, let Fail = Fail 1 ⊎ Fail 2 .
The following presents the correctness of Algorithm 2, while its proof is presented in Section 4.2.

Main Algorithm
Algorithm 3 first constructs (in Line 4) the labeled directed acyclic graph A unroll .The algorithm then goes over all the states and all levels 0 ≤ ℓ ≤ to construct the sampled sets and the size estimates inductively.In Lines 6-10, the algorithm caters for the base case (ℓ = 0).In the inductive case (Lines 12-15) the algorithm computes the estimate sz 1 and sz 0 , where sz b is the estimate of the size of ∪ b, ∈Pred( ,b) L ( b, ℓ −1 ).The size estimates of, and a sample subsets of the sets {L ( ℓ −1 )} ∈ are available inductively at this point.Further, membership in these languages can be easily checked.As a result, the algorithm uses the subroutine AppUnion to compute the size of the union (in Line 15), with A unroll as the membership oracle as before.Once the size estimates sz 0 and sz 1 are determined, the algorithm constructs the sampled subset of L ( ℓ ) by making xns calls to the subroutine sample (Lines 21-25); if fewer than ns many strings were sampled, the algorithm simply pads one fixed string from L ( ℓ ) so that S( ℓ ) eventually has size ns (see Lines 27-30).Theorem 3 presents the desired correctness statement of our FPRAS Algorithm 3, and its proof is presented in Section 4.3.

T 3.
Given an NFA A with states and ∈ N (presented in unary), Algorithm 3 returns Est such that the following holds: Moreover, the algorithm has time complexity (( , where the tilde hides for polynomial factors of log( + ).

TECHNICAL ANALYSIS
We now present the detailed technical analyses of the algorithms presented in Section 3.

Correctness of Algorithm 1
The proof is in the same lines as that of Karp and Luby [12].For the sake of completeness, we present a full proof.

P ( T 1).
Let Fail be the event that the output of the algorithm is not within We will show that Pr Fail; U 1,..., is upper-bounded by .
Note that the algorithm in each of the runs of the loop (Lines 5-9) tries to draw an element from some .So if we assume that the size of the sets for all is more than , then the condition in the if (in Line 7) will be satisfied and in that case the else clause in Line 8 is redundant.We will prove the correctness of Algorithm 1 in two parts.In the first part we will prove that if the size of the sets is greater than then Pr Fail; U 1,..., ≤ 2 .In the second part we will prove that if the sets has size thresh (much smaller than ) then the probability that the algorithm will ever need the else clause in Line 8 is less that /2.The two parts combined proves that if the sets s are of size thresh, then Pr Fail; U 1,..., ≤ which is what is claimed in the theorem.
For all ∈ ∪ =1 let ( ) denote the smallest index such that ∈ .In other words, is in ( ) and for all < ( ), ∉ .Observe U unique := ( , ( ) for all , then the condition in the if clause in Line 7 is always satisfied.For a run of the loop (Lines 5-9), consider the Lines 6-7.We say the pair ( , ) is sampled in round if in the th iteration of the loop is sampled in Line 6 and then in that same iteration is obtained from .dequeue() in the Line 7.
For any 0 ∈ {1, . . ., }, 0 ∈ 0 and 0 ∈ {1, . . ., }, we have Pr 0 is sampled in Line 6 in round 0 ; U 1,..., ×Pr 0 is obtained in Line 7 in round 0 | 0 is sampled in Line 6; U 1,..., (1) Since the choice of the set to pick 0 is independent of the distributions of any of the 's, we have, Next observe that the second term in Equation ( 1) is Pr 0 is sampled in round 0 from 0 ; U 1,..., .This term, in turn, can be computed by considering the disjoint union of the events corresponding to how many times 0 has been chosen so far: Pr 0 is sampled in round 0 from 0 ; U 1,...,

= =1
Pr 0 was sampled − 1 times before round 0 and 0 is sampled in round 0 from 0 ; U 1,..., For any 0 ∈ [ ] and 0 ∈ 0 , we have Pr ( 0 , 0 ) is sampled ; U 1,..., Probability that 0 is sampled in Line 6 is independent of the distributions of the samples.The last equality is because we have assumed ∀ , | | ≥ , so the if condition in Line 7 is always satisfied.Hence once 0 is sampled in Line 7 probability that 0 is obtained is exactly 1/| 0 |, since the set 0 contains elements sampled uniformly (with replacement) from 0 .Therefore, Let be the random variable denoting whether the counter is increased in the th iteration of the loop.Note that = 1 if the pair ( 0 , 0 ) sampled in the th iteration is in U unique .This is because in Line 9 it is checked if the sampled pair is in U unique .
So, E[ ; U 1,..., ] = ( , ) ∈ U unique Pr ( , ) is sampled in round ; U 1,..., ; here the ';' in the expectation has an analogous definition to the case when used for probability.Thus, Thus at the end of the algorithm the expected value of is between . Now by Chernoff bound we have, ).
We also know that, The last inequality is due to the setting of the parameter .Thus, by Chernoff bound, probability that the output of the algorithm is not between 1+ , so we have Pr Fail; U 1,..., ≤ 2 .
We first observe that since the probability of reaching Line 8 is independent of the distributions of the sample, we have that Pr Line 8 is ever reached ; U 1,..., = Pr[ Line 8 is ever reached ], and thus we will try to upper bound the latter quantity instead.Note that, the else clause in Line 8 is ever reached, if for some , the number of times is sampled in the iterations is more than thresh.
Let Bad denote the event that the index is sampled (in Line 6) more that thresh number of times during the course of the algorithm, that is during the runs of the loop (Lines 5-9).And let the event Bad be ∪ =1 Bad .We will now upper bound, Pr[Bad ].Let be the random variable counting the number times is sampled: which is less than 12(1+ sz ) 2 2 log(4/ ).By simple application of Chernoff Bound we see that the last inequality is for Δ ≥ 2. For our purpose if we put So by union bound Pr[Bad] ≤ /2.Observe that for large , Δ ≥ 2 as desired.Pr[Bad i ] ≤ 2 .Observe that for large , Δ ≥ 2 as desired.So by union bound, we have: Pr Bad; U 1,..., = Pr[Bad] ≤ /2.

Correctness of Algorithm 2
We prove the three parts of Theorem 2 individually.

Proof of Theorem 2(1)
Consider an execution of Algorithm 2 and let = 1 2 . . .ℓ ∈ L ( ℓ ) be the string constructed right before entering the branch at Line 5. Let 0, be the value of the variable pr 0 at level and let be the value of before the function returns.Then we have, Let sz ℓ 0 , sz ℓ 1 , sz ℓ −1 0 , sz ℓ −1 1 , . . .sz 1 0 , sz 1 1 be the estimates obtained in the 2ℓ calls to AppUnion during the run of sample().Then, Recall that is the argument of the call at level , and b is the b-predecessor of .Observe that L ( ) = L ( 0 ) ⊎ L ( 1 ), and thus, Hence, under the event AccurateAppUnion ,0 ∩ AccurateAppUnion ,1 we have This gives us, Thus, under the event AccurateAppUnion ,b = AccurateAppUnion, we have Assuming the event AccurateN ℓ, also holds, we have and using the inequality (1 + 1/4 2 ) ≤ , we have, , we then have We will now estimate the (conditional) probability of the event Fail.First, observe that for each string ′ ∉ L ( ℓ ), we have Pr ret sample = ′ = 0 because any string that the algorithm outputs is such that there is a path from the initial state to ℓ labeled with , and (b) the unrolled automata is such that any path from start state to some state corresponds to a string in L ( ).
We observe that Fail is the event that ret sample is a string in L ( ℓ ), each of which is a disjoint event.Thus,

Proof of Theorem 2(3)
We would like to use Theorem 1 to prove Theorem 2(3).Towards this, we first establish that the pre-conditions of Theorem 1 hold.
(1) For ≤ ℓ, if we substitute  From Theorem 1, it thus follows that, for all , Therefore, applying union-bound, we have

Correctness of Algorithm 3
We prove the main result (Theorem 3) by induction on the level ℓ.For each level ℓ, we first characterize the accuracy of the computed estimates {N( ℓ )} ∈ and the quality of the sampled sets {S( ℓ )} ∈ , assuming that these were computed when the samples at the lower levels 1, 2, . . ., ℓ −1 are perfectly uniform and have size thresh.After this, we characterize the real computed estimates and samples by arguing that the distance of the corresponding random variables, from those computed using perfectly uniform samples, is small.We first establish some helpful auxiliary results (Lemma 4 and Lemma 5) towards our main proof.As before, we will often invoke distribution entanglement of S ≤ℓ −1 by U ≤ℓ −1 for each level ℓ.L ( ).Observe that, Now, we would like to apply Theorem 1 to show that sz b (at level ℓ) is a good estimate.To this end, we first show that the pre-conditions stated in Theorem 1 hold.This is because, (1) under the event AccurateN ≤ℓ −1 , the estimates {N( ℓ −1 )} ∈ satisfy desired requirements.Next, (2) for each ≤ ℓ − 1, the samples {S( ℓ −1 )} ∈ also obey the desired requirements.This is because thresh (in Algorithm 1), after substituting , , satisfies Therefore, from Theorem 1, we have In addition to the previously introduced notation Pr[; U ≤ ], we will also use the notation TV( , ; U ≤ ) (resp.TV( , F ; U ≤ )) to denote the total variational distance between the distribution induced by random variables , (resp.distribution induced by , conditioned on the event F ) when S ≤ is distribution entangled to U ≤ .Formally, In the above expression, ranges over the union of domains of and while ′ ranges over the union of domains of S ≤ and U ≤ .
Lemma 5.For each ℓ ≤ and ∈ , we have TV(S( ℓ ), U( ℓ )) AccurateN ≤ℓ ; U ≤ℓ −1 ) ≤ P .Fix ℓ and .Let Fail denote the event that the th call to sample (in the loop starting at Line 21) for = returns ⊥.Similarly, let AccurateAppUnion denote the event AccurateAppUnion corresponding to the call to sample for = .Using Iverson bracket notation [9] Our proof of the Theorem 3 relies on upper bounding the probability with which the estimate of Algorithm 3 is outside the desired range.For this, we will set up new random variables and relate its joint distribution with the statsitical distance between the samples constructed by the algorithm and the ideal set of uniform samples.Recall that {S( ℓ )} ℓ ∈ [0, ], ∈ represent the set of random variables corresponding to the (multi)sets of samples we obtain during run of Algorithm 3. Furthermore, {U( ℓ )} ℓ ∈ [0, ], ∈ represent the set of random variables that correspond to repeatedly sampling ns elements (with replacement) uniformly at random from L ( ℓ ).It is worth observing that while, for all , , ℓ, ℓ ′ , S( ℓ ) and S( ℓ ′ ) are dependent but U( ℓ ) and U( ℓ ′ ) are independent.Also, the definition of distribution U( ℓ ) is independent of Algorithm 3 as it is defined solely based on the set of L (S( ℓ )).Since we are talking about the sets of random variables, there is an implicit assumption that these sets are ordered.
We will now define two sequences of random variables: ∈ , Instead of explicitly specifying them, we will instead specify properties about their joint distribution.We will use the notation ≤ (and resp.≤ )) to denote the ordered set The joint distribution of ( , ) has the following properties: (1) For every ∈ , 0 = 0 = S 0 .
(3) For all sets , sequences of sets ′ , ℓ ∈ [1, ], and ∈ , we have We remark that such a joint distribution always exists.This is because (3) only restricts the case when ℓ = ℓ , thereby, providing enough choices for assignments of probability values to the cases where ℓ ≠ ℓ such that ℓ has a well-defined probability distribution.In Appendix A.1 we give a concrete distribution that realizes these properties.
We will begin with the following simple observation, whose proof is deferred to Appendix.

Claim 6. Pr[(
Next, we relate the random variables defined above to the desired statistical distance: We now focus on deriving an upper bound on AccurateN as stated in the following claim, whose proof is deferred to Appendix.

Claim 8. AccurateN
We can now finish the proof of Theorem 3. We first observe that: The final step in the derivation uses the observation that given S ≤ℓ −1 and AccurateN ≤ℓ −1 , the probability of the event AccurateN ,ℓ is only dependent on the randomness of the algorithm AppUnion.Thus, given S ≤ℓ −1 and AccurateN ≤ℓ −1 , the event AccurateN ,ℓ is conditionally independent of ≤ℓ −1 .
Putting together, we have: By Lemma 4 and Lemma 5 and applying union bound over all ℓ ∈ [1, ] and all ∈ we have To calculate the time complexity of Algorithm 3 we first observe that the for loop goes over elements.Now for each ℓ and there is one call to AppUnion , with the number of sets being ( ) and xns calls to sample().So by Theorem 1, the time taken by AppUnion is the time it takes to do ( / 2 log( / )) number of calls to the membership oracle.So the number of calls to the membership oracle is ( 4 / 2 log(1/ )), ignoring the factors of log( + ).The method sample() on the other hand is a recursive function which calls AppUnion , () in every recursive step and the depth of the recursion is ( ).So the time complexity for each of the xns calls to sample() is the time it takes for (5 / 2 log(1/ )) calls to the oracle.Now, we note that the time complexity of the membership calls can be amortized as follows.First, for every string ∈

CONCLUSIONS AND FUTURE WORK
We consider the approximate counting problem #NFA and propose a fully polynomial time randomized approximation scheme (FPRAS) that significantly improves the prior FPRAS [3].Given the wide range of applications of counting and sampling from the language of an NFA in applications including databases, program analysis, testing and more broadly in Computer Science, we envision that further improvements in the complexity of approximating #NFA is a worthwhile avenue for future work.

Algorithm 3 :
Algorithm for estimating |L (A )| Input : NFA A with states, ∈ N in unary Parameters :Accuracy parameter and confidence parameter 1
1: Algorithm Template: FPRAS for #NFA Input : NFA A with states and a single final state ∈ , ∈ N (in unary) 1 Unroll the automaton A into an acyclic graph A unroll by making + 1 copies { ℓ } ℓ ∈ [0, ], ∈ of the states and adding transitions between immediate layers our algorithms, we will use Pr[E; U ≤ ] (resp.Pr[E |F ; U ≤ ]) to denote the probability (resp.conditionalprobability) of E (resp.E conditioned on F ) when S ≤ is distribution entangled to U ≤ .•Consider the random trial corresponding to the call to the procedure sample(ℓ, { ℓ }, , 0 , , ) for ℓ > 0 and let ret sample denote the random variable representing the return value thus obtained.Observe that, in each such trial, the function AppUnion is called 2ℓ times, twice for each level 1 ≤ ≤ ℓ.Let us now define some events of interest.Proc.ACM Manag.Data, Vol. 2, No. 2 (PODS), Article 112.Publication date: May 2024.