QuanDA: GPU Accelerated Quantitative Deep Neural Network Analysis

Over the past years, numerous studies demonstrated the vulnerability of deep neural networks (DNNs) to make correct classifications in the presence of small noise. This motivated the formal analysis of DNNs to ensure that they delineate acceptable behavior. However, in the case that the DNN’s behavior is unacceptable for the desired application, these qualitative approaches are ill equipped to determine the precise degree to which the DNN behaves unacceptably. We propose a novel quantitative DNN analysis framework, QuanDA, which not only checks whether the DNN delineates certain behavior but also provides the estimated probability of the DNN to delineate this particular behavior. Unlike the (few) available quantitative DNN analysis frameworks, QuanDA does not use any implicit assumptions on the probability distribution of the hidden nodes, which enables the framework to propagate close to real probability distributions of the hidden node values to each proceeding DNN layer. Furthermore, our framework leverages CUDA to parallelize the analysis, enabling high-speed GPU implementation for fast analysis. The applicability of the framework is demonstrated using the ACAS Xu benchmark, to provide reachability probability estimates for all network nodes. This paper also provides potential applications of QuanDA for the analysis of DNN safety properties.


INTRODUCTION
Deep neural networks (DNNs) are an actively researched domain of machine learning (ML) and a popular choice in numerous real-world applications due to their ability to perform exemplary classification and decision-making without explicit programming.This is achieved by the network learning input-output relations using sample data (for instance, the training and validation data in the case of supervised learning).The resulting trained networks find use cases in numerous realworld applications, such as smart devices, robotics, smart homes and industries, smart healthcare, and autonomous driving [10,19,26,34].
However, these networks trained on finite datasets often fail to provide correct results in the real-world, with (often) infinite possibilities of inputs.This has been frequently observed in the literature, for instance, in that pertaining to adversarial inputs [24,32], where the DNNs are observed to provide incorrect results for inputs that resemble but are different from those in the available dataset.This is undesirable, of course, particularly for DNNs deployed in safety-critical applications.
This has fueled an interest towards the formal analysis of DNNs in the past two decades.The aim of these qualitative analysis works [5,17,25] is to identify whether certain desirable DNN properties, for instance, the safety and robustness of the trained networks, hold for the trained DNNs.This of course provides a binary result, i.e., they tell whether the properties hold for the DNN or they do not.However, as observed in these works, the DNN properties under observation often do not hold for the network.This, in turn, raises the question of identifying the degree to which the DNN fails to delineate the desired properties.
Recent works explore this problem using quantitative DNN analysis [7,36].They attempt to estimate the probability of properties such as the robustness of the trained network in the presence of small noise around seed inputs.The results, in turn, provide the probability of the computed output remaining within close proximity of the exact output obtained for clean inputs.The notion of probability in these works is considered to be the ratio between the volume of reachable outputs to the volume of the entire valid output domain.However, such a notion of probability implicitly assumes that all regions of the output domain are equally likely, i.e., the network nodes entail uniform distribution for their output values.As will be shown with our detailed analysis and discussion in later sections of the article, this is a simplistic notion of probability, which does not hold for practical DNNs.
In contrast, we propose a novel quantitative DNN analysis framework, QuanDA, which does not make any assumptions on the probability distribution of the output at DNN nodes.However, this is not a straightforward task: even if the probability distribution is assumed to be uniform at the input nodes, the same (i.e., uniform probability distribution) does not hold for the hidden network nodes due to the computations involved in the DNN.To the best of our knowledge, this is the first work estimating realistic probability distribution for all DNN nodes, without any a priori assumptions regarding the probability distribution of the hidden nodes.Apart from the computational challenge of probability distribution estimation, the propagation of estimated distribution along DNN layers poses the additional challenge of high timing overhead.This is tackled by QuanDA via group formation and parallelization leveraging efficient use of both CPU and GPU capabilities.This ultimately provides the precise probability of correct, safe, or robust DNN outputs under bounded input and/or noise.In summary, the contributions of this work (also provided in Figure 1) are as follows: (1) Providing a novel framework, QuanDA, for quantitatively analyzing DNN properties (Section 3) (2) Computing exact reachability bounds for all DNN nodes, at all network layers, hence identifying the subset of real domain involved in network computations and making the DNN analysis tractable (Section 3.1) (3) Adapting stratified sampling and weighted sum models to ensure better precision and coverage of input domain, for realistic probability estimates (Section 3.2) (4) Using statistical methods for realistically propagating and estimating output probabilities of node values at each network layer, without using any assumptions about the probability distributions at the hidden nodes (Section 3.3) (5) Leveraging Hoeffding's inequality to provide the deviation from exact probabilities and precise confidence level for the probability estimates obtained at each DNN layer (Section 3.4) (6) Utilizing QuanDA to check the reachability property of the nodes in the benchmark ACAS Xu networks, also demonstrating the potential application of the framework for the analysis of the safety properties of these networks (Section 4) (7) Using the benchmark networks, demonstrating that the outputs of hidden nodes in the DNNs do not follow any univariate probability distribution, including uniform distribution (Section 4) The rest of the article is organized as follows.Section 2 provides an overview of the general DNN architecture.Section 3 describes in detail our novel QuanDA framework for quantitative DNN analysis.Section 4 presents our experimental setup along with results and findings from the analysis of reachability and safety properties for ACAS Xu networks.Section 5 gives an overview of the other DNN analysis approaches available in the literature, highlighting the unique challenges targeted by QuanDA.Section 6 concludes the article, while highlighting some open challenges for future research.

DEEP NEURAL NETWORKS (DNNS)
DNNs [31] are an interconnection of nodes arranged in layers, as shown in Figure 2. The input is fed to the network via input layer, whereas the classification or network's decision is available at the output layer.Embedded between them are one or more hidden layers.To ensure that the input to a DNN always stays within predefined input bounds, the input is often normalized before being fed to the network: Here, x r aw and x norm are the actual (unprocessed) and normalized inputs, respectively, while μ and σ represent the mean and standard deviation of the input distribution.The inverse process of obtaining unprocessed output from the normalized output is often also used at the output layer.This article considers the DNNs in which the input propagates only in one direction, i.e., from the input to the output layer.Such networks are known as the feed-forward neural networks.If all the nodes in a layer are connected to all those of the preceding layer, the network is called fully connected.
In a fully-connected feed-forward DNN, nodes at each layer of the network are connected to those of another via two major transformations: the affine transform and a non-linear activation.The affine transform is a deterministic function involving the multiplication of inputs x i with the weights of the layer w i j and translating the result by a bias value b j .
Among the most popular non-linear activations in the practical DNNs [11,14] are the rectified linear unit (ReLU) and maxpool/minpool (also shown in Figure 2).As suggested by the name, ReLU works analogous to an electrical rectifier, mapping all negative inputs to zero while mapping the positive inputs to the outputs without alterations.Consider N to be the number of input nodes x in the layer.The output of ReLU can then be expressed mathematically as The maxpool (or minpool) activation function instead maps the maximum (or minimum) value among inputs to the output.3 QUANDA: QUANTITATIVE DEEP NEURAL NETWORK ANALYSIS In general, quantitative analysis is a branch of mathematics leveraging statistical methods to collect, evaluate, and analyze numeric data.For DNNs, this could provide insight into the behavior of trained networks under varying inputs, as well as their precise correctness, safety, and robustness estimates.This section systematically explains our quantitative DNN analysis framework, QuanDA, also shown in Figure 3.The framework accepts a trained DNN and input/noise bounds (dictated by the desired network properties under study) as inputs.These are used for exact bound computations for all DNN nodes across all network layers.The bounds are divided into non-overlapping strata for stratified sampling.Efficient group formation of nodes, for DNN layers with a large number of nodes, is also deployed to reduce the timing overhead of the analysis.This is followed by weighted score computation using weighted sum models (WSMs) for each DNN node.This aids realistic probability estimation using the framework.The probability estimates are propagated sequentially over all network layers.The efficient design of QuanDA enables the parallelization of computations in each layer of the DNN, further improving the timing efficiency.The current version of the framework focuses on reachability and safety properties for the trained DNNs.
Moreover, the user-defined confidence interval and the maximum deviation between exact and estimated probability values provided at the input ensure precise probability estimates.The final result from the framework is probability estimates for the reachability property at all network nodes, and the safety properties at the network's output layer.These are available to users as .npyfiles at the output.

Bound Computation
The input bounds refer to the extrema of the valid input domain of the given DNN.Fundamentally, these are dictated either by the span of data available to the DNN at its input layer or the subset of the input domain of interest for the desired DNN property.For instance, the valid input bounds for a pixel in a grayscale image is [0, 255] and the subset of the distance of interest between two aircraft could be [0, 1000]ft.These are used to compute the output bounds for the layer, which form the input for the subsequent network layer.Without the determination of precise node bounds at each DNN layer, the node values could lie anywhere in the real domain, making the DNN analysis intractable.QuanDA leverages the laws of interval arithmetic [12] to propagate node bounds, starting at the input nodes, to obtain the exact bounds of the output for each node of the trained DNN.This is a useful step in the DNN analysis since it provides the boundaries within which all node values of interest lie.
Consider DNN nodes The affine transformation (discussed earlier in Equation ( 2)) involves the multiplication of the node values with scalar (weight w).Depending on whether the scalar is positive or negative, the bounds of the resulting output [x 1 , x 1 ] = w.[x 1 , x 1 ] can be obtained as follows: The summation of the bounds of the resulting products involves the addition of individual lower and upper bounds: The addition of bias value (scalar) b then translates the entire bounds by the value b: As stated in Equation ( 3), the ReLU activation only alters negative input values while providing only identity mapping for the positive inputs.Hence, the resulting bounds after ReLU activation can be calculated as follows: The bounds computed above are precise and exact, and correspond to the valid output domain for each of the network nodes.(Note that this does not hold for non-piecewise linear activationsthe application of linear arithmetic to which would require approximation of the function [29].This would lead to imprecise but not inaccurate bounds.)

Stratified Sampling
The bounds computed from the previous step are used to form S non-overlapping strata, to enable sampling.For instance, consider again the neuron x 1 with bounds [x 1 , x 1 ].The bounds of the stratum for the node x 1 are as follows: The combination of samples taken from all unique combinations of strata of different input nodes is, in turn, used to compute the output of the nodes in the following layer.This approach of selecting random samples from the individual stratum instead of the entire input nodes' bounds, known as stratified sampling, ensures that the response of output to a wide range of input samples is considered in the analysis.Hence, the sampling provides a "wider coverage" of input bounds and, in turn, provides "better precision" in calculated outputs.The result is the joint score of the input-output node stratum, which will be elaborated further later in the section.
However, to consider the combination of inputs from all strata belonging to all input nodes entails a time complexity of O (S N ), where S is the number of strata for each of the N nodes in a DNN layer.Therefore, QuanDA uses an apt group formation to reduce this timing complexity.
Group Formation.For DNN layers with a large number of nodes (i.e., N ≥ th), the nodes are distributed into G groups comprising N G nodes each.Naturally, this calls for the recomputation of the bounds of output nodes, i.e., nodes of the subsequent layer, with respect to the input nodes in the group (∈ N G ).The affine transformation is hence split into two stages.The first stage involves only the multiplication of nodes in individual groups to their respective weights 95:7 (i.e., x j * = k ∈N G w k j .xk ).The second stage adds the output results from all groups together, along with the corresponding bias value, to obtain the final output for the node (i.e., x j = b j + G x j * ).This reduces the time complexity of the sampling to The sampling approach described so far entails that samples selected from each stratum have an equal probability, hence indicating a uniform distribution.To overrule such an assumption, QuanDA takes inspiration from the weighted sum model (WSM), to incorporate the precise probability of reaching each stratum into the analysis to ensure realistic probability estimates.In general, WSM [1] is a statistical approach to determine the likelihood of individual responses (i.e., outputs) to the various criteria (i.e., input).Each criterion, in turn, has a certain weight of occurrence associated with them.
In QuanDA, we use inspiration from WSMs to obtain the weighted score W Sc for each stratum of the output node given the probabilities of strata of the input node.For instance, let N G x and N G I be the output and input nodes, respectively, for group G of the DNN layer.The W Sc for the strata s of N G x , given N G I , can then be determined as This is also depicted pictorially, using the WSM matrix, in Figure 4.Each cell of the WSM matrix gives the score for reaching the specific output stratum given individual input stratum, i.e., the joint score of input-output node stratum.The probabilities of the individual input node stratum are used as weights for computing W Sc of each stratum of the output node, as shown in the last column of the WSM matrix (in Figure 4).This computation of W Sc (and the subsequent probability estimation described in the next subsection) are carried out layer-wise, with probabilities of nodes from each layer used as weights to compute W Sc at the next layer.
Parallelization.It must be noted that sampling and W Sc computation for an output node are independent of similar computations for all other output nodes in the same network layer.QuanDA leverages this independence in computation by computing results for all output nodes in a layer in parallel.This parallelism is greatly aided by using the parallel processing capabilities of the GPU, hence accelerating the DNN analysis.

Probability Estimation
Depending on whether the nodes in a layer were distributed over G groups or not, each DNN layer computes one or more WSM matrices for each input-output node pair.The probability of the strata s of output node N x can then be estimated (also shown in Figure 4) as where I is the total number of input nodes contributing to the computation of N x .
Correctness Criteria.The correct computation of the probability estimates, as described above, entails that the probabilities of the strata of each DNN node must sum to 1: Probabilistically, reachability can be expressed as the probability of reaching a subset (or strata) Y s of output, given the input [x, x] ∈ χ , i.e., P (Y s |[x, x]).Hence, the probability estimation approach explained earlier provides the probabilistic reachability estimates without any extra computations.Additionally, the layer-wise computation in QuanDA allows not only the ability to obtain reachability results at the output layer but also for all nodes in the hidden layer(s).

Definition 2 (Safety). Given a trained network
QuanDA obtains the probability estimate by modifying the WSM matrices at the DNN's output layer such that each output node comprises two strata, i.e., the one corresponding to the desired (safe) output region and the other corresponding to the unsafe region.The rest of the probability estimation proceeds similarly to the previously explained procedure.The final result is the probability estimate P (Y s |[x, x]), which is the precise estimate for the output to lie in the (safe) desired output region given the input bounds [x, x].

Bounds on Estimate
Similar to all available quantitative analysis frameworks for DNNs [7,36], QuanDA also provides only an estimate for the actual probability of the properties holding true for the trained DNNs.However, unlike the prior efforts, QuanDA tries to ensure that the probability estimates obtained ACM Transactions on Design Automation of Electronic Systems, Vol. 28, No. 6, Article 95.Pub.date: October 2023.
QuanDA: GPU Accelerated Quantitative Deep Neural Network Analysis 95:9 lie within certain predefined (acceptable) bounds.This is to allow more precise and accurate probability estimation.Towards this end, QuanDA leverages upon Hoeffding's inequality [13] to identify the minimum number of iterations (i.e., n) the probability estimation needs to be repeated to ensure that estimated probabilities have converged towards the exact probabilities.The estimates from these n iterations are averaged out to obtain the calculated mean M n , while E (M n ) represent the (exact) expected mean of the probabilities: Based on the above inequality, the maximum deviation between the estimated and exact mean of probabilities remains below t. δ represents the significance level, while 1 − δ is the confidence interval ensuring that the deviation between the exact and estimated probabilities stay less than t.b i and a i represent the upper and lower bounds of the estimates, respectively.Since QuanDA deals with the probability estimates, these bounds can be equated to 1 and 0, respectively, reducing the above inequality to As indicated earlier, the parameters t and δ are user-defined, and can be modified to vary the accuracy and precision of probability estimates.

EXPERIMENTAL EVALUATION
We implemented QuanDA on AMD Ryzen Threadripper 2990WX CPUs and NVIDIA GeForce RTX 2080 Ti GPUs.Each CPU has 32 cores with 64 threads, with the maximum boost clock rate of 4.2GHz.Each GPU, on the other hand, hosts 4352 CUDA cores at 1635MHz GPU boost clock rate, providing it an overall 14.2TFLOPS.This means that the GPU is capable of handling 14.2 trillion floating point computations per second.The GPU hosts 11GB GDDR6 device memory with 616GB/s peak memory bandwidth.The systems use CUDA 11.6 toolkit and run on Ubuntu 18.04LTS.
The framework is written in Python and uses Numba [18] for CUDA GPU programming support.The current version of QuanDA accepts trained DNNs and input bounds in .nnetand .npyformats, respectively.The probability estimates at the output are stored in the .npyformat as well.Figure 5 summarizes the various operations (described in Section 3) and flow of data throughout QuanDA, distinguishing the operations carried out by the CPU and the GPU.

Experimental Setup
We use the aircraft collision avoidance system, ACAS Xu neural networks [23] to provide the quantitative analysis using QuanDA.This is a well-known benchmark opted for both qualitative and quantitative DNN analysis [7,17].The benchmark comprises 45 fully connected feed-forward DNNs with 6 hidden layers, each consisting of 50 nodes.The networks use ReLU activation for all hidden layers.The output layers instead use the minpool activation, i.e., the output node with the minimal value is chosen as the DNNs' decision.Each network accepts five inputs (i 1−5 ), i.e., the distance between ownship and intruder, and the directions and speeds of the aircraft.Likewise, the output layer provides five possible maneuvering decisions (o 1−5 ) by the networks: clear-of-conflict (COC), weak left/right and strong left/right.
The inputs are normalized prior to being fed to the DNNs, and inverse normalization is deployed to the output prior to decision-making.Additionally, we take all input values to be equally likely.However, as stated earlier, we use no assumption for the probability distribution of node values in any of the following DNN layers.In all our experiments, we use 5 strata to distribute the bounds We also show the potential application of QuanDA to estimate the probability of the following safety properties: Property 1 (φ 1 ).If the intruder is far away and flying slower than the ownship (i.e., i 1 ≥ 55947.691∧ i 4 ≥ 1145 ∧ i 5 ≤ 60), the COC output stays within a certain threshold (i.e., o 1 ≤ 1500).Property 2 (φ 2 ).If the intruder is far away and flying slower than the ownship (i.e., i 1 ≥ 55947.691∧ i 4 ≥ 1145 ∧ i 5 ≤ 60), the COC output is not maximal (i.e., max i ∈Z + :i ≤5 (o i ) o 1 ).

Property 3 (φ 3 ). If the intruder is directly ahead and flying slower than the ownship (i.e., 1500 ≤
We use the maximum deviation between exact and estimated probabilities (t) to be 0.05, and a confidence interval (1 − δ ) of 87% for all our experiments.

Results and Discussion
A single iteration of probability estimate of the safety properties (checked at the output nodes) takes ∼4.5min.The reachability probability estimates of the hidden layer nodes naturally have a smaller timing overhead.Compared with a complete CPU implementation (i.e., single-threaded execution without using GPU), this amounts to roughly a ∼3600 times speedup.It is interesting to note that despite all inputs being equally likely (i.e., uniformly distributed inputs), the DNN transformations of even a single layer lead the nodes in the hidden layer to lose the uniformity of the input distribution.In fact, the output nodes do not follow any univariate probability distribution.This could be partially attributed to the affine transformation, in which (scaled) discrete uniform distribution of the multiple input nodes, spanning over varying input bounds, combine to result in a multivariate distribution.Additionally, the consequent application of ReLU activation maps all negative results to zero.The effect of this can be observed clearly for nodes 1, 21, and 41 of the network 5_9 (in the bottom row of graphs in Figure 6), in which the strata containing output zero have probabilities of ∼1.0.
Our observations from the reachability hence confirm that even for applications in which the input is uniformly distributed, it is very unlikely for nodes in consequent layers to follow a uniform distribution.The realistic probability estimation of DNN properties hence requires careful consideration and propagation of the probability distributions of the hidden DNN nodes.
We additionally observe the probability estimates for the ACAS Xu safety properties φ 1−3 using QuanDA.The probability estimates for the aforementioned properties to hold for different ACAS This highlights the potential of the framework for estimating the probability of a diverse range of DNN properties.

RELATED WORK
The formal analysis of trained DNNs has gained popularity, particularly in the past two decades, with the research often focusing on providing qualitative results on the robustness and safety of these networks.Among these works are the pure satisfiability-based approaches analyzing binarized neural network (BNNs), i.e., the DNNs with only ±1 as the parameter values, and DNNs with piece-wise linear activation functions [6,15,21,25,27].The more sophisticated approaches [9,16,17] additionally leverage rules from linear programming for obtaining tighter node bounds and converging faster to a satisfiable solution at the cost of limited completeness of the results [25].Another significant branch of qualitative DNN analysis makes use of linear programming to transform the response of the trained network and properties into an optimization problem [2,3,5,8,20,33], while sometimes trading off some completeness of the analysis.This is because this branch of analysis is able to deal with only linear problems; the non-linearity of activation functions poses a challenge to the analysis.To deal with this challenge, these activation functions are sometimes approximated to linear functions [29,35].
More recently, model checking has been used for DNN analysis [22,28,30].Here, the trained DNN and its properties are first translated into the appropriate syntax of the model checkers.The automated model checking tools are then used to check whether the properties hold for the DNN.
Despite achieving remarkable feats, qualitative analysis efforts indicated earlier are all bounded by the inherent binary nature of the analysis results.This means that these works are able to indicate whether or not the properties hold for the networks.As known from the literature (particularly that pertaining to adversarial examples [32]), the DNNs are often prone to making incorrect classifications/decisions in the presence of even small noise in the input.Hence, it is not entirely insurmountable to identify that a property does not hold for a DNN.
To provide more interesting results, there have recently been some efforts to analyze DNN properties quantitatively [7,36].The idea here is to provide the degree or probability to which the properties hold for the trained networks.However, the notion of probability used in these works is rather simplistic.These works consider the probability for a DNN to reach output Y s to be the ratio of volume covered by the output, i.e., R Y s , over the entire reachable output domain R Y : This implicitly implies that all inputs in the output region Y are equally likely, i.e., the output is uniformly distributed.Naturally, this is an unrealistic assumption since even a mere addition of inputs with uniform distribution leads to a non-uniform output [4].DNNs involve much more complicated computations (as elaborated in Section 2) than mere addition and, hence, could not be expected to maintain uniform distribution.Additionally, as already discussed in Section 3 and observed in Figure 6, the various strata of the node outputs are likely to have different probabilities of occurrence.Hence, without considering the probabilities of the individual stratum of the output nodes, the probability estimates are likely to be imprecise and inaccurate.Moreover, these current works analyze the robustness of 5-25 randomly generated inputs for a few (1-3) of the networks from the ACAS Xu benchmarks, which generally only explore a subset of the valid input domain (significantly smaller than that explored earlier for the reachability and safety properties using QuanDA).

CONCLUSION AND FUTURE WORK
Deep neural networks (DNNs) form an integral part of numerous real-world systems, including safety-critical applications like autonomous driving and avionics.This makes it essential to ensure that these DNNs operate correctly and do not make incorrect decisions after deployment.This has been a motivation for various qualitative formal analysis research projects to ensure that the desired properties of DNNs hold under real-world input scenarios.However, given the vulnerability of trained DNNs to environmental variations, such as small input noise, it is essential (and more meaningful) to identify the degree to which the properties hold for the DNNs.Recent quantitative analysis approaches deal with this challenge by identifying the reachable volume for the trained DNN.The probability is then estimated as the ratio of reachable and total output volume.However, this entails the assumption that the entire output volume has an equal (uniform) probability of occurrence.
This article introduced our novel quantitative neural network analysis framework QuanDA, which estimates the precise probabilities for the outputs of DNN nodes, without relying on any assumption on the probability distribution of network nodes.It makes use of efficient CPU and GPU processing to compute the probability estimates layer-wise with user-defined deviation and confidence intervals, and efficiently parallelize the computations to ensure fast analysis.The framework is used to provide reachability probability estimates of the nodes in the ACAS Xu networks benchmark.We also show the potential application of QuanDA for three safety properties of the indicated networks.
It must be noted that current experiments use fully connected ACAS Xu networks since they are a known benchmark for DNN analysis.However, the framework is written in Python, the most popular programming language for DNNs (for instance, used in platforms such as Tensorflow), and, hence, can be extended to other DNN structures such as convolutional neural networks and recurrent neural networks.Additionally, Hoeffding's inequality allows a trade-off between confidence/deviation between exact and estimated probabilities and the number of repeated experiments.This could be leveraged to improve the scalability of the framework under restrictive timing constraints.Presently, the bounds on probability estimates are limited to individual layers, i.e., given the exact probabilities of one layer, the output of the next layer can be estimated with certain bounds (confidence).The determination of bounds on the probability of the output when only the estimate of probabilities is available at the input is an interesting future direction for the research.

APPENDICES
A COMPUTATIONAL OVERHEAD OF QUANDA This section goes over the computational overhead of all the procedures involved in our Python implementation of QuanDA. 2 Figure 7 summaries the interconnection of procedures pictorially, highlighting the procedures leveraging CPU and GPU computations.Note that the current version of QuanDA supports DNN input in the .nnetformat.Nevertheless, this does not limit the extension of the framework to non-MLP networks since the procedure call for network-specific computations, i.e., affine.py(shown in Figure 7), could be extended with computations (such as a convolution operation) involved in non-MLP and ResNet architectures.
affine.py -The computation essential for computing DNN output given normalized input are performed here.The current version of QuanDA supports addition, multiplication, and ReLU operations.Since these are basic operations (i.e., O (1)) that are performed sequentially (for all feedforward networks), and QuanDA leverages the parallel computations possible with GPU for each layer, the overall procedure has a constant time complexity.The prospective extension of QuanDA to include operations in non-MLP (including ResNet) may increase the complexity of affine.py to the complexity of the DNN operation with worst-case time complexity.
However, it must be noted that affine.py is called L( x ∈L G x ) times, where L is the number of layers in the network and G x s.t .x ≥ 1 is the number of groups in layer x.Hence, the overall time complexity of the procedure for the entire experiment is quadratic.cleanup_nnet.py-This is the last procedure call in QuanDA, which compiles and writes probability results and removes temporary files generated during analysis.Result compilation and writing involves looping over network nodes, resulting in a linear time complexity for the procedure.
constrained_nnet.py -This is the main procedure executing stratisfied sampling.Random samples are picked from the computed strata.Since each random input selection is independent, the sample generation can be performed in parallel.
counter.py -Weighted score is computed with this procedure call.Again, the matching of outputs to the corresponding output stratum is independent, hence, enabling parallelism.This leads to quadratic time complexity (with the current version of QuanDA).
hoef_size.py-The minimum number of experiments required to reach the desired confidence in the probability estimate is computed (as per Equation ( 14)) here.Since the computation involves only basic operations, the complexity of the procedure is constant.input_parameters_nnet.py-Here, the input parameters -mean, standard deviation, and lower and upper bounds -are extracted from the input DNN (.nnet) and input bound files.Since the current version of QuanDA parses through each character of every line of the file, the procedure also has a quadratic time complexity.normalize.py-Normalization of inputs and inverse normalization of the network outputs are performed with this procedure call.The operations employ the use of only basic operations while through the parallelism provided by GPU, the normalization/inverse normalization of all nodes can be done simultaneously.Hence, the procedure call has constant time complexity.output_parameters_nnet.py -Again, similar to input and network parameters extraction indicated earlier, the output parameters, i.e., output mean and standard deviation, are also obtained via parsing over the input DNN file.This ultimately leads to a similar time complexity, i.e., quadratic, for the procedure.
Note that even though all input, network, and output parameters can be extracted in the same procedure call, we opt to ensure the modularity of the framework procedure calls.In terms of time complexity, this does not have a negative impact on the overall complexity of QuanDA, since the complexity of the sequential call of quadratic procedures is still quadratic.probab_estimate_*_nnet.py -Here, the reachability probability is estimated using the weighted scores.In the worst case, i.e., when nodes of the layer are split into groups, the groups' bounds also need to be computed.The worst case complexity of the procedure is, hence, in the order of Exp (4).
prop_nnet.py-The current version of QuanDA deals with safety (in addition to reachability) probability estimation, which is carried out with this procedure call.The procedure is called only for the final DNN layer and has a linear time complexity.
In addition to the time complexity of procedure calls stated above, the overall complexity of the framework increases by a deдree o f 3 on account of (i) loops catering groups (for layers with large number of nodes), (ii) layers in the DNN, and (iii) repetition of experiments to ensure confidence in the estimates.However, the computations involved do not lead to an exponential time complexity for QuanDA for any size or architecture of the DNN analyzed.
As highlighted in Section 4, all our experiments were performed using NVIDIA GeForce RTX 2080 Ti GPUs (and AMD Ryzen Threadripper 2990WX CPUs), where each GPU has 4352 CUDA cores.Hence, the operations optimally leveraging the GPU are likely to compute faster than a CPU with a single-threaded instruction set by an order of magnitude in the thousands.For operations involving large memory overhead, the GPU may need to split the tasks for optimal memory management.Yet, the GPU implementation is still likely to have a significantly smaller timing overhead than the CPU implementation.As already indicated in Section 4.2, the CPU implementation was already infeasible due to large timing overhead while analyzing only a single DNN layer.

B PROBABILITY RESULTS OVER n ITERATIONS
Computing probability estimates for the output of DNN nodes is a complex problem; the statistical methods intended to aid with the computation inadvertently face challenges such as limited precision and sampling errors.This can be observed in the variance of probability estimates for properties φ1 − 3 for the ACAS Xu networks over 550 experiments (i.e., probability estimate computations) in Figures 8-10.
Hoeffding's inequality provides statistical guarantees that the probability estimates do not deviate more than t from the exact probability values.More precisely, these concentration inequalities provide the minimum number n to repeat the experiments to ensure more accurate results.This suggests that repeating experiments allow the probability estimates to converge to exact output probability.This can in fact be observed in

C PROBABILITY ESTIMATES FOR REACHABILITY OF NODES IN ACAS XU NETWORKS
We ran all experiments for 550 iterations.The reachability estimates for selected hidden nodes of one of the ACAS Xu networks, for input bounds determined by all properties provided in Section 4, are provided in Table 2.The detailed results for all 300 hidden nodes from all 45 ACAS Xu networks are available at https://owncloud.tuwien.ac.at/index.php/s/hthDlBp1Rm5ldFc.

3 Fig. 1 .
Fig. 1.Overview of QuanDA, along with novel contributions of the work (highlighted in yellow).

Fig. 3 .
Fig. 3. Detailed overview of QuanDA: the probability estimates (reachability and safety estimates) for the trained network are determined quantitatively.

Fig. 4 .
Fig. 4. Top: WSM matrix between nodes N I and N x ; (Bottom): Computation of probability estimate, using WSM matrices of all groups in the DNN layer.Definition 1 (Reachability).Given a trained network ℵ : χ → Y, the subset Y s ⊆ Y of the output is said to be reachable iff there exists an input region [x, x] ∈ χ that leads to the outputs y ∈ Y s .

Fig. 5 .
Fig. 5. Overview of the Python-based QuanDA toolchain, using trained DNN and input bounds, to generate the probability estimates for the desired DNN property.

11 Fig. 6 .
Fig.6.The probability estimates for 5 randomly selected DNN nodes from the first hidden layer of different DNNs using input bounds from different properties.As expected, the nodes do not follow any univariate probability distribution.

Figure 6
Figure6shows the box and whisker plot of the output from randomly selected DNNs nodes from the first hidden layer of two of the ACAS Xu networks.The output of the nodes using different properties (and, hence, different input bounds) are shown.The graphs provide the reachability probability estimates P (Y s |[x, x]) for the randomly selected nodes.The reachability results from the remaining nodes show a similar diversity of probability estimates over the different node strata.It is interesting to note that despite all inputs being equally likely (i.e., uniformly distributed inputs), the DNN transformations of even a single layer lead the nodes in the hidden layer to lose the uniformity of the input distribution.In fact, the output nodes do not follow any univariate probability distribution.This could be partially attributed to the affine transformation, in which (scaled) discrete uniform distribution of the multiple input nodes, spanning over varying input bounds, combine to result in a multivariate distribution.Additionally, the consequent application of ReLU activation maps all negative results to zero.The effect of this can be observed clearly for nodes 1, 21, and 41 of the network 5_9 (in the bottom row of graphs in Figure6), in which the strata containing output zero have probabilities of ∼1.0.Our observations from the reachability hence confirm that even for applications in which the input is uniformly distributed, it is very unlikely for nodes in consequent layers to follow a uniform distribution.The realistic probability estimation of DNN properties hence requires careful consideration and propagation of the probability distributions of the hidden DNN nodes.We additionally observe the probability estimates for the ACAS Xu safety properties φ 1−3 using QuanDA.The probability estimates for the aforementioned properties to hold for different ACAS

Fig. 7 .
Fig. 7. Summary of the interconnections between the various procedures run under QuanDA.
, in which the running average of probability estimates from individual experiments converge with the increasing number of experiment iterations for the ACAS Xu networks.The figures also indicate the deviation of estimates from exact values (t) and the confidence level (1 − δ ) at different iterations.

Fig. 8 .
Fig. 8. Box plot indicating the spread of the probability estimates of the ACAS Xu networks, over multiple experiments, for φ 1 .The plots for the remaining networks follow a similar pattern.

Fig. 9 .
Fig. 9. Box plot indicating the spread of the probability estimates of the ACAS Xu networks, over multiple experiments, for φ 2 .The plots for the remaining networks follow a similar pattern.

Fig. 10 .
Fig. 10.Box plot indicating the spread of the probability estimates of the ACAS Xu networks, over multiple experiments, for φ 3 .The plots for the remaining networks follow a similar pattern.

Fig. 11 .
Fig. 11.Running average of the estimated probability for the ACAS Xu networks, for φ 1 .The estimates coverage to exact probability values as the number of experiments increases (as dictated by Hoeffding's inequality).The plots for the remaining networks follow a similar pattern.

Fig. 12 .
Fig. 12.Running average of the estimated probability for the ACAS Xu networks, for φ 2 .The estimates coverage to exact probability values as the number of experiments increases (as dictated by Hoeffding's inequality).The plots for the remaining networks follow a similar pattern.

Fig. 13 .
Fig. 13.Running average of the estimated probability for the ACAS Xu networks, for φ 3 .The estimates coverage to exact probability values as the number of experiments increases (as dictated by Hoeffding's inequality).The plots for the remaining networks follow a similar pattern.

Table 1 .
Probability Estimates for the Safety Properties φ 1−3 , using Deviation t = 0.05 and Confidence 1 − δ = 87% Xu networks are indicated in Table1.The additional results for the convergence of probability estimates with increasing confidence are provided in Appendix B.Even though the bounds on estimates are not available for the final DNN output (the bounds provided by Hoeffding's inequality hold for individual layers), QuanDA still provides quantitative probability estimates.

Table 2 .
Probability Estimates for Randomly Selected Nodes of ACAS Xu 3_5, Hidden Layer 3