PolyARBerNN: A Neural Network Guided Solver and Optimizer for Bounded Polynomial Inequalities

Constraints solvers play a significant role in the analysis, synthesis, and formal verification of complex embedded and cyber-physical systems. In this paper, we study the problem of designing a scalable constraints solver for an important class of constraints named polynomial constraint inequalities (also known as non-linear real arithmetic theory). In this paper, we introduce a solver named PolyARBerNN that uses convex polynomials as abstractions for highly nonlinear polynomials. Such abstractions were previously shown to be powerful to prune the search space and restrict the usage of sound and complete solvers to small search spaces. Compared with the previous efforts on using convex abstractions, PolyARBerNN provides three main contributions namely (i) a neural network guided abstraction refinement procedure that helps selecting the right abstraction out of a set of pre-defined abstractions, (ii) a Bernstein polynomial-based search space pruning mechanism that can be used to compute tight estimates of the polynomial maximum and minimum values which can be used as an additional abstraction of the polynomials, and (iii) an optimizer that transforms polynomial objective functions into polynomial constraints (on the gradient of the objective function) whose solutions are guaranteed to be close to the global optima. These enhancements together allowed the PolyARBerNN solver to solve complex instances and scales more favorably compared to the state-of-art non-linear real arithmetic solvers while maintaining the soundness and completeness of the resulting solver. In particular, our test benches show that PolyARBerNN achieved 100X speedup compared with Z3 8.9, Yices 2.6, and NASALib (a solver that uses Bernstein expansion to solve multivariate polynomial constraints) on a variety of standard test benches.

In this paper, we will focus on the class of general multivariate polynomial constraints (also known as nonlinear real arithmetic).Multivariate polynomial constraints appear naturally in the design, synthesis, and verification of these systems.It is not then surprising that the amount of attention given to this problem in the last decade, as evidenced by the amount of off-the-self solvers that are designed to solve feasibility and optimization problems over general multivariate polynomial constraints, including Z3 [12], Coq [33], Yices [15], PVS [37], Cplex, [34], CVXOPT [50], and Quadprog [26].
Regardless of their prevalence in several synthesis and verification problems, well-known algorithms-that are capable of solving a set of polynomial constraints-are shown to be doubly exponential [17], placing a significant challenge to design efficient solvers for such problems.first-order theorem proving [28], higher-order theorem proving [29], and Boolean satisfiability (SAT) problems [45].
While several of these solvers sacrifice either soundness or correctness guarantees, we are interested in this paper on using such empirically powerful NNs to design a sound and complete solver for nonlinear real arithmetic.
In addition to NNs, polynomials constitute a rich class of functions for which several approximators have been studied.Two of the most famous approximators for polynomials are Taylor approximation and Bernstein polynomials.These two approximators have been successfully used in solvers like Coq and PVS [7,37].This opens the question of how to combine all those approximation techniques, i.e., NNs, Taylor, and Bernstein approximations, to come up with a scalable solver that can reason about general multivariate polynomial constraints.We introduce PolyARBerNN, a novel sound and complete solver for polynomial constraints that combines these three function approximators (NNs, Taylor, and Bernstein) to prune the search space and produce small enough instances in which existing sound and complete solvers (based on the well-known Cylindrical Algebraic Decomposition algorithm) can easily reason about.In general, we provide the following contributions: • We introduced a novel NN-guided abstraction refinement process in which a NN is used to guide the use of Taylor approximations to find a solution or prune the search space.We analyzed the theoretical characteristics of such a NN and provided empirical evidence on the generalizability of the trained NN in terms of its ability to guide the abstraction refinement process for unseen polynomials with various numbers of variables and orders.
• We complement the NN-guided abstraction refinement with a state-space pruning phase using Bernstein approximations that accelerates the process of removing portions of the state space in which the sign of the polynomial does not change.
• We validated our approach by first comparing the scalability of the proposed PolyARBerNN solver with respect to PVS, a library that uses Bernstein expansion to solve polynomial constraints.Second, we compared the execution times of the proposed tool with the latest versions of the state-of-the-art nonlinear arithmetic solvers, such as Z3 8.9, Yices 2.6 by varying the order, the number of variables, and the number of the polynomial constraints for instances when a solution exists and when a solution does not exist.We also compared the scalability of the solver against Z3 8.9 and Yices 2.9 on the problem of synthesizing a controller for a cyber-physical system.
• We proposed PolyAROpt, an optimizer that uses PolyARBerNN to solve constrained multivariate polynomial optimization problems.Our theoretical analysis shows that PolyAROpt is capable of providing solutions that are  close to the global optima (for any  > 0 chosen by the user).Numerical results show that PolyAROpt solves high-dimensional and high-order optimization problems with high speed compared to the built-in optimizer in Z3 8.9 solver.We also validated the effectiveness of PolyAROpt on the problem of computing the reachable sets of polynomial dynamical systems.
Related work: Cylindrical algebraic decomposition (CAD) was introduced by Collins [11] in 1975 and is considered to be the first algorithm to effectively solve general polynomial inequality constraints.Several improvements were introduced across the years to reduce the high time complexity of the CAD algorithm [8,27,36].Although the CAD algorithm is sound and complete, it scales poorly with the number of polynomial constraints and their order.Other techniques to solve general polynomial inequality constraints include the use of transformations and approximations to scale the computations.For instance, the authors in [37] incorporated Bernstein polynomials in the Prototype Verification System (PVS) theorem prover; these developments are publicly available in the NASA PVS Library.The library uses the range enclosure propriety of Bernstein polynomials to solve quantified polynomial inequalities.However, the library is not complete for non-strict inequalities [37] and is not practical for higher dimensional polynomials.Another line of work that is related to our work is the use of machine learning to solve combinatorial problems [25,45].In particular, the authors in [25] proposed a graph convolutional neural network (GCNN) to learn heuristics that can accelerate mixed-integer linear programming (MILP) solvers.Similarly, the NeuroSAT solver [45] uses a message-passing neural network (MPNN) to solve Boolean SAT problems.The authors of [45] showed that NeuroSAT generalizes to novel distributions after training only on random SAT problems.Nevertheless, NeuroSAT is not competitive with state-of-art SAT solvers and it does not have a correctness guarantee.

PROBLEM FORMULATION
Notation: We use the symbols N and R to denote the set of natural and real numbers, respectively.We denote by are the lower and upper bounds of the hyperrectangle, respectively.For a real-valued vector  =  1 ,  2 , • • • ,   ∈ R  and an index-vector we use the following notation throughout this paper: , and . A real-valued multivariate polynomial  : R  → R is defined as: where  = ( 1 ,  2 , . . .,   ) is the maximum degree of   for all  = 1, . . ., .We denote by   = ( (0,0,...,0) , . . .,  ( 1 , 2 ,...,  ) ) the vector of all the coefficients of polynomial .We denote the space of multivariate polynomials with coefficients in Given a real-valued function  : R  → R, we denote by  − 0 ( ) and  + 0 ( ) the zero sublevel and zero superlevel sets of f, i.e.,: Finally, a function  : R  → R  is called Lipschitz continuous if there exists a positive real constant   ≥ 0 such that, for all  1 ∈ R  and  2 ∈ R  , the following holds: Main Problem: In this paper, we focus on two problems namely (Problem 1) the feasibility problems that involve multiple polynomial inequality constraints with input ranges confined within closed hyperrectangles and (Problem 2) the constrained optimization problem which aims to maximize (or minimize) a polynomial objective function subject to other polynomial inequality constraints and input range constraints.
∃ ∈   (, ) such that: ≥ 0 and   () = 0 can be encoded using the constraints above.Similarly, given a polynomial objective function

CONVEX ABSTRACTION REFINEMENT: BENEFITS AND DRAWBACKS
In this section, we overview our previously reported framework for using convex abstraction refinement process introduced in [53] along with some drawbacks that motivate the need for the proposed framework.

Overview of Convex Abstraction Refinement
Sound and complete algorithms that solve Problem 1 are known to be doubly exponential in  with a total running time that it is bounded by

𝑛
[17], where  is the maximum degree among the polynomials  1 , . . .,   .Since the complexity of the problem grows exponentially, it is useful to remove (or prune) subsets of the search space in which the solution is guaranteed not to exist.Since Problem 1 asks for an  in R  for which all the polynomials are negative, a solution does not exist in subsets of R  at which one of the polynomials   is always positive (i.e.,  + 0 (  )).In the same way, finding regions of the input space for which some of the polynomials are negative  − 0 (  ) helps with finding the solution faster.
To find subsets of  + 0 (  ) and  − 0 (  ) efficiently, the use of "convex abstractions" of the polynomials was previously proposed by the authors in [53].Starting from a polynomial   () ∈ R[] and a hyperrectangle   ⊂ R  , the framework in [53]   () can be computed efficiently using convex programming tools.By iteratively refining these upper and lower convex approximations, the framework in [53] was able to rapidly prune the search space until regions with relatively small volumes are identified, at which sound and complete tools such as Z3 8.9 and Yices 2.6 (which are based on the Cylindrical Algebraic Decomposition algorithm) are used to search these small regions, efficiently, to find a solution.It is important to notice that these solvers (especially Yices) are optimized for the cases when the search space is a bounded hyperrectangle.Fig. 1.Exemplary cases where abstracting higher order polynomial (black curves) using convex approximations fails to provide helpful information: Top-Left: under-approximation (green curve) is entirely negative and hence fails to identify any subsets of  + 0 ().Top-Right: over-approximation (red curve) is entirely positive and hence fails to identify subsets of  − 0 ().Bottom: under/over approximations failed to identify polynomials that are consistently positive (left) or negative (right).

Drawbacks of Convex Abstraction Refinement
Although the prescribed convex abstraction refinement process was shown to provide several orders of magnitude speedup compared to the state-of-the-art [53], it adds unnecessary overhead in certain situations.In particular, and as shown in Figure 1, the quadratic abstractions     () and     () may fail to identify meaningful subsets of  − 0 (  ) and  + 0 (  ).One needs to split the input region to tighten the over-/under-approximation in such cases.Indeed, applying the convex abstraction refinement process, above, may lead to several unnecessary over-approximations or underapproximations until a tight one that prunes the search space is found.These drawbacks call for a methodology that is capable of: (1) Guiding the abstraction refinement process: To reduce the number of unnecessary computations of over/under approximations, one needs a heuristic that guides the convex abstraction refinement process.In particular, such a heuristic needs to consider the properties of the polynomials and the input region to estimate the volume of the sets that the convex under/over-approximation will identify.
(2) Alternative Abstraction: As shown in Figure 1 (bottom), abstracting high-order polynomials using convex ones may fail to identify easy cases when the polynomial is strictly positive or negative.Therefore, it is beneficial to use alternative ways to abstract high-order polynomials that can augment the convex abstractions.
Designing a strategy that addresses the two requirements above is the main topic for the following two sections.

NEURAL NETWORK GUIDED CONVEX ABSTRACTION REFINEMENT
In this section, we are interested in designing and training a Neural Network (NN) that can be used to guide the abstraction refinement process.Such NN can be used as an oracle by the solver to estimate the volume of the zero super/sub-level sets (for each polynomial) within a given region   (, ) and select the best approximation strategy out of three possibilities namely: (i) apply convex under-approximation, (ii) apply convex over-approximation, and (iii) split the region to allow for finer approximations in the subsequent iterations of the solver.In this section, we aim to develop a scientific methodology that can guide the design of such NN.
4.1 On the relation between the NN architecture and the characteristics of the polynomials: In this subsection, we aim to understand how the properties of the polynomials affect the design of the NN.We start by reviewing the following result from the machine learning literature: Theorem 1 (Theorem 1.1 [46]).There exists a Rectifier Linear Unit (ReLU)-based neural network  that can estimate a continuous function  such that the estimation error is bounded by: where  , ,  are the neural network depth, the neural network width, and the number of neural network inputs, respectively, and   is the Lipschitz constant of the function  .Moreover, this bound is nearly tight.
The above result can be interpreted as follows.The depth  and width  of a neural network depend on the rate of change of the underlying function (captured by its Lipschitz constant   ).That is, if we use a NN to estimate a function with a high   , then one needs to increase the depth  and width  of the NN to achieve an acceptable estimation error.
Now we aim to connect the result above with the characteristics of the polynomials.To that end, we recall the definition of "condition numbers" of a polynomial [18]: and a root  0 of , the quantity    ( 0 ) is called the condition number for the root  0 .The condition number characterizes the sensitivity of the root  0 to a perturbation of the coefficients   .That is, if we allow a random perturbation of a fixed relative magnitude  = |     | in each coefficient   in   , then the magnitude of the maximum displacement  0 of a root  0 is bounded as: | 0 | ≤    ( 0 ) .For a polynomial with multiple roots, then we define the condition number of the polynomial    as the largest    ( 0 ) among all roots, i.e., We are now ready to present our first theoretical result that connects the condition number of polynomials to the NN architecture.As stated before, we are interested in designing an NN that can estimate the zero sub/super level volume set within a given region.We show that the larger the condition number, the larger the neural network depth and width, as captured by the following result.Theorem 2. Given a polynomial  with coefficients   and a region   (, ).There exists a neural network   (  ,   ) that estimates the volume of zero sub/super level sets from the polynomial coefficients   .The Lipschitz constant of this   (  ,   ), denoted by    is bounded by O (       ) where    is the condition number of the polynomial ,   =  (,   ) and   is the number of roots of the polynomial .
To prove the result, we will proceed with an existential argument.We will show that a NN that matches the properties above exists without constructing such a NN.As shown in Figure 2, the neural network   (  ,   ) consists of multiple sub-neural networks.In particular, the first sub-neural network     → 0 computes all the roots  0 = ( 1 0 , . . .,    0 ) of the polynomial (where   is the number of roots) from the coefficients   , i.e.: (2) Note that     → 0 does not depend on the region   and hence the roots  0 may not lie inside the region   .Moreover, Theorem 2 asks for a NN that estimates the volume of the zero sub/super level sets and not the location of the roots.To that end, our strategy is to split the region   into sub-regions of fixed volume and check if a root lies within each of these sub-regions.If a sub-region does not have a root (i.e., there is no zero crossing inside this sub-region) and the evaluation of the polynomial at any point in this region turns to be positive, then this sub-region belongs to the super level set of  and similarly for the sublevel set of .By counting the number of the sub-regions with no zero crossings and multiplying this count by the volume of these sub-regions, we can provide an estimate of the sub/super level sets.
Such a process can be performed using the following three sub-neural networks: • The sub-neural network     →   splits the region   into  sub-regions  1  , . . .,    and return the bounds of the th sub-region, i.e.: • The sub-neural network    0 →  checks the location of the roots ( 1 0 , . . .,    0 ) and returns a binary indicator variable   that indicates whether a zero-crossing takes place within the th sub-region or whether the polynomial is always positive/negative within the th sub-region, i.e.: • The final output   (  ,   ) is computed using the sub-neural network   → + / − which counts the number of regions that has no zero-crossing (using the indicators  1 , . . .,   ) and compute the estimate of the zero sub/super level sets, i.e.: The Lipschitz constants of these sub-neural networks are captured by the following four propositions whose proof can be found in the appendix.Proof of Theorem 2. Consider the NN shown in Figure 2 and defined using equations ( 2)- (5).To bound the Lipschitz constant of   (  ,   ), we consider two sets of inputs (  ,   ) and ( ′  ,  ′  ) as follows: = O (1) where (7) where (10) follows from Proposition 3; (11) follows from Propositions 2 and 1 along with the definition of   = max(,   ).
Substituting (12) in (8) and noticing that  is a constant that does not depend on  yields: The Bernstein representation is known to be the most robust representation of polynomials which is captured by the next result [18].
Theorems 1-3 point to the optimal way of designing the targeted neural network.Such a neural network needs to take as input the Bernstein coefficients   instead of the power basis coefficients   .To validate this conclusion, we report empirical evidence in Table 1.In this numerical experiment, we trained two neural networks with the same exact architecture, using the same exact number of data points, and both networks have the same number of inputs.
Both neural networks are trained to estimate whether a zero-crossing occurs in a region (recall from our analysis in Theorem 2 that the Lipschitz constant of this NN is equal to the condition number of the polynomial).The only difference is that one neural network is trained using power basis coefficients   (column 3 of Table 1) while the second is trained using Bernstein basis coefficients   (column 5 of Table 1).The coefficients are randomly generated via a uniform distribution between −0.1 and 0.1, i.e., U (−0.1, 0.1).We generated 40000 training samples and 10000 validation samples for both bases.We evaluate the trained NN on three different benchmarks for the two bases.Each evaluation benchmark has 10000 samples.The results are summarized in Table 1.As it can be seen from Table 1, the NN trained with Bernstein coefficients generalizes better than the NN trained with power basis coefficients as reflected by the empirical "Accuracy" during evaluation.This empirical evidence matches our analysis in Theorem 2 along with the insights of Theorem 1 and Theorem 3.

TAMING THE COMPLEXITY OF COMPUTING BERNSTEIN COEFFICIENTS
In section 4, we concluded that Bernstein's representation has a smaller condition compared to other representations, which helps build a more efficient NN.Nevertheless, computing this representation adds a significant overhead even by using the most efficient algorithms to calculate these coefficients [42,48].For example, computing all the Bernstein coefficients of a 6 ℎ -dimensional polynomial with 7 ℎ order using Matrix method and Garloff's methods [42,48] require 1.107 and 7.106 summation and multiplication operations [42].To exacerbate the problem, the Bernstein coefficients depend on the region   and need to be recomputed in every iteration of the abstraction refinement process.Reducing such overhead is the main focus of this section.

Range Enclosure Property of Bernstein polynomials
Given a multivariate polynomial  () that is defined over the -dimensional box   (, ), we can bound the range of  () over   (, ) using the range enclosure property of Bernstein polynomials as follows: Theorem 4 (Theorem 2 [22]).Let  be a multivariate polynomial of degree  over the -dimensional box   (, ) with Bernstein coefficients  , , 0 ≤  ≤ .Then, for all  ∈   , the following inequality holds: The traditional approach to computing the range enclosure of  is to compute all the Bernstein coefficients of  to determine their minimum and maximum [23,24,54].However, computing all the coefficients has a complexity of O ((  + 1)  ), where   = max which does not depend on the dimension .

Zero Crossing Estimation using only a few Bernstein Coefficients
Now we discuss how to use the range enclosure property above to reduce the number of computed Bernstein coefficients.
First, we note that the zero crossing of a polynomial  in a given input region   depends on its estimate range given by  , and  , .More specifically, if  , > 0 ( , < 0), then the entire polynomial is positive (negative), which means that there is no zero-crossing.If  , and  , have different signs, and because of the estimation error of these bounds, the polynomial  may still be positive, negative, or have a zero crossing in the region.In this case, we need additional information such as the bounds of the gradient of the polynomial  within the input region, that are given by  ∇, and  ∇, (which can be computed efficiently thanks to the fact that gradients of polynomials are polynomials themselves).
Such additional information about the worst-case gradient of the polynomial leads to a natural estimate of whether a zero crossing occurs in a region.
Due to space constraints, we omit the analysis of bounding the estimation error introduced by relying only on the maximum and minimum of the polynomial  , and  , along with the maximum and minimum of the gradient  ∇, and  ∇, .Instead, we support our claim using the empirical evidence shown in Table 1.Using the same benchmarks used in Section 4.2, we train a third neural network that takes as input only the four inputs  , ,  , ,  ∇, ,  ∇, and compare its generalization performance (column 7 of Table 1).As shown in the table, the third neural network sacrifices some accuracy compared to the ones that use all Bernstein coefficients.But on the other side, it reduces the overhead to compute the Bernstein coefficients by order of magnitude as can be seen by comparing the "execution overhead" reported in columns 4, 6, and 8 for the power basis, the Bernstein basis, and the reduced Bernstein basis, respectively.

Search Space pruning using Bernstein Coefficients
The range enclosure property and the discussion above open the door for a natural solution of the "alternative abstraction" problem mentioned in Section 3.2.The maximum and minimum Bernstein coefficients can be used as an abstraction (in addition to convex upper and lower bounds) of high-order polynomials.Such abstractions can be refined with every iteration of the solver.They can be used to identify portions of the search space for which one of the polynomials is guaranteed to be positive (and hence a solution does not exist).More details about integrating this abstraction and the convex abstraction are given in the implementation section below.

ALGORITHM ARCHITECTURE AND IMPLEMENTATION DETAILS
In this section, we describe the implementation details of our solver PolyARBerNN.As a pre-processing step, the tool divides the input region   into several regions such that each one is an orthant.This allows the tool to process each orthant in parallel or sequentially.The tool keeps track of all regions for which the sign of a polynomial is not fixed.
These regions are called ambiguous regions, and they are stored in a list called .As long as the volume of the regions in this list is larger than a user-defined threshold , then our tool will continuously use abstractions to identify portions in which one of the polynomials is always positive (and hence removed from the search space) or negative (and hence the tool will give higher priority for this region).The abstraction refinement is iteratively applied in Lines 5-17 of Algorithm 1.In each abstraction refinement step, the tool picks a polynomial  and a region  based on several heuristics (Lines 6-7).In lines 8-14, we compute the maximum/minimum Bernstein coefficients followed by checking the sign of the polynomial within this region.Suppose the Bernstein coefficients indicate that the polynomial is always positive in this region.In that case, this provides a guarantee that a solution does not exist in this region (recall that Problem 1 searches for a point where all polynomials are negative).Similarly, if the polynomial is always negative, then it will be added to the list of negative regions.For those polynomials for which the Bernstein abstraction failed to identify their signs, we query the trained neural network to estimate the best convex abstraction possible (Lines 15-16).
Based on the neural network suggestion, we use the PolyAR tool [53] to compute the convex abstraction (Line 17), which returns portions of this region that are guaranteed to belong to the zero sublevel set  − 0 (), those who belong to the zero superlevel set  + 0 (), and those remain ambiguous  +/− 0 ().The process of using Bernstein abstraction and the convex abstraction (which is guided by the trained neural network) continues until all remaining ambiguous regions are smaller than a user-defined threshold  in which case it will be processed in parallel using a sound and complete tool that implements Cylindrical Algebraic Decomposition (CAD) such as Z3 and Yices (line 28 in Algorithm 1).
The neural network itself is trained using randomly generated, quadratic, two-dimensional polynomials where the coefficients follow a uniform distribution between −1 and 1.For each randomly generated polynomial, we used PolyAR to compute the volumes of the  + 0 (),  − 0 (),  +/− 0 () regions.We use a fully connected NN that contains an input layer, three hidden layers, and one output layer (shown in Figure 3).The input layer has four neurons, the hidden layers have 40 neurons each, and the output layer has three neurons.We use a dropout of probability 0.5 in the first and second hidden layers to avoid overfitting.We use the ReLU activation function for all the hidden layers and the Softmax activation function for the output layer.We use Adam as an optimizer and cross-entropy as a loss function.Although  := Remove_Ambiguous_Region_From_List ()  Sol := CAD_Solver_Parallel (,  1 , . . .,   ) 29: end if the neural network is trained on simple quadratic two-dimensional polynomials, we observed it generalizes well to higher-order polynomials with several variables.This will become apparent during the numerical evaluation in which polynomials of different orders and several variables will be used to evaluate the tool.
Correctness Guarantees: We conclude our discussion with the following result which captures the correctness guarantees of the proposed tool: Theorem 5.The PolyARBerNN solver is sound and complete.
Proof.This result follows from the fact that search space is pruned using sound abstractions (convex upper bounds or Bernstein-based).The neural network and the convex lower bound polynomials are just used as heuristics to guide the refinement process.Finally, CAD-based algorithms (which are sound and complete) are used to process the portions of the search space which are not pruned by the abstraction refinement.□

GENERALIZATION TO POLYNOMIAL OPTIMIZATION PROBLEMS:
In this section, we focus on providing a solution to Problem 2. Our approach is to turn the optimization problem (Problem 2) into a feasibility problem (Problem 1).First, we recall that the gradient of , ∇ = [ where    is the partial derivative of  with respect to   , is a vector of  polynomials.The optimal value of  occurs either (i) when the vector of partial derivatives are all equal to zero or (ii) at the boundaries of the input region.
To find the critical points  * of  where ∇ ( * ) = 0, we add the  polynomial constraints    ≤ 0, 1 ≤  ≤  and −    ≤ 0, 1 ≤  ≤  to the constraints of the optimization problem.Now, we modify the PolyARBernNN solver to output all possible regions in which all the constraints are satisfied.This can be easily computed by taking the intersections within the regions stored in the data structure   in Algorithm 1.These regions enjoy the property that all points in these regions are critical points of .In addition, we modify PolyARBernNN to output all the remaining ambiguous regions whose volumes are smaller than the user-specified threshold  and for which the CAD-based solvers returned a solution.These regions enjoy the property that there exists a point inside these regions which is a critical point.These modifications are captured in (Line 2 of Algorithm 2).
Since the minimum/maximum of  may occur at the boundaries of the region   ,  , our solver samples from the boundaries of the region   ,  (Line 4 in Algorithm 2).The solver uses 2 √ () 1/ as sampling distance between two successive boundary samples-recall  is a user-defined parameter and was used in Algorithm 1 as a threshold on the refinement process.Finally, we evaluate the polynomial  in the obtained samples and we take the minimum and maximum over the obtained values (Line 7 in Algorithm 2).All the details can be found in Algorithm 2.
We conclude our discussion with the following result which captures the error between the solutions provided by PolyAROpt and the global optima.Theorem 6.Let  *  and  *  be the global optimal points for the solution of Problem 2. The solution obtained by Algorithm 2, denoted by p and p satisfies the following: where   is the Lipschitz constant of the polynomial  and  > 0 is a user-defined error.II.
where the coefficients  1 , . . .,  6 follow a uniform distribution between −1 and 1.The random generated polynomials are defined over the domain . For each randomly generated polynomial, we perform the abstraction refinement on the domain  2 , iteratively.In every iteration, we perform under-approximation, over-approximation of the original polynomial over a selected ambiguous region, and a split of the ambiguous region.Next, we compute the volume of the remaining ambiguous region after each action was implemented.The labels are a one-hot vector of dimension three where each component represents the action that leads to the maximum reduction in the volume of the ambiguous region, either under-approximation, over-approximation or divide the region into two regions.We ran the abstraction refinement process on all the generated polynomials to collect the data     , ,     , ,   ∇  , ,   ∇  , , where  denote the index of the sample.We generate 50000 samples for training, 10000 samples for validation, and 10000 for testing.

Data Normalization.
In the literature of NN [31], it is important to normalize the data when the data vary across a wide range of values.This normalization leads to faster training and improves the generalization performance of the NN [31].Therefore, we normalize all the input data to a zero mean and unit variance by adopting a simple affine transformation data_sample ← data_sample− , where  and  are the mean of the data and its standard deviation.The  and  parameters are initialized with respectively the empirical mean and standard deviation of the dataset and they are computed offline before the training.

NN's evaluation
We evaluated the NN on 6 different benchmarks as follows: Benchmark  ()  order region Accuracy • In the first and second benchmarks, we generate the same random quadratic polynomials but in a different domain: − 4, 4 .This choice is made to test the generalization of the NN outside the data domains that were used in its training.This is important since the Bernstein coefficients of a polynomial (the input to the NN) depend on the input region   .• In the third and fourth benchmarks, we generated random polynomials with degrees 4 and 10 over the domain . These benchmarks are used to validate the generalization of the NN to polynomials of orders higher than the ones used in its training.
• Finally, in the fifth and sixth benchmarks, we generated random polynomials with higher dimensions, i.e., with dimension  = 4 and  = 7.
In summary, these benchmarks will help us to answer the following question: can the trained NN generalize to new data with different domains (benchmarks 1 and 2), higher orders (benchmarks 3 and 4), and higher dimensions (benchmarks 5 and 6)?More detail about the different benchmarks is shown in Table 2.
Figure 4 shows the performance of the trained NN over 20 random samples of each of the six benchmarks.For each sample, we used the framework in [53] to compute the ground-truth percentage in the reduction of the volume of ambiguous regions after applying every action under-approximation, over-approximation, or split.We then evaluated the NN on each sample and reported in Figure 4 both the ground-truth reduction of the ambiguous regions (as bars) against the index of the action suggested by the NN (as the text above the bars).As it can be seen from Figure 4, except for the second sample of the first benchmark, the NN outputs represent the actions that lead to the maximum reduction of the ambiguous region's volume.
Finally, we ran the same experiment for 1000 samples and report the percentage of samples for which the NN was able to predict the action that leads to the maximum reduction in the ambiguous region's volume.As it can be seen from Table 2, the trained NN is able to generalize on the different benchmarks.For instance, evaluating the NN on different domains results in the lowest accuracy of 93%.Furthermore, evaluating the NN on polynomials with higher-order results in an accuracy of 87%.Finally, the NN achieves 80% on higher dimension benchmarks.

NUMERICAL RESULTS -SCALABILITY RESULTS
In this section, we study the scalability of PolyARBerNN in terms of execution times by varying the order, the number of variables, and the number of the polynomial constraints for instances when a solution exists (the problem is Satisfiabile or SAT for short) and when a solution does not exist (or UNSAT for short).We will perform this study in comparison with state-of-the-art solvers including our previous solver PolyAR, Z3 8.9, and Yices 2.6.Next, we compare the performance of PolyARBerNN against a theorem prover named PVS which implements a Bernstein library to solve multivariate polynomial constraints [37].Finally, we compare the scalability of the PolyAROpt optimizer against the built-in optimization library in Z3 8.9 to solve an unconstrained multivariate polynomial optimization problem with varying order and number of variables.We consider two instances of Problem 1: an UNSAT and SAT problems.For each instance, we consider three scenarios,  = 1,  = 5, and  = 10 where  is the number of polynomial constraints.First, we vary the order of the polynomials from 0 to 1000 while fixing the number of variables (and hence the dimension of the search space) to two.Alternatively, we also fix the order of the polynomials to 30 while varying the number of variables from 1 to 200.We set the timeout of the simulations to be 1 hour.

Scalability of PolyARBerNN versus PVS
In this experiment, we want to investigate the effect of the NN on the overall performance of the system.We compare PolyARBerNN against the PVS theorem prover which also uses Bernstein coefficients to reason about polynomial constraints.The polynomials and the domains of the associated variables are given in [37] which were originally used to assess the scalability of the PVS theorem prover.As evinced by the results in Table 3, using a combination of Bernstein and adding the NN to guide the convex abstractions (in PolyARBerNN) leads to additional savings in the execution time, leading to a speedup of 483× in the Heart dipole example.

Scalability of PolyAROpt against other solvers
We compare the scalability results of PolyAROpt with the Z3 solver due to the fact that Z3 has a built-in optimization library.Unfortunately, Yices does not have such an optimizer.We set the timeout of the simulation to be 1 hour.In this subsection, we assess the scalability of the PolyARBerNN solver compared to state-of-the-art solvers for synthesizing a non-parametric controller for a Duffing oscillator reported by [21].All the details of the dynamics of the oscillator and how we generated the polynomial constraints can be found in [53].We denote by  the dimension of the Duffing oscillator.We consider two instances of the controller synthesis problem for the Duffing oscillator with the following parameters: We feed the resultant polynomial inequality constraint to PolyARBerNN, Yices, and Z3.We solve the feasibility problem for  = 3 and  = 4.We set the timeout to be 60.Figure 7 (left) shows the state-space evolution of the controlled Duffing oscillator for different solvers for number of variables  of 3 and 4. Figure 7 (right) shows the evolution of the execution time of the solvers during the 12 seconds.As it can be seen from Figure 7, our solver PolyARBerNN succeeded to find a control input  that regulates the state to the origin for all .However, off-the-shelf solvers are incapable of solving all two instances, and they early time out after 60  out of the simulated 12 .This shows the scalability of the proposed approach.

Use Case 2: Reachability analysis of a discrete polynomial dynamical systems
In this section, we show how to use PolyAROpt to compute the reachable set of states for discrete-time polynomial systems.We consider a discrete polynomial dynamical system of the following form: where  : R  → R  is a multivariate polynomial map of a maximum degree  = ( 1 , • • • ,   ), and  0 is a bounded polyhedron in R  .In this subsection, we consider a bounded-time reachability analysis of the system in (19).Computing the exact reachability sets of this type of dynamic system is hard.Therefore, we overapproximate the exact set with a simplified set such as bounded polyhedra.
where  +1, and  +1, are the  ℎ row and component of the templates  +1 and  +1 , respectively.In every step  ∈ N, we compute an upper bound for − +1, , by solving the optimization problem (20) using PolyAROpt and then  an overapproximation of the reachability set  +1 is computed.In order to use PolyAROpt to solve (20), we need to overapproximate the polyhedra   with a hyperreactangle   .Therefore, the optimization problem ( 20) is modified as follows: We implemented the reachability computation method and tested it on three dynamical systems: • FitzHugh-Nagumo Neuron: is a polynomial dynamic systems that models the electrical activity of a neuron [20].
All the details of the dynamics of the three dynamical systems and how we generated the polynomial constraints can be found in [5,9,20,53].We computed the reachable sets for these dynamical systems and compared our results with Sapo [14].Sapo is a tool proposed for the reachability analysis of the class of discrete-time polynomial dynamical systems.Sapo linearizes the optimization problem (20) using the Bernstein form of the polynomial and was shown to outperform state-of-the-art reachability analysis tools like Flow * [10].Figure 8 shows the reachable sets computed by PolyAROpt compared to Sapo for the FitzHugh-Nagumo Neuron model.Inspecting the results in Figure 8 qualitatively shows that PolyAROpt is capable of computing tighter sets.To quantitatively compare the results of PolyAROpt and Sapo, we compute the volume of each reachable set (for different time steps).Figure 9 shows these volumes for all the three dynamical systems mentioned above.As evident by the results in Figure 9, PolyAROpt results in reachable sets that are tighter than the one obtained from the Sapo thanks to PolyAROpt's ability of solving the polynomial optimization problem without any relaxations.Such ability to avoid relaxation results in several orders of magnitude reduction in the volume of the reachable sets compared to Sapo; a significant improvement in the analysis of such dynamical systems.
Conclusions.In this paper, we proposed PolyARBerNN, a solver for polynomial inequality constraints.We proposed a systematic methodology to design neural networks that can be used to guide the abstraction refinement process by bridging the gap between neural network properties and the properties of polynomial representations.We showed that the use of Bernstein coefficients leads the way to designing better neural network guides and provides an additional abstraction that can be used to accelerate the solver.We generalized the solver to reason about optimization problems.
We demonstrated that the proposed solver outperforms state-of-the-art tools by several orders of magnitude.
The indicator variable   should be set to zero whenever all the roots   0 lies outside the hyperrectangle    (  ,   ).
First note that a root  Before we compute the Lipschitz constant of    0 →  in Equation ( 28), we recall the following identities.Consider two functions  () and () with Lipschitz constants   and   , respectively.Then: • The Lipschitz constant of max( (), ()) is bounded by   +   .
• The Lipschitz constant of  () + () is bounded by max(  ,   ).28) is O (  ).We conclude our proof by noticing that all the operators in Equation ( 28)-namely the absolute value, the max operator, summation, and checking the final value against a constant-can be implemented exactly using ReLU neural networks [19] and hence the neural network    0 →  will also have a Lipschitz constant equal to O (  ).□

D PROOF OF PROPOSITION 4
Proof.This result follows directly by noticing that   → + / − can be computed as a linear function: where  is a constant that depends on the volume of the hyperrecatngle   .Since  is a constant, we conclude the result.□

Fig. 2 .
Fig. 2. The architecture of the neural network   (  ,   ) used to prove the Theorem 2.

Fig. 3 .
Fig. 3.The architecture of the trained NN that is used to guide the abstraction refinement process within PolyARBerNN.We used a fully connected NN that contains an input layer with 4 neurons, three hidden layers with 40 neurons each, and one output layer with three neurons.All neurons are ReLU-based except for the output neurons which uses SoftMax non-linearity.

Fig. 4 .
Fig. 4. Percentage in reduction of the volume of ambiguous regions along with the NN output number (the number is at the top of histograms) for 20 samples for 6 evaluation benchmarks described in TableII.

Fig. 5 .
Fig. 5. Scalability results of PolyARBerNN in the case for 1, 5, and 10 constraints.(left) evolution of the execution time in seconds as a function of the order of the polynomials, (right) evolution of the execution time in seconds as a function of the number of variables.The timeout is equal to 1 hour.
Figure 5 reports the execution times for all the experiments whenever the problem is UNSAT and SAT.As evidenced by the figures, PolyARBerNN succeeded to solve the instances of Problem 1 for all orders and numbers of variables in a few seconds.For instance, solving 10 polynomial constraints with 200 variables and a maximum order of 30 took around 20  leading to a speed-up of 200× compared to Z3 and Yices.On the other hand, other solvers are incapable of solving the polynomial constraints for all orders or number of variables and they time out after one hour.These results show the scalability of the proposed approach by including Bernstein coefficients to prune the search space and a NN to guide the abstraction refinement.

Fig. 6 .
Fig. 6.Scalability results of PolyAROpt for unconstrained optimization.(left) evolution of the execution time in seconds as a function of the order of the polynomial.(right) evolution of the execution time in seconds as a function of the number of variables.The timeout is equal to 1 hour.

Figure 6
Figure 6 reports the execution times of two experiments of computing the minimum and maximum of unconstrained optimization.As evidenced by the two figures, PolyAROpt succeeded to solve the unconstrained optimization problem for all orders and number of variables.For instance, solving the unconstrained optimization problem with 70 variables and a maximum order of 3 took around 50 .On the other hand, the Z3 solver is incapable of solving the unconstrained optimization problem for all orders or number of variables and times out.

Fig. 7 .
Fig. 7. Results of controlling the Duffing oscillator with different  (left) evolution of the states  1 () and  2 () for the solvers in the state-space, (right) evolution of the execution time of solvers during the 12 seconds.The timeout is equal to 60.Trajectories are truncated once the solver exceeds the timeout limit.

𝑗0
lies outside    (  ,   ) if and only if the following condition holds: ,    , and    are the th element in the vectors   0 ,   and   , respectively.Hence, the indicator variable   should be set to zero whenever the following conditions hold:   (  ,    ) = 0 ⇐⇒ max  ∈ {1,...,  }
is bounded by O (  ).
follows from Proposition 4. Now, we upper bound   ( ′  ,  ′  ) −   (  ,    ) 2as follows: [18]ollows from Theorem 1 and Theorem 2 that the higher the condition number of a polynomial    , the larger the network width and depth needed to estimate the volume of zero sub/super level sets with high accuracy.Unfortunately, the power basis representation of polynomials (i.e., representing the polynomial as a summation  ≤     ) is shown to be an unstable representation with extremely large condition numbers[18]which may necessitate neural networks with substantial architecture.4.2 Bernstein Polynomials: A Robust Representation of PolynomialsMotivated by the challenge above, we seek a representation of polynomials that is more robust to changes in coefficients, i.e., we seek a representation in which the roots of the polynomial change slowly with changes in the coefficients (and hence smaller condition numbers    and a smaller NN to estimate the volume of the sub/super level sets).   ∈ R[ 1 , . . .,   ] be a multivariate polynomial over a hyperrectangle   (, ) and of a maximal degree  = ( 1 , • • • ,   ) ∈ N  .The polynomial: is called the Bernstein polynomial of , where  , () and  , are called the Bernstein basis and Bernstein coefficients of , respectively, and are defined as follows: from which we conclude that the Lipschitz constant of   (  ,   ) is in the order of O (       ) which concludes the proof of Theorem 2. □

Table 1 .
Evaluation of three trained neural networks on three different benchmarks for the different polynomial basis.Each benchmark has 10000 samples.The coefficients of the polynomial within each basis are generated following a uniform distribution given in the table.

Table 3 .
Scalability results for PolyARBerNN against PVS.In this experiment, we compare the execution times of PolyARBerNN against the PolyAR tool [53], Z3 8.9, and Yices 2.6.