VCFloat2: Floating-Point Error Analysis in Coq

The development of sound and efficient tools that automatically perform floating-point round-off error analysis is an active area of research with applications to embedded systems and scientific computing. In this paper we describe VCFloat2, a novel extension to the VCFloat tool for verifying floating-point C programs in Coq. Like VCFloat1, VCFloat2 soundly and automatically computes round-off error bounds on floating-point expressions, but does so to higher accuracy; with better performance; with more generality for nonstandard number formats; with the ability to reason about external (user-defined or library) functions; and with improved modularity for interfacing with other program verification tools in Coq. We evaluate the performance of VCFloat2 using common benchmarks; compared to other state-of-the art tools, VCFloat2 computes competitive error bounds and transparent certificates that require less time for verification.


Introduction
We describe VCFloat2, an open-source 1 tool for automated floating-point round-off error analysis, foundationally verified sound in Coq, with a new functional modeling language for better integration with extended formal analyses of numerical programs, a new core representation of number formats that allows users to derive error bounds for formats not currently defined by the IEEE 754 standard, a new efficient optimizer for multinomial floating-point expressions, and a new analysis for obtaining tighter error bounds.VCFloat2 is available at github.com/VeriNum/vcfloat, or by opam in the coq-released repository as coq-vcfloat.
Developing machine-checked proofs that programs correctly and accurately approximate the solution to continuous mathematical problems is challenging.First, there is the challenge of developing machine-checked proofs of accuracy: while high-level mathematical specifications use real numbers and assume operations are exact, computer programs operate on floating-point numbers and perform inexact operations that introduce round-off error.Practically useful proofs of accuracy provide tight bounds on this round-off error without compromising soundness.Second, there is the challenge of composing proofs of correctness and accuracywithout introducing logical gaps-inside of a single mechanized logical framework.Using VCFloat2, we address these challenges by decomposing the problem as follows.
• Write a floating-point functional model -a floatingpoint-valued specification of the program-using the functional-modeling language provided by VCFloat2.• Prove that a program written in a low-level language such as C correctly implements the floating-point functional model using a program logic, specified in Coq, for the low-level language.To help automate this proof, use a program logic verification tool, like the Verified Software Toolchain (VST) [2].• Prove that the floating-point functional model approximates a real-valued functional model, to within a certain accuracy; that is, perform a round-off error analysis, using VCFloat2 for automation.• Prove that the real-valued functional model finds a solution to the mathematical problem of interest, with a given accuracy bound.For this, use a Coq library for real analysis, such as Coquelicot [11] or Mathematical Components [33].
Using VCFloat2 as both a specification language and a tool for automating floating-point round-off error analysis, the proofs outlined above can be composed into a single correctness-and-accuracy theorem in Coq, with no logical gaps compromising soundness.While there are several tools that perform floating-point round-off error analysis, not all of them produce machine-checkable proofs that can be so seamlessly composed with other theorems about program correctness and real analysis; a detailed review of related work is provided in Section 15.
VCFloat2 is an extension of the original VCFloat tool, developed by Ramananandro et al. [40], with the following improvements: Functional-modeling language: VCFloat1 took as input expressions from C programs parsed by the front end of the CompCert C compiler [10,30].In VCFloat2, we have added a new front end that can reify-that is, turn a formula into an abstract syntax tree for symbolic analysis-directly from functional models written in a general and natural style entirely independent of C or CompCert (see §5).In effect, this is a functional modeling language shallow-embedded in Coq for defining floating-point algorithms ( §3).Any-precision literals: To support VCFloat2's functional models, we added new floating-point literal notations within Coq that work at any floating-point precisionsingle (binary32), double (binary64), half, quad, or arbitrary user-specified ( §3).Modeling library: VCFloat2 provides a library containing definitions, lemmas, and congruences for reasoning about functional models and their equivalences.User-defined operators: While VCFloat1 could only handle a specific set of built-in operators (+, −, ×, ÷, √ ), VCFloat2 supports user-defined operators of any arity and precision, requiring only user-supplied proofs (or axioms) about their rounding behavior.Non-IEEE formats: While VCFloat1 could handle any IEEE 754 format (with arbitrary mantissa and exponent sizes), VCFloat2 additionally supports user-supplied number formats, so long as the operations on them have defined (possibly zero) relative and absolute error bounds.We provide an example of this functionality using binary64 double-words ("double-doubles" [25]) where the operation under consideration sums a doubledouble number and a binary64 number.Exact division: While VCFloat1 recognized that floatingpoint multiplication by nonnegative powers of 2 is exact in the absence of overflow, VCFloat2 obtains tighter error bounds by recognizing that floating-point division by powers of 2 (or multiplication by powers of 1/2) can also be exact, or in the case of underflow, highly accurate ( §7).
Efficient simplifier: VCFloat1 workflow generated verification conditions which were discharged by Coq's standard solvers for polynomial and rational equations (e.g., the field_simplify tactic) followed by a call to Coq's Interval package [13, §4.2].Even on standard floatingpoint benchmarks, these standard solvers generated exponentially large terms that caused Coq to run out of memory.For VCFloat2, we built an efficient, accurate, and proved correct ( §9) interval-goal optimizer in Coq's term language ( §8).Decomposition tactic: To obtain tight error bounds using Coq's interval package on expressions generated by VCFloat, we implemented a tactic that decomposes subexpressions in order to mitigate the dependency effect in the interval analysis.With this decomposition tactic, VCFloat2 can provide tight bounds on benchmarks in instances where VCFloat1 failed to produce a useful bound.
Taken together, each of the contributions outlined above make VCFloat2 a flexible tool for building end-to-end machinechecked proofs of accuracy and correctness for floating-point programs.This is illustrated by several verification projects where VCFloat2 has played an integral role.
Applications.VCFloat2 has been used in several applications, demonstrating its effectiveness as a tool for building end-to-end proofs of accuracy and correctness.
Differential equations: Kellison and Appel [26] proved that a C program correctly and accurately integrates an ordinary differential equation (ODE).Using VST, the authors proved that a C program correctly implements a floating-point functional model.VCFloat2 was used to prove that the functional model accurately approximates a discrete-time-step real-valued functional model.Finally, Coquelicot was used to prove that the real-valued functional model finds an accurate solution to the equation with bounded error after  time-steps; all proofs and accuracy bounds were composed into a single Coq theorem.Matrix-vector operations: The LAProof library [27] contains formal proofs of the accuracy of linear algebra operations described by the basic linear algebra subprograms (BLAS) specification, and provides a proof of correctness of a C implementation of sparse matrix-vector multiplication.These Coq theorems use VCFLoat2's functional-modeling language and modeling library, but do not use its automated error analysis.Jacobi method: Tekriwal et al. [43] proved correctness, accuracy, and convergence of a stationary iterative solver for linear systems of equations.A sparse-matrix C program implementing the Jacobi method was proved correct using VST and using VCFloat2's modeling library; proofs of accuracy and convergence used the Mathematical Components library and VCFloat2's modeling library.VSTlib: C's standard math library is now axiomatized as a VST library using VCFloat2's functional-modeling language with the library functions (sin, cos, etc.) treated as VCFLoat2 user-defined operators [3]; VCFloat2 therefore supports automatic round-off-error analysis for computations using those functions.
In the remainder of this paper, VCFLoat2 is described in detail; the significance of VCFloat2 with respect to related work is described in Section 15.In the following section, we provide a review of round-off error analysis in order to set the stage.

Review of round-off error analysis
We briefly review the fundamentals of round-off error analysis and floating-point arithmetic, and point readers to more detailed expositions by Overton [38] and Muller et al. [35] for further reference.
A floating-point number may be 0, normal finite, subnormal finite, +∞, −∞, or NaN (not a number).A normal number is representable as ±1. • 2  , where the  represents a string of binary digits of length  (the mantissa size) and  min ≤  ≤  max is the exponent.A subnormal number is representable as ±0. • 2  min , where there may be several leading zeros.A finite number is either ±0, normal, or subnormal.
Every finite floating-point number  exactly represents a real number R().But the floating-point calculation  +  or  •  is usually not exact: often R() + R() ≠ R( + ).That is, the result of a floating-point operation may have more mantissa bits than fit into the representation, so it will be rounded off.In general, if the result of evaluating  op  for op ∈ {+, −, ×, ÷} is a normal number, then we know that R( op ) = (R() op R()) (1 + ) for some  such that | | ≤ 2 − ; i.e., there is a relative error bound.
There are some special cases: if  and  have the same binary order of magnitude Finally, if  and  are subnormal, then R( +) = R() +R() exactly, but currently VCFloat2 does not handle this special case, approximating as R( + ) = R() + R() + .

Overview of VCFloat2
We will use this formula as a running example: which arises in the simulation of a harmonic oscillator by leapfrog integration with time-step ℎ = 1/32.Assume 2 ≤  ≤ 4 and −2 ≤  ≤ +2, and we compute this in singleprecision floating point.
VCFloat's job is to soundly derive an absolute round-off error bound for the formula, by inserting deltas and epsilons as described in Section 2, taking into account of all the special cases (known-normal numbers, known-subnormal numbers, multiplication by powers of two, Sterbenz subtraction, etc.).
We will illustrate the user's workflow using VCFloat2's new annotation system.The user writes in Coq, Our functional-modeling language is just Coq formulas and Coq functions over variables of type ftype( ), where  :type is any floating-point precision.(Don't confuse type, meaning a floating-point format, with Type, which is the notion of type in Coq's logic.)We predefine Tsingle:type and Tdouble:type, but the user can define any binary IEEE 754 floating-point format (e.g., half-precision or quad precision).So: Tsingle is a floating-point precision description, and the inhabitants of ftype Tsingle are single-precision floats.
VCFloat2 comes with these exciting new functions: But these are hardly exciting at all: each one is just the identity function.Actually these are annotations for the analysis. 2orm(+) suggests the sum  + is going to be a normal (not a subnormal) number.Denorm suggests a subnormal number.Sterbenz(x−y) suggests that  and  will have the same binary order of magnitude.These suggestions will have to be proved-at the point marked ( * B * ) below-but then they will lead to a better round-off error analysis.If these verification conditions can't be proved, then the whole round-off error theorem is unproved; one might need to remove the corresponding annotations.
Our running example is annotated with Norm and Sterbenz at specific places where it is expected that these conditions will be met.In future work we expect to automatically calculate many of these annotations, so users can write annotations only where they want to insist on the condition.
In general, an error analysis is valid only for a particular range of input values: the assumptions 2 ≤  ≤ 4 and −2 ≤  ≤ 2 are important.The VCFloat2 user encodes these into a boundsmap that maps identifiers (denoted here as _x and _v) for the free variables of the expression ( and  in our example) to user-specified lower and upper bounds: The theorem the user wants to prove: The round-off error of the step function is the maximum difference, for all  and  in bounds, between the floating-point evaluation of step   and the evaluation of the same formula on the real numbers.To state this theorem, we use the notion of a valmap, a computable mapping from variable-identifiers to floating-point numbers.For example, given floats  and  we can build a valmap representing {_x ↦ → , _v ↦ →  }.
We will prove that the round-off error is less than one in four million, for any valmap. 3To prepare a functional-model formula like step for analysis, one reifies it (using some Coq boilerplate to invoke the reifier within a definition): Proving this theorem for the user is the main purpose of the tool: To illustrate what VCFloat2 does and how it differs from VCFloat1, we will show the proof state at points A, B, C, D.
Point A. By now we have already reified the formula step into a deep-embedded tree step'.Unlike VCFloat1, we reified from a functional-model formula such as step, instead of from a C program; see section 5.
Point B. By this point, VCFloat's core algorithm has calculated several verification conditions.In this case there are 5 conditions.Number 3 of 5, for example, arises from the claim that ( + ℎ • ) is a normal number.Here is this verification condition (cleaned up a bit); all operations are in the reals: Point C. By this point VCFloat has inserted deltas and epsilons using the improved analysis of VCFloat2 as described in Section 7, and now one must prove that the resulting difference formula is within the error bound: In general this is a difficult formula to analyze, because of nested epsilons and deltas and the implicit subtraction of  − .If we give this directly to the interval package, it will derive a very weak bound, much worse than the desired one.Here we use our new special-purpose simplifier, which (efficiently and soundly) transforms the goal at Point C into the proof goal at Point D.
On some "Point C" goals we also use a (new) tactic that recursively decomposes the expression for absolute floatingpoint error into smaller subexpressions of related terms that are easier for our simplifier (and Coq Interval package); see §10.The Gappa tool uses a similar technique.
That is, fprec is the number of bits in the mantissa, femax is the maximum exponent value.The proof fprec_range enforces (via dependent types) that the precision (number of mantissa bits) must be less than the maximum exponent. 4his was sufficient to describe any IEEE 754 format (including half-precision, double-precision, quad-precision, etc.).But in VCFloat2 we support exotic floating-point types that don't precisely fit the IEEE 754 format.In particular, each value of the synthetic type double-double [20] has fprec at least (or exceeding) 106 bits, and femax=1024.Double-double generally respects the round-off bounds for ⟨106, 1024⟩ but is not exactly described by any VCFloat1 "type." Therefore in VCFloat2 we define, Record type: Type := GTYPE {fprec: Z; femax: Z; prec_range: 1 < fprec < femax; nonstd: option (nonstdtype fprec femax prec_range)}.Definition TYPE fprecp femax prec_range := GTYPE fprecp femax fprec_lt_femax prec_range None.
When nonstd=None, this is a standard IEEE 754 format; when nonstd=Some(nt), then nt is a record describing an abstract data type and its interpretation.
For the underlying semantics of floating-point numbers and operations, we rely on Coq's Flocq floating-point library [12,13].For example, subtraction: which operates on binary IEEE 754 floating-point numbers at any precision.The rounding mode may be round-to-nearesteven, round-toward-zero, round-down, etc.The last two arguments and the result are floats of the given precision.Given any format-description (ty: type), we can define ftype(ty), the Coq type of floating-point values belonging to that format: That is, if ty is a standard type, then the Coq type of its values is simply Flocq's binary_float type (of the appropriate precision).But if it's a nonstandard type, then we use the user-supplied representation type from the nt package.
When  is a standard type (nonstd()=None), we can use dependent types to convert between binary_float (fprec ) (femax ) and ftype(): Now, for any Flocq-standard arithmetic operator such as Bplus, Bminus, Bmult, etc. that is parametrized by prec and emax, we can define it is the single-precision floating-point add operator.
Notation parser/pretty-printer for literals.Coq has 64bit floats built-in, with notation parsers and printers for the usual notation (e.g., 1.36e+7).But we want to parse and pretty-print floats in any precision, and not into built-in floats but into Flocq's deep-embedded description of floats.So we implemented an entire scientific-notation parser and pretty-printer in Coq, and plugged it in using Coq's customizable Number Notation feature. 5t is well known that printing floating-point numbers accurately and concisely is nontrivial [14].There are four kinds of solutions: 1. Incorrect (in some cases, print a number that does not read back as the same number).2. Correct but unrounded, i.e. print 0.15 as 1.49999996e-1 which reads back correctly but does not look nice.3. Correct and concise by validation, i.e., print 0.15 as 0.15 or 1.5e-1 by trying twice (rounding down, rounding up), and then checking in which case the rounding was done correctly.This is the method we use.4. Correct and concise by construction, i.e., sophisticated algorithms that get it right without needing to validate.
In programming languages without arbitrary-precision integers, all of this is more difficult, but in Coq we have the Z type that simplifies some issues.

Reification
To represent in a logic a function analyzing logical formulas of type , one cannot write a function with type  → Prop; one must operate on syntactic representations of formulas, such as our expr type.One can then define in the logic a reflect function of type expr → .The opposite process, reification, cannot be done within the logic.But we can (and do) program a reify function from  to expr in the tactic language of the Coq proof assistant.One cannot prove reify correct, but we obtain a per-instance guarantee for each  :  by checking that reflect(reify( )) =  .This is proof by reflection [8,Ch. 16].
Reification is not a new concept, nor is the use of a tacticbased program to implement it.Where we innovate, compared to previous reifiers and compared to VCFloat1, is in the handling of annotations that seem to make no semantic difference-when reflected in the standard way-but become embodied in the reified term (abstract-syntax tree) so as to guide the proof of a theorem.
VCFloat1's inner workings are explained in Sections 3 and 4 of Ramananandro et al. [40].First the term is reified into syntax trees.Our Listing 1 is similar to Ramananandro et al.'s [40] Figure 1, except that: The InvShift form of rounded_unop is new in VCFloat2; see §7.The Func form of expr is new; see §12.The notion of collection and the predicate is_standard are new; see §13.
VCFloat1 did not reify from Coq formulas; instead it translated C statements into expr terms by first parsing the C using CompCert's front end, then translating CompCert ASTs [30] into exprs.In VCFloat2 we reify from Coq formulas, not directly from C programs, even though we too are sometimes interested in proving the correctness of C programs.We reify from a functional model (such as the step function shown earlier), for several reasons: Listing 1. Syntax of reified expressions.
• The functional model is an important artifact in its own right.It will be the subject of significant analysis, not only for floating-point round-off but for the function it calculates on the real numbers.We don't only want to prove that the C program accurately approximates some real-valued discrete algorithm, we want to prove that the real-valued algorithm accurately approximates the high-level goal, some real-valued function or relation.For that, we want a stable, cleanly written, human-readable functional model, not something automatically reified from a C program.• The user of VCFloat might not be programming in C.
• Our functional modeling language (and VCFloat) can work at any floating-point precision, but CompCert C only defines 32-bit single precision and 64-bit double precision.
Our reifier is written in Coq's tactic language.Except for its treatment of annotations, it is fairly conventional.A few clauses are illustrated in Listing 2.
These four clauses reify differently annotated subtractions.Since Norm, Denorm, and Sterbenz are all identity functions, a program in Coq logic's core calculus could not distinguish them.But the tactic language can.In the Binop tree-node that it builds, different "rounding knowledge" is encoded into the syntax tree.In VCFloat2 we repackage it into a more user-friendly form, wrapping it with appropriate corollaries and adding automation tactics to help discharge the verification conditions.Suppose the user's formula is which we reify into an expression  : expr (in the datatype of reified expressions, Listing 1).Then rndval_with_cond() produces results (, , vcs): : rexpr is a (reified) expression containing epsilons and deltas indexed by natural numbers, such as appears in the left-hand-side below the line at Point C (although there it appears in its reflected, not reified, form). : shiftmap is a map from those natural numbers to bounds-descriptors, sufficient to describe the bounds for   and   above the line at Point C. vcs : list(environ → Prop) is a list of verification conditions, such as the one that appears (reflected, below the line) at Point B. VCFloat's soundness theorem, rndval_with_cond_correct, is a machine-checked proof in Coq.It is presented as Theorem 3 by Ramananandro et al. [40].Basically, it says that • for any valmap  mapping reified variables (such as _x and _v in our example) to floating-point numbers, • if each of the verification conditions vcs holds in environment , • then there exists an error-map  from  indexes (natural numbers) to R, • such that every  and  (interpreted in ) respects the bound in , • and the floating-point evaluation (in ) of  is finite (not an infinity or NaN), • and the real-number interpretation of  (using  and ) is exactly equal to the floating-point evaluation of  (in ).
VCFloat2's prove_roundoff_bound tactic (at Point A) extends this soundness theorem to handle user-supplied functions and nonstandard float types.VCFloat2's automation then applies the theorem as part of an end-to-end machinechecked proof in Coq about the floating-point round-off error of the given formula.

Optimizations and inverse shifts
VCFloat1 included some theorems about transformations on (reified) terms that would improve the analysis (for better error bounds).In VCFloat2 we have integrated these transformations so that they're automatically applied; to do so, we proved the necessary soundness corollaries.This is built in to the prove_roundoff_bound tactic used at Point A.
For example, multiplication by a power of 2 is exact in floating-point arithmetic, if the result stays finite.VCFloat's reified trees represent 64.0 •  as Binop (Rounded2 MULT) 64 x, and represent 2 6 •  as Unop (Exact1 (Shift 6)) x.Both of these "reflect" back to the same floating-point formula, but they are treated differently in the analysis: MULT introduces  and , but Shift does not.We also use other such optimizing transformations such as constant-folding.The purpose is to optimize the analysis, not optimize the program that runs.The choice of what computation to actually run is specified by the user, in writing the functional model.
InvShift.We also implemented a new optimizing transformation: recognize division by powers of 2 (or multiplication by powers of 1/2).When We exploit this in VCFloat2's reified tree language with the InvShift operator.For example, our optimizer replaces Binop (Rounded2 DIV)  64 with Unop (Rounded1 (InvShift 6))  so that rndval_with_cond introduces the appropriate  with its bound. 6 8 An efficient simplifier for interval goals The Interval package [13, §4.2] is a procedure in Coq for proving goals of the form, 0 ≤  ≤ ℎ 0 where all the   and ℎ  are constants, the   are real-valued variables, and  is a real-valued expression over the variables.Any of the inequalities may be strict (<) rather than nonstrict (≤), some of the inequalities may be missing, there may be several redundant constraints over any given   , and any of the inequalities may be expressed as |  | ≤ ℎ  .The  0 and ℎ 0 may be left unspecified, in which case Interval reports the best bounds that it can prove.
Interval uses floating-point interval arithmetic, being careful with floating-point rounding modes (round down on one side, up on the other).But that alone would provide very weak bounds, so Interval also uses higher-precision (synthetic) floating point, interval bisection, Taylor expansions, and automatic differentiation.
At Point C, just after prove_roundoff_bound2, the proof goal is in the form accepted by the Interval package.Unfortunately, Interval doesn't do a very good job on that goal.Multivariate Taylor expansion would work quite well [42], but Interval uses only univariate Taylor series.
The Interval mode that can work on our problem is (repeated) bisection of the interval.But even then, the nested expressions with deltas and epsilons are obstacles to good approximations.In particular, at Point C in Section 3 there is the formula ( + . ..) (1 +  2 ) − ( + . ..), and subtracting  −  using interval arithmetic leads to a severe over-estimation.
We found that it helps to use Coq's field_simplify tactic before calling Interval; this would turn the (Point C) goal The two terms  and − have been symbolically canceled, which reduces the dependency effect in a subsequent interval analysis.On this new goal, the Interval tactic computes an excellent bound.
Repeatedly applying the distributive law to this multinomial has caused an exponential blow-up in the number of terms.For this small expression, "exponential" means only 17 terms, but if we apply VCFloat1 to more substantial examples, Coq runs out of memory.
The dependency effect.In interval arithmetic, expressions containing multiple occurrences of the same variable suffer from the dependency effect: rather than taking on a single value, each occurrence of the same variable represents a range of values, and the basic arithmetic operations assume the ranges of operands are independent [37, §1.3.5].When interval arithmetic is used naively to bound the round-off error in floating-point computations, the dependency effect leads to a substantial over-estimation of the error.The simple example of | −  | suffices as a demonstration of the effect: if  lies in the interval [0, 2], then evaluating the expression in interval arithmetic without symbolic cancellation yields a worst-case error bound of 2.
A fast ring simplifier.We implemented a high-performance special-purpose simplifier, to clean up queries before asking Interval to solve them.It it creates much smaller formulas than does field_simplify.It works well for formulas that can (mostly) be described as multinomials.We expand the multinomial into sum-of-products form, then soundly and efficiently cancel terms while reducing the exponential blow-up in the number of terms.
A real-life numerical analyst might perhaps discard the higher-order terms, those in which more than one  or  are multiplied together.Another real-life alternative is to ignore the issue of underflow (denormalized numbers), in which case all the  are discarded.Both of those methods work well most of the time.But neither one is sound; and we want proofs of our bounds!Therefore, we implemented (and proved correct in Coq) an algorithm to efficiently and soundly simplify Interval goals: Step 1: Apply limited ring simplification: the distributive law, and multiplication by 1 and by 0, division by 1, multiplication of constants together, limited simplification of rational constants.
Step 2: Discard and bound the total of insignificant terms.
Step 3: Cancel terms using an efficent balanced-binarytree data structure.
The entire algorithm is implemented in Coq logic's functional programming language, which Coq can compile to byte-code or machine-code.
Reification.A program in Coq's logic must be applied to a reified term-an abstract-syntax tree-not to a "native" proof goal.For this component, we chose to use the reifiedtree syntax from the Interval package, rather than VCFloat's own.This is because (1) our interval-goal simplifier should be usable by any user of the Interval package, not only in connection with VCFloat; and (2) we have no need of the "rounding knowledge" of VCFloat's tree syntax.

The distributive law.
Step 1 of the algorithm doesn't need much explanation: it works in one pass over the tree with a recursive function.

Discarding negligible terms.
Step 2, soundly discarding insignificant terms, works as follows.One might think, "let's discard any higher-order term, i.e., containing the product of two or more  and ."But some of those terms might be multiplied by very large coefficients or user-variables; and on the other hand, some terms containing only a single  or  might be multiplied by tiny numbers and therefore be insignificant.
We will take advantage of the fact that bounds are known for every variable, both the original variables (, ) and the  and  variables.So in a sum-of-products expression (resulting from limited ring simplification), we can bound every term.(Terms containing functions that we cannot bound in closed form, we handle as described below.) The user supplies a cutoff such as 2 −30 or 2 −60 or whatever is appropriate.We preprocess all of the (already reified) bounds hypotheses (for variables , ,  1 ,  2 , etc.) into a single absolute-value bound for each variable: Then for each term       we can look up the (reified) bound hypotheses to find a bound ℎ  ℎ  ℎ  on the absolute value of the term.To "delete"       we replace it by the constant ℎ  ℎ  ℎ  .We accumulate all those constants to produce the transformed goal, ,000,000 − 4.0745386 • 10 −10 .Here, the term 4.0745386 • 10 −10 is the sum of bounds of the deleted terms.This goal implies the original goal, from Point C in Section 3.
The algorithm for deleting insignificant terms might encounter some terms whose bounds it cannot analyze because the terms are not simply products of variables and constants.Such "residual" terms it leaves unchanged.

Gathering similar terms and cancelling subtractions.
At this point some terms could cancel (by subtraction).Step 3 cancels terms, efficiently, in the already-reified trees.We have a tree of additions of terms.Each term is either a product of constants and variables, or contains other operators; in the latter case we leave that term alone (as a residual term) and don't attempt to cancel it.An example of a (potentially) cancelable term is,  1  1  1  2  1  2  2 , where  1 ,  2 are rational constants, and  1 ,  2 ,  1 ,  2 are variables.
In our reified tree syntax, all variables are represented by natural numbers.So we can represent any product of (nonnegative integer) powers of these variables The product of all the constants can be represented as a canonical-form rational number.For efficiency, we factor out all the powers of 2 from the numerator and denominator into a separate factor 2  , where  may be positive, negative, or zero.That is, the canonical form of a (nonresidual) term is, if we find the negation of (±)/ • 2  we delete it from the list, otherwise we cons (±)/ • 2  to the front of the list; then reinsert at key .Meanwhile, we replace that term in the expression-tree with 0. What remains is: • an expression-tree with residual terms that could not be represented in canonical form, and zeros where terms have been removed from the tree and added to the key-value map; • a key-value map: for each key − →  a list of coefficients each of the form   • 2  .After the key-value map is built, for each  mapped to a list of coefficients, we add all the coefficients together, as follows: we normalize all the elements to have the same exponent  (so that it is no longer true that every  and  is odd), then add all the rational numbers, to collapse the list into a single coefficient.
We convert each key-value pair back into an expression , and build a final expression tree with their sum, plus the residual terms previously accumulated.The result is shown at Point D.

Efficiency of the algorithm.
Step 1 of the algorithm (distributive law) takes exponential time.Step 2 takes linear time and space (in the exponentially sized term produced by step 1).Step 3 takes  log  time and linear space (and tends to reduce the term back to small size). 7e tested this algorithm on a large expression that resulted from calculating position-change and momentumchange of a harmonic oscillator and then taking the sum of squares of position and momentum.
On this large expression with cutoff 2 −30 , the entire algorithm runs in Coq in 1.8 seconds. 8One might think, "simplifying 242 nodes into 219 nodes is not much of an accomplishment."But it is quite significant: the final expression has canceled the  −  and  −  that caused Interval to give horribly loose bounds.
A more efficient algorithm.It should be possible to discard negligible terms interleaved with the distributive law, so that the exponential blow-up in step 1 never occurs.To do so, one would first walk over the tree bounding the absolute value of every subexpression (using the bounds hypotheses).A second pass would walk the tree, such that in the context  * , while processing  one would adjust the cutoff by the bound for .We may implement this algorithm in the future.For now, the algorithm we have seems fast enough.
Floating-point interval arithmetic.In this section we have described analyses on real-valued formulas that contain integer and floating-point constants.Since one cannot efficiently compute on the real numbers, we perform our analyses in floating-point.Because the analyses must be sound even in the presence of round-off error, we compute in floating-point interval arithmetic, as provided by the Coq Interval package.

Soundness theorem for simplification
The algorithm described in the last section is proved correct in Coq, so applications of it are correct by reflection.
We use the Interval package's reify function to turn the user's functional model into a tree-term e of type expr.As usual in Coq, reify is written as a tactic program in the Ltac language, and we validate each reification by reflecting (see §5 1).
The user chooses a (small, floating-point) cutoff value, and VCFloat2's tactic applies the following function: The function simplify_and_prune embodies the three-part algorithm described in Section 8. Before stating the correctness theorem for simplify_and_prune, we must define the notion of equivalently evaluating expressions: The correctness theorems for ring_simp and cancel_terms is that they exactly preserve evaluation, in any environment.Listing 3. The specification of simplify_and_prune.
with bounds-hypotheses hyps and variables vars that satisfy hyps.Suppose also that simplify_and_prune gives you a simplified expression  1 , and that the total of all deleted terms is bounded by .Then it suffices to prove | 1 | ≤  − .
For example, if there are ≤ 10 10 terms to delete and a cutoff of 10 −8 is specified, then it is inconceivable that  will overflow, since 10 10−8 is representable in double-precision.But just in case, hypothesis F.real  = true tests for overflow.

Decomposition tactic
VCFloat2 produces an optimized expression for the absolute error | ỹ − | between the value ỹ resulting from the floating-point evaluation of an expression and the value  resulting from the evaluation of the same expression over the real numbers.For polynomial expressions, the simplifier described in Section 8 transforms the top-level interval goal | ỹ − | ≤ bound into a goal that the Interval tactic is more likely to prove a tight bound on.More generally, for rational expressions, the simplifier can be applied effectively to smaller subgoals that are generated by decomposing the top-level interval goal using some heuristics that reduce the dependency effect in the interval analysis.
The decomposition is packaged into a tactic in VCFloat2 called error_rewrites.It works by recursively approximating the left-hand side of interval goals for the absolute error.First, distributive and associative laws are applied at the top-level to the  and  error variables inserted by VCFloat as described in Section 2; for example, goals with left-hand sides of the form |( ũ/ ṽ) (1 + ) +  − / | are rewritten as |( ũ (1 + ))/ ṽ − / +  |.Then, the resulting left-hand side is recursively approximated using the triangle inequality (for  terms) and the following inequalities, the first of which is particularly important for rational expressions.
Each of the above inequalities corresponds to a transformation used in Gappa, as described by Boldo and Melquiond [13, §4.3.2.2].The decomposition tactic in VCFloat2 treats each subexpression in the right-hand side of the above inequalities as a separate interval goal, and each of these smaller interval goals generally contains fewer terms that suffer from the dependency effect than the original.

Proving C programs
The Verified Software Toolchain is a Coq library for proving the correctness of C programs with respect to specifications written in Coq's logic.Its specification language is higher-order separation logic with propositions that can use all of Coq's logic.Therefore its specification language and proof system is much more expressive than such systems as Dafny [29], Verifast [24], Frama-C [28].Furthermore, because it is embedded in Coq, one can compose, entirely within Coq, a VST proof that a C program correctly implements a functional model, with a Coq proof that a functional model correctly implements some high-level specification [26].
VST includes a full treatment of C language floating point, using the Flocq model of the IEEE 754 standard.In VST one can (fairly easily) prove that a C program implements a floating-point functional model [4,26,27,43].But VST provides no help in reasoning about the functional model.That is what VCFloat2 can do.
VST describes float operations using CompCert's thin layer of definitions over Flocq's description of IEEE 754 single-precision and double-precision.VCFloat2's functional modeling language uses a different thin layer over the same Flocq types.VCFloat2's compatibility library FPCompCert bridges the small gap.
Fast-math, or not?Some compilers do "fast math" optimizations that change the semantics of floating-point operations; for example, combining  ×  +  into a fused multiplyadd (fma) which omits an internal rounding step.CompCert does not do any transformations that alter floating-point semantics.The CakeML verified ML compiler does [6], and its authors "argue that any compiler correctness result for fastmath optimizations should appeal to a real-valued semantics rather than the rigid IEEE 754 floating-point numbers." That argument is sound, but should a compiler perform such transformations?We suggest no, because it greatly complicates the specification to be proved about the low-level program.Our approach, as explained in this paper, is to prove (with VST or similar tools) that the low-level program exactly implements a floating-point functional model.That proof treats floating-point operations as uninterpreted functions, and does not even mention the real numbers.When we want fused multiply-add, we write it that way in the functional model, and in the C program, using VCFloat2's new mechanism that we will describe in the next section.
12 User-supplied functions VCFloat2 allows the user to provide and use arbitrary functions (e.g., sin, cos).The user must provide proofs (or axioms) of their rounding behavior.Then VCFloat2 will compute and prove round-off error bounds for formulas containing calls to these functions (along with standard operators +, −, ×, ÷, etc.).This is done by extending the reified expr syntax and extending the rndval_with_cond theorem to accommodate user-supplied functions.
Any function that takes 0 or more float arguments (not necessarily all of the same type) and produces a float result can be used in VCFloat2.The user provides a floatfunc_package to describe its characteristics: is convertible to ftype Tdouble → ftype Tsingle → ftype Tsingle.
The specification of this function is as follows: There is a real-valued function ff_realfunc(): R → R → R; and ff_acc is the theorem that  approximates this function within relative error ff_rel  and absolute error ff_abs , provided that the arguments to  are within the bounds specified by ff_precond.
Calls to such functions are reified into the Func constructor of the expr syntax (see §5).Unlike VCFloat1's expr type, which was monotyped (in Coq) but each constructor labeled the intended type of its float values, VCFloat2's expr is dependently typed: expr() is the type of reified expressions that would evaluate to float-values of type ftype().
Func is especially dependently typed, in that klist forms a list of argument subexpressions each with a (potentially) different type, according to the list of types that is ff_args(f).The klist type constructor is for heterogeneous lists [16, §9.2].
From any given floatfunc_package (with its ff_acc theorem), VCFloat2 derives both the evaluation semantics and the (provable) rounding error of calls to these external functions.
VSTlib [3] is a Coq library for use in VST-verified C programs, with proofs or axiomatizations of the specifications of C libraries such as malloc/free, threads, and locks.VSTlib also provides an axiomatization of the GNU math library, including 58 standard Posix math functions such as sin, cos, fma (fused multiply-add), and so on.Each of these is specified to provide a certain level of floating-point accuracy on each specific target architecture as documented in the GNU C library manual [22].Different architecture-specific implementations of the math library have been measured to have different accuracy guarantees.So if you install VST with target architecture AArch64, then you'll get a single-precision arctangent function accurate within 1.5 ulp (unit in last place), but on VST configured for x86-32 it'll be specified as accurate to 0.5 ulp.Accuracy specifications for floating-point functions are written using VCFloat2's floatfunc_package framework, as described earlier in this section.
13 Nonstandard floating-point-like types Users may implement in software floating-point types that do not correspond exactly to any IEEE 754 format but that can be shown to respect round-off error bounds.VCFloat2 supports such user-defined types.
Consider the example of double-double [20].A doubledouble number can be used to represent a floating-point number with at least 106 bits of mantissa using two doubleprecision floats whose 53-bit mantissas do not overlap (because their exponents differ by at least 53).One can add or multiply numbers in this data type using just a few ordinary double-precision operations, especially on machines that support fused multiply-add (fma).
Rounding double-double can be shown to be follow a model with relative error | | ≤ 2 −106 and absolute error | | ≤ 2 −1022 .If the user can prove such a property of an abstract number type (of which double-double is just one example), then they can build a nonstdtype record in VCFloat2: Here, nonstd_bounds is the user-supplied proof of such a rounding theorem, nonstd_to_F is the user-supplied definition of a function that maps nonstandard types to a floatingpoint representation, and floatopt_to_real is a function provided by VCFloat2 that maps a floating-point representation to a real number.
After building a nonstdtype record, the GTYPE constructor is used to build a new VCFloat2 type.Expressions in which some functions return values of this type and other functions take values of this type can now be reified, and VCFloat2 will automatically calculate and prove round-off error bounds.
In order to write such expressions, users of VCFloat2 must first define operations on the newly defined type.This is done by constructing a floatfunc record (whose type was presented in Section 12).
Standard vs. nonstandard types.One might wonder what is special about a "standard" type, that it cannot be just an instance of the general notion of nonstandard type.A standard type must support all the operations listed in Listing 1; nonstandard types support only whatever functions someone has supplied.The notion of an arbitrary "cast" from any IEEE precision to any other cannot be generically supported, and (for example) some nonstandard types do not support Sterbenz subtraction, or constant literals.

Double-doubles in VCFloat2
To demonstrate nonstandard types and the definition of a floatfunc over a nonstandard type, we use double-double.We instantiate a nonstdtype whose nonstd_rep is { a: ftype Tdouble * ftype Tdouble | dd_pred a} where ddpred is a predicate describing a well-formed doubledouble.Based on this representation, we prove the required properties of a nonstdtype, such as nonstd_bounds.
As an example of a nonstandard operation on doubledoubles constructed using a floatfunc, we use the operation DWPlusFP, shown in Listing 4, summing a double-double number and a double-precision IEEE 754 floating-point number.That is, • we state the real-valued functional model as real-number addition; • we define the floating-point valued functional model as the DWplusFP algorithm; • we state its accuracy parameters (relative and absolute error corresponding to a hypothetical IEEE 754 type with 106-bit mantissa); • and we prove the accprop, the accuracy property that relates the two models up to the stated accuracy.We adapt Muller and Rideau's Coq formalization of doubleword arithmetic [36], extended to consider overflow and underflow.We implemented the function in C, shown in Listing 5, and used VST to prove that it implements the floating-point Table 1.Round-off error bounds for Gappa, PRECiSA, FPTaylor, and VCFloat2.The column labeled "Ratio" compares VCFloat2's error bound to the best performer (smaller is better)."Time" is shown as the sum x+y of the execution times (in seconds on a MacBook Pro M2) for Coq to (x) calculate and prove the error bound and (y) check the proof using the Qed command."Vars" and "Ops" are the number of variables and operations in the formulas.Remarks: (a) uses our fast ring simplifier discarding negligible terms (b) uses our decomposition tactic (c) uses field_simplify (d) would benefit from let-bindings (future work) (-) no special preparation of the interval goal.

Benchmark
Gappa  Testing Type equality during reification.In order to reify such an expression, it's necessary to test whether the Type of a subexpression is equal the underlying representation Type of a nonstdtype (we capitalize Type to emphasize that these are Coq types, not our type data structure).But there is no decidable equality on Type.Since this test is being done by the reifier, which is implemented in the tactic language, we can get by with the ability of tactics to test exact identity (a stronger property than equality).But to do this, the reifier needs a list of candidate Types to test against.This we call a collection, and (before reifying anything) the user must "register" any nonstandard type by including it in a typeclass instance of class collection.

Related work and performance evaluation
Several tools perform round-off error analysis and generate machine-checkable proofs: Gappa [9] is implemented in C++ and generates Coq proof scripts; PRECiSA [34] is implemented in Haskell and C and generates proofs in PVS; FPTaylor [42] is implemented in OCaml and can generate proofs in HOL Light; Real2Float [32] is built on top of the NLCertify [31] verification system and generates proof certificates that can be checked in Coq, and Daisy [18] is implemented in Scala and generates proofs in Coq and HOL4 [7].
In comparison to these other tools, VCFloat2's formula language is shallow-embedded expressions in the proof assistant's own logic, so not only can VCFloat2 generate proofs (as Gappa, PRECiSA, and FPTaylor can), but it can operate directly on inputs "from the logic." Thus, VCFloat2 fits better into an integrated error analysis and correctness verification of a numerical program-of which floating-point round-off is only one component.We want to connect to other proofs (about program correctness, about discretization error, etc.) done in a proof assistant for a higher-order logic.
We assessed VCFloat2 on several benchmarks from FP-Bench [17] (see Table 1).We chose 17 benchmarks for which there are previously reported results for FPTaylor, Gappa, and PRECiSA.The column "Ratio" in Table 1 compares the error bounds produced by VCFloat2 to the best performer of PRECiSA, Gappa, and verified error bounds computed by FPTaylor (FPTaylor-f) [42].VCFloat2 finds bounds within an order of magnitude of the best performer on 16 of the 17 problems.VCFloat2, like Gappa, fails to find a useful bound on jetEngine.Notably, the error bounds computed by VCFloat2 for the doppler1-3 benchmarks are within the same order of magnitude of those obtained by FPTaylor, but took an average of 17 seconds each compared to an average of 37 minutes each (on a slower computer) for FPTaylor.Times for verifying the certificates generated by Gappa and PRECiSA were not reported.
On any benchmark without a or b in the "remarks, " VCFloat1 would have gotten the same result, because (in these cases) VCFloat2 is using only VCFloat1 functionality.On the other benchmarks VCFloat1 would generally fail, unless the user did substantial ad-hoc tactical proofs equivalent to our automated tactics corresponding to remarks a or b.
VCFloat2 fails on jetEngine because it is a proper rational (not a multinomial) with a large number of operations (48 ops).VCFloat2's Prune tactic ( §8) does multinomial pruning (where appropriate) and solves remaining subgoals using the Coq Interval package, which can do either multivariable first-order interval arithmetic or single variable Taylor models-but this formula is multivariate and not a multinomial.And the decomposition tactic provided by VCFloat2 works well on multivariate multinomials with a smaller number of operations (e.g., turbine1-3).
In each of the benchmarks we considered, all inputs are assumed to be exactly representable floating-point numbers.For modular verification efforts, where the round-off errors produced by one function might propagate to the inputs of another, it is particularly useful to be able to specify that some inputs are not exact; while this is possible in PRECiSA, Gappa, and FPTaylor, is not currently a feature of VCFloat2.Finally, modularity via the separate treatment of the propagation of input errors and the local introduction of round-off errors has enabled the development of scalable tools for rounding error analyses such as Hugo [1] and Satire [19]; this type of analysis has not yet been adopted in tools that produce proof certificates.
For each FPBench benchmark in Table 1, the VCFloat2 functional model is a Coq function over double-precision floats (ftype Tdouble).These functional models can therefore make use of features of Coq's core language, such as letbinders for sharing common subexpressions.But the syntax of reified terms (see Listing 1) does not support let-binders; common subexpressions are expanded upon reification and the resulting reified terms can be very large.This degrades both the efficiency and the quality of the analysis.For example,  occurrences of a subexpression that produces a single relative error term will cause VCFloat to introduce  independent  error variables.In contrast, FPTaylor and PRECiSA maintain the dependence between round-off errors generated by common subexpressions.
Precision-tuning tools like FPTuner [15] and Precimonious [41] suggest, without proofs, which floating-point operations in a program can be done at double, single, or half-precision.The accuracy of these tuned programs could, in principle, be proved by VCFloat2.Herbie [39] discovers transformations of straight-line expressions that improve the floating-point round-off error; Becker et al. [5] used Daisy to produce sound upper bounds on the round-off error of rewrites produced by Herbie.

Conclusion
VCFloat2 permits automatic round-off error analysis to be done as part of a larger numerical analysis that treats algorithm correctness, discretization analysis, and low-level program correctness, all in the same general-purpose logic and with end-to-end composable, machine-checked proofs.For that purpose, it provides a language for writing floatingpoint functional models that clearly and simply relate to realvalued functional models.Below VCFloat2, VST (or other another tool) can reason about C program correctness and connect to VCFloat2's functional models.Above VCFloat2, tools for proofs in Coq about the properties of real-valued functional models are an exciting area for future research.
Future work.There are several interesting directions for future work: (1) Using multivariate Taylor models, perhaps as a form of optimize_for_interval at Point C of our process described in Section 3, would allow VCFloat2 to perform error analysis similar to FPTaylor.(2) Including an entire library of double-double operations in VCFloat2; we have only included a single operation to demonstrate the capability of our nonstandard-type mechanism.(3) Allowing input variables to come with their own error bounds (instead of being assumed exact) would give VCFloat2 modularity.(4) Allowing let-binders in expressions, or automatically handling common subexpressions, would improve overall accuracy bounds and would improve efficiency by reducing the number of error variables.

Listing 4 .Listing 5 .
Definition TwoSum (a b : ftype t) := let s := a+b in let a' := s−b in let b' := s− a' in let da := a−a' in let db := b−b' in (a+b, da+db).Definition Fast2Sum (a b : ftype t) := let s := a+b in let z := s−a in let t := b−z in (s, t).Definition DWPlusFP (xh xl y : ftype t) := let (sh, sl) := TwoSum xh y in let v:= xl+sl in Fast2Sum sh v.The DWPlusFP operation used in the construction of a nonstandard operation on double-doubles in VCFloat2.void dw_plus_fp(struct dword * st,struct dword * x, double y) { double v; struct dword sh; two_sum(&sh,x→ s,y); v = x→ t + sh.t; fast_2sum(st,sh.s,v);} The double-word plus a floating-point number operation implemented in C. functional model DWPlusFP.Now one can use DWPlusFP in VCFloat2 as a user-defined function operating on a userdefined type, and VCFloat2 can reason automatically about round-off error of programs that call this function.