Galápagos: Developing Verified Low Level Cryptography on Heterogeneous Hardwares

The proliferation of new hardware designs makes it difficult to produce high-performance cryptographic implementations tailored at the assembly level to each platform, let alone to prove such implementations correct. Hence we introduce Galápagos, an extensible framework designed to reduce the effort of verifying cryptographic implementations across different ISAs. In Galápagos, a developer proves their high-level implementation strategy correct once and then bundles both strategy and proof into an abstract module. The module can then be instantiated and connected to each platform-specific implementation. Galápagos facilitates this connection by generically raising the abstraction of the targeted platforms, and via a collection of new verified libraries and tool improvements to help automate the proof process. We validate Galápagos via multiple verified cryptographic implementations across three starkly different platforms: a 256-bit special-purpose accelerator, a 16-bit minimal ISA (the MSP430), and a standard 32-bit RISC-V CPU. Our case studies are derived from a real-world use case, the OpenTitan security chip, which is deploying our verified cryptographic code at scale.


INTRODUCTION
As Moore's law slows, we have seen an explosion of new, custom hardware designs that aim to increase performance and/or reduce power consumption relative to general-purpose processors [33,35,36,41].In our IoT-entranced world, these devices are inevitably connected to the Internet, and hence require cryptographic implementations for tasks like checking firmware integrity or establishing secure connections to remote servers.Such tasks place the cryptographic implementation on the system's critical path, making high performance crucial.
Historically, cryptographic providers such as OpenSSL [47] have met these performance demands via hand-written assembly code that utilizes platform-specific optimizations (e.g.NEON [6] or AES-NI [26]), capturing performance gains missed by generic compilers.Emerging heterogeneous platforms reinforce this trend, since compilers for them (including one of our case studies) may not be developed until long after the platforms are deployed, making hand-crafted low-level code a necessity.
Unfortunately, manually writing such low-level code invites vulnerabilities; e.g., OpenSSL has reported 33 CVEs since 2021 [46], of which 29 are memory safety or function correctness bugs.Formal software verification can statically prove an implementation free of entire classes of vulnerabilities, but prior work in this area is ill suited to a world of heterogeneous hardware ( §6).
When supporting heterogeneous platforms, verification cost and specialization-based performance are at odds.A large swath of work [5,9,21,53,59,65,68] verifies high-level source code and then assumes a standard compiler produces correct assembly without introducing vulnerabilities.This approach reduces verification costs, but it sacrifices specialization-based performance gains [11]; it is also infeasible for platforms that lack a compiler.Other work directly targets assembly implementations [2,3,11,12,14,24,54,55,61].This approach retains performance but targets only specific platforms.Hence the effort to verify a cryptographic algorithm (say, ECDSA [32]) grows linearly with the number of platforms targeted.
Our Approach.We present the extensible Galápagos 1 framework, which reconciles the need for low-cost verification with the performance gains from specialization in the multi-platform setting.Taking a cross-platform view emphasizes the importance of creating reusable abstractions across platforms, amortizing development costs.Galápagos supports such abstractions by allowing the developer to write high-level implementations and proofs that are parameterized by an abstract machine model, making them machine-independent.Galápagos also generates a common highlevel interface for hardware ISAs, making it easier for the developer to connect platform-specific reasoning to the machine-independent proofs.These two forms of abstraction significantly reduce the developer's hardware-specific proof work, without compromising the run-time performance of their code.
Abstract Implementation.A Galápagos developer initially writes an abstract implementation that captures their machineindependent decisions and proves them correct.They use as many named variables as they wish (unconstrained by finite registers), interact with immutable sequences of structured data (rather than byte-level memory accesses), and can thus focus on proving the algorithm's mathematical correctness.For example, the developer might decide to implement the Cooley-Tukey (CT) algorithm (Algorithm 2) to realize the number theoretic transform (NTT).The correctness of CT is justified by the properties of polynomial rings, which can be proven independent of any specific platform.
The abstract implementation is bundled into a functor using support we added to Dafny ( §3.1).A functor is a special type of module (a collection of types, functions, and proofs) that takes one or more modules as arguments and produces a new module.In our case, the abstract-implementation functor is parameterized by an abstract machine module that provides generic word-size operations, which makes the functor reusable across architectures.For instance, the classic Montgomery multiplication algorithm (Algorithm 1) is described in terms of some unspecified radix (word size), and the core operations (e.g., addition and multiplication), the various iteration counts, and even the pre-computed constants all depend on the radix.Nonetheless, the algorithm can be proven generically correct given an abstract machine model.
Platform-specific Instantiation.To target a new platform, as with prior work, the programmer must obtain (or write) a specification that defines the semantics of the hardware's ISA.For example, they might define a machine module with 256-bit words that supports addition and multiplication via hardware-specific instructions.They can then apply the abstract implementation's functor to this machine-specific module to instantiate a machine-specific module (containing a machine-specific algorithm and corresponding proofs).Note that this instantiated module is obtained for free, and it is now committed to the 256-bit word size.
Assembly Implementation.In the final step, the developer must show that an assembly implementation is working as described by the machine-specific algorithm in the instantiated module above.The assembly can be hand-written, produced by a compiler, or any combination thereof.Regardless, the developer must prove that each assembly routine realizes an algorithmic step (typically a fairly straightforward process).Crucially, however, they do not need complex proofs showing why those algorithmic steps are correct.Those proofs come for free from the instantiated module!However, the instantiated module still operates over a highlevel structured memory, whereas a hardware-level ISA typically operates over bytes.To manage this complexity, Galápagos supplies tools to automatically raise the level of abstraction for each platform.Specifically, Galápagos provides a functor-based, verified abstraction layer that translates a machine's low-level byte-oriented memory interface into a memory with a structured heap and stack.
Tooling and Library.To help with proof reuse and automation, Galápagos includes several improvements to the Dafny language as well as its first standard library.
Functors for Dafny.Creating and managing abstractions is critical for Galápagos.Hence, we introduced verified, ML-style functors to Dafny.This required adapting higher-order functional concepts to Dafny's imperative, first-order design.
Algebra Solver.Non-linear arithmetic is endemic to cryptographic algorithms.However, the state-of-the-art SMT solvers, which tools such as Dafny rely on, struggle to reliably handle nonlinear reasoning [22,30].Prior work has shown the effectiveness of algebra solvers in the Coq interactive theorem prover [61].We added similar support to the Dafny automated theorem prover, resulting in more concise proofs.
Standard Library.We developed the first standard library for Dafny (now distributed and maintained by the Dafny engineering team at Amazon) with over 5,800 LoC, 80 definitions, and 381 lemmas providing extensive verified facilities for reasoning about collections (e.g., sequences of bytes), translations between different ways of representing large integers in word-sized chunks, and a comprehensive collection of properties about non-linear arithmetic.
Case Studies.We base our validation of Galápagos ( §4) on a real-world use case: the OpenTitan security chip [49].Designed by partners including lowRISC and Google, OpenTitan is an open source TPM-like [60] chip that can provide a hardware root of trust for a wide variety of devices and applications.At the heart of OpenTitan's security architecture is a secure boot process [25,50] that loads and executes properly signed code only.The code implementing OpenTitan's secure boot (including the cryptographic routines) is baked into the chip's ROM, meaning that any flaws must be addressed by physically recalling the flawed chips, printing a new multi-million-dollar hardware mask, and then fabricating and distributing new chips.
Further complicating the story, OpenTitan includes both a 32bit RISC-V [57,63] main core and a custom 256-bit big-number accelerator (dubbed the OTBN), and for extra resiliency, OpenTitan aims to support secure booting with and without the OTBN enabled.Hence, in our case studies, we have used Galápagos to produce fully verified implementations of OpenTitan's existing RSA-3072 signature verification routines for both RISC-V and OTBN.Our verified code has been burnt into the mask ROM currently in use for fabricating OpenTitan chips, the first instance, to our knowledge, of formally verified cryptographic code baked into hardware at scale.
To further validate Galápagos's ability to support heterogeneous hardware, we developed (in less than a week) an implementation for yet another architecture, the MSP430, a tiny 16-bit ISA with only 27 instructions, developed by TI for low-power embedded devices.We intentionally avoided ARM and x86 since they are quite standard and well studied in prior work [2,3,11,12,14,54,55].
Our evaluation ( §5) finds that Galápagos reduces the effort to define a new ISA by 30-50%, and the proof burden for target-specific implementations by 30-60%.Further, Galápagos's approach produces implementations with speed comparable to (and in some cases faster than) our unverified reference implementations.
Altogether, our case studies consist of approximately 36K lines of specification, code, and proofs, which, along with our tool improvements, are available online as open source [66].
Limitations.Galápagos still requires the developer to produce low-level implementations of their algorithms; for scenarios where compilers exist and performance is not essential, other approaches may require less developer effort.Our case studies focus on signature verification, where side channels are irrelevant, so Galápagos concentrates on functional correctness; standard extensions from prior work [11] could support reasoning about side channels.Like any verification effort, the soundness of our results depends on the correctness of our specifications (both of the cryptography and the machine semantics) and of our verification tool (Dafny).
Contributions.In summary, this research: • Presents the Galápagos framework, which reduces developer effort for cross-platform cryptographic implementations.

BACKGROUND
Vale.Galápagos builds atop the Vale framework [11], which supports the verification of low-level, high-performance code.Figure 1 shows a sample Vale procedure that quadruples its input.The procedure's signature declares that it reads from register a0 and modifies registers a1 and a2.It also claims that if the input satisfies its precondition (the requires clause), then the output in a2 will satisfy the postcondition (the ensures clause).It makes two procedure calls, which here correspond to individual assembly instructions.Vale discharges proof obligations (e.g., that the preconditions imply the postconditions) by embedding the implementation code in a backend verifier (in our case, Dafny) which reasons about the implementation using a model of the target machine's hardware semantics.The verifier produces mathematical formulas and checks their validity with an SMT solver (in our case, Z3 [17]).
Thus, Vale proofs of correctness require a formal semantics for the underlying hardware.These may come from the hardware manufacturer (e.g., from ARM [56]), from prior academic work [7,16], or the developer can write their own.Figure 2 shows a simplified sample of such a definition.It declares that the machine's state consists of a collection of named 32-bit registers, a memory that maps integer addresses to bytes, and an ok flag that indicates whether code has executed successfully without crashing.The eval_code predicate defines the semantics, i.e., it dictates how the execution of code c causes the machine to transition from state s to state r.
To aid proofs about their implementation, the Vale developer typically writes and proves additional lemmas directly in the backend verifier and invokes them from Vale.
Dafny Abstract Modules.Galápagos exploits proof reuse, which standard Dafny supports (to a degree) through abstract modules.An abstract module declares an interface, which can be implemented by concrete modules.Dafny generates verification conditions that ensure the concrete module adheres to the interface.
Consider the example in Figure 3.The abstract module ring declares an elem type and functions over it.The int_ring module refines the interface by declaring that elem has type int and providing bodies to the functions.Importantly, add's body must satisfy the idempotency property specified in the abstract module.
Dafny also allows an abstract module to import other abstract modules, allowing access to their contents.Continuing with our example, suppose we want to implement a forward NTT generically over any ring.In FNTT, we can use the syntax import R : ring to use an unspecified module R that promises to implement the ring interface.Now we can use functions in R to perform more complex operations without assuming a particular implementation of R.add.
However, Dafny's basic module system falls short in a subtle but important case.Suppose now we want to implement an inverse NTT, and then use the two NTT modules to implement polynomial multiplication, all generically over some ring.The issue arises with poly_mul, where Dafny has no way to specify that the imported modules F and I are parameterized by the same underlying ring.

THE GALÁPAGOS FRAMEWORK
Galápagos is an extensible framework for developing high-performance cryptographic implementations on different platforms.As shown in Figure 4, the developer proves an abstract implementation ( §3.2) correct once and then reuses it across different platforms.For each platform ISA, Galápagos automatically generates a proven-correct, higher-level interface ( §3.3).The concrete assembly implementation ( §3.4) can thus be written on top of this interface, allowing easier access to the proofs provided by the abstract implementation.
To support an existing crypto primitive on a new platform, the developer supplies a new ISA specification and a corresponding assembly implementation.To support a new cryptographic primitive on existing platforms, the developer adds a new cryptographic specification, along with corresponding assembly implementations.The remainder (shown in purple) comes automatically from Galápagos.
To provide the abstractions needed to achieve code reuse and amortize development costs, Galápagos relies on our introduction of functors to Dafny ( §3.1).Proof automation is further aided by new solver support ( §3.5) and standard libraries ( §3.6) that we added.

Adding Functor Support to Dafny
Galápagos relies on abstraction to reduce developer effort.Dafny's existing module system was too limited for Galápagos ( §2), so we expanded its expressivity by introducing ML-style functors [20].S8) is proven to refine a crypto spec (S0).An assembly implementation (I3) is proven to refine a width-specific instance (I2) of the abstract impl (I1).The assembly implementation (I3) is written on top of an automatically generated instance (A4) of a higher-level hardware interface (A7), which is proven sound against the low-level ISA spec (S5).The ISA spec, in turn, is defined using an instance (S6) of the generic machine operations (S8).Given S0, I1 is written once; I3 and S5 are written once per-platform; and Galápagos provides I2, A4, S6, A7, and S8. Figure 12 shows how our case studies apply this workflow.
Functors are functions from modules to modules.In our implementation, a functor is a module that takes other modules as arguments (each argument is given a type defined by an abstract module), and the code and proofs in the functor are written in terms of the module arguments.The developer can instantiate the functor by applying it to concrete modules that refine the formal arguments' types.A functor thus allows a collection of code and proofs to be reused when instantiated with different module arguments.
Using functors, we can now successfully implement the polynomial multiplication example from §2.As shown in Figure 5, FNTT is now a functor that takes a module R of type ring as an argument and returns an instantiation of the FNTT code and proofs specific to that concrete argument.Applying FNTT to a different ring module produces a different concrete instantiation.The crucial benefit of using functors (as opposed to Dafny's existing module system) is that when two functors are applied to the same argument (e.g., the ring module in poly_mul), we can successfully unify the types coming from the two different instantiated modules.Below, we expand on our functor design choices using Dreyer's terminology [20].Applicative.Our functors are applicative, meaning that applying the same functor to the same argument(s) in two different contexts still produces the same concrete module.This is crucial for unifying types in examples like Figure 5.Our design contrasts with SML's generative functors, where each application generates a fresh copy of types, even with the same argument module(s).For example, in A = FNTT(IntRing) and B = FNTT(IntRing), A.elem and B.elem will not be of the same type with generative functors.
Second-Class, First-Order.Similar to most ML dialects, our functors are second class, meaning the module system exist in a different plane from ordinary functions and types.Specifically, a module cannot be passed to or returned from ordinary functions, nor can it be stored in datatypes.Our functors are close to being first-order, since they cannot be partially applied, but they can be parameterized by other functors, which is a higher-order property.
Proof Obligations.Unlike most other functor-supporting languages such as OCaml or ML, Dafny's types and methods come with verification obligations.Hence, when extending Dafny to support functors, we had to carefully ensure that the proof of a functor's correctness relies only on the properties promised by the abstract module "types" of its formal parameters, not any details of the concrete instantiations.In exchange, we gain verification efficiency: we need only verify the abstract implementation once; i.e., no additional verification work is required when instantiating the functor with concrete module arguments, since those arguments have already been proven to refine the corresponding abstract modules.

Writing an Abstract Implementation
A key aspect of the Galápagos framework is that the developer initially writes an abstract implementation of their desired cryptographic primitive.This implementation captures their algorithmic decisions and optimizations.Since it is written against a generic, high-level machine model, proving these decisions and optimizations is much simpler than it would be for a concrete implementation cluttered with hardware-specific details like finite registers, byte-level memory access, etc.Once the developer instantiates the generic machine module for a concrete hardware platform, Galápagos provides a hardware-specific version of the correctness proofs.To illustrate this process, we first introduce the generic machine model and then show how the developer uses it to write their abstract implementation and prove it correct.
Generic Machine Operations.As shown in Figure 6, the Galápagos generic machine model is provided as an abstract module in Dafny.An abstract module ( §2) omits implementations, so that other modules can provide those details by refining the abstract module in different ways.For instance, in the generic machine, uint represents the architecture's word size, but it is defined in terms of the upper bound BASE(), which deliberately omits a definition.
Within this module, Galápagos then provides various common hardware operations, including arithmetic operations, bit shifts, etc.These are defined in terms of uint words, without any knowledge of what the actual value of uint will be, other than the information from the ensures clauses, i.e., that BASE() will be even and larger than 1, which is convenient, for example, when defining msb.
To target a new platform, the Galápagos developer starts with a concrete module that refines the generic module above by filling in the missing definitions; for example, here is an excerpt of the definition for the 16-bit operations.module bw16_ops refines generic_machine_ops { function BASE() : (v : nat) { 0x10000 } // addc, msb and to_nat are obtained for free!} Dafny checks that the refinement is valid (e.g., that the definition of BASE() is even and greater than 1 in this case) and then automatically fills in concretized versions of the abstract operations.In other words, we can now invoke bw16_ops.addcto talk about add-with-carry over 16-bit words.
Abstract Implementation.With Galápagos, a developer aims to capture the essence of their implementation strategy while abstracting away the complexities of a low-level executable.This makes proofs of correctness far simpler.The abstraction of implementation details takes several forms.
First, the developer can use an unlimited number of named variables, rather than worry about finite registers.Second, rather than reason about byte-level memory operations, they instead write their implementation by reading and updating immutable sequences of  structured data (e.g., word-sized values).When a sequence is updated, it produces a copy of the original sequence with the corresponding element changed (similar in spirit to copy-on-write files).Hence every sequence is unique and unchanging, making reasoning far simpler since it, among other benefits, eliminates any aliasing concerns.Finally, the developer writes their implementation using the operations from the generic machine model (Figure 6).
To illustrate this process, Figure 7 shows an example of an abstract implementation of multi-word addition.Algorithms like RSA operate over large integers that cannot fit into a single machine word and must instead be represented by a sequence of words (or "limbs") stored in memory.In the example, when we define addition (big_add) over large integers, instead of explicitly referencing the memory, xs and ys are each represented using an immutable sequence of machine-words.Because sequences are ordinary values (just like integers), Dafny can trivially see that modifications to xs have no effect on ys (and vice versa), whereas a low-level implementation would have to worry about potential pointer aliasing.The implementation defines multi-word addition recursively, using variables like len and z to represent intermediate values.It also invokes the generic addc operation from the generic_machine_ops module to propagate the carry bit.
Given the abstract definition of multi-word addition, the developer can then generically prove its correctness, as shown with the big_add_correct lemma.Notice that the first ensures clause says that the result has the expected number of elements, while the second one shows that the addition is computed correctly if each sequence of words is converted into a single big integer value.
As shown in Figure 7, the abstract implementation is a functor parameterized by a machine module.This functor can be instantiated by applying it to a module that refines the formal argument's type.For example, generic_big_add_impl(bw16_ops) instantiates a concrete module, which has 16-bit definitions of big_add and corresponding 16-bit lemmas such as big_add_correct.

Memory Abstraction
Having written an abstract implementation ( §3.2) and instantiated it to specific platforms using functors ( §3.1), the Galápagos developer must use the resulting platform-specific proofs to show the correctness of their concrete, hardware-specific implementation.The concrete implementation is ultimately written using the hardware's ISA, formalized in Dafny.As discussed in §2 and shown in Figure 2, the ISA and its formalization operate at a very lowlevel compared to the abstract implementation and proofs.One particularly challenging aspect is that an ISA typically defines a flat, byte-level memory model.For example, the RISC-V model in Figure 2 maps integer addresses to bytes; this means that a 32-byte write to address, say, 0x400, affects the four bytes at addresses 0x400, 0x401, 0x402, and 0x403.Such a model is much harder to reason about than the high-level immutable sequences used in the abstract implementation, since the developer must carefully maintain invariants about which memory regions contain which data, and carefully prove at every memory operation that they are accessing the intended data.
To simplify this reasoning and bring the concrete implementation closer to the abstract implementation, Galápagos generalizes prior one-off memory abstraction techniques [11] by providing automatic support for abstracting an ISA's memory model.Specifically, Galápagos uses a functor to define a generic higher-level interface with a structured heap and stack.
As shown in Figure 8, the abstract heap maps an address to a sequence of uint words, whose size is specified by the developer.Similarly, the abstract stack is a sequence of frames, where each frame is also a sequence of words.The abstraction layer soundly preserves invariants showing that operations over structured memory are accurately reflected in the underlying byte-oriented memory.
As with the abstract implementation, the developer instantiates Galápagos' abstraction layer by defining the size of the memory entries they want to reason about.As we illustrate below for RISC-V, this instantiation enables a richer interface for memory instructions.
Accessing Heap Buffers.Many cryptographic implementations iterate over fixed-size buffers, e.g., while reading a plaintext message.Galápagos's memory abstraction provides an iterator interface to support such access patterns.This interface allows the programmer to reason in terms of word-sized (or larger) reads and writes made to immutable sequences of data.As a result, the developer can directly invoke the definitions and lemmas instantiated from the abstract implementation ( §3.1), which is conveniently written in terms of sequences of structured data.
The main iterator type is iter_t, which abstracts over a structured heap entry.Its invariant, iter_inv, guarantees that the iterator is well formed; for example, it ensures that the heap entry exists, that the current index is within the buffer's bounds, that the buffer's view of that region of memory as a sequence of uint words is consistent with the heap's state, and that a given address, addr, is consistent with the iterator's index.
Once the generic memory layer is instantiated for a hardware platform, Galápagos wraps the iterator interface around low-level memory accesses.Figure 9 shows the Vale procedure lw_heap that corresponds to the underlying hardware's load word instruction (RV_LW ) from Figure 2. In addition to the underlying instruction's   2).The proof relies on invariants maintained about iterator validity (shown in Figure 8).three arguments (dst, src, offset), the wrapped version takes two additional arguments, namely inc and iter_t.As shown in the ensures clauses, the inc flag controls whether the iterator should be advanced upon return.The caller of lw_heap must show that the iterator is safe (i.e., within its buffer's bounds) and well formed (satisfies iter_inv).In exchange, the caller learns (from the first ensures clause) that the destination's value has been updated to reflect the value in the structured heap.
In other words, the caller can reason about the contents of the immutable sequences of uint words, without worrying about the underlying bytes in the flat memory model.The lw_heap procedure returns an updated iterator that is guaranteed to be well formed.This programming style also means that despite all of the complexities in iter_inv, the full definition is irrelevant for callers of lw_heap, since lw_heap maintains the invariant "for free".Figure 10 shows this in action.The procedure buff_sum computes the sum of the contents in the buffer pointed at by a1.It does so  A Vale procedure illustrating the use of the iterator interface to ergonomically process heap buffers.The iter_inv is maintained for free due to Galápagos abstraction layer design.Slightly elided detail: sum is a wrapped sum rather than mathematical sum due to overflow.via pointer manipulation (e.g., incrementing a1 by four on each loop iteration), but the correctness of these memory operations is maintained by the iterator iter', which lw_heap updates.
Galápagos offers a similar interface, sw_heap, that wraps RISC-V's store word instruction.Like lw_heap, it takes in and returns an iterator, guarding the heap-buffer writes and maintaining the well-formed property of the iterator.
Accessing Stack Variables.Galápagos' memory abstraction layer also provides a structured stack as a generically-proven abstraction over the byte-level memory.The stack is a sequence of frames, each containing several slots for local variables.This makes it simpler for the implementation to prove that variables spilled from registers to the stack retain their value until the next access.Variables in the current frame can be read through the procedure lw_stack, which is another wrapper around the load word instruction (RV_LW), except the source-address register is hard-coded to be the stack pointer (SP).Stack frames can be added and removed using the procedures push_stack and pop_stack, which are wrappers around subtraction from and addition to the stack pointer.

Assembly Implementation
With Galápagos, the developer provides, in Vale, a hardware-specific implementation of their cryptographic primitive.They can do this by transcribing the assembly output by a compiler (e.g., when run on C reference code), by handcrafting the Vale assembly to exploit optimization opportunities missed by a generic compiler, or any mix of these strategies.
As they write their implementation, they interact with memory via the high-level, structured memory interfaces provided by Galápagos ( §3.3).This makes it straightforward to invoke the definitions and proofs from the hardware-specific instantiation of the abstract implementation ( §3.2).For example, because Galápagos' iterators abstract the ISA's byte-level memory into sequences of structured  By writing the implementation's pre-and postconditions in terms of the abstract implementation's definitions (from Figure 7), the developer can easily invoke the corresponding generic lemma concretized to the this platform.data, the iterators' sequences can be passed directly to the lemmas proven about the abstract implementation.
To illustrate this process, Figure 11 shows an excerpted version of the concrete RISC-V implementation of multi-word addition.It takes in an iterator for each of the , , and  buffers.Internally, it uses the lw_heap and sw_heap procedures to interact with these buffers in terms of the immutable sequences contained in the iterators (e.g., in x_iter.buff).This allows the implementation to easily invoke the concretized proof from the abstract implementation (i.e., big_add_correct from Figure 7), since both operate over the same high-level sequences.The proof demonstrates that the assembly implementation has successfully computed a step of the abstract implementation (namely computing the sum).

Algebra Solver Support
Algebraic reasoning is a common theme in cryptographic proofs.The highly parameterized nature of Galápagos also means that many architecture-specific constants cannot be assumed, resulting in formulas with more symbolic components.
Due to the undecidable nature [42] of general non-linear problems, SMT solvers (including Z3, the solver Dafny relies on), while quite effective at many logical theories, often struggle with nonlinear reasoning.However, certain sub-classes of non-linear formulas such as congruence relations have been shown to be decidable and robustly handled by dedicated algebra solvers [28].
Inspired by prior work [61] in the interactive theorem prover setting, we have extended Dafny to offer similar support for the Singular algebra solver [18].A developer can provide a proof goal and relevant facts (proven in standard Dafny) and then explicitly invoke the solver via the new gbassert keyword.We provide more details on our encoding in Appendix A.

Dafny Standard Library Support
Dafny provides a basic set of language features (e.g., sequences or maps) for defining and proving the correctness of an implementation.However, any additional properties must be proven from scratch by the developer.As a result, previous Dafny projects [11,13,22,27,29,30,39,40] have each developed their own projectspecific libraries.This has contributed to significant duplication of effort across projects and even across time, as these project-specific libraries are typically not maintained as Dafny actively evolves.
Early in Galápagos's development, we observed that we would need many of the same properties proven by previous projects, so rather than adding yet another project-specific collection, we have created the first Dafny standard library.The library offers a collection of definitions and lemmas, all fully verified with the latest version of Dafny.They cover data structures (e.g., maps, sequences, and sets), parameterized big integers represented as multi-limb sequences, and an extensive non-linear algebraic properties for dispatching problems algebra solvers ( §3.5) cannot handle.
In creating the new library, we drew upon code and proofs from past projects, but rewrote them in a uniform style (both syntactically and in proof style).We also extended them to fill in obvious gaps.The main components covered by our version is discussed below.
Data Structures.Dafny provides built-in support for sequences, maps, and sets, making them convenient for modeling a wide variety of systems.On top of these functional data structures, we added more robust support for performing and reasoning about insertion, removal, extrema, subsequencing or subsetting, conversions between data structures, and higher-order functions (fold, filter, etc.) over the data structures.
Big Integers.As discussed earlier, cryptographic algorithms often operate on large integers that cannot fit into a single machine word.We provide a parameterized library for representing such large integers as multi-limb sequences.The library includes operations such as big_add shown in Figure 7, lemmas about results of the operations, and lemmas describing the effect of converting between large integers represented by different bases.The latter simplify the reasoning about, say, converting the representation of a number as a sequence of bits into a sequence of 32-bit words.
Non-linear Arithmetic.As discussed earlier, another common theme in cryptographic proofs is algebraic reasoning.While fragments of non-linear reasoning can be decided (as we do with our newly added Singular support - §3.5), the problem as a whole is undecidable.SMT solvers rely on various heuristics to nonetheless try to solve at least some non-linear problems.Unfortunately, in our experience (and that of previous work [22,30]), such heuristics are unreliable; they can fail to solve seemingly simple problems, and even when they succeed one time, the proofs can break in response to seemingly minor perturbations, even something as simple as variable renaming.To mitigate these effects, our library proves a set of common algebraic properties from first principles and make them available as lemmas.These lemmas are exposed with varying levels of automation built in.Users can invoke very general lemmas (e.g., exposing lots of properties about multiplication), which provide significant automation but may create proof performance problems.Alternatively, developers can invoke tailored lemmas that specify one property (e.g., multiplication is commutative) or even choose a version where they specify exactly which variables in an equation the property should be applied to (e.g., they can specify  and  as arguments to the lemma to show that  *  ==  * ).These more specific versions require more manual developer work but they provide consistently provide fast, deterministic performance.The library has been adopted by the Dafny team at Amazon, who have added it to Dafny's continuous integration tests, which run on each commit to the main Dafny repository.The presence of a unified standard library has already encouraged additional contributions from other Dafny developers, including support for monadic operations, searches, sorts, and a Unicode library.

CASE STUDIES
As discussed in §1, Galápagos' initial case studies were motivated by the need to support the secure boot of the OpenTitan security chip [49].OpenTitan aims to process RSA signatures on both the main RISC-V core and on the custom OTBN accelerator.Having both implementations provides a fallback in case the OTBN accelerator is later discovered to have a flaw, or if manufacturers decide to omit the OTBN to save cost and energy.The RSA signature verification routine is used to validate the firmware's integrity at the very beginning of the boot process; this code is burned into the chip's boot ROM, so it cannot be updated through software or microcode patches, only by recalling the chip, designing a new ROM mask, and manufacturing new chips.Hence, the security and correctness of the implementation is crucial.
To further test Galápagos' expressivity, we added yet another hardware platform, the MSP430.We also added a second, latticebased cryptographic primitive, Falcon, recently standardized by the NIST post-quantum competition.
In this section we elaborate on both the hardware platforms and our verified implementations.Figure 12 illustrates how our case studies exercise the development process from Figure 4.

Case Studies: Hardware Platforms
Our case studies target three ISAs operating at different bit-widths, using different addressing modes, and supporting different arithmetic operations.We have developed formal semantics for each ISA in Dafny.These semantics are trusted, but we increase our confidence in them by running fuzz tests that compare the output of our semantics with those produced by reference simulators.
MSP430 is a microcontroller family developed at Texas Instruments [10].It offers a minimalist 16-bit ISA with only 27 instructions (omitting, for example, multiplication).MSP430 memory is byte addressable, and its instructions have six possible addressing modes: register, indexed, absolute, indirect register, indirect auto-increment, and immediate.
RISC-V is an open standard ISA family [57,63].For our case study, we use RV32IM, which is the 32-bit base integer ISA (47 instructions) with extensions for integer multiplication and division (8 instructions).The instruction set is quite standard, with a 32bit address space and byte addressable memory.There are only three data addressing modes: register, immediate, and indexed.One interesting wrinkle is that unlike most platforms (including our other two) RISC-V does not have a dedicated flags register for zero, overflow, or sign bits; instead the developer is expected to check for such conditions using standard ALU operations.
OTBN is a cryptographic accelerator ISA from the OpenTitan project led by lowRISC.OTBN operates on 32 control registers, each 32 bits wide, and 32 data registers, each 256 bits wide.Hence, the data registers alone can potentially hold 1KB of data without any memory accesses.OTBN is designed to accelerate cryptographic computations involving large integers, such as those used in RSA or elliptic curve cryptography.OTBN supports 57 instructions, many of which offer configurable options.For example, the BN.MULQACC instruction performs a quarter-word (64 bit) multiplication and then adds the result to a dedicated accumulation register.The instruction can be customized to choose different quarter words from each source/destination register, to shift the multiplication result before accumulating it, and to clear the accumulation register before adding the result.
For the data-memory instructions, BN.LID and BN.SID, a control register provides the index of the data register as an operand, indirectly reading and writing the wide registers.The instructions read/write 256-bits of data memory and support indirect addressing modes with auto-increment.
Memory Abstractions.Despite the differences in bit-width, memory size and addressing modes, Galápagos' common memory abstraction applies smoothly to all of the hardware platforms.
Instantiating the Galápagos structured memory for each is simple.For each platform, the developer only needs to specify the maximum memory size, the stack size, the word size, and the types for heap entries.Given these definitions, Galápagos automatically generates the high-level memory interface ( §3.3), as well as refinement proofs showing that the interface is sound with respect to the byte-level memory model in the trusted ISA semantics.The developer wraps the generated abstractions around platform-specific instructions and uses those to write the platform-specific implementations.
We return to lw_heap pattern in Figure 9 for an example in RISC-V.The actual RV_LW instruction (from Figure 2) only supports register plus immediate addressing mode.This can be made compatible with the iterator interface by combining lw_heap with an explicit addi instruction to increment the pointer, or by simply setting inc to false.
The indirect auto-increment mode in the MSP430 uses a register operand as a pointer, and it increments the pointer after performing the load.This matches the programming pattern that moves the iterator of an array to the next entry after reading the current entry.
There is a similar story on OTBN load instruction.The full syntax of the instruction is:

BN.LID <grd>[<grd_inc>], <offset>(<grs>[<grs_inc>])
Both grd and grs are 32-bit control registers, where grd specifies the index of the wide register to use as a destination, and grs along with the offset specifies the source memory address.Suppose that grd is register x1, which contains the value 0x3, grs is register x16, which contains the value 0x8000.With no offset, this instruction will load the 256-bit word at address 0x8000 into data register w3.
We note that there are options to increment the control registers, which also correspond to the lw_heap iterator pattern.

Case Studies: Cryptographic Algorithms
RSA. RSA signatures are simple to specify in terms of modular exponentiation of integer values.RSA implementations, however, are amenable to a wide variety of algorithmic and assembly-level optimizations.The algorithmic optimizations are quite complex to reason about even in isolation, let alone in the midst of a complicated assembly-level implementation.Hence Galápagos' split of these obligations between the implementation and the hardwarespecific implementation simplifies our correctness proofs.Abstract Implementation.Our abstract implementation, following the style of OpenTitan's unverified baselines, employs the Montgomery multiplication algorithm [43] to efficiently implement modular exponentiation.Algorithm 1 shows the pseudocode of the algorithm.Notably, the algorithm (and our abstract implementation) is parameterized over both by the radix (e.g., the machineword's upper limit) and by the size of the big integers, which are represented by sequences of machine words, like the multi-limb sequences in §3.2.
Notice that Line 3 of the algorithm accumulates an intermediate result and requires several multi-limb operations (e.g.,  •  is a product between a multi-limb sequence  and a machine word, which produces a multi-limb result, and similarly for  [] •).Therefore, in the abstract implementation, this line translates into a loop, which handles the element-wise products and sums.
We show our abstract Montgomery multiplication implementation correct by proving the following facts: (a) the output is congruent to  − , and (b) it is bounded by .To prove those, we need to construct appropriate loop invariants.For example, in the loop over  starting on Line 1, two invariants are  ≡  [..] − (mod ) and  < 2.While the congruence proof above fits perfectly into the subset handled by the extension to Dafny ( §3.5), the bound proof does not.Thus for the latter part we rely on lemmas about non-linear arithmetic from our new Dafny standard library ( §3.6).
Below we expand on the proof of invariants in the main loop of Algorithm 1, starting from Line 1.The two main invariants are the congruence relation and the bound.i.e.  ≡  [..] − (mod ) and  < 2.Consider  th iteration of the loop.We can show that the accumulation preserves the bound: The congruence proof roughly follows these steps: We prove that the least significant word of . We also note that (6) is due to the evaluation rule of multi-limb numbers.These invariants, along with the conditional subtraction at Line 6 of the algorithm, ensure correctness.Concrete Implementations.We ported the existing, unverified RSA implementations for RISC-V and OTBN into Vale.For the MSP430, we compiled a C version and transcribed the resulting assembly to Vale.For our proofs, we instantiate the abstract implementation's functor with hardware-specific modules that specify an appropriate radix for each platform (e.g., 2 16 for the MSP430).All three modules specialize RSA's integers to 3072 bits, to match OpenTitan's expectations.
Given the lemmas instantiated from the abstract implementation, proving the correctness of the hardware-specific implementations was relatively straightforward, mostly boiling down to proving various hardware-specific bit-fiddling optimizations.The OTBN implementation was relatively easy, since it could fit all of the RSA integers entirely into registers.Its two sets of flag registers simplified carry propagation, and the built-in accumulator register likewise simplified the multi-word computations.The most significant proof challenge was proving that the implementation correctly used the (very complex) BN.MULQACC instruction to compute the multiplication of two 256-bit numbers.
The MSP430 and RISC-V implementations resemble one another.Compared to OTBN, both support a simpler multiplication instruction, while RISC-V was complicated by the lack of a flags register.
Falcon.To validate that Galápagos is applicable to other algorithms, we have used it to produce verified implementations of Falcon [23], a post-quantum signature algorithm recently standardized by NIST.Falcon is based on lattices and its security reduces to the short integer solution problem [1], which differs drastically from RSA.
The spec for Falcon is relatively concise, although still more verbose than RSA, since it depends on definitions of polynomial arithmetic.Simplifying a bit, Falcon verifies a signature  over (hashed) message , using public key pk, by computing  ′ ←  −  • pk mod  and checking that the distance between  and  ′ is small.The signature and the public key are treated as polynomials, so the most computationally intense operation is computing the polynomial multiplication (i.e.,  • pk).
Abstract Implementation.Naively, a polynomial multiplication takes  ( 2 ) time, but this can be optimized to  ( log  ) using  is a power of two.
is a prime such that  ≡ 1(mod 2).
is a vector in Z   (standard order). is a primitive 2-th root of unity in Z  Ψ  is a vector in Z   with powers of  (bit-reversed order).Ensure: is the NTT of its initial content (bit-reversed order).end for 13: end for the number theoretic transform (NTT).In our abstract implementation, we employ the Cooley-Tukey (CT) butterfly algorithm [15] to compute a forward NTT operation (shown in pseudocode in Algorithm 2).Notice that the algorithm, like our abstract implementation, is parameterized over the prime  that defines the field and the size  of the polynomials.Hence, our generic NTT implementation can be instantiated for many other lattice-based algorithms beyond Falcon.
While the pseudocode in Algorithm 2 is relatively succinct, the justifications for why each step computes the right value are surprisingly subtle and are described across multiple research papers [37,38,44,45].
CLR R10 ; clear R10 SUBC R10, R10 ; subtract with overflow flag ; R10 is either 0x0000 or 0xFFFF AND 12289, R10 ; R10 is conditionally set to Q  We provide some brief intuition for the algorithm's correctness and refer the interested reader to [38] for more details.The NTT algorithm works with a sequence of words, where each word represents a polynomial coefficient in the ring Z  .Hence we can think of a sequence as a polynomial and reason about the effect of evaluating it on a point.If we have sequence  ∈ Z   and point  ∈ Z  , then the evaluation () can be written as −1 =0 [ ]  .Let  be the primitive -th root of unity in the ring Z  .The NTT algorithm evaluates the polynomial  at the points  0 ,  1 .. −1 .More formally, NTT() [] = −1 =0 [ ]   .The CT butterfly optimization uses the fact that polynomial evaluation can be split into the evaluation of the terms corresponding to even and odd powers.Let the corresponding coefficients be   and   , then we can rewrite () as   ( 2 ) + •  ( 2 ).This reduces the problem to to evaluating the polynomials   and   on the points  0 ,  2 .. 2(−1) .Since  is a primitive -th root, the list now only contains  2 distinct points.Applying this recursively produces the  ( log  ) running time.
Note, however, that for additional efficiency, Algorithm 2 is an iterative and in-place version of the CT butterfly.The loop over  that starts on Line 2 corresponds to the size of the polynomial, which doubles at each level.The loops over  and  combine the evaluations of the smaller polynomials.
Concrete Implementations.Having dealt with the complex mathematical reasoning in our abstract implementation, our concrete Falcon implementations focus on proving that they faithfully execute the operations dictated by the abstract implementation.Of the three implementations, the OTBN implementation is the simplest, since we were able to implement Falcon's many additions and subtractions modulo  by simply loading  into OTBN's dedicated modulus register and then invoking OTBN's modular addition and subtraction instructions.Implementing these operations on the MSP430 and RISC-V was more complex and involved some non-trivial bit manipulation.For example, on RISC-V the carry bit can be extracted through conditional branches, but Figure 13 is more efficient.Figure 14 shows another example from the MSP430.Without using branches, the code conditionally sets R10 to 12289 (the modulus ) based on the overflow flag.

EVALUATION
We aim to evaluate two key questions. (

Developer Effort
Below, we estimate how much effort is saved by applying Galápagos's abstractions, rather than writing them from scratch for each platform or algorithm.Hence we report the ratio of the generic part to the sum of generic and platform/algorithm-specific parts.
Case Study Hardware.Figure 15 measures the lines of code developed for our three hardware platforms.The generic row contains the abstract machine model ( §3.2) and the memory abstraction layer ( §3.3).The other rows show the additional lines of code needed to support each ISA's specification and abstraction.OTBN requires slightly more effort due to the complexities of the ISA's design.
The generic row is a one-time cost when developing the Galápagos framework.For the simpler ISAs, it saves up to half of the code that would have been written if developed without Galápagos.
Case Study Algorithms. Figure 16 presents the lines of code developed for our cryptographic algorithms.The specification and the generic implementation are the per-algorithm one-time cost.We note that the generic implementation for RSA is much shorter than Falcon's, largely due to the Dafny standard library's support for big-integer reasoning.For the concrete implementations, the Vale code embeds the concrete assembly while the Dafny code measures the additional platform-specific lemmas needed.Notice that the generic code reduces the proof burden for RSA by ∼ 30% and for Falcon by more than 60% (RSA has a lower ratio due to its heavy use of our standard library).
In our initial verification efforts, we verified implementations of RSA for the OTBN and RISC-V using traditional monolithic techniques from prior work [11,24,55].Motivated by the duplication across these implementations, we then developed the Galápagos framework and used it to refactor the code.This reduced the developer-written platform-specific code by 28% for OTBN and 29% for RISC-V.We then further leveraged the framework to both specify the MSP430 and add a custom RSA implementation, in approximately one week of developer effort.
With Falcon, we had the Galápagos framework in place, so we initially focused on the abstract proofs related to the NTT, which took 4 developer months.We then derived the platform specific implementations in ∼ 1 developer month.
Standard Library.As discussed in §3.6, we introduced Dafny's first standard library.Our case studies make heavy use of it, with ∼300 calls to standard-library lemmas.Figure 17 summarizes various statistics about the new library.Notice that even though the non-linear portion only includes a handful of definitions (primarily for basic recursive definitions of the various non-linear operations), it provides 249 lemmas proving properties of those definitions.Singular Support.Our case studies invoke Dafny's new Singular solver 27 times, often for properties that would have been quite painful to prove via manual lemma invocations.As evidence for this, we replaced 15 manual proofs with Singular invocations, eliminating ∼ 525 lines of proof code.

Performance
Hardware Setup.We execute our verified RISC-V and MSP430 code on two physical development boards and compare the cycle counts of our verified code against their unverified baselines.For RISC-V, we use SiFive's HiFive1 Rev B featuring the Freedom E310 microcontroller.We run the controller at the default 16 MHz.For MSP430, we use a Texas Instrument LaunchPad with the MSP430FR2476 microcontroller configured to run at 8 MHz.
Since OpenTitan chips are still working their way through their first production run, to measure performance of our OTBN implementations, we rely on OpenTitan's cycle-accurate simulator [48].
Baselines.For RSA, prior to our work, the OpenTitan team produced a hand-written assembly implementation for OTBN, and they used a C compiler (configured to optimize for size) to produce code for RISC-V.We similarly use a C compiler for the MSP430.These three implementations serve as unverified RSA baselines.
Falcon has pre-existing C implementation [34] but no optimized assembly for the hardware platforms we target.Hence, we rely on a C compiler to produce unverified baselines for RISC-V and the MSP430.No unverified baselines exist for OTBN, so we wrote our verified implementation from scratch.
Results. Figure 18 shows our performance results for our various verified implementations and their unverified baselines.We find that our verified implementations typically perform within ±2% of their respective baseline implementations.This result is expected, since our verified implementations differ from the baselines only in minor ways which make the code more amenable to verification, e.g., instruction reordering.Our verified Falcon implementation for the MSP430, however, is considerably faster than its compiled baseline.We attribute this result to our hand-tuned register allocation in the verified version.

RELATED WORK
Barbosa et al. present a recent summary of computer-aided cryptography [8].Here we focus on more closely related work on formally verified cryptographic implementations.We roughly categorize the work by target (source or assembly language) and by technique.
High-Level Languages.Several lines of work verify or produce cryptographic code in high-level languages.For example, some work [5,9,65] uses the Verified Software Toolchain [4] and yields C code, as does work on Fiat Crypto [21] and the HACL * library [53,68].Other work [59] uses SAW [19] to produce C and Java code.Still other work [67] relies on extraction to OCaml.
All of this work trusts a compiler (often run in a maximally aggressive optimization mode) to correctly and securely produce machine code suitable for execution.Such trust may be misplaced [21,62,64].Relying on a compiler can also be problematic for emerging hardware platforms, like OTBN, for which compilers do not yet exist.Historically, this approach has also produced code that lags hand-tuned assembly by 2× [21] to 100× [67].
Low-Level Languages.Work in Jasmin [2,3] verifies implementations written in a domain-specific language and then uses verified compilation to produce an executable.Fiat Crypto [21] also employs verified compilation from high-level elliptic curve descriptions to C-level implementations.Subsequent work suggests a path towards extending their verified pipeline to assembly [51,52].While attractive, developing a verified compiler (or even a verified backend) is a significant upfront development effort, and it asks engineers to write proofs about compilation passes, rather than about the code they wish to execute.It may also be difficult to generically match the ingenuity that performance engineers put into their hand-crafted assembly.
In contrast to verified compilation, previous work [11,24,55] based on Vale [11] directly verify a wide variety of cryptographic algorithms written in assembly.However, that work primarily focuses on x86-64, with a few implementations for Arm.These implementations and their proofs are standalone efforts, with little code or proof shared between architectures, even for implementations of the same algorithm.
Another line of work [14,54,61] targets implementations in an assembly-like domain-specific language (translated from platformspecific assembly via Python).The work's key insight is that often the proof of correctness for the core of a cryptographic routine can be automatically partitioned into proofs about basic mathematical operations and proofs about machine behavior (e.g., proving the absence of overflow), with the former discharged by an algebra solver (Singular [18]) and the latter discharged via an SMT solver (Z3 [17]).This work is complementary to Galápagos, which focuses on providing functor-based platform and algorithm abstractions that can be verifiably reused for multi-platform development.Similarly, their work inspired our integration of Singular into Dafny, but we have found that working in a general verifier like Dafny is critical, since it is unclear how to soundly and automatically break up and efficiently discharge the proof obligations that arise from larger implementations that include memory operations, conditional branches, non-linear equations beyond congruence relations, and arbitrary-length sequences needed to compute, say, RSA.
Extracting Common Algorithmic Features.Many verification projects focus on verifying elliptic curve operations, and several have extracted common algorithmic code (e.g., computing over Montgomery curves), either as libraries [67] or as compiler passes (in Fiat Crypto [21]).This generic code is then instantiated for specific curves that may have different optimal strategies for representing curve points.Galápagos also abstracts over the algorithm, but it differs in using verified functors and focusing on implementations of the same algorithm on different hardware platforms, rather than different algorithms/curves on the same platform.
Prior work has also targeted the number theoretic transform, which is the building block of many post-quantum cryptographic algorithms.Navas et al. use abstract interpretation to show that NTT implementations in C are free of algorithmic overflows [44].Other work has produced verified NTT implementations through domainspecific languages [31,58].These works focus on the techniques to facilitate "push button" verification of individual NTT implementations, while Galápagos focuses on amortizing the verification effort across multiple implementations.

CONCLUSION
We have presented the Galápagos framework, which aims to lower the cost of developing high-performance cryptographic implementations across an increasingly heterogeneous hardware landscape.Galápagos uses functors to abstract algorithms and platforms, which can then be automatically instantiated across heterogeneous hardware.Using Galápagos to verify six cryptographic implementations of RSA and Falcon on three wildly varying platforms shows that Galápagos reduces the developer's burden without sacrificing performance.OpenTitan is deploying our verified RSA code at scale.Ultimately, we hope Galápagos helps verified cryptography to boldly go where no (verified) cryptography has gone before.

Figure 4 :
Figure4: Galápagos Overview.An abstract implementation and proof (I1) parameterized by generic machine operations (S8) is proven to refine a crypto spec (S0).An assembly implementation (I3) is proven to refine a width-specific instance (I2) of the abstract impl (I1).The assembly implementation (I3) is written on top of an automatically generated instance (A4) of a higher-level hardware interface (A7), which is proven sound against the low-level ISA spec (S5).The ISA spec, in turn, is defined using an instance (S6) of the generic machine operations (S8).Given S0, I1 is written once; I3 and S5 are written once per-platform; and Galápagos provides I2, A4, S6, A7, and S8.Figure12shows how our case studies apply this workflow.

Figure 6 :
Figure6: Snippet of Galápagos Generic Machine Operations in Dafny.Operations are defined with respect to an unknown word size uint; e.g., addition with carry wraps when the sum overflows.

Figure 9 :
Figure 9: RISC-V Load from Structured Heap.The untrusted lw_heap Vale procedure offers a friendlier interface that is proven sound against the trusted ISA-level RV_LW instruction (from Figure2).The proof relies on invariants maintained about iterator validity (shown in Figure8).

Figure 10 :
Figure10: Looping Over a Structured Memory Buffer.A Vale procedure illustrating the use of the iterator interface to ergonomically process heap buffers.The iter_inv is maintained for free due to Galápagos abstraction layer design.Slightly elided detail: sum is a wrapped sum rather than mathematical sum due to overflow.

Figure 11 :
Figure11: A Concrete Vale Implementation of Multi-Word Addition.By writing the implementation's pre-and postconditions in terms of the abstract implementation's definitions (from Figure7), the developer can easily invoke the corresponding generic lemma concretized to the this platform.

Figure 12 :
Figure 12: Case Studies Overview.We include three hardware platforms §4.1 and two algorithms §4.2 in our case studies.

•
Introduces functor support into an SMT-based automated theorem prover, and shows how to use functors to abstract algorithms and heterogeneous platforms.• Evaluates the reuse enabled by Galápagos on six verified implementations covering classical and post-quantum cryptographic algorithms and three disparate hardware platforms.• Contributes a new verified Dafny standard library, now upstreamed, to facilitate future verification efforts.• Produces the first formally verified cryptographic routines baked into hardware for large scale deployment.
// The abstract heap is a collection of disjoint buffers, // each accessed in the map by its base address type heap_t = map<nat, seq<uint>> datatype frame_t = frame(fp : nat, content : seq<uint>) // The stack is a sequence of frames datatype stack_t = stack(sp : nat, fs : seq<frame_t>) // relation between the byte level memory and structured heap/stack predicate mem_inv(mem : map<int, uint8>, h : heap_t, s : stack_t) Dafny types for the structured heap, stack, and iterators over the heap's buffers, plus an invariant that connects an iterator to the contents of the heap.