Formal Mechanised Semantics of CHERI C: Capabilities, Undefined Behaviour, and Provenance

Memory safety issues are a persistent source of security vulnerabilities, with conventional architectures and the C codebase chronically prone to exploitable errors. The CHERI research project has shown how one can provide radically improved security for that existing codebase with minimal modification, using unforgeable hardware capabilities in place of machine-word pointers in CHERI dialects of C, implemented as adaptions of Clang/LLVM and GCC. CHERI was first prototyped as extensions of MIPS and RISC-V; it is currently being evaluated by Arm and others with the Arm Morello experimental architecture, processor, and platform, to explore its potential for mass-market adoption, and by Microsoft in their CHERIoT design for embedded cores. There is thus considerable practical experience with CHERI C implementation and use, but exactly what CHERI C's semantics is (or should be) remains an open question. In this paper, we present the first attempt to rigorously and comprehensively define CHERI C semantics, discuss key semantics design questions relating to capabilities, provenance, and undefined behaviour, and clarify them with semantics in multiple complementary forms: in prose, as an executable semantics adapting the Cerberus C semantics, and mechanised in Coq. This establishes a solid foundation for CHERI C, for those porting code to it, for compiler implementers, and for future semantics and verification.


Introduction
Memory safety bugs continue to be a major source of security vulnerabilities, despite much research on software bug-finding and mitigation approaches.For example, they are responsible for most of those addressed by Microsoft security updates or impacting Chromium [19,29].They are a particular concern for the large codebases in C and C++ that comprise the infrastructure that we all depend on.Alternative memory-safe languages offer promise, but these C/C++ codebases will clearly be an ongoing challenge for the foreseeable future.
The CHERI project [44], developed by the University of Cambridge and SRI International since 2010, offers a promising hardware-based approach.CHERI extends conventional hardware Instruction-Set Architectures (ISAs) to enable support for fine-grained memory protection and for scalable software compartmentalisation, with hardware-supported capabilities.In a 64-bit CHERI ISA, instead of using simple 64-bit machine-word virtual-address pointer values to access memory, restricted only by the memory management unit (MMU), one can use 128+1-bit capabilities that encode a virtual address together with the bounds of the memory it can access.Encoding these within the capability enables a fast access-time check, faulting if there is a safety violation.The ISA design ensures that capabilities cannot be forged, i.e, that normal code execution can shrink capabilities but never grow them, and there are additional "sealed-capability" features for secure encapsulation.
The initial academic work developed CHERI-MIPS and CHERI-RISC-V architecture extensions, along with FPGA processor implementations and system software (including adaptions of Clang/LLVM, linkers, debuggers, FreeRTOS, FreeBSD, and WebKit).Some initial design work on potential CHERI-x86 designs is in progress [1].Arm, partly supported by the £190m UKRI Digital Security by Design (DSbD) programme [41], have now developed the Morello architecture, processor, and development board, extending the Armv8-A architecture and high-performance Neoverse N1 processor, to enable industrial evaluation that may support massmarket adoption in mobile or server cores [4,5].Meanwhile, the Microsoft CHERIoT project has developed the eponymous architecture, reference hardware design, and RTOS and software stack for an extension of 32-bit RISC-V with CHERI-based protection for small embedded cores [3].
A key design goal for CHERI is to provide radically improved security for those critical existing C codebases with minimal modification.It does so with a dialect of C, implemented initially as modifications to Clang/LLVM, and now also as a GCC port by Arm.The CHERI architectural mechanisms can be used by language implementations and systems software in various ways to provide improved security, but the basic idea for fine-grained memory protection is to implement C pointer types with machine capabilities instead of machine words, so that pointer integrity and memory accesses are checked by the hardware.For simple code, recompilation of the unmodified existing code with the CHERI C compiler will do this, while more exotic code, for example code that manipulates the bit-representations of pointers, may need some source adaptation.
A 2019 analysis [29] suggested that 30-70% of the vulnerabilities reported to the Microsoft Security Response Center (MSRC) would have been deterministically mitigated by CHERI memory-safety, and porting the FreeBSD kernel and userspace to CHERI required changes only to 0.18% and 0.04% LoC respectively.Analysis of an open-source desktop stack [42] estimated a 73.8% vulnerability mitigation rate through a combination of memory protection and software compartmentalisation requiring a 0.026% LoC change.
All this raises the question that we address in this paper: what is CHERI C, exactly?This is important from several perspectives: those porting legacy C code to CHERI C, or writing new code, need to know what is permitted; those implementing CHERI C compilers (notably the Clang and GCC extensions) need a common understanding, lest those diverge from each other and from the programmer's model; all these need to understand what is common and what varies across CHERI C implementations above distinct CHERI hardware architectures, so that CHERI C code can be portable across architectures; future semantics and verification for CHERI C needs a basis for its work; and all involved need an understanding of what security properties CHERI C enforces, and what vulnerabilities it mitigates.
We make the following contributions: • Discussion of the design issues that arise in the design of CHERI C and its semantics, including the subtle interactions between capabilities, undefined behaviour, and pointer provenance, illustrated with a test suite of examples ( §3).• An executable mechanised semantics of CHERI C, reifying the above as an extension of the Cerberus ISO C semantics [27,28] and the PNVI-ae-udi memory object model supported by the ISO C standards committee [18,28] ( §4).• The CHERI C memory object model is mechanised within Coq, with the extracted code used in the executable semantics ( §4.

3). • A prose definition of CHERI C (published as a separate
Technical Report [49]).

• Validation and experimental comparison ( §5).
The several different versions of the semantics serve different purposes: the prose version should be widely accessible; the extension of Cerberus gives a semantics that is executable as a test oracle, to compute the allowed behaviour of small and modest-sized tests and programs, and helped us check that we have considered all interactions of CHERI and ISO C features; and the Coq formalisation of that helped nail down all the details and provides a basis for later mechanised proofs ( §7).
Without all this, CHERI C would remain merely "defined" by its implementations, leaving many important aspects unclear, and with no solid basis for future discussion.
We begin with background on CHERI hardware capabilities, C undefined behaviour, and pointer provenance ( §2) and conclude with discussion of related and future work ( §6,7).Our semantics and examples will be available open-source.A sophisticated compression scheme allows a capability to include 64-bit lower and upper bounds, encoded into 87 bits in total, with 56 of those shared with the address field [5, §2.5.1], [47].Small regions can be described precisely, with an arbitrary size in bytes, while for larger regions, only certain combinations of bounds and size are representable (though all addresses are representable for some base and size).The one-bit tag provides integrity protection: it is preserved only by legitimate operations on capabilities and cleared by any others (e.g. by overwriting individual bytes).A capability can only be used as such, e.g. for a dereference, if its tag is set.The permission bits control whether a capability can be used for loading or storing non-capability data, loading or storing capabilities, and fetching instructions, among other things.Capabilities can also be sealed, making them immutable and unusable for anything but branching to them; this allows controlled transitions between different security domains.Sealing (or unsealing) a capability requires an authority capability with the Seal (or Unseal) permission.Some variations of this are indexed by an object type otype.Global g and executive e bits restrict the locations where a capability can be stored and the banking of certain system registers.Morello extends the Armv8-A general-purpose integer register file, and some control and status registers, from 64 bits to 128+1 bits.Memory is extended with a tag bit for each 128bit sized and aligned unit of DRAM.The Program Counter (PC) is extended to become a Program-Counter Capability (PCC), constraining instruction fetch as well as PC-relative loads (e.g., of global variables).A new Default Data Capability (DDC) register controls memory accesses by legacy (non-capability) instructions, for legacy code using integer pointers.Morello extends Armv8-A with new instructions and modifies existing instructions to use and respect capabilities.
CHERI architectures introduce new detected errors, e.g. when an instruction tries to access memory outside the bounds of the capability used for the access, or with an untagged capability; the exact handling of these is ISAspecific.For example, in Morello such an access triggers a synchronous data abort exception.In other cases, e.g. when an instruction attempts to construct a non-representable capability, hardware will clear the tag of the resulting capability, to protect integrity.

C Undefined Behaviour
ISO C relies crucially on the notion of undefined behaviour (UB), to make it possible to define the semantics of a memoryunsafe language (in which the possible effects of a wild write are hard to bound without massively over-constraining implementations), and to enable high-performance optimising implementations on diverse platforms.In the ISO C abstract machine, out-of-bounds memory accesses have undefined behaviour, as do signed integer overflows, data races, and many other things.Any program for which there exists an abstract machine execution which has undefined behaviour is deemed to have undefined behaviour as a whole: programmers are required to avoid this, and compiler implementations are free to behave in any way for such programs.Importantly, UB of programs is not a temporal notion: while it is identified in the ISO abstract machine at specific execution points, it is not the case that a correct compiler is guaranteed to behave according to the abstract machine until such points, but rather that the compiled whole program can behave arbitrarily.This is forced by the desire to allow optimisations that move code around without a compile-time check or proof that it is UB-free: the compiler is allowed to assume that the preconditions for such optimisations hold, and the programmer is obliged to ensure that they do, otherwise the standard gives no guarantees about the whole-program behaviour.

C Memory Object Models and Pointer Provenance
In conventional C implementations, pointers are represented at run-time with simple machine-word integers, and the language exposes this representation to programmers (e.g. via pointer/integer casts, representation-byte accesses, and type punning); this expressiveness is important for low-level systems code.However, compile-time optimisations rely on alias analysis, e.g. to determine that two pointers cannot alias and hence accesses via them can be reordered.In particular, compile-time analysis commonly tracks the original allocation, or provenance, of pointer values, and two pointer values that can be statically determine to have different provenance are assumed to not alias.This has been discussed in the ISO WG14 C standard since 2004 [46], and recent work has proposed a provenance semantics "PNVI-ae-udi" that is the basis for an in-progress ISO Technical Specification [18,28].In that semantics, the C abstract machine associates a provenance, which is either an allocation unique ID or empty, with every pointer value.Normal operations on pointers simply propagate the provenance, and memory accesses via a pointer check that the address is within the bounds of the original allocation identified by the provenance, with undefined behaviour otherwise.Of course, while this C abstract machine uses provenance at runtime, conventional implementations still do not and should not.Instead, this licences various existing compiler optimisations that rely on compile-time provenance analysis, by deeming certain programs UB.In PNVI-ae-udi, pointers carry provenance, but integers do not.If a pointer is cast to an integer (or its representation is otherwise examined), the allocation identified by its provenance is marked exposed.On a cast from integer to pointer, if the integer is within the range of some exposed allocation, the appropriate provenance is (in the semantics) attached to the resulting pointer value.

CHERI C Semantics Design Questions
CHERI C has been implemented as an adaption of the Clang/LLVM C compiler [23,36,45], and, in progress, as an adaption of GCC [6].The details of these implementations are beyond the scope of this paper, but the basic idea is to represent all C source-language pointers with machine capabilities, instead of machine words.Pointer arithmetic is implemented as arithmetic over these capabilities, and thus the hardware checks that all accesses are within their bounds.For allocations of local variables whose address is taken, the compiler introduces code to construct a capability with the correctly narrowed bounds (derived from the stack-pointer capability), and for globals, thread-local variables, function pointers, and malloc'd allocations, the runtime linker and the allocator similarly construct capabilities with the appropriate bounds.The language provides new intrinsics for explicit manipulation of capabilities, e.g. to inspect their fields or further narrow their bounds, but these are not needed for porting straightforward C code.In addition to source-language pointers, CHERI C implementations also use capabilities to represent internal runtime pointers: the program counter, jump addresses, stack pointer, return addresses, and global-offset-table (GOT) machinery.The design of CHERI C has to reconcile three major and at times conflicting objectives: 1. Existing C programmers should be able to port existing C codebases to CHERI C with little effort.2. Existing compiler infrastructure and optimisations should require only limited changes, to maintain performance of the generated code and to make the required compiler engineering effort feasible.
3. Memory-safety errors which would lead to exploitable vulnerabilities should be deterministically mitigated wherever possible.In particular, CHERI C aims to provide a substantial level of spatial safety, ensuring that "pointers may be used only to access memory within bounds of their associated allocation" [45].Temporal safety is not universally guaranteed by default across all existing CHERI architectures.Some already provide temporal safety guarantees [3] while for others it is a topic of active research [17].
Moreover and the address of this slot.At Line 2 the capability q has that address plus sizeof(int), but unchanged bounds.Then at the Line 3 access, the hardware check that the virtual address footprint of the access is within that of the bounds fails, and a hardware exception and then a signal are raised -the program will fail-stop safely, preventing exploitation of the bug.
This suggests what would be a straightforward and desirable semantics for CHERI C: that any such memory-access UB is instead guaranteed to trap.With some attention to other UB cases, that could provide strong general security guarantees for race-free programs.Unfortunately, the reality is more complex.Standard C-compiler optimisations can eliminate the call to f entirely -at -O2 the current Clang/LLVM-based CHERI C compiler compiles this code to just return zero -and whether they do or not can depend in subtle and hard-to-predict ways on the rest of the code.For example, if &x is assigned to a global, then at -O2 the inlined f survives and performs the doomed write (and, in a larger example, who-knows-what after that), while at -O3 the doomed write is again eliminated.
Another instructive example is g on the right.Here, the compiler will assume the absence of UB, reason that the access a[i] must be in-bounds, and compile it to a[0], removing the potential capability exception for a [1].It is then hard for a source-language semantics to bound the behaviour of the rest of the program, as it does not correspond to any execution path of the (CHERI) C abstract machine.
To make this range of implementation behaviour legal, our formal semantics has to retain the ISO C notion of undefined behaviour, leave implementations unconstrained for programs that are UB, and deem the first program to be UB, and likewise for any program that calls g(1) (with a terminating h).Such a semantics cannot capture the intended security properties that CHERI C aims to provide.CHERI C clearly deterministically mitigates many otherwise-exploitable security flaws, but undefined behaviour and compiler optimisations make it unclear what precise security properties it provides in general.Further work is needed to see whether the effects of those optimisations and undefined behaviours can be bounded more tightly at an acceptable performance cost.

Out-of-bounds pointer construction and representability
In ISO C, it is undefined behaviour to use pointer arithmetic to construct a pointer value that is either below or more than one byte past the footprint of the object [21, 6.5.6p8].The onepast case has to be allowed to support the standard C idiom of iterating across an array, but in real-world C it is not uncommon for code to construct pointer values that are below or 4 int * q = p + 100001; 5 q = q -100000; 7 } more than one-past the object [12,28], e.g. for decreasing loops, or transiently in more complex arithmetic, as on the right.In CHERI architectures, the capability compression schemes [5,47] cannot express arbitrary combinations of address, size, and bounds, but, to support porting such software to CHERI, they have been designed to allow combinations for which the address is somewhat outside the bounds.Exactly what combinations are representable is a complex property of the encoding scheme, but they have been designed to allow at least some ranges below and above the object.If a capability arithmetic operation would construct a non-representable value, the resulting address will be as expected, but the tag will be cleared and the bounds may have been changed (another possibility explored earlier in CHERI was to have the hardware trap on the attempt to construct such a capability, but that turns out to be less useful).
For CHERI C, we have to decide whether (a) to follow ISO, with UB for any pointer construction beyond one-past, (b) allow arbitrary virtual address values within the ranges allowed by all CHERI architectures (or some safe approximation thereto), or (c) allow whatever the underlying architecture makes representable.Moreover, for (b) and (c), we would have to decide whether to deem it UB to go outside those, or merely to make the resulting bounds to be unspecified and the resulting tag be unspecified or cleared.
Conventional C compiler optimisations impact this in two ways.First, despite common coding practices, C compilers do sometimes reason from the fact that in ISO C array indices must be in bounds, as in function g in the earlier example.It would be hard and probably performance-reducing to remove that from implementations, and problematic to bound the resulting behaviour for programs that trigger it if they were not UB.Second, optimisation can remove transient outof-bounds construction, e.g. by collapsing (p+100001)-100000 above to just the ISO-legal p+1, so one could not leave that as defined behaviour and deterministically clear the tag in the semantics.Moreover, importing the complexities of the architectural compression schemes into the language pointer arithmetic semantics is unappealing.These lead us to keep the stricter ISO rule also for CHERI C, option (a), even though that leaves code that exploits the architectural guarantee as UB (we would urge compiler developers to not treat that UB aggressively).
Another potential issue is, in the other direction, that a compiler might implement an ISO source-semantics-legal p+(100001-100000) as (p+100001)-100000, potentially leading to run-time non-representability.Further work is needed to check or ensure that this does not occur, but it has not been observed in running the large corpus of code ported to CHERI.To summarise: compilers can optimise away, but not introduce, code that creates non-representability.
The architectural limits on representability also mean that in some (relatively uncommon) circumstances allocators need to use additional padding and/or alignment to ensure that the required capability is representable and does not overlap other allocations.This has a small cost in wasted memory usage, but does not impact the semantics.

Pointer/Integer conversions and (u)intptr _ t
Systems C code often requires bitwise or integer operations on pointers, e.g. to examine or enforce particular alignments, or to exploit the fact that some pointers are known to be aligned or bounded, to store metadata in low-order or highorder bits.C pointer types only directly support addition and subtraction of a pointer and an integer value, so this requires casting a pointer to an integer type, doing whatever arithmetic is required, and, if using the result as a pointer, casting back.
In ISO C, the types uintptr _ t and intptr _ t, when available, are guaranteed to support identity round-trips; in de facto C, round-trips involving limited arithmetic are widely relied on, and older code often uses (unsigned) long for the same purposes.In CHERI C, if one only needs the integer result, one should cast to the new ptraddr _ t and do conventional integer computation.If one ultimately needs a pointer value, casts to normal integer types will lose the tag and other metadata, so this would not work.Instead, in CHERI C (u)intptr _ t are implemented with capabilities, casts between these and pointer types are no-ops (in both directions), and arithmetic operations on them are implemented with the corresponding capability operations (which have the expected effects on the address part of the capability).This minimises porting effort for such code.
1 #include <stdint.h>There are a number of semantic issues and options to consider, balancing usability (giving more code better-defined results), optimisation at (u)intptr _ t types (with performance and compiler-modification costs), portability (among CHERI architectures that may differ in capability encoding details), and complexity.
(1) The simplest option would be to follow the semantics of pointers, declaring any (u)intptr _ t arithmetic resulting in values outside one-past the original allocation bounds to be UB -but that would break many common C idioms, both where one eventually casts back to a pointer and uses that for an access, and where one just uses the integer value.
(2) Alternatively, we could allow (u)intptr _ t arithmetic within some larger region of representability, with UB if one goes outside.That would also invalidate some reasonable idioms, e.g. using (u)intptr _ t values as indices in a hash table (though in CHERI C one should ideally use ptraddr _ t there).
(3) Finally, we could allow (u)intptr _ t arithmetic within some region of representability, but keep defined behaviour and the integer (address-part) value of the result defined if one goes outside.
We choose (3), but have to consider the results of casting back to a pointer, and of inspecting the tag, bounds, and permissions.If hardware capability arithmetic in compiled code goes outside the architecturally representable region, then the tag will be cleared and any access via it will trap.However, in general, optimisation of (u)intptr _ t arithmetic could either introduce or eliminate a CHERI C abstract-machine construction of a non-architecturally-representable capability, e.g.rewriting i+(100001-100000) to (i+100001)-100000, or (i+100001)-100000 to i+1.The GCC section anchor optimisation with a negative offset could also introduce nonrepresentability.
(a) At one extreme, one might allow any arithmetic transformations on (u)intptr _ t where the integer (address) value of the transformed expression is the same as that of the original, but that would mean any (u)intptr _ t expression could result in unspecified tags and bounds, which is not acceptable.(b) At the other extreme, one could require that optimisation never introduces or eliminates any non-representability, requiring that the hardware execution matches some straightforward abstract-machine capability computation.This is attractively simple to specify and to use, but has some runtime cost and (perhaps more important) compiler-modification cost, to ensure that such optimisations are not done at these types.More data on these costs would be desirable, but for the time being we reject this option.(c) The intermediate position we choose is to limit optimisations to those that do not introduce new nonrepresentability, but allow them to eliminate excursions into non-representability.We express this precise-but-loose specification in the semantics with a ghost-state bit per capability value, recording whether abstract-machine (u)intptr _ t arithmetic ever made it non-representable in abstract-machine execution.We permit casts to pointer types of (u)intptr _ t capabilities with this bit set, and loads and stores of them (otherwise memcpy of such values would become UB), but make it UB to access memory via them.Then one has to consider the semantics of inspecting the bounds or tag (using intrinsics) and representation bytes (using unsigned char * pointers) of such values.We deem all these to give unspecified values (not UB).
Finally, we have to consider what the above "region of representability" should be (for (1) this would be moot, as pointers within or one-past the original allocation are always representable).CHERI capabilities are encoded in architecturespecific sophisticated ways.For CHERI C, we could: (i) Fix a relatively simple and portable definition expressing some conservative extent supported across different CHERI architectures and values.For 64-bit CHERI architectures, [45, §4.3.5] says pointers are guaranteed representable if within the greater of 1KiB and 1 8 of the object size below the lower bound, and the greater of 2KiB and 1 4 of the object size above the upper bound.This is reasonably simplebut not portable to all CHERI architectures, in particular, CHERIoT [3], which, while based on 32-bit RISC-V, uses a different capability encoding scheme from 32-bit CHERI-RISC-V and provides byte-granularity bounds for any object up to 511 bytes.(ii) Alternatively, we can make this implementation-defined, letting implementations choose either the above or the specific underlying architectural notion of representability.This option is attractive because it allows the use of the full range of representable addresses, and is thus "future-proof".The disadvantage is that it makes it difficult to write portable CHERI C code.For the time being we choose this option.

Pointer/Integer type punning
An additional benefit of keeping the pointer and (u)intptr _ t representations identical is that it preserves the C possibility of type punning between them via a union, as shown below.

Accesses to capability representations
An essential aspect of CHERI architecture design is that capabilities are unforgeable: attempts to manipulate their representations, e.g.writing their bytes directly rather than with a capability instruction, are guaranteed to clear the tag.ISO C permits bytewise access to pointer-containing data, e.g. to support a bytewise memcpy, so we have to consider the extent to which this CHERI architectural property should be reflected in the CHERI C semantics.
We want to guarantee that the tag will be cleared in the case when its representation is modified directly.Again, optimisations make this challenging.In this example, CHERI hardware execution of an unoptimised compilation will clear the tag of * px on the byte-write of Line 6, leading to a capability access fault at Line 7, but an optimising compiler may remove the identity bytewrite entirely.
To allow optimisations which preserve the address, but do not necessarily preserve tag clearing, we use the ghost state, similarly to how we did in Section 3.3, to enforce that following any non-capability write to a capability, it is UB to use it for an access.(A more extreme semantics would be to deem any non-capability write to a capability to be UB, but that would prevent one memzero or memcpy'ing over some struct in a malloc'd region to re-use it, which should be permitted.) Another example in which optimisations can remove tag clearing is below, in which the for loop may be optimised (e.g. by GCC's tree-loop-distribute-patterns) to a call to memcpy (p1,p0,sizeof(int * )).In CHERI C, memcpy must be implemented with capability-sized and aligned accesses where possible, to preserve pointers, so this optimised code would then preserve the capability and its tag.Our semantics makes such optimisations sound using an additional per-capability-value ghost state bit to mark the capability, after its representation was modified directly, as no longer suitable for memory access, resulting in UB in Line 9.
A memcpy of part of a capability must behave semantically like any other non-capability-sized and aligned representation access, using that ghost state bit rather than deterministically clearing the tag (which also makes sound optimisations that combine memcpy calls for adjacent memory regions, that at the hardware level could introduce tag preservation).
This approach uses ghost state to record abstract-machine accesses to capability representations, to make subsequent accesses via such capabilities UB, but we also have to ask to what extent such a capability can be examined -or, in other words, what does the ghost state "cover"?The example below shows some scenarios we need to consider.Trying to access the memory using px in Line 11 should certainly be UB.For the other operations, we have to ask: (1) Whether the pointer-to-integer cast in Line 7, to obtain the capability address, is UB or some implementation defined value?For example, in Morello, we know that the lower 64 bits of a capability contain its address and the compiler can reason how modifying the first byte will affect it.This knowledge is specific to a particular ISA and could be used only when targeting it.
(2) Whether the tag access in Line 8 is UB or returns an unspecified value?
(3) Whether the permission access in Line 9 is a UB or returns an unspecified or implementation-defined value, and, if the latter, what is guaranteed about it?
To summarize, our current proposed solution is for the abstract CHERI C machine to record any non-capability write to a capability (via representation pointers or using standard library functions) using ghost state.Using such a manipulated capability to access memory is UB.Comparing it to other capabilities using intrinsic cheri _ is _ equal _ exact, or examining its tag via cheri _ tag _ get, will return an unspecified boolean value.This way we avoid declaring such checks to be UB.The effect of direct representation manipulation on other capability fields except the tag is implementation defined.That will permit ISA-specific optimisations where the compiler is aware of the capability encoding for a target ISA.

Pointer equality
There are several possible definitions of pointer equality (==).Ignoring pointer provenance for now, we could take either: (1) bitwise equality of capability representations, with tags, (2) the same but without tags, or (3) equality just of their address fields, without all their capability metadata.Intuitively the first definition may seem most natural, with equality implying interchangeability, and that was the choice for the early CHERI C implementation.However, pragmatically it seems that porting code is most straightforward with the third option, so that is what we adopt here.In fact, even in ISO C, equality of pointers does not guarantee interchangeability, due to pointer provenance [18], so this is perhaps less of a departure from standard practice than it may seem.
Additionally, CHERI C provides the intrinsic cheri _ is _ equal _ exact which compares two capabilities (pointers or (u)intptr _ t), comparing all fields, including meta-information such as a tag or permissions.If some of their fields, such as tag or bounds, are marked as unspecified in ghost state, its return value is unspecified as well.Here we implement array indexing via intptr _ t arithmetic.In ISO C, the semantics of the addition will depend on the integer conversion rank of the size _ t and intptr _ t types.If intptr _ t has a higher rank, then the first argument will be cast to intptr _ t and then the addition of two intptr _ t values will be performed.Otherwise, the second argument will be cast to size _ t, the addition performed on size _ t values, and the result will be cast back to intptr _ t before assigning it to ip1.

Capability derivation in binary arithmetic
Using the latter strategy in CHERI C would result in ip1 being derived from the null capability, and hence untagged.This means that converted back to a pointer, p would be nondereferenceable.To avoid that, CHERI C semantics requires that no other standard integer type shall have a higher integer conversion rank than intptr _ t and uintptr _ t.Additionally, for binary operations, the capability derivation picks as a source for the resulting capability the argument which was not a result of implicit or explicit conversion from a noncapability type.

Sub-object bounds
In C one routinely constructs pointers to a subobject of a data structure: to a member of a struct or an element of an array.For CHERI C, it is tempting to have the compiler automatically narrow the bounds of the corresponding capability to just that member or element, to implement the principle of least privilege, but it is also common in C to use pointer arithmetic and offsetof to move the resulting pointer to a different subobject, in array indexing and the "container-of" idiom.
Based on experience in porting code, the current default behaviour of CHERI C is to not enforce subobject bounds.The CHERI C semantics follows suit, though Clang/LLVM CHERI C provides options for stricter bounds enforcement, and the semantics should be revisited following further use.

Pointers to const-qualified types and permissions
In ISO C, objects created at const-qualified types are expected to be immutable, so it would be natural for a capability pointing to a const object to not have write permission, and CHERI C does this.Note that in ISO C it is allowed to cast a pointer to a non-const type to a pointer to the corresponding const type and later cast it back and modify the object; to allow this, in CHERI C those casts are no-ops on the underlying capability (in CHERI ISAs, clearing permission would be irreversible).

Abstracting capabilities across architectures
The main existing CHERI C implementations support Arm Morello (CHERI Arm-A) and CHERI RISC-V, both 64-bit, and there is another supporting the CHERIoT extension to RISC-V RV32E, the small RISC-V specification intended for embedded devices [3].One should be able to write portable CHERI C programs across these architectures, or (sometimes) across just the first two.That means that the C semantics needs a common abstraction of hardware capabilities.
One part of this is the abstract address type denoted by the new ptraddr _ t C type, an integer type with implementationdefined width and signedness.The list of permissions encoded in capability can vary between architectures, but there is a common basic set which is always present.The object type field width and values could vary.Finally, the seal type is also architecture-dependent.Abstracting these types and their properties allows us to talk in the CHERI C language semantics about portable capabilities.
All existing CHERI architectures use a capability encoding scheme to compress capability address and bounds, in 128 or 64 bits for 64-or 32-bit architectures, and there is a trade-off between compression and the set of representable addresses, as some combinations of fields may become nonrepresentable.To abstract from this, we make several design choices.First, we restrict the abstract scope of compression to four capability fields: address, flags, and upper and lower bounds.Other fields, such as permissions, are always represented exactly.This allows us to describe most of CHERI C semantics in terms of abstract capabilities, making it portable across current and future implementations.In cases where architecture-specific details matter, the corresponding parts of CHERI C semantics are clearly designated as implementation defined.

Capabilities and provenance
As recalled in §2.3, the in-progress ISO definition of provenance tracks provenance in the C abstract machine, to define undefined-behaviour cases that are important to legitimise current compiler optimisations based on their static analysis of provenance -but provenance data is not carried at runtime in conventional implementations.Meanwhile, CHERI implementations carry capabilities at runtime, and the CHERI ISA specification [44] also speaks of "provenance".How do these relate to each other?
The ISO C + PNVI-ae-udi semantics rules call for several checks based on provenance: (1) Checking whether a pointer is inside the bounds of the corresponding memory allocation's footprint.
(2) Checking whether two pointers possess the same provenance when subtracted or compared.
(3) Checking whether a pointer refers to a live allocation.
We believe (though have not formally proved) that the first check is redundant in the presence of capabilities.Provenance tracking enables the static overestimation of capability runtime bounds.Capability bounds are initially set to align with the allocation's footprint and can be narrowed down but not extended, thus resulting in an "overestimation".
The second check presents a challenge.It might be assumed that pointers with matching or at least overlapping bounds share the same provenance.However, this is not valid in two scenarios: 1) when bounds have been narrowed through intrinsic calls, resulting in non-intersecting regions, and 2) in the absence of a capability revocation mechanism, provenance is temporally unique, while capabilities are not.Consequently, one could have a pointer to a heap object that has been killed and another pointer to a newly allocated object at the same address.
The third check is impossible in general in the absence of some sort of capability revocation mechanism. 1n conclusion, the capability checks at runtime could not subsume provenance checks at compile time.The two are complementary.
We codified CHERI C semantics, fleshing out many details associated with the high-level design choices described in the previous section, as an extension of Cerberus [27,28], a well-validated semantic model for a substantial fragment of ISO C. The resulting CHERI C semantics is executable and permits running small C test programs to investigate semantic questions.
Cerberus is expressed as a translation, from C into a small Core language, combined with a memory object model.Most CHERI C-related changes relate to the latter.We defined the CHERI C memory object model in Coq and extracted it to OCaml to integrate into Cerberus.Previous Cerberus memory object models have been in OCaml; this Coq definition should support future proof about CHERI.The complete Coq definition could be found in Cerberus git repository at https://github.com/rems-project/cerberus;we explain the main features in non-mechanised mathematics here.

Abstract capabilities
We defined abstract capabilities as a Coq module type which defines an opaque capability type and operations on it.We chose Arm Morello [5] for the implementation-defined aspects, giving a concrete executable implementation of CHERI capabilities.We used the existing ISA model for Morello [8] mechanically extracted from the Arm ASL reference, from which we extracted low-level Coq implementations of the relevant functions using Sail's support for multiple backends [7].

CHERI C adds the following new undefined behaviours:
UB _ CHERI _ InvalidCap is flagged when attempting to dereference a pointer with the capability tag cleared.
UB _ CHERI _ UndefinedTag is flagged on attempt to dereference a pointer with the capability tag marked as unspecified in the ghost state.
UB _ CHERI _ InsufficientPermissions is flagged on an attempt to perform memory access (e.g.read or write) via a capability which does not have the permission bit set for the given operation.
UB _ CHERI _ BoundsViolation is flagged on an attempt to dereference an out of bounds pointer.
The ISO C UB012 _ lvalue _ read _ trap _ representation is flagged when an attempt to decode a stored representation of a capability object fails.

CHERI C memory object model, in Coq
The Cerberus memory object model encapsulates all memory-related logic, providing a clean abstract interface to the rest of the semantics.Key data types such as the memory state, and pointer and integer values are opaque in the module interface.All memory model operations accessing the memory state are implemented in a memM monad, which maintains the state and facilitates error handling.
The standard Cerberus memory model interface provides functions for dynamic memory management (allocation, release, memcpy and memcmp); reading and writing memory values; pointer arithmetic, comparison, and alignment; and pointer/integer conversion.Since memory values are abstract outside the module, it also provides functions for relational and arithmetic operations on integer, floating point, and pointer memory values.These operations do not depend on the access to the memory state and are not in the memM monad.
Our CHERI C memory object model fits nicely into this Cerberus memory model abstraction; its interface had to be extended only to add capability derivation ( §4.4) and intrinsics type derivation ( §4.5).
The memory state (see below) is a tuple (, , ) with information about allocations , PNVI-ae-udi related data , and the concrete representation of memory contents .The CHERI memory model state is similar to the Cerberus concrete memory model, with the  component extended as follows.As before the memory content is stored in an integer-address-indexed dictionary .Each byte consists of provenance (), an optional 8-bit numeric value, and an optional integer index.Additionally, for each capability-size aligned memory location, we add metadata consisting of the capability tag and a two-bit ghost state, stored in the new  dictionary.The first bit of the ghost state for a given capability indicates whether the tag is unspecified, and the second bit indicates whether the address and bounds are unspecified.
Pointer values are capabilities, and tag, bounds, and permission checks are performed when they are used to access memory.When written to memory, a capability representation excluding the tag is written to , and the tag is stored in .Writing non-capabilities to memory marks all previously set tags for the corresponding address range as unspecified in the ghost state in .
Integer values could be either pure numeric values for integer types, or capabilities (with signedness flag) for (u) intptr _ t types.This representation allows us to preserve all capability fields when casting pointers to (u)intptr _ t and back.
The memory monad used in Coq is a combination of state and error monads.The memory state is completely internal to the memory model implementation.All monadic calls to the memory interface cross the OCaml to Coq language boundary only in one direction.
To give a flavour of the new checks require for CHERI C, we give a formal semantics rule for the load operation below, using similar notation to [27].This corresponds to the function load in the full Coq definition (module CheriMemory), which involves considerably more detail.CHERI-specific changes are highlighted in blue and marked with †.
The auxiliary bounds-checking predicate now takes a capability instead of just an address (1 †  ) .The capability fields include the address , the tag, permissions set, bounds (, ), and the ghost state bits.The test succeeds if we have read permission (1 †  ) , the tag is known (1 †  ) and set (1 †  ) , and the address is within the bounds (1 †  ) .
bounds_check load (, , , ) ≜ (1 †  )  = (, tag, perm, (base, limit), (g tag , g bounds ) . . . ) ∧ The memory-object-model load operation takes a pointer (2  ) , containing provenance information and a capability, and returns a memory value and a footprint annotation.The capability must not be a null capability (2  †    )   and must pass the bounds check (2  ) .The abstraction function must successfully interpret memory bytes (2  )  and associated with them capability metadata (2  †   )  as a C value  of type .Finally, (2  ) prevents reading from uninitialized memory, which would result in unspecified values.
The helper function expose, which remains unchanged from PNVI-ae-udi, accepts the abstract state  and a set of tainted allocations  as input.It designates the specified allocation  as exposed if it was previously included in the set of allocations and marked as alive.

Capability Derivation
For unary and binary operations on integer values involving at least one capability-carrying type, CHERI C needs to choose which will be used to derive the resulting capability.We made this derivation step explicit by elaborating it in the intermediate Core language.

Intrinsics
Many of the CHERI C intrinsics are polymorphic in the capability type they accept, and their return type may depend on it.This does not fit the standard C type system and to implement this in Cerberus we extended it with a special type derivation mechanism, implemented via an embedded DSL.

Validation and Experimental Comparison
We validate that our design decisions of §3 are appropriate for CHERI C by discussion with designers and implementers of the existing CHERI C implementations, and with developers who have ported large bodies of code to CHERI C; these discussions identified a number of previously unconsidered issues discussed there.We validate experimentally that our executable mechanised semantics has the intended behaviour, and that this and the behaviour of the Clang/LLVM and GCC implementations are consistent.We developed a test suite of 94 tests exercising and demonstrating various aspects of CHERI C semantics, especially where they may be unclear or differ from ISO C. Table 1 summarizes the semantic categories along with the number of tests that cover each category.
We compiled and ran all our tests using three CHERI C implementations and compared the results.We found that existing implementations are mostly compatible with this standard, with some minor bugs but no principal disagreements.Our assessment of their compliance with CHERI C, as defined by this document, is summarised below.The complete results of our testing are available at https://www.cl.cam.ac.uk/~vz231/asplos24/test-results/.
The output from a single test is presented in Appendix A. This test evaluates how both signed and unsigned integer types manage bitwise operations with intptr _ t.With Clang, non-representability issues arise for cap&int and cap&uint as the operation clears the upper bits of the address, leading to a value below the lower bound.In contrast, GCC does not exhibit this issue, likely because of its memory allocator's address ranges.Cerberus demonstrates non-representability in the ghost state for cap&int, where the value falls beneath the lower bound.
Several interesting aspects of the semantics are related to potential compiler optimisations, and writing tests to exercise those well is (as usual) a challenging problem, which we leave for future work.Our focus here is on exercising the main semantic choices.Table 1.Summary of the tests for which we compared the results on three CHERI C implementations.

Cerberus
This is our reference implementation of semantics and it passes all our tests with the results we expect, modulo one known bug relating to const behaviour.A further known shortcoming is that not all intrinsic functions were implemented.

Clang/LLVM
This was the first CHERI C compiler and is the most mature.It supports three CHERI backends: Morello (CHERI Arm-A), CHERI-RISC-V, and CHERI-MIPS.It has proven to be quite robust and used to port and compile CheriBSD and other software such as KDE.The CHERI C language was developed and refined using this compiler as a testbed to try various aspects of CHERI C semantics.The compiler supports several modes of sub-object bounds enforcement, but we only tested the "conservative" setting as it is the one closest to our semantic definition.We compiled for Morello and CHERI-RISC-V and tested compiled binaries under CheriBSD running under CHERI-QEMU.
Not surprisingly, we found it to be mostly compliant with our CHERI C semantics definition.Our test suite independently identified two known issues that had been previously reported and acknowledged by the compiler team.It also rediscovered an upstream bug present in the LLVM version CHERI LLVM is currently based on but already fixed in later versions.Additionally, our suite detected one spurious warning message and two bugs in the realloc function of the CheriBSD jemalloc library.Some warning messages emitted by the compiler use terminology that is different from the terminology used in this paper (e.g., "capability provenance" vs "capability derivation"), but they are not otherwise incorrect.
Under CHERI-RISC-V (version 8), some test results do not match our semantics.These failures are caused by: 1) an exception when attempting to change the bounds of an untagged capability, and 2) an exception when attempting to modify the sentry capability.However, the current draft of version 9 of the ISA specification [43] changes these behaviours to changing bounds and clearing the tag, respectively.With these changes, it should be compatible with our semantics.

GCC
CHERI GCC is a relatively new arrival.There have been two public releases, and we have seen significant progress in CHERI C support between them.We run compiled "bare metal" binaries under CHERI-QEMU.Our test suite identified five issues in the latest public release of the compiler and runtime, all of which were reported to the developers.One was confirmed as a bug in the compiler.Two issues related to a memory allocator were deferred to a different project (newlib, the libc implementation used in this baremetal environment.The two remaining issues have not yet been confirmed at the time of writing.

CHERIoT
CHERIoT utilises an LLVM-based compiler, specifically targeting embedded systems.As a result of its focus on embedded systems, executing our test suite would require extensive tooling and modifications to the tests, which we have not done.Nevertheless, based on discussion with CHERIoT designers and their review of our semantics specification, our CHERI C semantics are applicable to CHERIoT.It is important to note that CHERIoT provides additional temporal guarantees and defines certain aspects that we regard as undefined behaviour.

Related work
Memory safety has been a pivotal area of research for several decades [13].While there exists a vast body of work on this topic, in this section, we focus on the work most closely related to ours.These can be broadly categorized into four groups below, with examples for the last three being nonexhaustive.

CHERI C semantics and program analysis
The most closely related work is Park et al. [34], which presents a formalized CHERI-C memory object model in Isabelle/HOL [33].That memory model, which is based on the CompCert block/offset model [24], is more abstract than ours, which utilises a finite flat address space that aligns more closely with hardware architectures.They do not address non-representability, pointer-to-integer conversion, or potential optimisations eliminating capability invalidation from direct byte manipulations.Handling of (u)intptr _ t types and capability derivation in arithmetic operations are also absent.They provide proofs of essential properties (which we do not), and their model is combined with the Gillian program analysis framework [25], with a front-end based on Clang and ESBMC [9] to support execution of concrete examples.
Brauße et al. present a bounded model checker for CHERI-C programs [9].This is a viable approach to find some potential code safety violations in CHERI C programs, but does not provide a complete formal semantics for the CHERI C language, or even for the CHERI C memory object model.

Architecture-level memory protection
This category comprises systems like Hardbound [14] and Softbound [30] which have similar goals to CHERI, using additional metadata associated with pointers to provide memory protection.Comparisons between these systems and CHERI are available in the existing CHERI literature.Specifically, one can refer to the ISCA 2014 paper by Woodruff et al. [48] and "Historical Context and Related Work" section of [44].They tend not to consider the source-level C semantics beyond issues that arise during implementation.Providing such semantics presents similar challenges to CHERI C and while the details would differ considerably we expect that our approach would be equally applicable.Softbound illustrates an interesting example of this: the protection checks are added after the main compiler optimisation passes, and so their safety proof does not apply to the original source program.In work on santizers, Isemann et al [20] demonstrate that this is a real problem because (for example) bad memory accesses can be optimised away and so the checks are never performed.This is exactly the difficulty that our semantics anticipates and allows for in Section 3.1.

C dialects with added memory safety
Without the hardware support of the CHERI ISA, these projects use a combination of compile-time and runtime checks.They employ static code analysis and often depend on type annotations.
There are several dialects of C, such as Checked-C [37] and CCured [31], that aim to provide memory safety.Broadly speaking, they are further away in their semantics and type system from ISO C than CHERI C is, and require additional type annotations and source code changes to ensure memory safety.In some cases, safety guarantees only apply to parts of the code (checked regions in Checked-C), and mixing checked and unchecked pointers is allowed.CCured relies on wholeprogram analysis, using different pointer representations for various pointer types.This poses some difficulties with separate compilation and the use of third-party libraries.At the same time, they provide a path for incremental migration to a type-safe language which does not require hardware support.
Castro et al. [10] employ static code analysis to enforce data flow integrity, preserving the standard ISO C semantics without modifications.However, this approach has notable drawbacks: it results in substantial memory and runtime overhead, requires major alterations to existing compilers, and demands extra efforts to instrument both the standard library and third-party binary code.
The Deputy [11] project augments the C language type system with dependent types.While it relies on type annotations provided by the programmer, in some cases types can be inferred.It was implemented through several compiler passes, including type inference, flow-insensitive type checking and instrumentation, and check optimisation.Beyond the added complexity of implementing these passes, dynamic assertions are generated for type constraints involving dynamic values, leading to additional runtime overhead.Porting existing code necessitates the addition of type annotations, and in some instances, requires the code to be rewritten to mark it as trusted.
Dynamic binary instrumentation frameworks, such as Valgrind [32], and tools based on them, like memcheck [40], are useful for debugging and testing to find memory safety problems.However, due to significant performance overhead and deployment complexity, they are not typically suitable for production use.
Backward-compatible bounds checking techniques, such as those in [2,15,16,22,38,39], modify the compiler and utilise a runtime library to track pointers' bounds information in a separate data structure at runtime.Limitations of these approaches, including interactions with uninstrumented libraries, the lack of support for integer-to-pointer casts ( [15]), substitution of array bounds checks with coarser pool bounds checks ( [2,15]), and reliance on the undecidable flow-insensitive points-to analysis [35], do not ensure the complete elimination of classes of memory problems, a guarantee that CHERI C promises to offer.Lastly, the non-negligible performance costs render them unsuitable for production use.

Memory-safe languages
The final category includes memory-safe languages like Rust [26].The main appeal of CHERI C and other memory-safe dialects of the C language lies in their ability to port existing legacy C code without the need for substantial rewriting.With this in mind, we do not compare CHERI C to other such languages here.

Conclusion
Our mechanised semantics for CHERI C should provide clarity of what is (and is not) guaranteed by the language, helping to avoid any divergence between implementations and promote portability of CHERI C code; it has already clarified a number of the issues we describe.It moreover enables a wide range of potential future work.
The discussion here of the interaction between CHERI hardware architectural guarantees and C compiler optimisations and undefined behaviour makes clear that further work is needed to understand what precise security properties CHERI C implementations could reasonably provide.
The fact that our semantics is executable means that it could be used as a test oracle for more aggressive compiler testing, letting one use randomly generated tests without manually curating their intended results.
The fact that the memory object model is mechanised in a theorem prover (Coq) makes it potentially usable for proof about the language, e.g. to make precise properties such as provenance validity and capability integrity that are informally described in the CHERI architecture specification [45].
The semantics would provide a solid basis for program analysis or model-checking of CHERI C.
Finally, this work can provide a basis for extensions to CHERI temporal safety [17] and subobject bounds [36].