AutoCC : Automatic discovery of Covert Channels in Time-shared Hardware

Covert channels enable information leakage between security domains that should be isolated by observing execution differences in shared hardware. These channels can appear in any stateful shared resource, including caches, predictors, and accelerators. Previous works have identified many vulnerable components, demonstrating and defending against attacks via reverse engineering. However, this approach requires much human effort and reasoning. With the Cambrian explosion of specialized hardware, it is becoming increasingly difficult to identify all vulnerabilities manually.To tackle this challenge, we propose AutoCC, a methodology that leverages formal property verification (FPV) to automatically discover covert channels in hardware that is shared between processes. AutoCC operates at the register-transfer level (RTL) to exhaustively examine any machine state left by a process after a context switch that creates an execution difference. Upon finding such a difference, AutoCC provides a precise execution trace showing how the information was encoded into the machine state and recovered.Leveraging AutoCC’s flow to generate FPV testbenches that apply our methodology, we evaluated it on four open-source hardware projects, including two RISC-V cores and two accelerators. Without hand-written code or directed tests, AutoCC uncovered known covert channels (within minutes instead of many hours of test-driven emulations) and unknown ones. Although AutoCC is primarily intended to find covert channels, our evaluation has also found RTL bugs, demonstrating that AutoCC is an effective tool to test both the security and reliability of hardware designs.CCS CONCEPTS• Security and privacy → Side-channel analysis and countermeasures; Tamper-proof and tamper-resistant designs; Information flow control; • Hardware → Best practices for EDA.

Figure 1: A microarchitectural covert channel.The Trojan in the victim process modifies-via permitted operationsmicroarchitectural state to encode a secret.The spy process observes this modification, directly or via a timing difference, to infer the secret.Sec.2.1 exemplifies using a covert channel.

INTRODUCTION
The end of Moore's law has given rise to complex and heterogeneous System-on-Chip (SoC) designs, which are composed of diverse hardware blocks and intricate software systems [5,10,18,22,40,54,57,60,62].Ensuring the security of these systems is becoming increasingly challenging due to the sheer number of hardware modules and their interactions [4,47,49].In particular, microarchitectural covert channels, which exploit hardware state hidden by the instruction set architecture (ISA) [64], pose a significant threat to system security, allowing unauthorized information flow across security boundaries [33].
Uncovering covert channels in heterogeneous SoCs during simulation and emulation-based testing is akin to finding a needle in a haystack, requiring much engineering effort, time, and cleverness to create tests that exercise all possible vulnerabilities.Moreover, upon empirically observing a channel, it is difficult to find the root cause, as the state that leaks information is often not directly observable [64].Even when this cause is found, verifying the effectiveness of RTL fixes is challenging, as design changes may alter the execution that previously exercised the issue.
Formal property verification (FPV) is a promising alternative to exhaustively and precisely find covert channels without relying on tests.However, FPV also presents several challenges, such as a steep learning curve, the difficulty of formalizing the security problem to find the desired behavior as property counterexamples (CEXs), and the exponential growth of FPV tool runtime with the increase in hardware state size.
Our approach: To tackle these challenges, we present AutoCC, a novel methodology that frames the problem of finding covert channels in time-shared hardware (as described in Fig. 1) into an FPV testbench (FT).We also introduce an automated flow that generates FTs implementing our methodology by simply providing the path to an RTL module and a target FPV tool.This systematic approach enables RTL designers to explore potential data leaks (between processes that time-multiplex the usage of a hardware IP block) without needing to reason about which states may leak.Our modular methodology makes it suitable for large designs-circumventing the exponential state growth.The automatic generation of FTs makes our methodology accessible to RTL designers without prior knowledge of formal methods.
The security of a hardware system depends on the security of each component; AutoCC enables designers to more efficiently and effectively identify and address covert channels in heterogeneous SoC designs, enhancing overall system security.
Our main technical contributions are: • A modular FPV methodology that exhaustively searches for execution traces within a victim process that lead to execution differences observable to a spy process.• An automated procedure to generate an FPV testbench that applies the above methodology without requiring any upfront user input or RTL details. • Uncovering covert channels and hardware bugs in the mature open-source RISC-V CVA6 core and MAPLE accelerator.
We evaluate and demonstrate that AutoCC's methodology: • Exercises previously known and new hardware issues in minutes (as opposed to hours of stress-test simulation).• Finds the root cause of a CEX with little engineering effort since the length of the execution trace is minimal.• Uncovers experimentally viable covert channels that we can validate in system-level RTL simulation.• Validates that the RTL fixes to address covert channels are effective since they eliminate the CEXs.

BACKGROUND AND PRIOR WORK
Process isolation is fundamental to system security and the primary mechanism by which information is confined to appropriate domains.A covert channel is an information flow that uses a mechanism not intended for information transfer [33]; it enables information leakage across security boundaries of the operating system (OS) and between domains that should be isolated-violating the system's security policy.For example, a spy process may leverage a covert channel to extract a secret from a victim process.Covert channels can be categorized based on the source of their data leakage.For example, physical channels rely on measurable changes in the electromagnetic field or power draw to extract information [2,61].Microarchitectural channels exploit hardware states invisible to the instruction set architecture (ISA) to enable unauthorized information flow [17,64].Our work focuses on the latter; for the rest of the paper, when we say covert channels, we refer specifically to the microarchitectural ones.
Motivating Example: Let us assume a setup as shown in Fig. 1 to motivate the threat scenario.The victim and the spy are two applications running concurrently on shared hardware.They are (supposedly) isolated by a supervisor using an established mechanism for memory protection.However, this security boundary can be bypassed using a covert channel, for example, by a prime-andprobe attack on the L1 data cache: the spy first primes the data cache by accessing each element of a data array with the size of the data cache (prime buffer), filling the L1 data cache with it.During the victim's execution time slice, the embedded (malicious or unwitting) Trojan encodes a secret S into the microarchitectural state, in this example, by evicting S cache lines with its own data.Finally, the spy again accesses its entire prime buffer, measuring its execution time.Doing so, it observes a latency that linearly depends on the number of cache misses, through which it can infer the number of cache lines that the Trojan evicted and thus the victim's secret S.
Resource Sharing: A microarchitectural covert channel is possible when the spy and the victim processes share a resource.Exploitable resources are those holding state that depends on execution history, and that can impact the timing or behavior of future instructions.This includes the hardware units mentioned above and also subtle ones like arbiters, buffers, and FSMs.Regarding how the processes share the resource over time, we distinguish between hardware threads simultaneously sharing a resource (e.g., a pipeline or a shared cache) and software threads time-sharing a resource (e.g., time-multiplexing a core or an accelerator) [17].Our threat model (detailed in Sec.3.1) is based on time-shared hardware because (a) it is common in specialized hardware, and (b) a security domain may already prefer not simultaneously sharing capacity-or bandwidthlimited resources (e.g., instruction cache, TLBs, predictors, etc.) to avoid contention-derived information leakage.
Spy's Observation Model: For data to leak from the victim to the spy process, the spy must be able to observe some fraction of the victim's execution.Timing channels result from observable timing differences in the spy's execution originating from microarchitectural states whose value depends on the victim's secret [17].Other channels might infer the contents of these states directly based on the outcomes of executing unauthorized operations.The latter are frequently regarded as hardware bugs in security literature, as unauthorized access attempts should not leave traces dependent on the requested data.As Sec. 3 explains, AutoCC detects differences at RTL module interfaces, and thus, its observation model is applicable to all microarchitectural channels.
Victim's Intent: Regarding the intention of the execution trace within the victim process that enabled the information leakage, the literature considers side channels as the subset of the covert channels where the victim process leaks inadvertently, while the rest rely on a malicious function-a Trojan-to use the secret in a specific way that actively leaks information across the security boundary.Our methodology is agnostic to intent, as it explores every possible execution that enables the covert channel.
Protections: The literature in security offers two alternative protections against timing channels: partitioning of hardware resources and constant-time implementations of cryptography software [11].In a simultaneous multi-threading processor, hardware partitioning spatially divides shared resources like caches or prediction tables.In a time-shared processor, shared resources are temporally partitioned via a flush [64]-this is the mechanism we evaluate in this work.Constant-time programming does not necessarily mean that the execution time is deterministic, but that it does not depend on the secret data [19].This programming style avoids branches and array indexing based on secret data.This is done so that benevolent software does not inadvertently leak information (a side channel).Our methodology, by default, does not restrict the type of instructions that can be executed since we focus on finding covert channels to be closed in hardware.However, a user can also constrain the FPV environment generated by AutoCC to only explore executions that are allowed under constant-time programming.Such an environment would verify that a hardware design does not leak data while executing constant-time software.Sec. 5 further discusses the tradeoffs of protecting against covert channels in hardware versus restricting the software.
Detection: Information flow security in hardware has been actively explored since the early 2010s [3,42,52,53,71].While these approaches focus on monitoring and controlling the flow of sensitive data through hardware components to mitigate security threats, they do so via RTL simulation.As such, they are as effective as the test cases provided.Although constrained-random testing and fuzzing can be used to generate a wide range of test cases [9,26,29,32,58], they are not as exhaustive as formal methods.Subtle timing differences can be exploited to extract secrets-if targeted efficiently, even a binary channel can leak a 256-bit AES key in under a second for a typical context switch frequency of 1kHz [64].Thus, formal methods are key to finding every channel.

Formal Methods for Hardware Verification
The first works to ensure RTL correctness through formal verification utilized model checking with SAT solvers and binary decision diagrams [6,41,50].For a given design under test (DUT), a model checker generates a state space of all possible executions of the DUT, given its inputs and the specified assumptions.Assumptions constrain the state space exploration by preventing some behaviors, while assertions check that properties hold on all the explored paths.
FPV backend tools use a variety of solver engines [8,65] to search for property violations (counterexamples) exhaustively.Bounded Model Checking (BMC) is the method of choice for many solver engines today.In BMC, correctness properties are unwound to a bounded number of transitions , reducing the problem of model checking to an instance of SAT.For AutoCC, this means proving the property for all -cycle executions of the DUT-every successful proof increments .What does this mean for completeness?A bounded proof of a property for  cycles means that the property holds for executions of less than or equal to  cycles-longer executions may still result in a property violation.To prove the property for unbounded executions,  must reach a completeness threshold [55].A naive threshold is the number of states in the model; a tighter one is the shortest path between the two states furthest apart in the model [13].In practice, reaching this completeness threshold is not always possible; the checker may run out of time or memory, or the threshold itself may be hard to compute.
Prior work has leveraged FPV for different purposes: RTLCheck verifies RTL implementations of CPUs against their memory consistency models [39]; ILA generates a Verilog model of the design from its functional specification and compares it against the RTL implementation [24]; and AutoSVA checks the liveness properties of RTL module interactions [47].Liveness properties specify that "something good will happen, " e.g., a request is eventually acknowledged, while safety properties specify that "nothing bad will happen, " e.g., a response must have had a request.In the context of covert channels, we are interested in safety properties that detect data leakage across processes.Sec. 3 elaborates on how AutoCC frames this detection as an FPV problem.
Formal methods have also been used to detect security vulnerabilities.InSpectre [21] creates formal models of processors to detect Spectre-like attacks that combine speculative execution and a covert channel.UPEC [14] uses FPV to detect memory leakages via side effects of non-permitted operations.However, UPEC is limited to uncovering memory leakages (e.g., through stale microarchitectural state) and does not consider leakage due to execution time.
To extend the scope of prior work based on formal methods, AutoCC uses FPV on hardware RTL to automatically detect microarchitectural covert channels originating from states whose value depends on a previous execution and impacts the timing of future instructions.AutoCC complements empirical covert channel measurement frameworks such as Channel Bench [16], which show the (non-)existence of some specific channels, but not all.

THE AUTOCC APPROACH
This section first presents the threat model we tackle in this paper, i.e., time-multiplexed executions of processes on shared hardware.Sec.3.2 then describes how we formalize that threat model (into a problem that FPV engines can solve) in order to discover covert channels between these processes automatically.Sec.3.3 explains how to apply this methodology to an RTL project using our automated flow, which generates the FPV testbench and tool bindings.Sec.3.4 proposes a viable path for applying AutoCC to large projects via modularity.Finally, Sec.3.5 introduces two strategies that leverage AutoCC to assist the correct design of temporal protections against covert channels.

The AutoCC Threat Model
The AutoCC threat model assumes two processes, an attacker and a victim, executing on time-shared hardware and separated via a context switch enforced by the operating system (OS).Both processes are untrusted, and the victim runs in a controlled environment where the OS restricts with whom the victim may communicate.
The attacker process possesses no special privileges and executes in a security domain of its own.In theory, no hardware state should leak data from the victim to the attacker since the processes are located in different security domains.However, an attacker could use a covert channel to extract information illegally.Its primary asset is a Trojan, i.e., a piece of code in the victim process that enables the data leak (as depicted in Fig. 1).
As a tool for hardware designers, AutoCC's emphasis is on sensitivity.That is to say, its goal is to expose the full set of possible covert channels to the designers, who then decide the course of action (Sec.5 discusses decision tradeoffs).As such, we place no constraint on how the secret data is encoded into the state of the compromised hardware, i.e., the Trojan can be a malicious hidden function of the victim process or innocent code that leaks data inadvertently as a side effect of a legitimate operation.Aiming to find every covert channel-regardless of the intent of the code enabling it-allows us to prove stronger correctness assertions, i.e., hardware free of covert channels must also be free of side channels.
We further note that this threat model is not restricted to CPUs.Accelerators and other specialized hardware blocks are often shared between processes in a time-multiplexed manner, and they are also susceptible to covert channels.The operations available to these specialized hardware blocks can be considered as their ISA [70].For the rest of the paper, design under test (DUT) refers to the top-level module we are testing, regardless of its level of specialization.

Formalizing the Threat Model for FPV
Having defined the threat model, we now explain how we formalize it as a problem for FPV by pushing the FPV tool closer and closer to modeling the scenario described above.For our formalization, we consider the following definitions: Definition 1 (State).The state of a DUT is the set of all flip-flops, registers, and memory cells contained within that hardware module and its instantiated submodules.The DUT defines our universe of discourse; any RTL outside of the DUT is not considered.This distinction is especially relevant for our discussion on modularity in Sec.3.4.
Definition 2 (Architectural State).The architectural state (ℎ) of a DUT is the subset of the state that is readable via ISA instructions.

Definition 3 (Microarchitectural State
).The microarchitectural state (ℎ) is the subset of the state that is not part of ℎ (not directly readable via ISA instructions).
A process executing on a DUT will naturally alter the values of both ℎ and ℎ.Accordingly, the isolation of these states (to the processes they belong to) is a responsibility shared by software and hardware.A well-implemented OS (1) guards the ℎ that is only accessible via privileged mode and (2) swaps the values of ℎ before another process begins.Well-designed and secure Figure 2: Overview of the AutoCC methodology.The victim processes   and   are free to take on any legal execution for an arbitrary number of cycles; the inputs to both processes are symbolic.At the end of this phase 1 ○, both ℎ and ℎ of  and  may differ.The context switch then occurs, and once it completes at 2 ○, the ℎ of both  and  are the same, but differences in ℎ may remain.(See Fig. 3 for details of the context switch.)We assert our ℎ condition once   begins execution 3 ○.Holding inputs to both universes equal, AutoCC checks whether differences in ℎ after the context switch cause observable differences   execution.
hardware will either partition or flush any ℎ that could leak data from one process to another.In these terms, AutoCC assumes the correctness of the OS and checks the isolation of ℎ.
Data Leakage: Two conditions must be met for data leakage to occur.First, the values of ℎ at the beginning of the spy process are determined from the behavior of the victim process.That is, based on different values of a victim's data, there exist at least two executions of the victim process that lead to different values of ℎ.Second, there exist at least two executions of the same spy program starting from the same values of ℎ that lead to different ℎ, solely because of that difference in ℎ.The goal is to set up an environment where the FPV tool explores any possible execution of victim and spy processes where these conditions are met.
AutoCC achieves this by setting up two instances of the DUTuniverses  and -as follows (see also Fig. 2): Both universes start from an identical reset state; Each universe has its own set of input and output signals; Because each set of input signals is driven separately by the FPV tool, each universe can take on any legal execution.(Sec.3.4 elaborates on what makes an execution legal.)Fig. 2 also defines three events that occur during the execution of the DUT.The first event is the end of the victim process (and the beginning of the context switch), where  and  can be in any reachable state after an arbitrary number of cycles.These states represent all possible executions of the victim process.Although the start of the context switch may be staggered, the end of it serves as a synchronization point between  and , forcing the two universes (with hitherto different executions) into convergence.To do so, the context switch must ensure that upon completion (1) ℎ  and ℎ  are identical, and (2) the microarchitectural flush mechanism has been executed if it exists.With these two conditions met,  and  are assumed to now both be executing the same process, namely the spy process that was just switched in.The inputs for both universes are forced equal to ensure that any observed divergence is only the result of different values of ℎ  and ℎ  .In this post-switch world, we assert that on every cycle, ℎ  and ℎ  must be equal.
What would it mean if this assertion were violated?A counterexample (CEX) to this assertion means that on some cycle following the switch,  and  diverged in an observable way-at the resolution of a cycle-and that this discrepancy was caused by their differing executions before the switch.That is to say, there is a mechanism by which some code in the victim process can affect the execution of the spy process, i.e., a covert channel.Analyzing the CEX and determining the root of this divergence reveals how the channel is operated; we showcase how this encoding and observation occurs in Sec. 4.
Observation Model: In our threat model, the spy is a software program, so for a covert channel to be exploitable, it must be observable by software.In practical terms, this implies that the program's visible state is impacted, which is why Fig. 2 displays an assertion on ℎ.However, given the variety of modern hardware designs, determining which states belong to ℎ can be unclear, and manually specifying all the relevant signals becomes tedious.We pose that as long as there exist ISA instructions that allow a process to expose any subset of ℎ to the DUT output interface, we can assert an equivalent correctness condition just on the DUT outputs of  and  without reasoning about their internal signals.Any difference between ℎ  and ℎ  on cycle  can, by a sequence of these instructions, be externalized by the FPV tool as a difference in outputs on cycle  +  for some bounded .This allows the AutoCC tool to generate an FPV testbench (FT) without user input beyond providing the path to the DUT.Sec.3.3 elaborates on how the FT is generated and how the user might need to manually specify the subset of ℎ expected to be handled by the OS.
Modeling the OS: Our threat model assumes that the OS is trusted and correctly implements the context switch.Rather than reasoning about the sequence of instructions that the OS uses to switch between processes, we assume that its goal is achieved by the end of it.This is represented in Fig. 3 by showing that ℎ differences between  and  and the symbolic ℎ of the spy (yaxis) are resolved by the end of the context switch.Although  and  are in different symbolic ℎ and ℎ during the execution of the victim process, because we consider that the spy process begins when the ℎ is the same in both universes, the FPV tool is only interested in exploring executions of the victim process that lead to this condition.The victim process and the OS are only separated for conceptual purposes, as hinted in Fig. 2 with the dashed line.In practice, there is no bright line between the execution of the victim process and that of the OS; we are agnostic to the timing and specific instruction sequence that lead  and  to the same ℎ.This may result in CEXs that present covert channels that are not exploitable under a specific OS implementation, but we argue that it is useful for a hardware designer to be aware of them.Moreover, in FPV, it is best practice not to overconstrain the model, as this can miss exploring important behavior.
Measuring Context Switch Latency: For all its advantages, taking the end of the flush as the synchronization point between  and  admits one blind spot, as it assumes that the flushes in Figure 3: AutoCC model of the context switch event.Instead of enforcing a discrete jump to a sequence of OS instructions, we simply require that the victim processes in  and  eventually converge to the same ℎ (indicated here by   and   converging on the -axis).This is then the state of the incoming spy process.Since the microarchitectural flush is the last thing that executes before   begins, this convergence must occur by the start of the flush.Note that the flush is free to start on different cycles in  and ; it is only required they complete together.
both universes finish on the same cycle.This precludes any CEXs originating from a difference in the latencies of the flush event itself.If a Trojan can modulate the flush latency and a spy can observe the difference, this latency may enable a covert channel.Nonetheless, AutoCC can further verify the DUT against this behavior by considering the start of the flush as the cycle on which  and  must converge.The flush event may then be considered part of the spy process, and our existing assertions will generate a CEX for any differences between the flush event in  and .

FPV Testbench (FT) Generation Flow
To make AutoCC accessible to hardware designers, we have developed a tool flow that requires minimal effort to set up.It creates-in under a second-a working FPV testbench (FT) from the path to the DUT and the choice of target FPV backend (Sec.3.3.3).This FT has three major components: (1) a wrapper containing two instances of the DUT, (2) a property file that defines the properties to be checked, and (3) a backend-specific command file to invoke the FPV engines with the appropriate parameters.We implemented this FT generation flow in Python, leveraging the AutoSVA framework [44,47] to parse the DUT interface.

Generating the DUT Wrapper.
Based on the top-level RTL module we set as the DUT (e.g., core, accelerators, or subset of them), the flow generates an FT in 3 steps.
First, the flow parses the interface signals of the DUT to create the wrapper's interface.The input and output signals of the wrapper are two sets of the DUT signals, each with a unique suffix (e.g.,  and ), except for the signals we do not want to replicate, such as the clock and reset signals.
Second, the flow instantiates the DUT twice-as submodules of the wrapper-with different names, i.e.,  and .
Third, it connects each set of the independent, duplicated interface signals to the corresponding submodule and the common, non-duplicated signals to both submodules.If users want other interface signals of the DUT not to be replicated (e.g., a debug interface), they can specify them via a Verilog comment (//AutoCC Common) above each signal.This is equivalent to assuming that an input signal is equal throughout the entire execution, which may be useful to deal with illegal inputs, as we elaborate in Sec.3.4.Making a signal common to  and  helps improve the FPV tool runtime at the cost of not searching the space state derived from that signal being different in both universes.Listing 1: Property file created generated by the AutoCC tool.
It uses the signal that indicates that ℎ flush has finished in both universes, to start the equality condition that defines the transfer period.After the transfer period is done, the spy process begins, i,e, inputs are assumed equal in both universes, and outputs are checked.
Listing 1 shows the template of the property file generated by Au-toCC.Users are not required to provide a priori information about the internals of the DUT, as the properties generated solely use interface signals.Properties are written in SystemVerilog Assertions language (SVA) [27].Assumptions are generated for DUT inputs and assertions for DUT outputs.
Transactions: When a valid signal governs a group of signals, we name it a transaction.We use this valid signal as a precondition for the properties reasoning about the payload of the transaction.This means that we do not check whether the payload of an outgoing transaction (from the DUT perspective) changes values while the transaction is not valid.However, if the RTL module to which the DUT is outputting wrongly uses an invalid payload, this would be detected by AutoCC when applied to this incorrect module since the input payloads are only assumed equal when the input transaction is valid.This careful management of interface transactions is crucial when verifying a large design via modularity (Sec.3.4).We reuse AutoSVA's method to identify transactions automatically [47].
Defining the Architecture and Flush Conditions: By default, AutoCC does not identify the ℎ flush event or the set of ℎ signals.Users can modify these signals depending on the DUT to determine when a flush is considered finished and which state elements belong to ℎ.As we showcase in the evaluation section, we recommend adding states to the architectural_state_eq condition as CEXs are found to avoid overconstraining in advance.However, states that are clearly architectural because the OS manages them, e.g., the register file, may be added upfront.
Flush Completion: The flush event can be tricky to nail down as some DUTs do not have a well-defined signal for when the flush completes, and some do not have a flush operation at all.For instance, certain accelerators are designed under the assumption that when a new process begins utilizing the accelerator, there are no ongoing operations within its pipeline.That is to say, each stage of the pipeline must be idle when a new process begins; for these DUTs, flush completion can simply be defined as an idle pipeline.
Transfer Period: This concept is introduced to ease the definition of the flush completion on DUTs that have neither a flush nor an idle signal.The condition defining the transfer period is that for some cycles after the flush has finished, both ℎ and the interface signals are identical for  and , giving time for the pipeline stages in both universes to converge.As shown in Listing 1, the length of this transfer period is configurable via the THRESHOLD parameter.In theory, a transfer period of  cycles would eliminate CEXs that could only exercise within the first  cycles of the new process.In practice, as long as  remains smaller than the length of the OS operations between the flush completion and the transference of control to the spy process, these CEXs would not correspond to exploitable covert channels.As a heuristic, the length of the transfer period may be set to the longest path through the pipeline.
Spy Mode: The properties in Listing 1 only apply when the spy process is executing and the transfer period has elapsed (spy_mode is asserted).Until then, the inputs to both universes are free to be different, and the outputs are not checked.

AutoCC's FPV Backend Support.
The adoption of formal methods is frequently hindered by the access to FPV engines, as the need for training to use them effectively.To ease their usage, our tool also generates the backend-specific commands and binding files required to use FPV engines-based on their documentation [8,65].We have tested AutoCC with two different backends: JasperGold [7] and YosysHQ's SBY [65,67].Once the properties and bindings are generated, our tool invokes the backend to start the property-checking process.Our methodology only uses single-cycle properties, which are efficient for FPV engines to verify and are supported by the open-source part of SBY.Thus, our tool is potentially amenable to an end-to-end open-source tool flow via SBY when applied to Verilog projects.

Reducing the State Space via Modularity
Covert channels can potentially be exploited from any state that a victim touches.Thus, AutoCC should be applied to all the RTL modules impacted by that software process.Proving the assertions of Listing 1-or achieving a deep-enough bounded proof-is often infeasible for SoC designs of realistic size.
The space state exploration in FPV (and thus backend tool runtime) grows exponentially with the RTL size and the search depth (time in cycles).As a baseline mitigation, we adopt the standard technique of minimizing the size of parameterized modules, such as TLBs, caches, etc [55].Provided that the downsized module is still able to exercise all the relevant features, this technique would not affect the coverage of evaluation.However, this technique is often not enough to achieve a sufficiently deep bounded proof to provide confidence in the correctness of the design.To that end, we adopt two techniques: blackboxing and modularity.(Since blackboxing is a form of modularity, we discuss them together.) The implications of both techniques are very similar, but they differ in the location of the abstracted module.Blackboxing means that a submodule of the DUT is abstracted away from the verification engine, while modularity means that we create a new FT where the DUT is a submodule of the former top module.In practice, blackboxing can be thought of as if the submodule was moved outside the DUT while the wires that connect it to the DUT are left intact.These wires now become part of the DUT interface and are subject to the same constraints as the other DUT inputs and outputs, i.e., upon entering the spy mode, the wires that output the DUT (and input the blackboxed module) are checked to be equal in  and , while the inputs to the DUT are assumed equal.
To the FPV engine, the internals of a blackboxed module do not exist; it does not follow any state evolution.Thus, a module should only be blackboxed if the user does not care about any leaks originating from within it.(This could be because the OS is assumed to flush the module's state or the module has already been verified.) Advantages: First, since the DUT contains less state, the combinatorial search size is reduced exponentially.Second, the exploration depth required to exercise the relevant features of the DUT is reduced since the FPV tool is driving the inputs of the DUT directly.
Disadvantages: The CEXs found are less informative since we do not know how the inputs of the DUT were produced.For blackboxing, this refers to the outputs of the blackboxed module, which drive the rest of the logic within the DUT.Moreover, the CEXs are more likely to be spurious since inputs to the DUT may be illegal.

Definition 4 (Illegal Input Sequence
).An input sequence to the DUT is considered illegal if it is unreachable when the DUT is instantiated within the full SoC (driving the DUT inputs).
Based on the above definition, the user could create assumptions to limit the inputs to legal values, e.g., do not receive a memory response if a request was not sent.A hardware designer may decide not to include these assumptions in its RTL module if the rest of the SoC is untrusted (e.g., resulting from integrating third-party IP).Alternatively, one may add individual assumptions to the FT to limit the inputs to legal values.To ease the modeling of DUT's outgoing transactions, our tool flow can also generate that from AutoSVA annotations [44].However, we argue that in FPV, it is good practice to add assumptions and modeling upon encountering spurious CEXs, as it is a good way to learn about the design and avoid overconstraining the verification process.

SoC-level Verification:
To apply AutoCC at the SoC level, we recommend first creating FTs for RTL modules with the simplest interfaces, e.g., modules connected to the network-on-chip (NoC).This makes it much easier to deal with illegal inputs, as the NoC protocol is usually well-defined.Our properties in Listing 1 are designed to be modular so that RTL modules can be independently verified for the absence of covert channels.However, modularity results in more effort, not because of creating the FTs (which is automated in AutoCC), but because the DUT inputs are arbitrarily driven by the FPV tool, making the CEXs more prone to be spurious.

AutoCC during RTL Development
Listing 1 properties are expressed using interface signals, making them implementation-independent.This, along with their modular nature, allows designers to utilize AutoCC properties for test-driven development (TDD), where CEXs help to refine the design [56].
TDD is particularly useful for designing the ℎ flush mechanism.The overall flush mechanism would be correct if every module involved in the victim process effectively flushes exploitable ℎ and the orchestration of the flush signals across modules is properly implemented.We propose two methods that use AutoCC to identify the minimal set of ℎ states that need to be flushed to provide full temporal partitioning (i.e., no observable differences).
Algorithm 1 incrementally builds the flush mechanism by adding flushes to the states that cause CEXs to AutoCC properties.
Algorithm 2 starts with the assumptions that the entire ℎ is being flushed and AutoCC properties achieve a proof.Then it iteratively takes a state from the set of candidates and removes it Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
from the flush signal as long as proof is still achieved.The candidate set is a subset of flush since there may not be an incentive to remove a state flush if it does not impact performance.Both approaches assume that FPV returns in a finite amount of time, and the user is responsible for determining when a bounded proof yields confidence.

EVALUATION AND RESULTS
This section presents our evaluation of AutoCC on four open-source projects: 32-bit RISC-V Vscale core [38]; application-class 64-bit CVA6 core [43,68]; MAPLE memory access engine [45,46], and an accelerator for AES encryption [38].We chose these projects because they represent a diverse set of designs in terms of complexity and pipeline depth.Table 1 lists the valuable CEXs we found.We consider a CEX valuable if it uncovers (a) a behavioral difference in the execution of a spy process based on the state left by a victim process or (b) unexpected or unintended behavior in the RTL based on legal execution.Alternatively, a spurious CEX is caused by an illegal input sequence (see Definition 3).Table 1 also shows the depth of the CEX (length of the execution trace) and the runtime of the FPV tool.Although we have validated the AutoCC methodology with both SBY and JasperGold, we chose to perform evaluations with the latter due to familiarity with its GUI and because we are also evaluating SystemVerilog projects.
During the rest of the section, we walk the reader through the steps of applying AutoCC to the RTL projects listed above, including generating the FTs, refining the architectural state signal upon CEXs, and finding the CEXs indicated in Table 1.In the case of CVA6 and MAPLE, we (a) found hardware bugs and exploitable covert channels and reproduced a leak in system-level RTL simulation, (b) fixed these bugs and vulnerabilities in RTL and re-ran AutoCC to confirm that the CEXs were no longer found, and (c) merged these fixes into the upstream repositories of these open-source projects.

The 32-bit Vscale RISC-V core
Step-by-step use-case.Because Vscale is the first DUT presented, we will walk the reader (as a potential user) through how we applied the AutoCC methodology to it (see specific commands on Sec.A.5).
First, we create the FT by running the AutoCC python script indicating the path to the top-level module of Vscale (vscale_core.v).Second, we start the exhaustive exploration by running JasperGold and indicating the path to the generated FT.Note that this first run uses the default values for the flush and architectural state signals (see Listing 1).The CEXs shown in Table 2 result from iteratively refining the definition of the architectural state.V1.The first CEX we observed was caused by a jump to an address in a register.Recall that the default assertions in the FT only check whether the output interfaces of the DUT are equal.Thus, the formal engine searches for an execution path to expose different internal states at the output interfaces.We refined that CEX by adding a condition to architectural_state_eq to check that pipeline.regfile.data is equal in both instances of the Vscale core.We could have added this condition from the beginning, but we chose to add them as we were finding CEXs for three reasons: (1) because we had not looked inside the core's internal state before, and so the CEX helped us find the path to each signal name; (2) to validate that the methodology can find covert channels based on an unflushed state; and (3) because it is good practice to start with the simplest precondition possible to make sure we do not overconstrain the state exploration.
V2.The second CEX was caused by a jump to a register previously fetched from the CSR module.The OS is responsible for protecting and managing the CSR registers, so these should be considered part of the architectural state.Since the CSR module contains many registers, it was more convenient to blackbox it and follow the procedure described in Sec.3.4.
V3.The third CEX was caused by the PC being different in both universes, causing the next instruction fetch to have a different address.We refine this CEX by adding the PC registers along the core's pipeline to the architectural state.
V4 & V5.The fourth and fifth CEXs are caused by the fact that the Vscale core does not have a temporal fence like the version we used for CVA6 [43].Particularly, our fifth CEX of Table 2 showed a case where an interrupting instruction in the write-back stage of -from the execution before the context switch-was causing stalls in the fetch stage of the pipeline for the spy process.However, since the OS code that manages the context switch has more instructions than pipeline stages of Vscale, it seems reasonable to consider that all instructions inside the pipeline should be equal in both universes when the spy process is about to start.For this evaluation, we assume a trusted and correct OS.Nonetheless, if an AutoCC user prefers not to assume that, this CEX could constitute a covert channel in that threat model.
Bounded proof.After refining the last CEX, the FPV engine kept searching until it reached our limit of 24 hours.At that moment, it had reached a bounded proof of depth 21.Since Vscale does not have caches or deep units, and the previous CEX had depth 9, we believe it would not find more CEXs even if it ran longer.

The 64-bit CVA6 RISC-V core
CVA6 is a mature application-class RISC-V core, fully implementing I, M, A, F, D, and C extensions (ISA v2.3) and three privilege levels (M, S, U).CVA6 has been taped out numerous times into silicon [12,15,34,69] and offers several cache, MMU, and core configurations, including 32-bit and 64-bit variants.
Configurations.We used the 64-bit one with all the extensions, defined by their cv64a6_imafdc_sv39_config_pkg configuration file.However, we shrank the size of caches (16 lines), TLB (4 lines), and branch predictor table (16 entries) to reduce the state size while still exercising their functionality.Leveraging the modularity of AutoCC, we disabled the floating-point unit to lighten the FPV process, as this IP block could be evaluated separately.There are three adaptations of CVA6 that implement different versions of the fence.tinstruction-a ℎ temporal partitioning mechanismwith increasing levels of flush exhaustiveness [63].
Validating previously-found covert-channels.Our work began with the second implementation-full flush-which clears the caches, TLBs, branch predictors, and other states in smaller units, such as arbiters.We set the  ℎ_ condition as the fence.thas completed in both universes, i.e. when the write-back data cache (D$) has invalidated its lines.One of the first CEXs we found (after we added the PC, register file, and CSR into the ℎ signal) was caused by executions where  had an outstanding AXI (Advanced eXtensible Interface) request going into the flush while  did not.Since the arrival of the flush signal kills all outstanding AXI transactions, 's instruction cache (I$), which was making the request, transitioned to a KILL_MISS state while 's remained in IDLE.This divergence of ℎ can lead to an observable timing difference after the flush event, for instance, by issuing another cache request.A natural solution is to stipulate that the flush must first wait for all outstanding AXI requests to be completed.We still found another CEX after assuming that all AXI requests are satisfied before the flush.In this new CEX, the page table walker (PTW) takes longer to flush in  because it had an active memory request to the D$.These CEXs confirm and extend prior findings about full flush fence.t in Wistoff et al. [63].The observation that subtle, hard-to-find components may produce a covert channel (when not cleared systematically) was their primary motivation for the third implementation of CVA6's ℎ flush: microreset.
Evaluating the safest configuration.Unlike the full flush, microreset targets the entire ℎ rather than attempting to identify a subset of vulnerabilities (only ℎ is left unflushed).Microreset also enforces the fence.tlatency be independent of any previous execution, padding it to the worst-case: the latency of a full D$ write-back.Flushing all ℎ and padding to a constant latency is the most thorough temporal partition a designer can do against covert channels in hardware, so we were not expecting to find any relevant CEXs; however, we found three, presented below.
C1. First, we found a CEX where an I$ fetch results in an exception in both  and .Since the exception is a valid response for this transaction, icache_dreq_i.valid is asserted even though the fetch did not hit the I$.In the frontend, CVA6 loads icache_data with whatever data payload it receives from the I$, as long as the response is valid.This payload is an input into the instruction realigner; the crux of the CEX is that the realigner sets its valid signal (for the output back to the pipeline) based on a bit of this payload without knowing that the payload came from an invalid I$ line.The difference in the output of the realigner then results in a PC mismatch in  and .We tentatively fixed this to continue exploring by zeroing out the data payload if we do not hit in the I$.
C2. Second, we faced a CEX caused by an invalid FSM transition in the PTW.This CEX begins with a TLB miss in both  and , resulting in both universes going on a page table walk; the flush signal from fence.t arrives while the walk is ongoing.The FSM logic for the PTW dictates that if the PTW looks up a page table entry (PTE) when flush gets set, it should wait for a response before going to IDLE.(The intended transition is PTE_LOOKUP to WAIT_RVALID, then WAIT_RVALID to IDLE on receiving a valid response.)This is exactly what  does.However, while  is in WAIT_RVALID,  also handles an exception, causing flush to get set again.As a result, 's FSM transitions to IDLE on the next cycle, terminating the page walk before it gets a response.We reached out to the CVA6 maintainers to discuss this corner case and proposed a fix, which has been merged upstream. 1This CEX showcases that AutoCC not only finds potential covert channels but also errors in the design.
C3. Third, we hit a CEX where  observes a chain of events involving the I$, TLB, PTW, and D$.Initially, the I$ experiences a miss, whose memory translation also results in a TLB miss.Subsequently, the PTW starts fetching PTEs, which results in a D$ request, right when the flush signal arrives.Although the TLB and PTW eventually get flushed, the D$ ends with a valid line after the flush completes.This CEX shows that a sequence of events initiated before the flush leads to an effect observable after the flush ends, constituting a potential covert channel.Based on this CEX, we find that draining D$ transactions after writing back the D$ and before clearing the design's flip-flops is insufficient; D$ transactions need to be drained before and after the write-back.We have made a corresponding fix for microreset. 2

The MAPLE Memory-Access Engine
MAPLE is an accelerator for fetching memory patterns that supports fetching single array elements, array ranges, and indirect memory accesses.It also contains a memory-management unit (MMU) for virtual memory translation.In addition to load and consume operations, the API offered by MAPLE exposes several registers to configure the hardware queues and the MMU.Particularly, the API offers a init operation to allocate a MAPLE instance (by mapping its memory-mapped configuration registers into virtual memory), a close operation to de-allocate the instance, and a cleanup operation to invalidate these configurations and flush the TLB between processes.The cleanup operation is performed as a first step of the initialization process.
Flush mechanism.We used the FSM that controls the invalidation process to set up the flush signal-when the invalidation state transitions to idle.Although MAPLE queues could be considered architecturally visible, these are flushed by the cleanup operation, so we did not add them in the architectural state condition.
M1.The first CEX we quickly found was caused by several other requests being in the NoC protocol's output buffer in  when the flush signal was set.Although this could potentially yield a covert channel under special timing conditions (an old request being backpressured from the NoC), we chose to continue exploring CEXs by assuming that this buffer is empty during the context switch.
M2.The next CEX was caused by the TLB in  being disabled while the TLB in  was enabled.The TLB is enabled by default at reset, but MAPLE's API allows disabling it.We found from the CEX trace that the flip-flop of TLB being enabled is not flushed during the context switch.This flip-flop could be used as a binary covert channel, provided that the Trojan could disable the TLB and the spy observe a page fault.We fixed this in MAPLE's RTL by resetting this flip-flop during the flush.
M3.The third CEX, found after a couple of hours, was caused by another register not being flushed.This one is the base address of the array for which subsequent data fetches can be offloaded to MAPLE by indicating an array index.To better describe this covert channel and how to exploit it in practice, we recreate a data leak with a test written in C. Listing 2: Pseudocode of the program that lets a spy process recover the secret that a Trojan is actively leaking.MAPLE has a function (dec_set_array_base) that sets the base address of an array so that subsequent loads from it are offloaded to MAPLE by simply indicating the array index to load (dec_load_word_async).Since AutoCC found that this base address is not properly flushed, we can use it to leak the secret.The secret is leaked a byte at a time, by using it as an offset to set the base address of the array.Since the spy has allocated an array where array[index]==index, this offset is inferred from the loaded value.Exploiting M3 at system-level.Listing 2 shows the leak function that allows a Trojan to encode a byte of the secret per iteration and the observe function that allows the spy to recover it.To evaluate this test 3 , we first built an RTL simulation environment of MAPLE integrated with the OpenPiton SoC [4] following the tutorial in the MAPLE repository.Then, we performed the test bare-metal using VCS O-2018.09-SP2.It took under a minute for VCS to simulate the test on the OpenPiton SoC with MAPLE, where the spy recovers 8 bits per iteration, e.g., a 32-bit secret could be recovered with 4 iterations in less than 6,000 clock cycles.
Closing the covert channels.We have merged the RTL fixes to close M24 and M35 covert channels into the upstream repository of MAPLE.For fabricated chips that include MAPLE [15], these channels could be closed in software by writing these registers explicitly to the reset value during the invalidation process.

An AES Accelerator
The AES accelerator we evaluated takes a 128-bit plain text and a 128-bit key as input and produces a 128-bit cipher text as output.It is a pipelined accelerator with 40 stages.We applied our methodology by following the same steps as in the previous section.We first ran the default FT generated by AutoCC, without specifying the flush signal.This accelerator does not contain any architecturally visible state but rather follows a request-response protocol.
A1.We found a CEX at depth 42 in a few seconds; universe  contained several ongoing requests, while  had none.Since the flush signal (set free) appeared while the accelerator pipeline in  was processing requests, a timing difference appears when  eventually responds, and  does not.
Using accelerators concurrently.The design of this AES accelerator assumes that it will only be used by one process at a time, as it does not offer any invalidate or flush signals.This would work well in a scenario where another process cannot use the accelerator until all the requests have been responded to.This is a reasonable assumption in the context of a well-programmed allocation of system resources.Hence, we refined this CEX by defining the flush signal as the condition of both universes having no ongoing requests.Once this was added, the tool found full proof in 5 hours.
Heterogeneous SoCs may lead to subtle vulnerabilities.In the era of heterogeneous hardware, system designers have to be very careful when integrating third-party IP blocks, as they might not be aware of the assumptions made by other designers.Otherwise, integrating an IP block similar to this AES accelerator (without hardware invalidation mechanism) in a system that does not assume the OS to shield the allocation of hardware resources (e.g., waiting for all ongoing requests) may enable a covert channel.

DISCUSSION: HW/SW PROTECTIONS
We understand that security is not the task of hardware alone.Designers often have to make trade-offs between PPA6 and security; by identifying covert channels, our methodology helps them make informed decisions by knowing which hardware blocks, features, or optimizations may cause data leakage.Our approach also provides concise traces of the execution that led to a particular state and how that state led to an observable difference in the spy process.
Tradeoffs: With this knowledge, a hardware vendor can better decide whether to close the covert channel in RTL or warn against it on its security specification. 7For example, if a hardware-based division operation is found to be susceptible to a covert channel and fixing it would significantly slow down the operation for nonsecurity-critical applications, the hardware vendor may decide not to fix it but flag it, so that programmers prioritizing security avoid using divisions on sensitive data.However, addressing the channel in hardware may be worth it if it has a minor impact on PPA.This is the case for the covert channels found in this paper, where enhancing the existing flush mechanism fixed them with negligible PPA implications.The hard part of fixing these channels was knowing about their existence, which is what AutoCC provided.
The Cost of Flushing Microarchitectural State: Although analyzing the PPA impact of flushing ℎ is out of the scope of this paper, we can make some observations.Flushing ℎ may affect runtime in two ways: (1) the time it takes to flush the state, and (2) the time it takes to restore the state after the flush.The first one is impacted by the unit that takes the longest to flush; much of the state can be flushed in a single cycle, but some units may take longer (e.g., write-back caches).On the second one, the concern is the performance loss due to the unavailable state after the context switch, e.g., more misses may occur because the cache is flushed, or the branch predictor might need to relearn the branch history.Prior work found that this impact mostly depends on the period between context switches and the size of these structures [63].For example, since on-core caches are small (typically much smaller than the program working set [16]), the lines interesting for the second process are likely evicted by the cache replacement policy anyway, and so there is no performance impact due to the flush.
We regard the problem of preventing covert channels as a challenge in hardware-software co-design.Hardware must provide the means to partition shared resources so that an OS can use these as necessary when reallocating those resources from one security domain to another.To that end, AutoCC can assist in designing and verifying temporal partitioning mechanisms for RTL modules.

FURTHER RELATED WORK
Information flow tracking (IFT) monitors the flow of sensitive data through hardware components via RTL simulation [3,42,52,53].Like AutoCC, IFT techniques provide a precise trace of the leakage; however, they rely on input tests and user-provided security properties.Prior works in IFT are in part orthogonal to AutoCC since they focus on SoC-level simulation while AutoCC formally verifies hardware components-potentially early in the design phase.
Other works in the area of information flow security propose new hardware description languages that integrate aspects of type systems to prevent illegal information flows.Caisson [36] statically analyzes designs written in its language to guarantee noninterference.Sapper [35] offers the same static guarantee by automatically inserting runtime checks into a Verilog design.SecVerilog [71] extends Verilog with a label-based type system to allow for dynamic labels that depend on runtime values.All of these approaches must be applied end-to-end on the entire design and require significant modification and annotation of existing RTL.This, in turn, requires reasoning about design internals and their security properties.
Like AutoCC, Simarel [31] uses bounded model checking to verify relational invariants between core executions.They focus on inductive invariants to prove information isolation.However, Simarel generally reasons about flows between levels in a security lattice, and no testing occurs against a formalized context switch.
While prior work is effective at tracking hardware state being read and propagated, they do not directly consider how timing in the program execution may also be used to extract information.

CONCLUSION
Our work introduces an FPV-based methodology that, given an RTL module, exhaustively searches for execution traces of a victim process that lead to execution differences observable by a supposedly isolated spy process.We demonstrated the effectiveness and efficiency of this methodology by applying it to four open-source hardware components.Particularly, we found that AutoCC: (1) exercises previously-known issues within minutes, compared to lengthy stress-test simulations or emulations; (2) helps find the root cause of a CEX with minimal engineering effort due to the short length of the execution trace; (3) exposes new hardware bugs and covert channels in the mature RISC-V CVA6 core and the MAPLE accelerator; (4) uncovers experimentally-viable covert channels as we reproduced one via system-level RTL simulation; (5) validates that RTL fixes to close covert channels are effective.
Users: AutoCC holds much value for hardware designers, empowering them to systematically search for covert channels in RTL during or after development.We believe AutoCC is most useful for developers of RTL modules or for those integrating third-party modules into a larger system.To make AutoCC accessible and practical for our potential users, we have: (a) developed an automated flow to generate FPV testbenches implementing this methodology, eliminating the need for upfront user input or RTL details; (b) proposed a test-driven approach to assist the design of hardware that requires temporal isolation, i.e., flushing the ℎ state between processes; (c) open-sourced AutoCC and added its artifact evaluation to showcase how to apply AutoCC to more RTL modules.

Table 1 :
Description, DUT execution depth, and FPV tool runtime (in minutes and hours) of the CEXs found in Vscale (V), CVA6 (C), MAPLE (M), and AES (A) that uncover hardware bugs or possible covert channels.Description Depth Time V5. Interrupt in the WB stage stalls pipeline 9 < 10 min.C1.Leaks invalid I-Cache data to the next PC 76 < 30 min.C2.Wrong transition in the FSM of the PTW 80 < 6h C3.Valid D$ line after flush caused by PTW 80 < 6h M2.Leak whether the TLB was disabled 21 < 30 min.M3.Leak the value of a configuration register 23 < 3h A1.Request in the pipeline during the switch 42 < 1 min.

Table 2 :
Description, depth, and FPV tool runtime (in seconds) of every CEX found in our experiments with Vscale starting from the default AutoCC FT, in order.