Augmented Symbolic Execution for Information Flow in Hardware Designs

We present SEIF, a methodology that combines static analysis with symbolic execution to verify and explicate information flow paths in a hardware design. SEIF begins with a statically built model of the information flow through a design and uses guided symbolic execution to recognize and eliminate non-flows with high precision or to find corresponding paths through the design state for true flows. We evaluate SEIF on two open-source CPUs, an AES core, and the AKER access control module. SEIF can exhaustively explore 10-12 clock cycles deep in 4-6 seconds on average, and can automatically account for 86-90% of the paths in the statically built model. Additionally, SEIF can be used to find multiple violating paths for security properties, providing a new angle for security verification.

Unfortunately, symbolic execution infamously suffers from the path explosion problem.The number of paths through a design grows exponentially with the number of branch points in the design.Hardware designs have the added complexity of reasoning about paths over multiple clock cycles in order to realize complete flows of information from an input port (source) signal to an output port (sink) signal.Current solutions to the path explosion problem have been to consider small, but security critical designs [24], [25], or to constrain the hardware design space by analyzing how information flows for a particular software program [21], [26].
We take a different approach.We start with static analysis, using an existing tool [27] to build a graph that overapproximates how information flows through a design.Such a graph is useful for designers, allowing them to explore their design and find possible illegal or insecure flows.For our purposes, the graph provides useful information that can be used to guide symbolic execution: a sequence of landmark points in the hardware design that execution must reach in order to realize a given path of information flow.Using these landmarks as a guide, we use symbolic execution to improve the graph's efficacy: finding a realizable path through the design state, along with the inputs needed to take that path, corresponding to a path in the graph; recognizing and eliminating from the graph paths which are unrealizable in execution; and recognizing and eliminating from the graph paths which are realizable, but do not represent a true flow of information.
This paper presents SEIF (pronounced "safe"), a toolflow that combines symbolic execution with static analysis in the form of the information flow graph.SEIF takes as input the statically built information flow (IF) graph and the source signals of interest in the design.Three outcomes are possible: 1. SEIF finds that the path is unrealizable or does not represent a true flow of information, and requires no further scrutiny from the security engineer, 2. SEIF returns a sequence of input values that will drive the design along the IF-graph path to realize the flow of information, or 3. the complexity of the search space leaves the IF-graph path unaccounted for.
To find that a path is unrealizable or does not represent a true flow of information, SEIF uses two mechanisms: a check for mutually contradictory constraints and symbolic analysis.If the first mechanism reports that a path is unrealizable then it is, regardless of the number of clock cycles the design is allowed to run.If the second mechanism reports that a path is unrealizable, then it is within the clock cycle bound used by SEIF.In our experiments, this was the case for 5-7% of the paths.
To find and return a sequence of input values that will drive execution along an IF-graph path, SEIF uses symbolic execution guided by the IF graph and heuristics we develop.The returned sequence of input values will drive execution along the IF-graph path, either starting from the design's reset state or from an intermediate state.In our evaluation (Section 7) we differentiate these two cases.
In this paper, we develop SEIF, an algorithm and tool to search for and eliminate false paths of information flow from a static analysis of a hardware design and then to further explicate the paths that remain.We show that by using the static analysis as a guide, we can guide symbolic execution toward more probable paths and eliminate impossible paths early.Our contributions are: • Define SEIF, an augmented symbolic execution methodology for information flow analysis.• Implement the methodology and search heuristics on top of the symbolic execution engine discussed in [28].• Evaluate the augmented symbolic execution strategy on four open-source designs.

Threat Model
Information-flow analysis is a part of the security validation activities [29] that take place during the design phase of the hardware lifecycle [30].The goal is to find weaknesses, vulnerabilities, and flaws in register transfer level (RTL) designs that may be exploitable post-deployment.
Flaws that result from logic and physical synthesis tools, manufacturing, or the supply chain cannot be discovered by SEIF.We target flaws that occur by benign human error in the specification, design, or implementation phases.Our analysis may find maliciously inserted flaws, they will have a lower chance of being uncovered than benign flaws as the attacker will likely take steps to hide their work, so that the security engineer does not recognize the malicious flow of information as dangerous.Flaws maliciously inserted after the security validation is complete, e.g., analog Trojans [31], cannot be discovered by SEIF.

Problem Statement
We approach the problem of information-flow analysis by transforming it into a graph reachability problem over a labeled, directed graph representing signal connectivity, extracted from the Verilog RTL design.We use symbolic execution of the RTL to determine which paths through the labeled directed graph represent true flows of information through the design in execution.
Given a hardware design and a particular input signal of interest, the goal is to return: 1) the set of realizable information flows through the design originating at that signal; and 2) for each found information flow, return a sequence of input values to the design that will drive the information flow.

Preliminaries
It is useful to keep in mind three models: the state diagram of the design showing machine states and transitions between them; the labeled, directed, signal-connectivity graph, which we call the Information Flow (IF) graph; and the symbolic execution (SE) tree, showing execution paths through the RTL along with the associated (symbolic) states and path conditions.We describe these three in the following sections, but first we introduce a fragment of Verilog RTL as a toy example to help illustrate the three models.

Toy Example
The code snippet of Figure 1 shows a flow from an input, secret, to an output, led.The flow is guarded by an internal, state-holding variable and the secret will only flow to the LED output in the clock cycle after state = 3.Note that with non-blocking assignments ("<=") all righthand side expressions are calculated at the same time and assignments take effect at the next clock cycle.Blocking assignments ("=") take effect immediately.

State Diagram
We model a hardware design as a tuple, D = (S, s 0 , I, δ, ω), where • S is the set of states of the design; • S 0 ⊂ S is the set of initial states; • I is the finite set of input strings; • δ : S × I → S is the transition function; • ω : S → O is the output function.A state s ∈ S is a vector of valuations to state-holding internal variables of the design, s = ⟨v 0 , v 1 , . . ., v |s| ⟩.We use v i to indicate the variable and ⟨v i = x⟩ sj to indicate that the value of variable v i is x in state s j .As shorthand, we sometimes use v i to refer to both the variable and its value, when it is clear in the text what we mean.The design powers up in an initial state, s 0 .Many state-holding variables are reset to 0 in the initial state.An input string i ∈ I is a concatenation of values to input variables of the design.Inputs are provided on every clock cycle.Similarly to stateholding variables, we refer to the value of input variable v j at any given clock cycle as ⟨v j = x⟩ or simply v j .The clk signal is a special input that synchronizes reading input values and state transitions, which happen on clock cycle edges.The output function is the identity function over a subset of the design's variables.For example, Figure 2 shows a sequence of state transitions for the toy example, starting with the initial state, in which information flows from secret to guard to output variable led.In this example, the initial state s 0 = ⟨prev = 0, state = 0, guard = 0⟩ produces output ω(s 0 ) = ⟨led = 0⟩, and transitions to state s 1 = ⟨prev = 0, state = 1, guard = 0⟩ when enable is high on a positive clock edge.
Figure 2: State transitions of the toy example (Figure 1) in which information flows from secret to led.

Information Flow (IF) Graph
The Information Flow (IF) graph is a labeled, directed graph that captures signal connectivity and provides additional information, taken from the Verilog source, about the conditions under which two signals are connected [27].Nodes represent the variables (wires and regs) of the design, and edges indicate a possible flow of information from one variable to another.An edge (v 1 , v 2 ) exists when either there is an assignment from v 1 to v 2 (e.g., v 2 <= v 1 ) or v 1 appears in a condition (e.g., if(v 1 )), and v 2 appears on the left-hand side of an assignment in either branch.The edge is labeled with the line number of the relevant Verilog statement and lists the surrounding conditions in the code that must be true for the information flow to take place.For example, in Figure 3, which shows the IF graph for the code in Figure 1, the edge (secret, guard) would be labeled with the condition that state == 3. Note that this graph inherently has no notion of timing or clock cycles.Each edge in the IF graph represents a viable 1-hop flow of information in the design.However, multi-hop paths through the IF graph may not correspond to viable information flows.In other words, if we view the IF graph as an information-flow relation, taking the transitive closure of the relation yields an over-approximation of information flow through the design.To demonstrate, consider the code in Figure 1, but with the last line replaced with the following: assign led = (prev == 2) ?guard : 0; The IF graph would have the same nodes and edges, but the path from secret to guard to led does not correspond to any flow of information through the design.
There are two reasons why a path through the IF graph may not correspond to a true information flow.The first is that, as in the example above, the sequence of conditions needed for each edge cannot be satisfied.The second is that a path through the IF graph from x to y may not correspond to a true flow of information in the sense that the value of y depends on the value of x.A common example of this is the assignment y = x ⊕ x.

Symbolic Execution
In symbolic execution, concrete input values are replaced with abstract symbols.The design is executed using the symbols in place of literals.When a branch point (e.g., if(enable)) is reached, both paths are separately explored.For each path, the branching condition that must be true for that path (e.g., enable == 1). is maintained in the path condition.At the end of a single path of symbolic execution, satisfying assignments to the constraints in the path condition can be used as concrete input values to drive concrete execution down that same path.
Symbolic execution is modeled as a directed tree.Each node n in the tree is associated with a line of code in the design and is associated with a symbolic state, σ, and path condition, π.A node's children are the possible next lines of code to symbolically execute.A path from the root node to any leaf node represents a realizable path through the design.
The number of paths to explore grows quickly.For example, the symbolic execution of the design in Figure 1 for the four clock cycles necessary to find the informationflow path from secret to led would yield the tree of nodes shown in Figure 4.
A single path through the IF graph can correspond to many paths through the symbolic execution tree.For example, enable can remain low for 0, 1, 2, . . .clock cycles between each update to state.Each of these options represents a separate path through the symbolic execution tree.This example, although simple, is not all that contrived.It may be that state in another module of the design can take varying time to compute an action before enable becomes high again.Two problems become apparent: 1) The choice of path in the current clock cycle can determine whether there exists a path in a future clock cycle that will allow the flow of information to continue.2) Once exploration starts down one path, it is not clear at what point -after how many clock cycles -the current path should be abandoned as incorrect, and a new path should be tried.For example, there are infinitely long paths in which prev never gets to 3 and enable remains low.
Figure 4: Symbolic execution tree of the design in Figure 1 after four clock cycles.

Methodology: Symbolic Execution for Information Flow
Given a design and an input signal of interest, src, our goal is to find how information can flow during execution from src through the design.Our approach is to first use the IF graph to enumerate all potential paths of information flow through the design from src.As this is a static analysis, complexity grows linearly with the number of variables in the design and the length of the RTL code.Then, for each enumerated path, SEIF uses symbolic execution to either find a corresponding information-flow path through the design, or determine that no such path exists.

Overview
Once the IF graph is generated, the analysis proceeds in three main phases: pruning globally unrealizable paths, symbolically executing the design to find realizable paths through the design, and analyzing the semantics of each found path to find true paths of information flow.In the following sections, we describe each phase in more detail.
If SEIF returns a path, it is a true path through the design corresponding to the path in the IF graph.Depending on the post-processing option used, this will either be a path starting at the design's reset state or an intermediate state.
If SEIF does not return a path, there are three possibilities.First, the path in the IF graph has been identified as infeasible within a bounded number of clock cycles.Second, the path in the IF graph is feasible in the design, but does not represent an actual flow of information -this result is sound with one caveat discussed in Section 5.4.Third, the path in the IF graph cannot be accounted for.These options are discussed in Section 5.3.5 and evaluated in Section 7.2.

Pruning Globally Unrealizable Paths from the IF Graph
In the first phase, our goal is to quickly and cheaply eliminate paths through the IF graph that are easily falsified before moving on to the next, more expensive phase.Consider the example code in Figure 5.The variable temp carries the input secret only when the input signal enable is high.The secret information is conditionally passed on to result and from there to led2.The corresponding IF graph is shown in Figure 6.While the IF graph appears to show a flow of information from secret to led2 via temp, the constraint for edges (secret, temp) and (temp, result) require enable to be high and low, respectively.Since both edges must occur in the same clock cycle, this flow cannot be realized.Let us take the following IF path in Figure 6 as an example: ⟨(secret, temp), (temp, result), (result, led2)⟩.This path has two segments.The first segment is the twohop sequence, ⟨(secret, temp), (temp, result)⟩, made up of a continuous assignment and a non-blocking assignment.The second segment, ⟨(result, led2)⟩, is a single hop and a continuous assignment.
For every segment in a given IF-graph path, the conditions involved in that segment are collected and checked for co-satisfiability.If the hops in any one segment have mutually contradictory constraints, that path is discarded.In Figure 6, the segment ⟨(secret, temp), (temp, result)⟩ has contradictory constraints, as the first hop requires that enable is high, while the second hop requires it to be low.
This pruning analysis is sound-only unrealizable paths are discarded-as long as the co-satisfiability check considers only state-holding signals and input signals in the satisfiability query, as these signals do not change value in the middle of a clock cycle.

Symbolic Execution to Find Paths through the Design
In the second phase, the goal is to find true paths through the design for each remaining path in the IF graph.We use symbolic execution to find a sequence of machine states and a corresponding sequence of input signals (for example, as seen in Figure 2) that aligns with the path outlined by the IF-graph path.

Symbolic Execution Guided by IF-Graph Path
Segments.The segment analysis done in the first phase provides information about where the clock cycle boundaries lie; the IF graph also provides information about which lines of code must execute for each hop in a segment.SEIF uses this information to drive symbolic execution along the path outlined by the IF graph.
In each clock cycle, the symbolic execution engine is restricted to following only those design paths which include the lines of code that must be executed for the current IFpath segment to be realized.For example, in Figure 1, the symbolic execution engine only considers paths which take the if branch at line 8, when state == 3.By doing so, the search space is significantly reduced.
However, there may still be many possible paths through the design to consider, only some of which allow the complete IF path to be realized.Continuing with our example, Figure 7 shows the symbolic execution tree for one clock cycle of the code in Figure 1.Each node in the tree represents a line of code, or non-branching sequence of code (e.g., lines 3-4) to be executed.
The path of interest, this time annotated with which line of code needs to execute for each hop to be realized, is ⟨(secret, guard) line 8 , (guard, led) line 13 ⟩.Examining the symbolic execution tree in Figure 7, it would appear that two of the four possible paths achieve the desired flow.But annotations in the IF graph tell us that the sequence of conditions (state == 3) s3 , (prev == 3) s4 needs to be met.For that to happen, lines 3-4 need to execute in the first four clock cycles and lines 8, 13 need to execute in only the fourth clock cycle.While this is clear to see when examining the state transition diagram (Figure 2), there is nothing in the IF graph, or even the code itself, indicating that it will take four clock cycles to realize this flow.Finding the desired path through the multi-clock-cycle symbolic execution tree is a search problem.We discuss the search strategies we developed to guide search in SEIF in Section 5.3.4Continuing with our example from Figure 1, at the start of the initial clock cycle, the symbolic execution engine checks whether the condition required for the first hop in the IF graph (state == 3) is mutually contradictory with the initial symbolic state (in which state == 0).Indeed, it is, and the symbolic execution engine discards any paths that would include line 8, the line of code required for the first hop in the IF graph. 1 At this point, SEIF recognizes that realizing the first segment of the IF graph at the current state (state s0) is infeasible.

Stalling the IF-Graph
Path to Advance to a New Machine State.The second strategy used by SEIF is to pause the search for realizing a segment of the IF path in order to advance the design to a next-state when needed.In our example, the first segment of the IF graph cannot be realized from the initial reset state.SEIF symbolically executes the design for a single clock cycle, without considering the constraints required by the next IF path segment, to advance the design to a new state.SEIF then checks whether the IF graph segment can be realized from this new state.
There are many possible next states and SEIF must find one that satisfies two criteria: 1) The next state advances the design toward a state in which the next IF segment can be realized, and 2) The next state does not undo any prior progress along the IF graph path that has already been made.We discuss search strategies for finding valuable nextstates in the next section.The second constraint is trickier.During normal execution, it is likely that information written to a reg in one clock cycle gets overwritten in a subsequent clock cycle.For example, consider the code in Figure 8, which is similar to that of our first example (Figure 1), but made slightly more complex by the addition of two new registers: guard0 and clear.The IF path of interest is now from secret to guard0 to guard to led.To achieve the second flow segment, ⟨(guard0, guard)⟩, SEIF needs to first advance the design to a state s ′ = ⟨state == 3⟩.However, it is important that while the design advances to state s ′ , the clear signal is never set, as a 0 written to guard0 would undo the information flow from secret to guard0 from the prior IF path segment.
SEIF uses information from the IF graph to stall the information flow while advancing the design to a next-state.We define stalling as symbolically executing the design for a single clock cycle, such that the design transitions to a next state, but the position along the IF path remains unchanged.To stall, SEIF prevents the symbolic execution engine from considering any paths of execution that will undo information flow from prior segments in the IF path.To do this, SEIF considers the node n in the IF path, in which information currently "resides."In our current example, this would be the node guard0.SEIF then uses the IF graph to find all edges incident to node n, which represent flows of information from variables in the design to n and are associated with lines of code.Explicit flows need to be prevented during stalling, but implicit flows do not need to be prevented, as they do not cause the value in n to be overwritten.SEIF avoids exploration of any paths through the design which would execute a line of code in which n is written to.In this way, the information in n is not lost while stalling.
There are two edge cases to consider.The first is selfloops.Direct flows from n to n (e.g., n <= n + 1 are allowed, as the information in n stays in n.The second is the case when n is assigned a constant (e.g., n <= 1).SEIF checks this corner case during symbolic execution and abandons any path in which it occurs.If the assignment by a constant happens regardless of the rest of the state, then stalling cannot occur at this point in the IF graph.
Because of stalling, the number of clock cycles needed to verify the information flow may exceed the length of the IF path.

Search Strategies.
The goal is to find a sequence of design states, and corresponding input values, that correspond to an IF path, or determine that no such sequence exists.The search space is large; an IF path with n segments requires at least n clock cycles through the design.When stalling is needed, the number of clock cycles required is unbounded (although finite).
Information from the IF graph is used to prune the symbolic execution tree at each clock cycle, but a single IF hop can correspond to many paths through the symbolic execution tree.This is because a segment of the IF-path involves only a small number (typically fewer than 5) of lines of code be executed.The input space is partially constrained to ensure those few lines of code are executed, but most of the input space is unconstrained, and therefore there is freedom in how most of the design is explored at each clock cycle.
We developed and implemented four search strategies: The key idea behind this strategy is that, for each segment, we can either symbolically execute until a design path is found in which the segment conditions are satisfied (termed a continue), or we can stall for some bounded number of cycles.For an IF path, we build and exhaustively search a list of all possible continue, stall combinations.If SEIF is unable to complete the IF path for a given continue-stall pattern, it moves on to the next pattern.The list of continue-stall combinations are in truth-table order to allow the SEIF engine to explore as deeply as possible first, aiming to verify the shortest path possible with no stalls.In this context, depth equates to the number of IF-path segments successfully traversed, and for which SEIF has realized a partial path of execution.
Baseline 2: Backtracking Only.In this search strategy SEIF begins by symbolically executing for one clock cycle for the first segment.If the flow found, SEIF moves to the next segment in the IF path.If at any segment, the flow is not found in some bounded number of clock cycles, or there are no more design paths to try, SEIF returns (or backtracks) to an earlier segment to find a different path that satisfies the same segment conditions.
Stalling with Backtracking.This strategy is a hybrid of baselines 1 and 2. For any given continue, stall pattern, after successfully executing consecutive continues, and reaching a stall, SEIF stalls for a bounded number of clock cycles and attempts to find execution paths where SEIF can make forward progress in the next segments.If all symbolic execution paths are explored, or SEIF times out (according to some pre-determined bound), it backtracks.
Stalling with Heuristic.This strategy builds on top of stalling with backtracking.Our heuristic relies on the UNSAT core, the subset of constraints in a SAT query for which no satisfying assignment exists.If SEIF stalls, it is searching for a new machine state that will satisfy the conditions of the next IF path segment.In this case, SEIF pushes the symbolic state and the constraints from the next segment to the SMT solver, which returns the UNSAT core.For each path explored while stalling, SEIF checks if the UNSAT core became smaller.If it did, SEIF continues searching for a new machine state along the path.If it grows, SEIF prioritize the next candidate stall path.

Post Processing to Find
Reset.SEIF begins exploration from a symbolic state, and therefore the design paths it generates inputs for may not start from the reset state.We mitigate this by checking whether the found design path has constraints that conflict with the design's reset state.If not, the path can start from reset.If so, the path starts from an intermediate state of the design, and SEIF cannot guarantee that it is a reachable state.Most often, SEIF finds paths that can start from reset and we evaluate this in Section 7.2.

Semantic Analysis to Identify True Information Flows
Once per execution path, SEIF performs a semantic analysis check to prune flows that represent viable design paths, but not true flows of information.This can happen when a textual flow does not represent an information flow.For example, y <= x xor x, would yield a path showing x flows to y even though there is no flow from x to y. SEIF prunes explicit textual flows which do not represent information flows.
If there is an implicit textual flow that is not a true information flow, SEIF cannot eliminate that false positive.For example, if (x XOR x) y <= 0; else y <= 1; (Here, there is no path in which y is set to 0, and SEIF does recognize that.) In the case of reconvergent fan-out SEIF may or may not find the flow.In the example of Figure 9, x is an input and blocks 2 and 3 represent different areas of the design (i.e.modules, always blocks).There are 4 cases to consider: 1) The writes to y and y' are both unconditional and there is no flow from x to z because z = 3x − 3x.SEIF performs the check and correctly detects no flow.
2) The writes to y and y' are conditional, and depend on the same conditions.SEIF detects there is no flow 3) The write to y' is conditioned on something that is mutually UNSAT with the condition for y.In this case, there is always a flow from x into z, and SEIF detects it.4) The write to y' is conditioned on something mutually satisfiable with the condition for y, where the condition for y is different.If SEIF follows a design path where both conditions are true at the same time, it detects no flow, while there may be other design paths through block 3 which would enable a flow, and vice-versa.
Unless SEIF is able to exhaustively explore, it may report an incorrect result.

Implementation
We implemented SEIF using the Sylvia symbolic execution engine [28] and using hyperflow graphs [27] as our IF graph engine. 2 Both Sylvia and the hyperflow graph toolchain were built using python3.Sylvia implements the Verilog semantics according to the IEEE 1364-2005 semantics using pyVerilog and the Z3 solver for SMT solving.SEIF also uses Z3 for preprocessing and path removal.
When considering information flow paths that span multiple modules, enumerating all possible paths for even a single source/sink pair becomes too expensive.We manage this complexity by following the divide-and-conquer approach of Ryan et al. [28].SEIF first finds the partial IF paths within a module, and then uses the segment conditions to find the next module to explore.SEIF uses the SMT solver to ensure that the path fragments can be stitched back together to form a valid information flow path from source to sink.This approach reduces repeated work within a module when exploring paths across multiple modules.

Evaluation
We evaluate SEIF over four open-source designs to study its viability as a means for accounting for information flows within a hardware design.The evaluation addresses the following questions: 1) Can SEIF recognize and eliminate paths through the IF graph that are unrealizeable in practice?2) Can SEIF find paths through the design, along with the sequence of inputs to realize the path, that corresponds to paths through the IF graph? 3) Can SEIF be meaningfully applied to security relevant signals in hardware designs to give experts feedback on the security of the design or new areas to explore?
The experiments are performed on a machine with an Intel Xeon E5-2620 V3 12-core CPU (2.40GHz, a dualsocket server) and 62G of available RAM.

Accounting for Paths in the IF Graph
We first examine SEIF's ability to account for paths in the IF graph, either by finding paths through the design that correspond to the IF path, or by eliminating the IF path as infeasible.In these experiments we look at the OR1200, MPS430, and ACW designs, which are the largest of the four.
We identified 20 security-critical signals in the OR1200 to use in our experiments that appear in security properties of the OR1200 collected from the security literature [36] [37] [38] [39] [40].We selected 10 sources to analyze in the MSP430 by finding signals roughly analogous to those in the security properties for the OR1200.For the ACW, we chose 20 main internal signals to look at that appear in the security properties manually and automatically generated by [20] [41] and map to several known CWEs.
For each source signal there can be tens of thousands of IF paths.(See the numbers in Tables 1 and 2, discussed in Section 7.5.)For the efficacy and performance evaluations in this and the next two sections, we analyze a subset of the total paths.For each source signal of interest, we randomly selected 300 paths from the IF graph for analysis.For the security analysis case study (Section 7.5), we analyze all paths from a given source.
Figure 10 summarizes SEIF's ability to account for the IF paths.For 86% to 90% IF paths on average, SEIF either finds the corresponding path through the design or eliminates the IF path as infeasible or not representing a true flow of information.The majority of accounted-for IF paths, 58% to 77% on average in the three designs, are true paths in the design, indicating that the static analysis done to build the IF graph is a decent approximation of information flow through the design.We further break down these numbers to show the percentage of the found IF paths for which SEIF returns a design path that starts at the reset state vs. a design path that starts at some intermediate state.Paths that start at the reset state are better for the engineer as they can be immediately replayed from the known reset state.

Evaluation of Search Strategies
In the following we evaluate the four search strategies discussed in Section 5.3.4. Figure 11 reports, for each design, the percentage of IF paths found by each of the four search strategies.These are paths for which SEIF found a corresponding path through the design.As expected, the heuristic guided search outperforms the other strategies in all three designs, improving over the baselines by 26% on average and over bounded stalling with backtracking by 11% on average.We note that baseline 2, which does not include stalling, is the least successful at finding corresponding paths in the design.This highlights the value of SEIF: many IF paths give an incomplete picture of a path through the design and include points where the design must advance to a new state before the IF path can continue.Without SEIF, it would be up to the engineers to figure out how and whether to advance the design state.Figure 14 shows that the amount of backtracking that is required is lowered when we incorporate bounded stalling.

Eliminating Information Flows Paths
We examine how IF paths that do not correspond to information-flow paths through the design are falsified in Figure 17.The experiment used the 300 randomly chosen paths for the 20 security-critical signals in the OR1200.The largest percentage of eliminated paths are found statically before symbolic execution begins.This is good news, as that is the cheapest and quickest phase of the analysis.There is a non-trivial portion, 5% to 7%, that are eliminated because they do not represent true flows of information through the design.SEIF's use of symbolic execution allows for this precise analysis, which taint tracking may not be able to provide.

Case Study: Security Property Verification
When starting with a property, such as is often done in security verification tasks, SEIF goes beyond producing a single counterexample.In traditional, assertion-based formal methods, once the formal or bit-level engine produces the first counterexample, it takes manual manipulation of the property or environment to generate subsequent violating traces.SEIF is able find multiple realizable traces through the design that exhibit the vulnerable behavior and can guide the security engineer to other areas of the design they may be interested in exploring.
We demonstrate the approach for two security-critical properties from the TrustHub Security Property/Rule Database [?], [35], one for the MSP430 and one for an AES implementation.The MSP430 property asserts that the program counter's value should not be readable from the debug access port during normal operation.The AES property verifies that the secret key material is not accessible to any unprivileged internal data registers [42].
SEIF generates all the paths from the source of interest to the security-critical sink automatically.In order to produce the violating paths, SEIF adds a constraint to the solver specifying the desired precondition.If we find a candidate violation of the security property, we ensure it is replayable from the reset state of the design.The results for the MSP430 and AES are presented in Tables 1 and 2

Related Work
Symbolic Execution of HW Designs for Information Flow Analysis.EISec uses netlist-level symbolic execution to verify information-flow safety and quantify confusion and diffusion in cryptographic modules [25].
Our work improves upon EISec by allowing analysis at the RT-level and enabling verification of a wider class of information-flow properties.Other tools use symbolic simulation (e.g., [43]) to verify particular binaries running on the hardware [21], [24].
Symbolic Execution of SW for Information Flow Analysis.The software community was perhaps the first to leverage symbolic execution to verify information flow.The approach has been used in combination with taint tracking [44], to find and mitigate side channels [45], [46], [47], [48], and to identify programs that are vulnerable to transient execution attacks [49].
Symbolic Execution of SW or HW to Find Exploitable Flaws.There is a long history of using symbolic execution in software to find exploitable security flaws (e.g, [50], [51], [52]).In hardware, symbolic execution has been used to find violations of and exploits for securitycritical assertions [53] and to find and trigger trojans in the Verilog RTL [54].As with SEIF, the main challenge is guiding search through the tree to find the salient paths.
Information Flow Tracking in HW.The state of the art for information flow analysis in hardware is information flow tracking (IFT), which instruments a design with tracking logic [2].Many tools operate at the netlist level, although some operate at the RTL level [11].IFT has also been used in analog designs [14], and tools exist to synthesize designs that incorporate tracking logic [16], [18].IFT can be used to check hyperproperties and has been used to verify the safety and security of many different systems [55], [56] [3], [15] [57] [19], [20] [12] [9] [4].IFT has also been used to automatically generate information flow properties for use with formal verification engines [41], [58].We used these properties in our evaluation.
Formal Analysis for Information Flow.Proofchecking approaches have been used for detecting security vulnerabilities in hardware designs [22] [10].These approaches are often less automated, more time intensive, and tackle smaller designs, for stronger results that are both sound and complete.VeriCoq translated Verilog to Coq for proof-carrying designs [8].Another approach is to use selfcomposition, or program products, to verify informationflow properties [1].Security extensions in the hardware description language can enforce information flow policies at the language level [5], [6], [7], [13], [59], [60].

Conclusion
SEIF combines static analysis and symbolic execution to find information flows in hardware designs.SEIF improves over static analysis, eliminating false-positive flows, and finding replayable designs through the path for true flows.In our experiments, SEIF accounts for 86-90% of statically identified flows in three open-source designs.SEIF also leverages static analysis to explore the designs for 10-12 clock cycles in 4-6 seconds on average.Additionally, SEIF can be used to find multiple violating paths for security properties, providing a new angle for security verification.

Appendix 11.1. Determining the Stall Bound
We empirically determined the number of clock cycles to stall for each heuristic that involved stalling: baseline 1, bounded stalling with backtracking and the UNSAT core heuristic.Once we do not see any significant gains from stalling for an additional clock cycle, we have found our bound.The results for each of the three heuristics are shown in Figures 18,19

1 aFigure 1 :
Figure 1: Toy example.clk, enable, and secret are input wires.state, prev, and guard are state-holding regs.Not shown is the initialization, which sets state, prev, and guard to 0. led is an output wire.secret flows through guard to led after four clock cycles.

Figure 3 :
Figure 3: An IF graph for the code in Figure 1.Dashed lines represent implicit flows of information and solid lines represent explicit flows.Labels are omitted for space.

1 w
i r e temp = ( e n a b l e ) ? s e c r e t : 0 ; 2 3 a l w a y s @( p o s e d g e c l k ) b e g i n 4 i f ( e n a b l e ) b e g i n 5 r e s u l t <= 0 ; 6 end e l s e b e g i n 7 r e s u l t <= temp ; i g n l e d 2 = r e s u l t ;

Figure 5 :
Figure 5: A toy example illustrating globally unrealizable paths.clk, enable, and secret are input wires and led2 is an output wire.result is a state-holding reg.secret cannot flow through temp to result and led2.

Figure 6 :
Figure 6: The partial IF graph for the code shown in 5, showing only the paths through temp.Although the graph shows a path from secret to led2, an SMT query finds that the constraints along the path will never be cosatisfiable.This analysis requires knowing where clock cycle boundaries are.In the IF graph, an edge corresponding to a nonblocking assignment (for example, result <= temp) denotes a clock cycle boundary.When state is updated in one clock cycle, the updated value can be read in the next clock cycle.At the start of this phase, the given path through the IF is divided into segments.One segment of an IF-graph path is a sequence of hops in the IF graph.These hops could be any implicit or explicit flows.However, the explicit non-blocking assignments are of particular interest to us in determining how we should break the IF path into segments.Each non-blocking assignment represents exactly where we reach a clock cycle boundary in the IF path and thus break off a new segment after that flow.If a path

Figure 7 :
Figure 7: Symbolic Execution Tree of Paths 5.3.2.Pruning Unrealizable Paths at Clock Cycle Boundaries.As a first strategy, the symbolic execution engine prunes unrealizable paths at each clock cycle boundary.At each clock cycle, the engine first checks the cosatisfiability of the conditions required in the current IF segment, similar to the check done to prune globally unrealizable paths (Section 5.2).However, this time the SMT query includes the current symbolic state along with the conditions required for the IF segment.As with the global pruning step, the check considers only the state-holding variables in the segment conditions, as the value of combinational logic variables may change during the course of a clock cycle.Continuing with our example from Figure1, at the start of the initial clock cycle, the symbolic execution engine checks whether the condition required for the first hop in the IF graph (state == 3) is mutually contradictory with the initial symbolic state (in which state == 0).Indeed, it is, and the symbolic execution engine discards any paths that would include line 8, the line of code required for the first hop in the IF graph.1At this point, SEIF recognizes that realizing the first segment of the IF graph at the current state (state s0) is infeasible.

1 a
l w a y s @( p o s e d g e c l k ) b e g i n 2 i f ( e n a b l e ) b e g i n 3 p r e v <= s t a t e ; 4 s t a t e <= s t a t e + 1 ; s t a t e == 0 ) b e g i n 8 g u a r d 0 <= s e c r e t ; 9 end e l s e i f ( c l e a r ) b e g i i g n l e d = ( p r e v == 3 ) ? g u a r d : 0 ;

Figure 8 :
Figure 8: A design demonstrating the challenges of stalling.The IF path of interest is now from secret to guard0 to guard to led.To achieve the second flow segment, ⟨(guard0, guard)⟩, SEIF needs to first advance the design to a state s ′ = ⟨state == 3⟩.However, it is important that while the design advances to state s ′ , the clear signal is never set, as a 0 written to guard0 would undo the information flow from secret to guard0 from the prior IF path segment.SEIF uses information from the IF graph to stall the information flow while advancing the design to a next-state.We define stalling as symbolically executing the design for a single clock cycle, such that the design transitions to a next state, but the position along the IF path remains unchanged.To stall, SEIF prevents the symbolic execution engine from considering any paths of execution that will undo information flow from prior segments in the IF path.To do this, SEIF considers the node n in the IF path, in which information currently "resides."In our current example, this would be the node guard0.SEIF then uses the IF graph to find all edges incident to node n, which represent flows

Figure 10 :
Figure 10: Accounting for IF Paths

Figure 11 :
Figure 11: Finding Design Paths Corresponding to IF Paths Figures 12 and 13 report on the performance of the four search strategies, both in terms of average time taken to find a corresponding design path and average number of clock cycles through the design for the found path.Again, the heuristic guided search outperforms the other strategies, completing the search for each IF path in 3-6 seconds.Figure14shows that the amount of backtracking that is required is lowered when we incorporate bounded stalling.

Figure 12 :
Figure 12: Time to Find Design Paths

Figure 13 :
Figure 13: Clock Cycles to Find Design Paths

Figure 14 :
Figure 14: Frequency of Backtracking To better understand how SEIF is finding flows over time, we explore all IF paths from a single source signal, the program counter, in the MSP430.We track how many IF paths are found in the design after 1 clock cycle of search, 2 clock cycles of search, etc.The experiment was done with heuristic-guided stalling turned on.Figure 15 shows the results.There were a total of 19060 IF paths, and SEIF found design paths for 89.93% of them.The complete search took 16 clock cycles, however, most of the paths were found withing the first 8 clock cycles.The experiment took 3.5 days to run.

7. 3 . 1 .
Determining the Stall Bound.For all the experiments in the previous sections, the number of stalls per IF path segment was set to be 5, 5, and 4 for baseline 1, bounded stalling with backtracking and the UNSAT core heuristic, respectively.(As a reminder, baseline 2 is backtracking only, with no stalling).We determined these numbers empirically by selecting at random 5 of the securitycritical source signals from the OR1200, and for each of

Figure 15 :
Figure 15: Finding design paths over time these source signals selecting at random 300 paths to evaluate, and then running the experiments with an increasing number of stalls allowed until we saw the number of IF paths found begin to flatten out.Finding the bound for the heuristic-guided stalling strategy is shown in Figure 16.The graphs for the other three search strategies are in the appendix.

Figure 17 :
Figure 17: Breakdown of how flows are falsified by SEIF , and 20.

Figure 19 :
Figure 19: Empirically determining the stall bound for bounded stalling with backtracking.

Figure 20 :
Figure 20: Empirically determining the stall bound for stalling with the UNSAT core heuristic.

TABLE 2 :
Security Property Verification: Secret Key in AES Implementation