‘Put the Car on the Stand’: SMT-based Oracles for Investigating Decisions

Principled accountability in the aftermath of harms is essential to the trustworthy design and governance of algorithmic decision making. Legal philosophy offers a paramount method for assessing culpability: putting the agent ‘on the stand’ to subject their actions and intentions to cross-examination. We show that under minimal assumptions automated reasoning can rigorously interrogate algorithmic behaviors as in the adversarial process of legal fact finding. We model accountability processes, such as trials or review boards, as Counterfactual-Guided Logic Exploration and Abstraction Refinement (CLEAR) loops. We use an SMT-based oracle to discharge queries about agent behavior in factual and counterfactual scenarios, as adaptively formulated by a human investigator. For a decision algorithm A , we use symbolic execution to represent its logic as a statement Π in the decidable theory QF_FPBV . We implement our framework in a tool called soid with an accompanying GUI, and demonstrate its utility on an illustrative car crash scenario.


Introduction
Our lives are increasingly impacted by the automated decision making of AI.We share roads with autonomous vehicles, as healthcare providers use algorithms to diagnose diseases and prepare treatment plans, employers to automate hiring screens, and even judges to analyze flight and recidivism risks.Though the creators of AI often intend it to improve human welfare, it is a harsh reality that algorithms often fail.Automated decision makers (ADMs) are now deployed into roles of immense social responsibility even as their nature means they are not now, and likely will never be, trustworthy enough to do no harm.When autonomous vehicles drive on open roads they cause fatal accidents [66].Classification and scoring algorithms perpetuate and 'would' counterfactuals using ∀.
We have also implemented an example of a domain-specific graphical user interface (GUI) that allows operation of soid without requiring technical expertise, but here describe how it works directly.The investigator starts from the logs of A, the factual information within capturing the state of the world as the agent perceived it.These logs can be easily translated into a first-order formalism as a sequence of equalities.For example, in the left panel of the car crash diagrammed in Figure 1 the information in the logs of A (blue at bottom) about the other car (red at left) might be encodable as a system of equalities φ ≡ agent1 pos x = 1.376 In the GUI all such statements are translated into formal logic automatically.Using the automated reasoning This formalism also allows us to reason about whole families of counterfactuals, which can be defined by relaxing the constraints and negating the original factual state as a valid model In this way, hypothetical-but-similar scenarios of interest to the investigator ('what if that car was outside instead of inside the intersection?','what if the car was signaling a turn instead of straight?')can be rigorously formalized to enable automated analysis of the agent's behavior.
We model the adaptive, semi-automated process soid implements as a counterfactual-guided logic exploration and abstraction refinement (CLEAR) loop.We use SMT-based automated reasoning for the oracle that answers the factual and counterfactual questions the investigator asks.We rely on the Z3 SMT solver [57] and -for our benchmarks -KLEE-Float for symbolic execution [16,53], though through a modular API soid supports any symbolic execution engine able to generate output in the SMT-LIB format.We demonstrate soid on a car crash scenario caused by a reinforcement-learned agent within a simulated traffic environment, and show that soid abstracts away sufficient detail of the code of the simulated autonomous vehicle to be used through an intuitive GUI by investigators without technical expertise.
Our CLEAR loop is inspired by the CEGAR loop [21].The main difference is that, while the CEGAR loop produces counterexamples automatically, we rely on the human experts to formulate the queries.Each query and its answer helps the expert investigator to build a body of knowledge, the Facts, about the decision logic used in the autonomous agent.The investigator also decides when to terminate the CLEAR loop and when they have enough information for the final judgment of an agent's culpability.
Outline.In §2 we discuss the motivation for soid and give a few examples that also underlie our benchmarks.In §3 we overview SMT-based automated reasoning and counterfactual theories of causation and responsibility.In §4 we present our CLEAR procedure, before in §5 detailing how we logically encode agent decision making and counterfactual scenarios.We then describe the use of the soid tool through quantitative and qualitative evaluation in §6, before concluding in §7.
Technically, our approach differs from the already significant body of work in counterfactual analysis of algorithmic decision making in two significant ways: in our analysis of executable code 'as it runs' -rather than just of a partial component (such as an ML model in isolation) or of some higher-level mathematical abstraction of the system -and in our reliance on formal verification.By analyzing the code itself, rather than an idealized abstraction or particular model, we can capture behaviors of the entire software system: preprocessing, decision making, postprocessing, and any bugs and faults therein.This makes our analysis more complete.In Section 6.2, for example, we consider a hypothetical case study where an instance of API misusage -rather than any mistakes of logic within the code itself -undermines a machine learning decision.
The decision and its consequences are not analyzable by considering only the (correct on its own terms) decision model alone.
Meanwhile, verification technologies allow us to analyze all possible executions obeying highly expressive pre-and post-conditions.SMT-based methods in particular provide the full expressiveness of first-order logic.
As such, our approach can encode entire families of counterfactuals in order to provide a broad and thorough picture of agent decision making, so as to better interpret responsibility.Prevailing, often statistical methods, commonly focus more on gathering explanations prioritized on informal measures, such as minimality or diversity criteria, in order to demonstrate causality, see e.g., [72,56].Informally, in computer science terms, this distinction is analogous to that in automated reasoning between methods for verification -which emphasize overall safety and the correctness of the set of all program traces provably meeting some logically expressed property -and methods like testing or fuzzing that focus on finding or excluding representative executions believed to exemplify that property [2].Of course (SMT-based) verification does have costs -it carries substantially more computational complexity than testing approaches, which can increase compute costs and limit scalability.Accordingly, we implement and benchmark the empirical efficacy of our method in a laboratory environment in Section 6.In human terms, our approach is analagous to enabling asking broad, positive questions about agent behavior under a coherent family of scenarios, rather than asking questions aimed primarily at generating or falsifying a particular claimed explanation for a (factual or counterfactual) decision.The work of [17] and VerifAI [31] are the related approaches of which we are aware most similar to our own in goals and method, although the former requires stronger modeling assumptions while the latter sacrifices some formal guarantees for scalability.
Our work also joins recent efforts to apply formal methods to legal reasoning and processes [54,29,63,46].
The recent workshop series on Programming Languages and the Law (ProLaLa) has featured various domainspecific languages for formal reasoning about laws and regulations, such as formalizing norms used in privacy policy regulations, legal rental contracts, and property law, just to name a few.The need for a formal definition of accountability and establishing a connection between software systems and legal systems has also been recognized in the Designing Accountable Software Systems (DASS) program, started by the National Science Foundation (NSF), the main funding body for foundational research in the USA.In his influential CACM column, Vardi [71] has advocated for the importance of accountability in the computing marketplace.

Motivation
To illustrate the design and purpose of soid, we continue with the crash from Figure 1.In the left panel the autonomous vehicle A (blue at bottom) perceived that the other car (red at left) had its right turn signal on.Call this time t * .When A entered the intersection -believing the action to be safe, even though it lacked the right-of-way -the other car proceeded straight, leading to a collision.Because it did not possess the right-of-way, A is culpable for the crash.This scenario forms the basis for our benchmarks in Section 6, where the specific question we investigate with soid is 'with what purpose did A move, and so to what degree is it culpable for the crash?'Note that the actions of A are consistent with with three significantly different interpretations: i a reasonable (or standard) A drove carefully, but proceeded straight as is common human driving behavior given the (perception of an) indicated right turn; ii an impatient A drove with reckless indifference to the risk of a crash; and iii a pathological A drove to opportunistically cause crashes with other cars, without unjustifiable violations of traffic laws such as weaving into an oncoming lane. 1 If no opportunity presented itself, the car would move on.
Even for the same act, these different interpretations will likely lead to drastically different liabilities for the controller under criminal or civil law.Interestingly, the natural language explanations for the i) reasonable and iii) pathological cars are identical: 'moving straight would likely not cause a crash, so proceeding would Interpretation i) is consistent with (yes, no), ii) with (no, no), and iii) with (no, yes).Note the adaptive structure of our questions, where the second query can be skipped based on the answer to the first.The goal of soid is to enable efficient and adaptive investigation of such queries, in order to distinguish the computational reasoning underlying agent decisions and support principled assessment of responsibility.Although autonomous vehicles provide an insightful example, CLEAR is not limited to cyberphysical systems.In Section 6.2 we use soid to analyze a buggy application of a decision tree leading to a health risk misclassification.
We also give another, more nuanced motivating self-driving car crash example in Appendix A, drawn directly from the self-driving car and smart road network industries.

Legal Accountability for ADMs
Before presenting the technical details of soid, we also overview how the philosophy and practice of legal accountability might apply to ADMs, and so motivate the analysis soid is designed to enable.We work in broad strokes as the legal liability scheme for ADMs is still being developed, and so we do not want to limit our consideration to a specific body of law.This in turn limits our ability to draw specific conclusions, as disparate bodies of law often place vastly different importance on the presence of intentionality, negligence, and other artifacts of decision making.
A core principle of legal accountability is that the 'why' of a wrongful act is almost always relevant to evaluating how (severely) liable the actor is.In the words of the influential United States Supreme Court justice Oliver Wendell Holmes, 'even a dog distinguishes between being stumbled over and being kicked.'As every kick has its own reasons bodies of law often distinguish further -such as whether the 'why' is an active intent to cause harm.Though holding algorithmic agents accountable raises the many technical challenges that motivate soid, once we understand the 'why' of an algorithmic decision we can still apply the same framework of our ethical and legal practices we always use for accountability [47,40].The algorithmic nature of a harmful decision does not invalidate the need for accountability: the locus of Holmes' adage lies in the harm to the victim, being justifiably more aggrieved to be injured on purpose or due to a negligent disregard of the risk of a kick than by an accidental contact in the course of reasonable behavior.In practice, even though criminal law and civil law each place different emphasis on the presence of attributes like intention and negligence in a decision, intention in particular almost always matters to -and often intensifies -an agent's liability.Any time the law penalizes an unintentional offense it will almost always punish an intentional violation as well, and should intention be present, the law will usually apply the greatest possible penalties authorized for the harm.Given the importance of recognizing intention, soid is designed to support rigorous and thorough findings of fact about algorithmic decisions from which a principled assessment of their 'why' can be drawn.
Taking a step back, it is deeply contentious whether ADMs now and in the future can, could, or should possess agency, legal personhood, or sovereignty, and whether they can ever be morally and legally responsible [58].Even the basic nature of computational decision making is a significant point of debate in artificial intelligence and philosophy, with a long and contentious history [14,23].For the moment, ADMs are not general intelligences.They will likely not for the foreseeable future possess cognition, agency, values, or theory of mind, nor will they formulate their own goals and desires, or be more than 'fancy toasters' that proxy the decision making agency and responsibility of some answerable controller.An algorithm is no more than a computable function implemented by symbolic manipulation, statistically-inferred pattern matching, or a combination thereof.Nonetheless, even working off the most stringent rejection of modern ADMs forming explicit knowledge or intentional states, following Holmes we can see there is still value in grading the severity of a harmful decision.It is deeply ingrained in our governing frameworks for legal and moral accountability that when acting with the purpose of harm an agent (or its controller) has committed a greater transgression than in the case where the harm was unintended.
In this work we sidestep whether and how computers can possess intentionality by viewing intention through a functionalist lens.Even for conscious reasoning, it is impossible to replay a human being's actual thought process during a trial.So in practice, legal definitions refer instead to an ex post rationalization of the agent's decisions made by the accountability process through the finding of fact.A person is assessed to have, e.g., purposely caused harm if the facts show they acted in a way that is consistent with purposeful behavior.
We can approach computational reasoning in much the same way, with an investigator making an ex post descriptive rationalization capturing their understanding of an ADM's decision making.This understanding then justifies a principled assessment of the controller's responsibility.For example, a controller can be assessed to have released into the world an ADM that the facts show acted in a way that is consistent with a purposeful attempt to cause harm.The design and algorithmic processes of the agent are otherwise irrelevant.How the ADM actually decision makes -whether through statistical inference or explicit goaloriented decision logic or otherwise -is relevant only with respect to our ability to interrogate its decision making.This approach is consistent with soid, which is capable of analyzing arbitrary programs.
An investigator using soid to label an ADM as 'reasonable' or 'reckless' or 'pathological' or similar is, however, only the start.How such an assessment should then be interpreted and used by an accountability process is, ultimately, a policy question.The unsettled nature of the laws, policies, and norms that govern ADMs, both for now and into the future, means there are many open questions about the relevance of the intent of an ADM and its relationship to the intent of the controller.But we can consider the ramifications in broad strokes.For individuals harmed by ADMs (whether as consumers, other end-users, or just unlucky 'bystanders'), the situation seems little different than for human misconduct: the finding of intent amplifies the harm, and the victim can reasonably expect the accountability process to penalize the transgressor appropriately.More specific questions are harder.Should apparent intent in both the controller and ADM be assessed more harshly than in one or the other alone?Or would apparent intent in the controller render the actions of the ADM relevant only in how successfully the intent of the controller was carried out?How should an emergent 'algorithmic intent' traceable to software faults interact with any documented, contrary evidence of the intent of the controller?These questions lay beyond the scope of this work, but they are each dependent on our capacity to first recognize and distinguish the functional intent of the ADM, motivating our research goals.
For the controllers of ADMs (whether as programmers, vendors, owners, or sovereign states), it is a natural starting point to view them as responsible for the actions of their computational agents, just as they would (most often) be responsible for human agents acting on their behalf.With reward comes responsibility.
If a controller profits from deploying an ADM, so must they bear the costs of its harms.Legal concepts governing humans acting on behalf or through each other or organizations are well-founded throughout, e.g., agency and criminal law [40,50].These mechanisms may be either directly applicable or can form the basis for analogous systems governing algorithmic accountability.For example, just as a business is expected to adequately prepare (i.e., train) a human agent to operate on their behalf without causing harm, a controller can be expected to adequately prepare (i.e., design or train) a computational agent.What standard the controller sets internally ex ante before deploying the ADM is primarily relevant insofar as it provides confidence to the controller the ADM will not be found ex post to have operated in a way consistent with an intent to harm -and so carry with it a corresponding increase in liability.
Grounding our approach in the functionalist perspective also helps us manage difficult questions about the validity of anthropomorphizing algorithmic systems through the use of language like 'intent', 'beliefs', or 'reasonableness', as we ourselves have done throughout Section 2. It is not immediately clear such language is intrinsically confusing or harmful: the use of such labels in characterizing automated decision making is decades-old, to the extent that consideration of whether and how machines can form intentional states has informed how prevailing approaches in the philosophy of action now capture whether and how humans form them [14,15].Moreover, as accountability processes begin to wrestle with algorithmic decision making some anthropomorphization is perhaps unavoidable, due to the often heavily analogical nature of legal reasoning [51].We ourselves invoked the analogy of Holmes to frame our discussion.The validity of some such analogies are in some cases already contentious.For example, whether the 'creativity' required to earn authorship under copyright law must necessarily be human is under active consideration in litigation and scholarship concerning generative models [30,1].On the other hand, the negative consequences of anthropomorphizing ADMs has been itself widely recognized in scholarship and science fiction dating back decades: it can cause us to, e.g., ascribe to machines and their actions non-existent morality and common sense, or grow attached to them in ways that cause us to disregard their harms or cloud our judgement of their true capabilities and limitations.
To avoid conflation, perhaps machine analogues to terms like 'intention' will arise.But wherever the legal and policy language settles, the core philosophical principle -that a functional interpretation of the 'why' of a decision matters for accountability -will hold.So long as the philosophical (and computational) principles remain, the goals of our research should likewise remain applicable no matter what norms of language develop.

Technical Background
In this section we present some relevant foundations for CLEAR from formal and automated reasoning.

Programs and Traces
That is, by definition A cannot define how the environment E[A] updates the ve i , as that capacity is exactly the distinction between a program and the environment it runs within.
We work with statements over the program variables in the logic QF FPBV.The available domains are those of floating points and bitvectors.An expression e is built from variables composed with the constants and function symbols defined over those domains, e.g., (fp.to real b011) + 2.34 • v 14 .A formula φ is built from expressions and relation symbols composed using propositional operators, e.g., the prior expression could extend to the formula (fp.to real b011) + 2.34 We write σ |= φ( V ) for formulas over symvar(A) analogously to the concrete case, still in QF FPBV.The path constraint π σ (α 1 , . . ., α n ) is such a formula over the α i , which captures their possible settings.We let i where e ′ i ≡ e i for symbolic-valued variables and e ′ i ≡ (α i = d i ) for concrete-valued variables in σ.Let refs σ ( vi ) be the j such that α j is referenced within e i at state σ.We define the reference closure of a symbolic variable recursively, as rcl σ (v i ) = {α j } j∈refs σ ( vi) ∧ k∈refs σ (vi) rcl(v k ).We then define e σ | i as the subformula of e σ referencing exactly those α j ∈ rcl σ (v i ).In sum, e σ | i captures any and all constraints on the value of vi when a symbolic execution of A reaches σ.
We often work with a special class of formulas we call relaxations.Relaxations encode formulas that start from a concrete state σ, and then independently relax the variable constraints so that they can take on a range of values.A punctured relaxation excludes a unique model (i.e., state).

SMT-based Program Analysis
We overview SMT solving and symbolic execution, and refer the reader to [57] and [16,10] respectively for greater detail.
SMT Solving.Satisfiability modulo theory (SMT) solving is a form of automated theorem proving that computes the satisfiability (and, by duality, validity) of formulas in certain fragments of first-order (FO) logic.SMT solvers -we use the state-of-the-art Z3 [57] -are also able to return satisfying models when they exist.In the case of validity queries, these models are concrete counterexamples to the disproven theorem.
An SMT formula Φ is a FO-formula over a decidable theory T .In this work, we set T = QF FPBV, the combination of quantifier-free formulas in the theories of floating-points and bitvectors [11].We require support for floating-point statements due to their centrality in machine learning.n addition to the normal syntax of QF FPBV, we will need to support a modal counterfactual operator, x 2→ y, which we formally define in Section 3.3.It is known that modal logics are 'robustly decideable' with a generic transformation into a decidable theory [70].We however instead use a custom SMT encoding of our counterfactual relation, described in Section 5.This representation phrases queries so that satisfying models produced by the solver encode concrete counterfactuals.
Symbolic Execution.One of the great successes of SMT-based program analysis, symbolic execution explores the reachable paths of a program P when executed over V .Concrete values are computed exactly, assignments to or from symbolic-valued variables update their expressions, and branching conditions update the path constraints.For a branch condition b( V ) reached at symbolic state σj , such as a guard for an if-statement or while-loop, an SMT solver is invoked to check which branches are feasible under π j , i.e., whether Φ ≡ b( V ) ∧ π j and/or Φ ′ ≡ ¬b( V ) ∧ π j are satisfiable.If only one is, the execution continues along it and its path constraints are updated.For example, if only Φ ′ is satisfiable then π j ← π j ∧ ¬b( V ).
If both are satisfiable, the execution can fork in order to explore all reachable paths and produce a set of constraint formulas {π i } i∈ [ct] encoding each path at termination.By setting initial constraints on the input variables, symbolic execution can narrow the search space to only the paths of executions meeting preconditions.For our benchmarks we use KLEE-Float [16,53], a symbolic execution engine for C with support for floating-point operations, backended by Z3.KLEE-Float generates SMT queries in QF FPBV. 2

Counterfactual Reasoning
Counterfactuals are essential to modern theories of causation and responsibility in philosophy and law [55,12,67,52,72], and are are already quite prominent in formal methods for accountability [8,9,7,19,34,27,41,42,72]. Causation refers to the influence an input of some process has on its output, e.g., in an MDP how a choice of action influences the resultant distribution on (some property of) the next state, or for a program how changes in the inputs influence the outputs.In the simplest possible case, an action a is an actual cause of an outcome, such as a harm h, if under every counterfactual a is both necessary and sufficient for the harm to occur, notated as a 2→ h and ¬a 2→ ¬h.Here a and h are some domain-specific formal objects, while the modal notation x 2→ y, popularized by Lewis [52], means if x had happened, then y would have happened.The canonical unified logical and computational treatment of counterfactuals are the works of Pearl and Halpern [41,42], which provide a directed acyclic graph-based formalism able to inductively model causal effects in far more complicated dependency structures.
Responsibility is a higher-order property than causality.As raised in Section 2, counterfactuals of decision making are essential to interrogating intention and with it responsibility.Put simply, counterfactuals enable challenging and verifying proposed explanations for an agent's decisions.Importantly, counterfactual analysis is well-defined independent of any specific decisions, or the inferences about agent intention made on the basis of them.So although we often frame our discussion in terms of behaviors that match common understanding of human decision making, we stress our formal approach would generalize to future models of algorithmic decision making interested in very different attributes and behaviors than those we apply to humans.
Formalism.Working ex post, we formalize counterfactual algorithmic decision making by starting from a set of factual traces T f = {τ f 1 , . . ., τ f k } encoding a history of A's harmful or otherwise relevant decisions.A counterfactual τ cf = (τ f , τ pp , t * ) is a tuple of a factual trace τ f , a past possibility trace τ pp , and an integer t * ∈ N that we call the keyframe.We write τ cf .fst= τ f and τ cf .snd= τ pp .Intuitively, we want counterfactuals to represent the decisions that A would have made in revealing alternate circumstances.
What makes a counterfactual 'revealing' is a deep and nuanced question, but the philosophy of action highlighs the importance of particular attributes for counterfactual scenarios to be meaningful.We enforce these tenets as predicates, in order to guarantee that our method works for counterfactuals possessing them.
1. Non-backtracking: Counterfactuals should encode scenarios with a meaningful relationship to observed events, and should not require us to 'replay' the evolution of the world under significant changes to past history.Formally, both τ f and τ pp must be defined at t * , and must agree up until it: Every past possibility trace forms a non-backtracking counterfactual for t * = 1, so usually choice of keyframe will come first from some a priori understanding the investigator has about the critical decision moments leading to a harm.Note that we do not place any restrictions on the (implied) transition τ f (t * − 1) → τ pp (t * ).In particular, we do not require that (τ f (t * − 1), τ pp (t * )) ∈ R A .Rather, we allow the investigator to 'wave a magic wand' in order to define alternate scenarios of interest. 3.Scope of Decisions: In order to clarify the purpose of an agent's actions, what an agent might have done is less important than what it would have decided to try to do.In complex systems the former is often contaminated by the decisions of other agents and the evolution of the environment, as agents rarely have complete control over outcomes.To clarify this distinction, we enforce a scope to the decision making of A by limiting past possibility traces to internal reasoning.Therefore, no transition after t * may update the valuations of E.
This scope constraint can be interpreted as formalizing that we do not require or use access to an environmental model E[A].We can think of this assumption in terms of the transition relation on states that constraints τ pp .We would need E[A] in order to define a total transition relation on states R ′ E[A] (σ i , σ i+1 ), given that A by itself cannot define how the environment changes.By working without an E[A], we are limited (enforced by the predicate) to τ pp produced by the partial relation R A (σ i , σ i+1 ) only defined on pairs of states that agree on all ve i .Note, however, that this predicate does not preclude A from incorporating a history of data about E[A] into a single decision: the ve i capture just the freshest environmental information.Any historical data from previous polls of the sensors or network can be passed as various vs j as A propagates them between executions.
An admissible counterfactual is both non-backtracking and limited in scope: In order to use automated reasoning to interrogate A's decision making history (in the form of T f ), we need to formalize the semantics of two different families of trace properties: if faced with φ at time t * , would A have done β at time ℓ?
We begin with factuals.In order to formulate a useful semantics for this predicate, we need a reasonable interpretation of the subsequence τ f (t) . . .τ f (ℓ) that the property implicitly analyzes.Working after-the-fact justifies one: as a window of agency, during which either A made a decision or failed to do so as a harm played out.If the window of agency was still open, we could not be working ex post.We can then formulate a semantic definition in which φ( Î) specifies preconditions on the inputs to A, and β( D) then specifies post-conditions on its decision variables, limited in scope and to the window of agency.
On the contrary, for counterfactuals it is not obvious that we can assume a known and finite window.As τ pp (t * ) may have never been observed it could lead to A looping forever, and without an E[A] we cannot know how long the window would last.However, as counterfactuals are objects of our own creation, we will assume that the investigator can conjecture a reasonable window [t * , ℓ] within which the decision of A must be made in order to be timely, with responsibility attaching to the agent if it is unable to make a decision within it.This assumption guarantees termination.
With this philosophically distinct but mathematically equivalent assumption, we are able to define the semantics of the counterfactual operator as In practice, our use of symbolic execution will abstract away these details by framing the scope and the window of agency so that admit(τ cf ) is true by construction.We discuss this formally in Section 5.
Lastly, we will oftentimes discuss families of counterfactuals, which are sets of counterfactuals which share a factual trace Families of counterfactuals can naturally be defined implicitly by a tuple ctx = (τ f , t * , φ) as Choice of context ctx will be our usual way of delineating families, especially as φ then provides a descriptive representation.

Formal Reasoning for Accountability
Our method, CLEAR, has both a practical and theoretical basis.Practically, CLEAR is intended for counterfactual-guided logic exploration (CLEAR).Given a program A and log of factual executions T f , CLEAR provides an interactive, adaptive procedure for the investigator to refine a set Facts of trace properties capturing how A behaves in T f and related counterfactuals, just as in a legal finding of fact.
Theoretically, CLEAR can also be understood as an instance of semi-automated counterfactual-guided abstraction refinement (CLEAR), in the style of the automated CEGAR [21].This theoretical interpretation provides a potentially fertile ground for future automated extensions of CLEAR and soid to bolster the explanatory power of the logic exploration.

Counterfactual-Guided Logic Exploration
Given a program A and log of factual executions T f , CLEAR aims to provide an interactive, adaptive procedure for the investigator to refine a set Facts of trace properties capturing how A behaves in T f and related counterfactuals, just as in a legal finding of fact.We call this counterfactual-guided logic exploration.
Our end goal is for its implementation in a tool like soid is to enable continuous refinement of a formal representation of A's decision making: Each fact in Facts is composed of a (counter)factual trace and a property that holds over it, as proven by an SMT solver.Since we do not assume access to some overarching property P (A) that we aim to prove, Facts is the ultimate product of the counterfactual-guided logic exploration.The human investigator is trusted to take Facts and use it to assess A's responsibility for a harm.
Our method relies on an oracle interface, O A (•), into the decision logic of A. We specify factual queries as q = (φ, β), pairing an input constraint φ and a behavior β.Such a query asks whether the factual program execution starting from the program state encoded by φ results in the agent behavior encoded by β, or more formally...
We specify counterfactual queries as q = (1 ∃ , φ, β), composed of a 'might'/'would' (existential/universal) (τ f , t), φ ← start?(T f , I) β ← behavior?(V ) 6: β, 1 ∃ ← behavior?(V ) 13: This quite minimal information is sufficient for the oracle to resolve the information needed to improve Facts.Note that as T cf ctx excludes the factual trace as a valid continuance from t * , it is not possible for a counterfactual query to resolve (positively or negatively) on the basis of the factual execution -only counterfactuals are considered.
An Example.Consider an investigator trying to understand the facts under which the car in Figure 1 did, would, or might enter the intersection.If φ of Equation 1 represents the critical moment at which the car moved into the intersection, then the investigator could query where move is a decision variable.If, for example, r 1 = (1, ), then Algorithm 1 will set to capture the now confirmed fact that the car chose to move into the intersection (rather than say, had a brake failure).Note that to do so the investigator needs only to know the input constraints φ and the specific decision variable move .All other aspects of the self-driving car's decision logic is hidden by the oracle interface, and the output is clear and interpretable answer to exactly the question posed.Adaptively, the investigator might then decide to skip Equation 2, and instead move on to querying using φ ′′ from Equation 3, e.g., to ask whether under the family of counterfactuals T cf ctx defined by φ ′′ there exists a circumstance where the car would not have entered the intersection.If then, for example, r 2 = (1, M), where the model M encodes a concrete counterfactual scenario, the investigator can update Facts ← Facts ∪ {(M, φ ′′ 2→ A,t * ,ℓ move = 0)} and continue on from there.
The Method.We define the CLEAR loop and oracle interface that together underlie soid in Algorithms 1 and 2, where calls in italics?indicate manual interventions that must be made by the investigator.The investigatory procedure starts from the set of factual traces T f = {τ f 1 , . . ., τ f k } observed from A's executions.At each iteration of the loop, the investigator adaptively formulates and poses the next question in a sequence Query = ⟨q 1 , . . ., q i , . ..⟩.The responses Resp = ⟨r 1 , . . ., r i , . ..⟩ are then used to build up the set Facts of trace properties regarding A's decision making under both the T f and the set of counterfactual scenarios, . ., τ cf k ′ }, defined within the q i by the investigator.Each entry in Facts is rigorously proven by the verification oracle O A (•), with access to the logical representation Π of A as expressed by A. We leave to the investigator the decision to terminate the investigatory loop, as well as any final judgement as to the agent's culpability.In Section 5 we explain the encodings Φ used within Algorithm 2 in detail, and further prove that they correctly implement the semantics of → A,t,ℓ and 2→ A,t * ,ℓ as defined in Section 3.3.
Design Goals.We briefly highly how CLEAR and soid meet some critical design goals to support principled analysis for legal accountability.
1.The oracle design pushes the technical details of how A works 'across the veil', so that an investigator needs to know no more than the meaning of the input/output API exposed by A (over the variables in I and some subset of D, respectively) in order to construct a query q i and interpret the response r i ← O A (q i ).To this end, we designed the oracle query to place as minimal a possible burden on the investigator.
2. The method emphasizes adaptive construction of Facts, so that the investigator may shape the ith query not just by considering the questions ⟨q 1 , . . ., q i−1 ⟩ asked, but also using the responses ⟨r 1 , . . ., r i−1 ⟩ already received.We aim to put the agent on the stand, not just send it a questionnaire.Crucial to this goal is to return concrete traces from counterfactual queries, so that their corresponding facts can help guide the construction of the next.Using the ability of SMT solvers to return models for satisfiable formulas, when 1 ∃ = 1 and there exists τ cf ∈ T cf ctx such that τ cf |= φ( Î) 2→ A,t * ,ℓ β( D), we are able to explicitly inform the investigator of the fact (τ cf , φ( Î) 2→ A,t * ,ℓ β( D)).Conversely, when 1 ∃ = 0 and φ( Î) 2→ A,t * ,ℓ β( D) is not true for all τ cf ∈ T cf ctx , we also can explicitly return the fact (τ cf , φ( Î) 2→ A,t * ,ℓ ¬β( D)) for some counterexample τ cf ∈ T cf ctx encoded by the output model M.
3. The method is interpretable.When an investigator poses a question, soid pushes everything 'smart' the method does across the oracle interface to the verification, so that even non-technical users can understand the relationship between query and response.In a sense, our method benefits from a simple and straightforward design, so that its process is direct and interpretable to the investigators using it.As with a human on the stand, we just want the answer to the question that was asked, no more and no less.As this design goal describes what not to do rather than what to do, we informally meet it by not introducing unnecessary automation.

Counterfactual-Guided Abstraction Refinement
Automated counterexample-guided abstraction refinement (CEGAR) [21] is a tentpole of modern formal methods research, underlying many recent advances in program synthesis and analysis.The premise of CEGAR is that instead of trying to prove a safety property φ of a program A directly, instead φ can be checked against an abstraction Â that over-approximates the feasible traces of A. If the checking of φ fails due to a counterexample trace τ ∈ Â (i.e., if τ ̸ |= φ), the method determines if τ is spurious or real.If real, the property is proven false; if spurious, the abstraction Â is refined in order to exclude τ explicitly, and the checking restarts.This iterative process of checking and refinement until the property is proven is the CEGAR loop.The strength of CEGAR lies in automated discovery of relevance: Â begins small, and only grows when necessary to accommodate functionality of A that is relevant to its correctness under φ.
Formally, an abstraction is given by a surjection h : D → D, where D is an abstract domain.Our overloading of the • notation from the symbolic domain is intentional.This abstraction function induces an equivalence relation: for σ, σ ′ ∈ D, σ ≡ σ ′ iff h(σ) = h(σ ′ ), as well as an over-approximate abstract transition relation over these equivalence classes: Spurious traces appear because of the equivalence of states produced by a coarse abstraction.If there exists a transition σ → σ ′ , and σ ≡ σ ′′ , then R includes a transition σ ′′ → σ ′ within Â even if it does not exist in A itself.If a spurious counterexample τ includes a transition σ ′′ → σ ′ along its execution, the refinement process automatically splits σ ̸ ≡ σ ′′ to remove this trace.
Like CEGAR, the goal of our CLEAR is to automatically analyze only relevant executions of a program, in order to efficiently determine its behavior.Unlike in CEGAR, we cannot automate this process completely, because we are unable (by assumption) to assume computable models of either relevance or of correctness.Just as when questioning a human, by asking questions and receiving answers -the CLEAR loop -the investigator iteratively clarifies their understanding of the agent's decision making.As a high-level sketch, consider an investigation into the behavior of a program A invoking the autonomous vehicle control code in Figure 2, as shown in Figure 3.When the investigator poses a counterfactual query q i , the resulting symbolic execution follows a set of symbolic transitions Rqi ⊆ D × D to visit a set of symbolic states { σt * , . . ., σℓ }.
The same is true of factual queries, albeit where each σj is equivalent to a corresponding concrete state σ j , i.e., σ j |= σj uniquely.In the figure we use such symbolic and concrete states interchangeably.Rqi and the σj can be used to refine Â.If a symbolic state is distinct from Â, that is, if σj ⊓ σ′ for all σ′ ∈ D, then D ← D ∪ { σj } directly.When the meet is not empty, a new symbolic state can be generated by conjoining the variable expressions and path constraints of σj and σ′ that concretizes to γ( σj ) ∩ γ(σ ′ ).The symbolic transitions in Rqi can then be added to R as appropriate to maintain the over-approximation.
At the moment, the observation that the product of iterative symbolic executions can be related to abstraction refinement -as has been previously observed in the other direction, see e.g., [13] -is purely theoretical for CLEAR.Nonetheless, we find this abstraction refinement perspective to have great potential.
In normal CEGAR, the trace and trace formulas used during refinement are spurious -instead of being added to the abstraction directly, the abstraction is refined by using them to find predicates that exclude them.
Finding the optimal refinement is NP-hard [21], so numerous methods have been developed to efficiently and effectively extract powerful predicates that capture relevant program behavior [43].For example, prior work combining CEGAR and symbolic execution applies interpolation methods [13].A potential direction Figure 3: Counterfactual-guided Abstraction Refinement of the code in Figure 2. A factual query q 1 and a pair of counterfactual queries, q 2 , q 3 , all with the same β, produce the diagrammed symbolic traces and the resultant abstraction Â at bottom right.Concrete states are rendered with a typewriter font, while symbolic states are rendered in Garamond.
for future work might be to determine whether such algorithms can be adapted to automatically produce higher-level abstract representations of critical program behavior useful to CLEAR.Such an extension might be able to, e.g., synthesize counterfactual families that capture a set of concrete states recurring across a number of queries, in order to suggest a counterfactual generation query primed to produce a particularly informative example.By using abstraction refinement, we believe CLEAR might be enhanced for better explanations without unbalancing the interpretability vs. automation tradeoff.
An Example. Figure 3 shows the evolution of a counterfactual-guided abstraction refinement over our running example inspecting lpr from Figure 2. Through a sequence of three queries ⟨q 1 , q 2 , q 3 ⟩, where the first are factual and the latter two counterfactual, the investigator attempts to find a concrete counterfactual where the acceleration guard will not only be activated, but set to a particular value.As the queries are resolved, the symbolic executions reach nine symbolic states, all but two of which can be concretized into multiple concrete states.The meets of some of these states are non-empty, e.g., σ3 ⊓ σ6 ̸ = ∅, as the conditions of σ6 imply those of σ3 , so γ(σ 3 ) ⊆ γ(σ 6 ).Those meets form distinct symbolic states within the abstraction, with the transition relation capturing the possible executions appropriately.

Representations and Queries
We specify a factual query as a tuple (φ, β), the former logical formula specifying the inputs to the program at a critical decision moment (as in Equation 1), the latter encoding a description of the possible agent decision being investigated.Implicitly, φ defines a factual scenario (τ f , t, φ), where φ encodes the program state at that critical moment τ f (t).Counterfactual queries are encoded similarly, but with the additional of the existential indicator bit 1 ∃ .They are also able to encode many different possible program executions, captured by the notion of a family of counterfactuals T cf ctx .The last necessary statement required to invoke an SMT solver is Π( V ), the decision logic of A constrained to the scenario(s) implied by φ.We generate Π( V ) dynamically given a (counter)factual query using symbolic execution.

Representing Agents and Scenarios
We represent scenarios by tight constraints on the variables in Î as encoded by τ f (t), and counterfactual scenarios as punctured relaxations of τ f (t * ).With this restriction, in order to distinguish constraints over E and S we are able to redefine φ( Î) = φ( Ê) ∧ ψ( Ŝ).Working solely with relaxations for specifiable counterfactuals is an ergonomic tradeoff to limit the expressive power of queries, in order to promote adaptive sequential querying of statements amenable to symbolic execution.In principle, CLEAR supports supports counterfactuals written as arbitrary φ( Î) ∈ QF FPBV, and soid could do so as well.
Factuals.Factual scenarios are encoded by an (admittedly paradoxical) 'tight relaxation', in the sense that every input variable is constrained by an equality.
In practice, a factual query is specified by a (φ, ψ) pair that have a unique satisfying model over I. Evaluating a factual scenario is functionally equivalent to a concrete execution, since τ f is the only possible program trace.We tie factual analysis into our framework for completeness, and because unlike traditional 'opaque' assertion-based testing soid supports writing complex behavioral conditions on all of V , including both internal and output variables.Additionally, our factual representations also naturally generate circuits for (zero-knowledge) proofs-of-compliance, another promising tool for algorithmic accountability [47,59].
Counterfactuals.A counterfactual is encoded as a punctured relaxation which removes the original factual 2. A family of counterfactual scenarios is a tuple (T cf ctx , φ, ψ, F ) where the set T cf ctx contains every In practice, a counterfactual query is specified by a (1 ∃ , φ, ψ, F ) tuple where τ f (t * ) is excluded as a model by the negation of the formula F ( Î) tightly encoding it.
Behaviors.A behavior is just an arbitrary formula over D.

Definition 5.3. A behavior is a formula β( D).
Let vars(β) ⊆ [n] be the set of indices such that i ∈ vars(β) if vi is referenced by β( D).We then define e σ | β ≡ i∈vars(β) e σ | i as the formula describing the assignment constraints of all variables in β at a given symbolic state σ.
Decision Logic.The decision logic of the agent is composed of two sets of interrelated components.The first set of components are the path formulas π i for i ∈ [ct] generated by a symbolic execution, when constrained by φ( Ê) ∧ φ( Ŝ) and up to some maximum step length ℓ max .This 'maximum time' guarantees each symbolic trace τi of the symbolic execution will terminate at some state σℓi for ℓ i ≤ ℓ max .The second set are the e σℓ i β describing the assignment constraints for variables in β at those terminating states.Composed together, these components produce a formula Π( V ) that constrains the possible output values of the variables in β given φ( Ê) ∧ φ( Ŝ) as preconditions. 4efinition 5.4.The decision logic of A is a formula where is the set of path constraints produced by a symbolic execution of A with π 1 ≡ φ( Ê)∧ψ( Ŝ) as the initial path constraint and ℓ max as the timeout, and where σℓi is the terminating state of the i-th symbolic trace of the execution.
For concision, we continue to write Π( V ).A symbolic execution engine like KLEE-Float can be coaxed to automatically append each e σℓ i β to π i by assuming every vi referenced in β is equal to a fresh symbolic variable right after vi is (symbolically) computed.

Resolving (Counter)factual Queries
Given these representations, it is possible for us to encode the semantics of our factual (→ A,t,ℓ ) and counterfactual (2→ A,t * ,ℓ ) operators as SMT queries in QF FPBV.In the following arguments, we assume correctness of symbolic execution, i.e., that Π( V ) exactly represents the possible executions of A under φ( Ê) ∧ ψ( Ŝ) up to some time ℓ ≤ ℓ max .
⇒: By assumption, Definition 5.1(ii), and Definition 5.1(iii), τ f (t) I |= φ( Ê) ∧ ψ( Ŝ) uniquely.Therefore Φ is true for any valuation M of V for which M| I ̸ = τ f (t) I , as the LHS of the implication is false.So, the validity of Φ reduces to its truth when M| I = τ f (t) I .
By assumption scope(τ f , t) is true, implying τ f (ℓ) I = τ f (t) I .By the correctness of the symbolic execution Π( V ) is satisfiable if and only if φ( Ê) ∧ ψ( Ŝ) is as well, and in particular τ f (ℓ) |= Π( V ) uniquely for some ℓ ≤ ℓ max .Therefore for any valuation M ̸ = τ f (ℓ) it follows that M ̸ |= Π( V ), and so Φ is true as the LHS of the implication is again false.That leaves only the case of τ f (ℓ).As we have already shown , and as by assumption further τ f (ℓ) D |= β( D), we may conclude that τ f (ℓ) |= Φ.
⇐: By the validity of Φ, τ f (ℓ ′ ) = M ′ |= Φ for every ℓ ′ .First, by the correctness of the symbolic execution, τ f (ℓ ′ ) I = τ f (t) I always, implying scope(τ f , t) is true.We show there exists some ℓ such that τ f (ℓ) = , from which it follows that τ f (ℓ) D |= β( D), allowing us to conclude that That there exists some unique ℓ ≤ ℓ max such that τ pp (ℓ) = M |= Π( V ) follows from the correctness of the symbolic execution.Finally, since τ f (ℓ Notice that our proof is unaltered by dropping property Definition 5.1(i) from the definition, since as noted the restriction to relaxations is an ergonomic decision.
By assumption and Definition 5.2.1(ii), τ pp (t * )| I |= φ( Ê) ∧ ψ( Ŝ) ∧ ¬F ( Î), and T cf ctx contains every such τ cf by Definition 5.2.2.Therefore Φ is true for any valuation M of V for which M| I ̸ = τ pp (t * )| I for every τ cf ∈ T cf ctx , as the LHS of the implication is false.Moreover Definition 5.2.1(iii) implies that τ f (t * ) I ̸ |= F ( Î), and so for any M for which M| I = τ f (t * ) I the LHS of the implication is again false.Together, the validity of Φ reduces to its truth in the case that By the correctness of the symbolic execution Π( V ) is satisfiable if and only if φ( Ê) ∧ ψ( Ŝ) is as well, and in particular τ pp (ℓ) |= Π( V ) for some ℓ ≤ ℓ max for every τ cf ∈ T cf ctx , or τ f (ℓ) |= Π( V ).If the latter, τ f (ℓ) ̸ |= ¬F ( Î) by Definition 5.2.1(i) and Definition 3.2, and so Φ is true as the LHS of the implication is again false.Otherwise, if instead the former, we split into three cases, where τ cf , τ ′cf ∈ T cf ctx are arbitrary but distinct counterfactuals.¬1 ∃ , ⇐: By the validity of Φ, τ pp (ℓ ′ ) = M ′ |= Φ for every ℓ ′ for every τ cf ∈ T cf ctx .First, by the correctness of the symbolic execution, τ pp (ℓ ′ )| I = τ pp (t * )| I always, implying scope(τ pp , t * ) is true.Second, as Φ does not constrain any t ′ < t * , nbt(τ cf ) follows trivially, and so admit(τ cf ) is true as well.We show that for every  The soid tool is implemented in Python, and invokes the Z3 SMT solver [57] for resolving queries.To begin, and outside of the scope of soid, the investigator uses their knowledge of the harm under investigation to extract the factual trace τ f from the logging infrastructure of A. Note that our tool assumes that both the τ f and A used in the analysis correspond to the real-world execution.Accountable logging [74] and verifiable computation [61] can bolster confidence in these assumptions, and further that the program execution pathways being analyzed by soid are those applicable in deployment and are not being manipulated by a 'defeat device' [24].At present soid also assumes deterministic programs, though symbolic execution of randomized programs is an active area of formal methods research with developing tooling that could in the future be used to extend our method [69].Research Questions.We concretely evaluate soid with respect to three quantitative research questions.
They concern specific technical details of soid's efficiency: RQ1) Can ex post analysis paired with symbolic execution to constrain to only feasible paths limit the consequences of the combinatorial explosion often present in verification tasks?
More specifically, by logical duality, 'will the car always move?' and 'does there exist a scenario where the car does not move?' produce equivalent outputs.This is despite that the former is a universal 'verification' query while the latter is an existential 'counterfactual generation' query.So, we investigate whether 'will the car always move?' is significantly more expensive to resolve than 'does there exist a scenario where the car does not move?' for the same counterfactual family.

Three Cars on the Stand: A Case Study
In this section, we evaluate soid on the crash example from Section 2 (and Figure 1).We pose and resolve the queries from the example: in a simulated driving environment, and show that soid is able to produce Facts that distinguish between three different machine-learned self-driving car agents.
For our environment we employ Gym-Duckietown [18] with a simple intersection layout.A rendering of our example crash in our environment is given in Figure 1.For our three agents, we used the same general C codebase, but used reinforcement learning -specifically Q-learning [73] -to train three different versions of the decision model it invokes, each based on a different reward profile.Informally we deemed these reward profiles 'standard', 'impatient' and 'pathological'.The 'standard' profile is heavily penalized for crashing, but also rewarded for speed and not punished for moving without the right of way, so long as it is 'safe'.The 'impatient' profile is only rewarded for speed.The 'pathological' profile is rewarded significantly for crashes, and minimally for speed to promote movement over nothing.The reward functions for these profiles are provided in Appendix B. The simulation environment is completely invisible to soid, which only analyzes program executions on the basis of its code and logs.
On top of Gym-Duckietown we designed and implemented a web GUI to enable non-expert interaction with soid.GUIs that automatically generate representations of the driving environment are already deployed into semi-autonomous vehicles, such as those produced by Tesla.While simulating the environment, a dragand-drop and button interface allows the user to manipulate the environment.by, e.g., introducing new cars, manipulating a car's position or angle, or changing a car's destination or which car possesses the right of way.After a factual trace plays out, a slider allows the investigator to select a step of the execution, before a drop-down and button interface allows specifying a counterfactual family and behavior (whether or not the car moved).We provide still images of the GUI's interfaces in Figure 4.
Results.The results of our benchmarks are summarized Table 1.All statistics were gathered on an Intel Xeon CPU E5-2650 v3 @ 2.30GHz workstation with 64 GB of RAM.Each heading in Table 1 specifies a set of constraints φ( Î), and implicitly a behavior β( D).The rows list the trained model invoked within A, the output of the evaluations, average timings, and the total number of feasible paths.Note that the symbolic and solving timings do not exactly sum to the total timing, due to some overhead.
We find that soid met each of our design goals on the way to successfully distinguishing the three cars, especially when enhanced by the GUI.The tool provides an interpretable and adaptive oracle allowing the investigator to query a sequence of counterfactuals without directly interacting with A or the machine learned-model underlying it.Most of our queries resolved within < 20s, providing effective usability.The results of the queries demonstrate the distinctive behaviors expected of the three conflicting 'purposes' as described in detail in §2, allowing a thoughtful investigator to distinguish them as desired.With this data, we can also draw the following conclusions about our research questions.
RQ #1.We find that our universal and existential formulation took nearly identical time to resolve for all our evaluated families of counterfactuals.Moreover, we found that both positive and negative results for each query took near identical time as well.This finding highlights the benefit of ex post, 'human scale' analysis for combating combinatorial explosion.By limiting the scope of the queries to small, adaptively formulated relaxations, the set of program paths and the hardness of the resulting queries were kept manageable for the solver regardless of the nature of the query or its truth.This runs counter to the prevailing experience in ex ante SMT-based program verification.RQ #2.We find the inclusion of a floating-point range query to notably increase the cost of solving, with a ∼6x increase in the number of feasible program paths and a ∼20-30x increase in the time required.
As floating point ranges are a natural query constraint for machine learning-based systems, this increased in cost may be a significant limitation for practical deployments of soid.Note that most of the increase in cost lies in symbolic execution (∼50x increase) rather than solving (∼10x increase).This encourages that advancements in floating point symbolic execution could greatly reduce this liability.The recent focus on SMT-based decision procedures for floating point/real-valued solving in the context of neural network verification [44,45] could also be a source of relevant advances.
RQ #3.We find that soid successfully 'ignores' irrelevant information in the sense that no extraneous paths were explored, albeit at some increased cost in the solving of path constraint queries which were nonetheless enlarged by that information.Specifically, including a third car (agent2) outside of the intersection, but nonetheless present as an input to A, did not increase the number of paths found, but did increase the cost of solving by ∼2x.This is likely due to the increased size of the π i and Π in the queried SMT formulas.

Health Risk Decision Tree Misclassification
To demonstrate that soid is more general in application than to just cyberphysical systems, we also consider a second motivating example of incorrect statistical inference.We train a decision tree to infer the health risk status of individuals using the Pima Indians dataset, a classic example in counterfactuals due to [72].
Notably, we consider a program A with an implicit unit conversion bug: A computes the BMI input to the decision tree using the height and weight parameters from its input.However, it is written to expect metric inputs in kg and m, while the inputs are instead provided in the imperial in and lb.This is a flaw of the software system in general.Both the decision tree and program themselves are correct, but end-to-end the system misclassifies many inputs, as for the same quantities (kg/m 2 ) ≫ (lb/in 2 ).
Unlike statistical counterfactual methods like those of [72,56] which only analyze the (correct) decision model, the end-to-end nature of soid allows it to analyze everything, including the conversion bug. Figure 5 displays the inference code and incorrect decision due to the conversion error.
We then ran a small case study on this decision tree health risk misclassification example.The results The results show soid is able to efficiently resolve the counterfactual in the positive.Of our empirical research questions only RQ #2 is directly applicable.
RQ #2.We find the additional cost of the floating-point operations to still enable quick solving, even though those operations lead to SMT constraints incorporating exponentiation and division operations over floatingpoint operations, in addition to the (bounded-depth) recursion over floating-point comparisons needed for the decision tree inference itself.

Conclusion
We conclude by highlighting some promising future directions for extending CLEAR and soid to both support more complex agents and make the investigatory process more intuitive.
Exploiting Counterfactual-Guided Abstraction Refinement.As described, a promising direction is to add additional automation into soid by using the abstraction refinement literature to generate informative representations of critical states and logic that recur across many queries.
Supporting DNNs.Many modern machine-learned agents rely on models built out of deep neural network (DNN) architectures.Extending soid to support such agents -most likely by relying on SMT-based neural network verifiers as subroutines [44,45] -is an important open direction for increasing the utility of our method and tools.
Alternative Generation of Π( V ).Symbolic execution is no the only way to generate Π( V ).In particular, predicate transformer semantics [43] provide another direction for generating decision logic representations compatible with SMT.Comparing the relative strengths of various methods may show that other approaches are superior, e.g., weakest precondition generation could be used to precompute a single large Π( V ) for all possible program executions, amortizing the cost of finding Π( V ) in exchange for an increase in the effort required to solve Φ.
Programming Counterfactuals.Although soid is adaptive, that does not necessarily mean it needs to be interactive.A further possible direction would be to design a counterfactual calculus as the basis for a programming language that would invoke soid to solve for elements of the Facts as part of the semantics.
Such a language could potentially be the basis for formalizing certain legal regimes for which counterfactual analysis forms a critical component.A related direction would be to integrate with a scenario specification language like SCENIC [35], and the VerifAI project more generally [31], to add another layer of capability onto the specification of families of counterfactual scenarios.→ Did the observation of the pedestrian lead to a decision to (more sharply) brake?
2→ If A's sensors also misidentified the pedestrian as a cyclist, would that have led to a decision to (less sharply) brake?
Note again their adaptive nature: the exact formulation of the later questions depend on the answer to the first.These queries explore under what conditions the car will reconsider transiting the intersection, and the relative responsibilities of the three agents.The answers (yes, yes, yes, no) support the IM being significantly culpable through its misidentification, as the agent's decisions are consistent with believing it braked sufficiently to avoid a (faster moving) cyclist.In contrast, (no, no, yes, no) support A as uniquely culpable among the ADMs, as the misidentification by the IM had limited influence on its decisions.

B Reward Functions
We document the reward profiles we used to train the 'standard', 'impatient', and 'pathological' cars respectively, up to some minor edits for clarity.Note that in order to cleanly test our intersection dynamics every car attempts to avoid tailgating, in order to reduce spurious collisions on entry.

Figure 1 :
Figure 1: A broadside car crash rendered in the soid GUI.
method of symbolic execution, we can answer queries about what the car did under these constraints by checking formula entailment.Posing counterfactual queries requires manipulating the state of the world since such queries ask how would the agent react if the situation were different.Counterfactuals can be encoded by substituting constraint values φ ′ ≡ φ[(agent1 pos x = 1.376) → (agent1 pos x = 1.250)].

Let A be the
program instantiating a decision algorithm A. A operates over a finite set of program variables var(A) = V = {v 1 , . . ., v n }.We view var(A) as a union of disjoint subsets V = I ∪ D. The set D = {vd 1 , . . ., vd n D } is the set of internal decision variables.The set of input variables I = E ∪ S is itself partitioned into sets of environment variables E = {ve 1 , . . ., ve n E } and state variables S = {vs 1 , . . ., vs n S }.Therefore n = n E + n S + n D .E is composed of variables encoding input sources external to the agent, while S is composed of variables encoding internal state.Every v i is associated with a domain D vi .A state is the composition of the variable assignments at that point of the execution σ ∈ D = (D v1 × • • • × D vn ).Given σ = (d 1 , . . ., d n ) ∈ D, we denote the restriction to only environment variables as σ| E = (d 1 , . . ., d en E ) ∈ D e1 × • • • × D en E , and similarly for σ| S , σ| I , and σ| D .A trace τ = σ 1 σ 2 σ 3 . . . is a (possibly infinite) sequence of states.We access states by τ (t) = σ t , and values of variables at states by σ(v i ) = d i .The set of possible traces is governed by a transition relation R ⊆ D × D, so that τ (t) = σ and τ (t+1) = σ ′ may occur within some τ only if (σ, σ ′ ) ∈ R. The program A encodes a partial transition relation, R A , with the constraint that and σ = (d 1 , . . ., d n ) ∈ D, then we write σ |= φ(V ) if the constant formula that results from substituting each d i for v i evaluates to True.We use φ and ψ when writing formulas over I that represent scenarios, β when writing formulas over D representing decisions made, and Π when writing formulas over V representing whole program executions.A symbolic state σ = D = ( Dv1 × • • • × Dvn , π σ ) is defined over a set of symbolic variables symvar(A) = V = {v 1 , . . ., vn }, with Î, Ê, Ŝ, and D defined analogously.Each Dvi augments the concrete domain D vi by allowing vi to reference an expression e i over a set of symbolic values {α i } i∈[k] , e.g., e i = α i or e i = 2α j + 3.0.

Figure 2 :
Figure 2: Longitudinal Proper Response Guard (see Def. 4.1 of [64]) written in C.Although not used by the simulated self-driving cars of our case study, this sort of decision functionality is representative of what soid is designed to investigate.

Figure 4 :
Figure 4: Still of the soid GUI (with a small section cut out for brevity).At top left is the critical moment from the program logs as chosen by the investigator.At bottom right are the counterfactual conditions the investigator has specified.

Figure 5 :
Figure 5: Our decision tree example.At top, the relevant decision subtree for a misclassification based on health data, with the incorrect path taken in red -and the correct branch missed in blue -as the unit conversion bug leads to a significantly smaller BMI input than is correct.At bottom, the (otherwise correct) decision tree inference logic in C.

Figure 6 :
Figure 6: A pair of self-driving car crash scenarios.(L):Ex. 1.A broadside crash between an agent car A (blue semicircle) and another car (orange semicircle).See Figure1for a rendering of this scenario in our GUI.(R): Ex. 2. An agent car A (blue semicircle) hits a pedestrian (green triangle) in an environment with an obscuring truck (orange semicircle) and an automated intersection (red cross).at t * 2

Table 1 :
Experimental results for our car crash case study.
2→ If A had arrived before the other car, and that other car was not signaling a turn, would A have waited?(e.g., to 'bait' the other car into passing in front of it?) at t * 1 2→ Could a different turn signal have led A to remain stationary?

Table 2 :
2→ A,t * ,ℓ φ * ≡ φ f act [(weight = 249.973)→⊤],everhigh risk?Experimental results for our decision tree misclassification case study.ofourbenchmarksare summarized Table2, and were gathered on the same Intel Xeon CPU E5-2650 v3 @ 2.30GHz workstation with 64 GB of RAM.In additional to a simple factual verification query as a baseline, we posed a single counterfactual query: at t * 2→ Does there exist a weight for which the instance is classified as high risk?