"What Will You Do Next?" Designing and Evaluating Explanation Generation Using Behavior Trees for Projection-Level XAI

Explainable AI (XAI) is a subfield of human-agent interaction that involves the design and development of methods that generate explanations and exhibit more transparent behavior from AI agents. In this work we present three contributions that advance XAI research in the context of Human-Robot Interaction (HRI). First, we extend explanation generation using behavior trees to include projection-level XAI, i.e. the ability to query an agent for explanations on future actions. Second, we developed algorithms that answer pre- and post-conditions of an action, which we hypothesize improves comprehension of an agent. Third, we present an experimental design using a robot arm and GUI to evaluate the efficacy of the explanation generation approach on various levels of user situational awareness, workload, and trust. All code is open-source to allow researchers to explore using explanation generation with behavior trees for future human-robot interaction studies.


INTRODUCTION
Recent advances in artifcial intelligence (AI), such as machine learning (ML) and robotics, ofer the promise of impressive performance in autonomy.Nevertheless, highly autonomous systems also present new challenges in terms of trust and transparency [8].Explainable AI (XAI) systems aim to address these challenges by designing systems that can generate explanations and more intelligible behavior.

Design and Evaluation of XAI Systems
As XAI research has grown over the past decade, there is an increasing body of research investigating how XAI systems should be developed.Mohseni et al. [15] present a design and evaluation framework for XAI to address the unique needs of three types of AI stakeholder roles: "Developers/AI researchers", "Domain Experts", and "Lay Users." They argue that diferent stakeholders have diferent explainability needs and that XAI designers must engage in an iterative process of development and engagement with developers, experts, and users to identify the XAI system goals and explainable interfaces and algorithms.
Hofman et al. [11] propose that for a statement to be an explanation, it must satisfy a user's goal or need for knowledge.They map user goals to "triggers", or questions users generate to satisfy a need for knowledge about the AI.For example, a user's goal for local and global understanding of the system may trigger the "What did it just do?" and "How does it work?"questions, respectively.Open-ended "Why?" triggers, e.g."Why did it do x ?", are the most challenging to respond to as these triggers often encode a contrastive reasoning process within the user [14], e.g."Why didn't it do y instead?"Therefore the iterative design methodology of [15] should attempt to address the contrastive foils to satisfy the users' needs.The "What will it do next?"trigger may address a continuous aim to enable a user to intervene during plan execution to prevent undesirable outcomes, or it may allow a user to calibrate their trust based on the system's expected outcomes.
An example of engaging with users to determine system goals is Barkouki et al. [3] who presented a survey to NASA engineers and scientists to determine the types of XAI they felt were most important in a supervision task of a free-fying space robot and environmental control analysis system.They reported the most important type of XAI for NASA engineers supervising Astrobee [20] was "What will it do next?"This is "projection-level" XAI according to the SAFE-AI framework developed by Sanneman and Shah [19] for XAI design based on three levels of XAI -level 1 XAI: Perception, level 2 XAI: Comprehension, and level 3 XAI: Projection -corresponding to the three levels of situational awareness (SA) from human factors research [7].Level 1 SA (Perception) is understanding what a system is doing or has done; level 2 SA (Comprehension) relates to understanding why a system is behaving a certain way; level 3 SA (Projection) is the ability to predict future outcomes.
A growing number of surveys exist on XAI methods, yet many focus on the forms of XAI intended for use in machine learning as opposed to embodied autonomous systems such as mobile robots and manipulators.As the focus herein is on HRI, we refer the readers to recent surveys, e.g.[1][18] [17], on XAI methods specifcally for robotic agents.

Explanation Generation Using Behavior Trees
Han et al. [9] uses behavior trees to sequence and execute robotic actions in a structure that uses a hierarchical semantic set in equation (1) which enables explanation generation.
A basic understanding of behavior trees is a preliminary of this work and can be found in [9] and [5].The explanation generation algorithms, accessible via a ROS service [16], extract causal information about the robot's behavior when queried by a user.
• "What are you doing?" • "What is your goal?" • "How do you achieve your goal?" • "What is your sub-goal?" • "How do you achieve your sub-goal?" • "What went wrong?" The algorithms take as input the currently executing leaf node and traverse the tree to extract the relevant information corresponding to one of the elements in (1).For example, in "What is your goal?", the response is the name of the root node of the tree in execution, i.e. the goal from the semantic set.To answer "How do you achieve your goal?", the response is composed by traversing the tree to the root node, then traversing and returning the concatenated names of all of the nodes immediately below the goal, i.e. the steps in (1).Refer to [9] for full pseudocode and implementation details.All of these aforementioned explanations fall into Levels-1 and -2 XAI according to [19].That is, they answer Perception and Comprehension triggers of a user, respectively.The objective of this project is to extend explanation generation using behavior trees with additional explanations, including Level-3 for Projection, as well as develop additional tools to enable HRI research using explanation generation with behavior trees.This paper is organized as follows: Section 2 discusses prior related research into projection-level XAI for agents; Section 3 describes our software contributions; Section 4 provides a proposed experimental design to evaluate explanation generation on user cognitive states; Section 5 describes limitations and thoughts on future work; and Section 6 concludes the paper.

RELATED WORK ON XAI FOR PROJECTION
According to the SAFE-AI framework [19], level 3 XAI for explainable agents and robots includes information about prediction, consequences, counterfactual cases, hypothetical model changes, and uncertainty.Here we discuss several works that develop methods for providing explanations for prediction and consequences.[19] considers explaining a plan output by a planner as level 1 XAI while explaining a plan output by an agent executing that plan as level 3 XAI.Therefore, we include works from the XAI planning (XAIP) literature only where there are implications for explainability during execution, as opposed to explaining a plan ofine.Similarly, works on policy or reward function explainability from the explainable RL (XRL) feld are not reviewed here.
The concepts of agent legibility and motion predictability are explored in [6].Legibility is defned as behavior meant to expose the agent's intent and predictability refers to behavior that matches a human's expectation.They showed humans could distinguish between goals and paths based on varying levels of legible and predictable motions of a robot arm.The notions of plan explicability and task-level predictability were introduced by [22].The goal for predictability here, though, was not to inform the human of the robot's future actions, but to learn a user mental model, ofine, which a robot could use to plan behavior that would most closely conform to the human's expectations.Plan explicability is linked to legibility, in that it intends to convey information about the agent's plan, i.e. the goal, through behavior rather than explicitly through explanations.
Work on explicit explanations in the automated planning feld was developed in [4] and [21] as methods that interpret human's questions to the agent as signaling diferences between the human's mental model and the agent's mental model that the agent then reconciles through explicit explanation.By reconciling diferences in the mental models through explanations, the human can better interpret and predict the actions of the agent.This assumes both the human and agent plan optimally and that the agent has a more complete and/or accurate mental model of the environment than the human.
Work has also been done to consider combining explicable behavior with explanations [21] in a framework that considers the mixed costs both due to deviating from an optimal plan for the sake of explication as well as the human workload involved in receiving and interpreting explanations.In their user study, human subjects participating as scene commanders during a robotic urban search and rescue scenario were given an interface with options to request the agent's plan and an explanation.As the agent is performing model space search to generate explicable plans, the answer to the request for the plan by the user constitutes a form of level 3 XAI.

ALGORITHMS
We developed four additional explanation generating algorithms 1 , implemented in C++ as a ROS package [16], that return the answers to the following triggers: • "What do you do next if the current action succeeds?" • "What do you do next if the current action fails?" • "What are the preconditions of the current action?" • "What are the postconditions of the current action?"answer ← "If " + n.short_description + "succeeds, my next action is " + next.short_description+ ". "; 6: return answer Algorithm 1 takes the currently executing node and traverses the tree to fnd the next node to be executed assuming the current node returns SUCCESS and the nodes used to build the tree are limited to those discussed in Section 5. Algorithm 2 produces a string to present to the user with the explanation.A similar process is used to explain the next action assuming the current node returns FAILURE.Algorithms for explaining pre-and post-conditions return the relevant pre/post-condition information from the scripting language discussed below.
Examples of the explanation outputs are displayed in Fig. 1 for a simple behavior tree.In this example, the next action after the IsDrawerOpen condition node depends on whether the condition is met or not.If it is met, the robot will execute the DetectTools action, while if it is not met, the robot will execute OpenDrawer.The precondition of the CollectTools action is that the tools must be detected, thus it will skip the collect action if this precondition is not met.The CloseDrawer action has post-conditions that will set a boolean variable indicating whether the drawer is open or closed, depending on whether the action succeeds.
The ROS package with these algorithms is developed in a fork of the original explanation generation using behavior trees frst published in [9].We additionally signifcantly refactored the existing algorithms to improve modularity, extensibility, and readability.Furthermore, we introduced a feature that enables the parameterization of node descriptions using blackboard values.Previously, explanations within a subtree were vague to allow reuse in diferent situations, such as "pick item." Now, a node description can be set as "pick {item}, " where "item" represents a blackboard variable.During runtime, "item" is substituted with its corresponding value.This feature allows for more specifc explanations within subtrees.
The behavior tree C++ library BehaviorTree.CPP2 provides the framework needed to create and run behavior trees.To add the necessary functionality that allows for explanation generation, [9] depends on their fork of the behavior tree C++ library.Likewise, our approach relies on these modifcations to the main library, however, we created our own fork with a branch that is up to date with BehaviorTree.CPP v4.3, which is the latest version at the time of this writing 3 .BehaviorTree.CPP v4.3 4 allows scripting within the XML defnition of behavior trees to execute script code before and/or after the execution of a node.The library provides preconditions, such as "skipIf", "successIf", and "failureIf" that will skip the execution of a node based on specifed script conditions.It also provides postconditions that will run script code based on the result of the node after execution.While this was intended to be a convenience for behavior tree developers, it fortuitously provides a method of extracting pre/post-conditions directly from the executing node to be used in explanations.[19] state that pre/post-conditions are a form of level 2 XAI, however, we argue that in the case of postconditions, they are more akin to consequences, which [19] consider to be level 3 XAI.

EXPERIMENTAL DESIGN
We briefy present the following experimental design with a generic task using a robot arm as a template for validation of the explanation generation algorithms presented in Section 3 on user cognitive states.An implementation of this experimental design was presented at the HRI 2023 Workshop YOUR Study Design (WYSD) workshop, and feedback from the mentors has been incorporated into this design.Additionally, to address the dearth of published human subject research incorporating XAI methods, we recommend and will follow the reporting guidelines from [2] to improve replicability.
We aim to understand the efects of varying levels of XAI on user cognitive states to direct future research towards developing the XAI methods that most efectively achieve user goals.Specifcally, we are asking: (1) Does increasing the level of XAI increase the level of situational awareness (SA)?(2) Are levels of XAI and SA correlated?E.g. does providing level 3 XAI specifcally improve level 3 SA?(3) Does increasing the levels of XAI available produce an increase in trust accuracy?(4) Does increasing the levels of XAI available produce an increase in workload?

Manipulated Variables:
This will be a between-subjects study.All subjects will initially receive a familiarization task with the robot on a separate task, such as pick and place of several objects, and be provided explanation generation via the GUI.During the main task subjects will be randomly assigned a treatment group in which they will receive Levels 1 (baseline), 1+2, 1+3, or 1+2+3 explanations continuously throughout the experiment.The aim is to determine if specifc levels of XAI produce efects on specifc levels of SA.

Dependent Variables:
We will assess trust using a validated XAI trust scale from [11].
Workload will be measured using the NASA Task Load Index (TLX) scale [10].Situational awareness levels will be evaluated using the accuracy of responses to questionnaires about the robot's actions, goals, and the nature of using fallback actions due to prior action failure.Additionally, SA can be elucidated by programming deterministic robot failures into the sequence of the task and assessing whether the users can predict and/or diagnose the failure prompting the subject with a questionnaire about the future state of the Tammer H. Barkouki, Ian T. Chuang, and Stephen K. Robinson robot task before the failure occurs.We refer readers to [19] and [11] for recommended metric alternatives specifc to XAI.

LIMITATIONS AND FUTURE WORK
The current implementation follows and builds upon the hierarchical explanation generation framework of [9].We tested our algorithms on behavior trees that conform to this framework and follow the semantic set in eq. 1.Furthermore, the types of control and decorator nodes tested are limited to the base forms of the common nodes.That is, we have not tested explanation generation on behavior trees containing nodes other than the control nodes Sequence and Fallback, and the decorator nodes SubTree, Inverter, Repeat, and Retry.
As agent behavior complexity increases, so does the need for explanation.Complex and robust agent behavior can be achieved through the use of other nodes not currently supported by our contribution, such as ReactiveSequence and SequenceWithMemory 5 as well as the corresponding types of Fallbacks.For example, placing a ReactiveSequence as the parent node of an asynchronous action gives the capacity for an agent to autonomously preempt the action in execution if a condition is violated.Future users of explanation generation using behavior trees should explore the behavior they require and consider adding the necessary XAI functionality where needed.
Behavior trees are now included as core dependencies in ROS2's planning and navigation stacks, Plansys2 [13] and Nav2 [12], indicating a growing interest in their usage in robotics.Therefore it is incumbent upon HRI researchers to continue to develop and test XAI tools compatible with robots programmed with behavior trees.This ensures that the HRI study results align with robot programming methods that are being deployed in the real world.Further work is needed to explore the Plansys2 and Nav2 implementations on compatibility with the explanation generation approach.

CONCLUSION
In this paper, we presented an extension to previously published algorithms for explanation generation using behavior trees.We considered the explanation generation problem within the XAI evaluation metrics framework, based on the human factors literature, and identifed a need to include level 3 XAI for users requiring information on the agent's future actions.We developed algorithms that allow a user to query about an agent's expected next action, as well as their fallback action if the current one fails.We also include questions on the pre/post-conditions of the current action.Lastly, presented a planned human subject experiment to evaluate explanation generation.All code is open source and experiment design and results will be reported in a manner that maximizes transparency and replicability.
Algorithm 1 Find Next Node if SUCCEED