Exploring the Impact of Explanation Representation on User Satisfaction in Robot Navigation

The decisions made by autonomous robots hold substantial influence over how humans perceive their behavior. One way to alleviate potential negative impressions of such decisions by humans and enhance human comprehension of them is through explaining. We introduce visual and textual explanations integrated into robot navigation, considering the surrounding environmental context. To gauge the effectiveness of our approach, we conducted a comprehensive user study, assessing user satisfaction across different forms of explanation representation. Our empirical findings reveal a notable discrepancy in user satisfaction, with significantly higher levels observed for explanations that adopt a multimodal format, as opposed to those relying solely on unimodal representations.


INTRODUCTION
Providing reasons for the actions of robots has been proven to positively affect both human trust [15,50] and understanding [11,50].Additionally, this practice plays a crucial role in fostering effective interactions between humans and robots [43].Furthermore, robots that provide explanations are perceived as more socially adept [1].Furthermore, the EU Data Protection regulation [49] underscores the importance of furnishing individuals with explanations when machines make decisions that impact them.Despite the manifold advantages that robots bring, a deficiency in transparency and accountability persists concerning their decision-making processes [17].This challenge is exacerbated by the complexity of most robot behaviors and their intricate social interactions with humans.
In the domain of various robot behaviors, our specific focus lies on motion planning in indoor environments and the pivotal importance of elucidating robot navigation for human counterparts: Consider an autonomous robot scenario involving going to the docking station in the lab and navigating its surroundings while contending with potential obstacles.Unforeseen events, such as the abrupt presence of an obstruction in its path (trajectory), e.g., a trash bin or a closed door, can cause the robot's planning system to diverge from its original trajectory or encounter failures, as depicted in Figure 1a.Such behavioral deviations have the potential to catch individuals interacting with the robot or monitoring it remotely off guard, leading to a diminished level of confidence in the robot's intentions.To address this concern and cultivate a more secure environment for humans and robots, it becomes imperative for the robot to furnish explanations for its actions [47].To heighten the transparency and user-friendliness of robot behaviors, we employ visual and textual explanations as an interactive medium to illuminate robots' decision-making processes for individuals.Concurrently, we harness both unimodal and multimodal explanation modalities (representations) to enrich the comprehensibility and user-friendliness of robot navigation.Unimodal explanations are either textual or visual explanations of the influence of a robot's environment on its decision-making.Multimodality is reflected in a simultaneous provision of visual and textual explanations.Figure 1b presents a multimodal explanation illustrating the robot's behavior in the context of the depicted failure scenario.
We conduct two user studies focusing on providing explanations to remote monitors/users (explainees) via a screen-based interface.Initially, the pilot study was designed to check the user preference for visual explanation design, i.e., visualization explanation scheme, as well as satisfaction with the chosen design.The outcomes of the pilot study were used to design the main study where we tested explainees' satisfaction concerning the explanation representation type (visual, textual, visual-textual) and, respectively its modality (unimodal, multimodal).The main objective was to find out whether there is a significant difference in user satisfaction between different explanation representations in the same settings.The main contributions of this paper are: -We propose methods for generating visual and textual explanations of robot navigation (Section 3) and evaluate them on two user studies (Sections 4 and 5); -In the pilot user study, we observe the users' preferences in terms of visual explanation schemes (Section 4); The autonomous robot (TIAGo by PAL Robotics), indicated by the white-bordered circle, experiences a planning failure trying to follow its initial trajectory (depicted in light grey) when it encounters obstructions (a trash bin and a closed door).The robot cannot produce an alternative trajectory to navigate around these obstacles.The primary factors responsible for this failure are highlighted in red and referred to as ℎ_ and  in the visual representation.Other objects are color-coded with a heat map that indicates their proximity to the robot.The written explanation clarifies the main reasons for the robot's failure and suggests the actions a nearby human can take so the robot can continue towards its target.
-We discover that explanation recipients exhibit higher satisfaction when presented with multimodal (visual-textual) than with unimodal (visual or textual) explanations (Section 5).The structure of this paper is as follows: In Section 2, we give an overview of the related work and prior research.Section 3 introduces our explanation methodology, emphasizing its incorporation of environmental context into the explanation generation planning process.Section 4 delves into our pilot study giving an overview of the study design, results, and discussion.The main study is presented and discussed in Section 5. Finally, Section 6 presents conclusions and points out future research directions.

RELATED WORK
Our research is positioned at the intersection of two key areas: Explainable Autonomous Navigation and Representations and Interfaces in Autonomous Navigation and Human-Robot Interaction (HRI).We present an overview of related work in these areas, focusing on mobile robots as autonomous agents, enabling us to understand the current research and position our own comprehensively.

Explainable Autonomous Navigation
Several navigation explanation methodologies have incorporated natural-language explanations.Notable examples include the work conducted by Perera et al. [39] and Rosenthal et al. [41], which focus on explaining complete global trajectories by providing narrative summaries of robot paths.Their explanations are post-hoc, presented to explain after navigation is finished or the trajectory is calculated.Gavriilidis et al. [19] generate explanations independent of autonomous agents' behavior, breaking down behavior into natural-language components.To produce an explanation they train a surrogate explainable model approximating agent policy.Conversely, Stein's research [45] delves into model-informed natural language explanations, leveraging the internal features of underlying navigation algorithms.Natural-language explanations have also been used to support humans in rectifying navigational errors [11].
Visual explanations have also been employed in explaining autonomous navigation.One group of approaches [10,26] has explored the use of lights to indicate robot paths by projecting them on the floor in front of the robot.These approaches are targeted toward explainees in the robot's immediate neighborhood.Another group of approaches [3,4] employs expressive lights mounted on a robot, which reveal certain internal states of service robots.Halilovic and Lindner [23,24] explain the robot's local decisions by visualizing a navigating robot's neighborhood in a bird-eye view.He et al. [27] focus on visualizing numerical information from the trained neural networks navigation path planners, while Bautista-Montesano and colleagues [5] explain fuzzy reinforcement learning planners.Both approaches are model-specific, relying on the internal workings of the underlying algorithms.
Most addressed motion behavior in explainable motion planning literature is a planning failure.For example, Robb and colleagues [40] delve into the exploration of navigational planning failures by examining users' cognitive frameworks when they are presented with explanations for failures of remote navigational robotic agents.In the same context, Brandao et al. [7], contribute a comprehensive taxonomy for explainable motion planning.This taxonomy formalizes explanations for instances of planning failures and includes explanations for deviations from planned trajectories and users' trajectory preferences.Deviation explanations are usually contrastive explaining why an alternative trajectory was taken instead of the original one.Halilovic and Lindner investigate deviations from initial paths at a local level, using both visual explanations [23] and visual-textual explanations [24].Additionally, the study by Brandao and colleagues [8] explores the optimality of global paths.In this research, the authors present users with an alternative non-optimal global path and compare it to the optimal counterpart to better understand global path choices.Drawing from the literature [21][22][23][24]29], we provide a top-down explanation approach enriching explanations with details, that not only include objects around them, but also the object's semantic features and the notion of human presence and motion.Our approach explains both trajectory deviations and planning failures.Our method can generate both uni-and multi-modal explanations.Visual and textual parts of multimodal explanations are coherent (see Figure 1b).

Representations and Interfaces in
Autonomous Navigation and HRI 2.2.1 Representations.Human-robot interaction research is progressing to study interactive modalities to make communication easier and more natural.However, the high cost of developing them and controlling expression design can be a huge challenge in this research.Therefore, color can be viewed as a low-cost and intuitive way of making interaction design [44].According to [9], green elicits the most positive emotions, whereas red elicits high excitement and may even render some bad moods.Similarly, [13] supports the claim that red color is related to excitement and intensity; yellow can be paralleled with attention and aggravation, whereas green shows safety and calmness.Netek et al. [36] focused on user preferences by cartographically interpreting heat maps.The authors chose traffic-accident data and provided the participants with heat maps of different color schemes and were asked to rate according to the most viable and satisfying interpretation of the use case.The participants were divided among cartographers and non-cartographers and both groups chose the red color scheme as the preferred color.However, it was found that the carto and non-carto groups' opinions differed in the context of the chromatic ranges as the carto groups preferred convergent monochromatic ranges (eg.increasing saturation of red as the intensity of traffic increases) whereas the non-carto group preferred divergent symmetrical range where the most implemented transition was from green (for minimum intensity), yellow (for average intensity), to red (maximum intensity).This transition promotes quick and easy interpretation, and it has undergone rigorous research to establish its status as one of the most user-friendly choices.Color is effective in visualizing stationary objects.However, when dealing with moving objects, alternative forms of visualization prove more advantageous.Moving objects can lead to future crises, necessitating attention-grabbing visual cues for enhanced awareness and prevention.Blinking lights and other forms of animation have proven to be useful in giving warnings to the users and shifting their focus to the concerned moving object [6,20].As explained by [18], arrows originated in the prehistoric era with cave paintings and engravings and now have emerged as symbols for complex wayfinding.Arrows were first used for navigation and wayfinding in the 14th century and have now become almost synonymous with the concept of direction.Intuitively, the arrow symbol is structured in a way that the pointy arrowhead is understood as the direction where the navigator should proceed.In a study [34] researchers attempted to understand the neural basis of navigation in humans.This positron emission tomography study found that wayfinding in a familiar town setup required significantly higher strategy building and decision-making than following a trail of arrows, implying that following arrows is intuitive and requires less planning.This justifies the usage of arrows in complex navigation.Another symbol that is widely used to represent paths in navigation is solid route lines.Dent et al. [12] have researched maps regarding design principles and delineate why solid lines are preferred while representing routes.These lines provide the users with a clear and concise way to visualize the path they have to follow.Another advantage of using solid lines is that the width and color of the line itself can have multiple explanations, like the commonality or novelty of the path, traffic in the route, etc. Numerous phone applications for navigation also use solid lines to represent the route [2].These route lines generally have a contrasting color as compared to the rest of the interface (e.g., blue, green, red, etc.) to make the paths visible by just a glance.Different obstacle color schemes and trajectory visualizations represent independent variables in our pilot user study.

Interfaces.
The research on autonomous navigation interfaces has mainly been concerned with teleoperated robotics systems.Vaughan et al. [48] set the stage with their introduction of a teleoperation interface featuring a focus plus context (F+C) view, which is geared towards mitigating limitations in peripheral vision.Monocle [42] unveils an interactive detail-in-context interface tailored specifically for search and rescue tasks.In the human-in-the-loop planning interfaces domain, Lemasurier et al. [30] makes a comparative exploration between 2D keyboard and mouse interfaces and their 3D virtual reality counterparts.The focus of their study lies in assessing operator performance and usability.Olivares et al. [37] investigate the impact of windowing control on user performance, providing additional layers of understanding regarding user preferences based on expertise levels.Regarding the control modalities, Santos et al. [14] research the efficiency of eye tracking by evaluating its efficacy alongside touchscreen and standard mouse interfaces for pick-and-place tasks.Pose and Paste [31] introduce a novel interface paradigm for interaction between a single user and multiple robots.Navigating hazardous environments, the modular Human-Robot Interface [33] takes center stage and showcases adaptability and safety in real-world interventions.As Olivares et al. [35] pivot towards user-friendly teleoperation, their touch-based multimodal interface stands out for its accessibility for non-experts, proving successful in providing intuitive direction perception.
Our emphasis lies in providing explanations for the navigational behaviors of remote agents, a facet largely unexplored in current literature.We are particularly focused on the design of visual explanation interfaces for such agents.The significance of such explanations becomes evident in scenarios where robots navigate through hazardous or inaccessible terrains, offering valuable insights into their decision-making processes.Notably, when agents operate in human-populated indoor environments, our target audience comprises robot monitors-individuals tasked with remotely overseeing robot behavior during navigation.This can prove instrumental in discreetly testing new robots within human surroundings without alerting individuals to the ongoing experimental phase.Furthermore, our approach extends its utility as an educational tool for children, facilitating remote learning about robot navigation.

VISUAL AND TEXTUAL EXPLANATIONS
(a) Hall's social space [25,32] (b) Textual explanations We create an explanation layer on top of the navigation, which is agnostic to the navigation method.Our explanation navigation layer is implemented in the Robot Operating System (ROS), which plugs in on the existing ROS navigation stack.We depict visual explanations using a localized visual "explanation layer" that surrounds the robot continuously as it navigates.Within this layer, objects with the potential to obstruct the robot's path are visually marked based on their proximity to the robot, creating a localized heatmap.We utilize different color schemes (see Figure 3), while we use the yellow-green heatmap [36] (as seen in Figure 1b) as a descriptive example.Objects closer to the robot, indicating a higher likelihood of becoming obstacles, are shaded closer to yellow, while objects farther away lean more towards a green color.The color red is exclusively used when an object transforms into an obstacle, significantly changing the robot's navigation behavior.This color scheme adheres to fundamental principles of color psychology, where red serves as a powerful indicator of a critical event, while yellow and green, in ascending order of intensity, signify a more peaceful and stable state [36].Our design approach ensures that individuals can easily discern the environmental factors that the robot considers crucial for its navigation decisions.The robot maintains an understanding of social space, adhering to Hall's theory of social spaces [25] (as depicted in Figure 2a).Within this framework, the robot's social spaces are categorized into four zones.Typically, the social zone extends from distances of 1.2 to 3.6 meters from the robot.We model explanations as a social mechanism setting the visual explanation layer size to 2.4 metres around the robot, which corresponds to the middle of a robot's social zone.
In addition to providing visual explanations, our system also generates corresponding textual explanations.While the robot is in motion, our system continuously monitors the positions of both the robot itself and nearby objects, ensuring that our foundational ontology remains up-to-date (see [21,24]).This dynamic process allows for establishing spatial and semantic relationships between objects and the robot.Our textual explanations comprised concise statements that succinctly conveyed information to explanation participants.These statements focused on illuminating the ongoing actions of the robot and the surrounding objects and provided insights into the reasons behind trajectory deviations and planning failures (see Figure 2b).Even though we formulated the textual explanations, our primary emphasis revolved toward adherence to the principles of literature rather than creating novel designs.

PILOT STUDY
We conducted a pilot study to test user preference and satisfaction regarding the visual explanation design.Participants were provided with visual (unimodal) explanations.Furthermore, participants expressed their perceived satisfaction with their preferred visual explanation design through the Likert scale questionnaire.The most preferred design by participants from this study was used as a visual explanation design in the main study.

Participants
For the pilot study, our sample consisted of 119 participants (49 males and 70 females), ages 18 to 45 years (M = 21.2,SD = 3.87).The sample representation was from various parts of the world.All participants had a minimum education of high school degree and represented various fields of study.The participants had normal or corrected-to-normal vision and no color blindness.Participants were provided with all the information about the study and informed consent was collected.Participants were recruited through mailing lists in two universities and mailing lists in the robotics and HRI community.All participants participated as volunteers.Each participant was randomly sampled.

Procedure
The data was collected via an online survey web application LimeSurvey 1 .The participants were provided with information and ethical considerations about the study and had to fill in the demographic details.Following this, participants were shown 6 videos depicting different visualization schemes for an explanation, each video approximately 1 minute long.The video portrayed the autonomous robot attempting to fetch a book from a shelf in a library setting.The robot was to explain two deviations in the path occurring while fetching the book.No textual explanation was provided to explain the robot's behavior.To give participants an idea about the library setting, a real-time bird's eye view of the room was provided alongside the explanation video.The video order was randomized to reduce any biases.Each scenario (video, visualization scheme, see Figure 3) portrays a combination of a specific color scheme and a specific path scheme.Among color schemes, we differentiated: • red-only the objects (chairs in our example) that directly cause a significant change in navigation behavior (deviation, failure) are colored (in red).• red heatmap with varying intensities-as the robot moves, a red heatmap with varying intensities (from stronger red in the middle to the lighter red towards the edge) is displayed around it-objects that cause significant navigational behavior change are still highlighted with the strongest nuance of the red color-the decreasing intensity of the red color from the middle towards the edge of the visual explanation layer alludes to the respective decreasing probability of objects becoming obstacles.• red-yellow-green heatmap with varying intensities-a scheme already explained in Section 3, where the yellow-green heatmap is continuously displayed around the robot during its navigation with red-colored objects causing deviation or failure.
Among path schemes, we distinguished: • solid arrow-a set of arrows directed towards the navigational goal, in turquoise, representing a trajectory, starting from the robot's current position and ending at the navigation goal.
• solid line-a line in turquoise, representing a trajectory, connecting the robot's current position and the navigation goal.
In each of the visualization schemes, the names of objects in the scope of the visual layer were displayed, as well as the navigational goal and an old path (in gray), after the deviation happened.Humans were additionally marked with blinking exclamation marks: red if they were closer than 2 meters to the robot, otherwise yellow.Additionally, arrows were displayed around moving objects to highlight the direction in which they could be moved.
Q1 With the visualization scheme in the scenario I ranked the best, I'm able to understand the explanation.Q2 I'm satisfied with the explanation provided through the visualization scheme in the scenario I ranked the best.Q3 The visualization scheme in the scenario I ranked the best provides sufficient details of the explanation.Q4 The visualization scheme in the scenario I ranked the best accurately explains the movement and actions of the robot.Q5 The visualization scheme in the scenario I ranked the best provides reliable information about the robot's actions.Q6 I could see the reasons behind choosing the visualization scheme for the explanation in the scenario I ranked the best.Q7 The visualization scheme in the scenario I ranked the best explained the robot's actions/behaviors efficiently.Q8 With the visualization scheme in the scenario I ranked the best, I'm able to predict the behavior of the robot.Q9 The visualization scheme in the scenario I ranked the best explains the robot's actions and situation completely.Table 1: Satisfaction questionnaire for the pilot study Following this, participants were asked to rank the videos from best to worst in terms of their visualization scheme.The task was to order scenarios to a rank between 1 and 6, 1 denoting the best video, and 6 denoting the worst.Further, they were asked to fill out the satisfaction survey containing nine questions (see Table 1), keeping in mind the video they ranked the best.The questionnaire was designed based on recommendations for metrics on Explanation Satisfaction by Hoffman et al. [28].Explanation Satisfaction denotes the extent to which users perceive their understanding of the explained AI system or process, constituting a context-specific, retrospective evaluation of explanations.The users ranked the items in the questionnaire on a 5-point Likert scale, which we used in both studies.This scale measures a user attitude on a scale from 'strongly agree' to 'strongly disagree' corresponding linearly to values from 1.0 to 5.0.After filling out the questionnaire, feedback was collected about what could be improved in the visualization they ranked the best and what additional characteristics they would like to add.In total, the study took between 10 and 15 minutes.

Results
Participants watched six scenarios (Scenarios 1 to 6) and ranked them according to their preferences.Most participants, 50 out of 119, ranked Scenario 6 as the best.To better understand the preferences, we conducted a percentage analysis of these rankings: Scenario 1 (6%), Scenario 2 (4%), Scenario 3 (10%), Scenario 4 (20%), Scenario 5 (18%), Scenario 6 (42%).To get further insights into the participants who selected Scenario 6 as the best, we conducted descriptive statistics on this subgroup by analyzing the satisfaction questionnaire consisting of 9 items (see Table 1).The mean responses on all Likert Scale questions were between 4 and 5, corresponding to Agree and Strongly Agree answers.It indicates that users who preferred Scenario 6 showed high satisfaction with their choice.

MAIN STUDY
Following the pilot study, we conducted the main study to compare and assess which mode of explanation is most satisfying and understandable i.e., only visual, only textual, or both visual and textual (visual-textual).The visual representation used for visual and visual-textual modes was chosen from the result of the pilot study (red-yellow-green heatmap with different intensities and paths as full lines -Scenario 6).The same textual explanations were used in both textual and visual-textual representations (see Figure 2b).Participants were asked to watch a video corresponding to one of the three conditions (visual/textual/visual-textual) where explanations were provided by the autonomous robot about deviation, rerouting, failure, and requests to the user.The participants were then asked to answer the satisfaction questionnaire.Our study aimed to test whether there is a difference in user satisfaction with different explanation representations of the same navigation scenario.Following this notion, we formulated the following hypotheses: • H1.There is a significant difference in user satisfaction with explanations between at least two explanation representations.
We base this hypothesis on previous research [46], which has shown that different (types of) users prefer different explanation types (representations).• H2.Multimodal explanations increase user satisfaction compared to unimodal explanations.We base this hypothesis on previous research [38,46] which has shown that multimodal (visual-textual) explanations result in better explanations as visual and textual components complement each other.

Participants
To determine our sample size, we ran an a priori power analysis for the ANOVA (effect size f = 0.25, error probability  = 0.052, power (1 -error probability) = 0.948) using G*Power 3.1.9.6 [16].The power analysis determined that we needed 52 participants for each of the three study conditions, totaling 156 participants.The initial number of 219 participants was reduced by expelling participants who did not pass the attention check (2 control questions where participants had to choose the correct obstacle that causes robot deviation or failure in a given scenario, and we measured the time needed to complete the survey).The remaining participants were expelled randomly until their number was reduced to 156.Out of 156 participants, there were 93 males and 63 females with ages between 18 and 45 years (M = 24.88,SD = 6.28).The sample representation was from various parts of the world.All participants had a minimum education of high school degree and represented various fields of study.Around a quarter of participants had some experience with robotics/artificial intelligence and/or explainable artificial intelligence.The participants had normal or corrected-to-normal vision and no color blindness.Participants were provided with all the information about the study and informed consent was collected.They were recruited through mailing lists in two universities and the robotics and HRI communities.All participants participated as volunteers.Each participant was randomly assigned to one of the three condition groups after clicking on the user study link.3.000 3.000 3.000 2.000 2.000 3.000 2.000 2.000 3.000 Maximum 5.000 5.000 5.000 5.000 5.000 5.000 5.000 5.000 5.000 Table 2: Descriptive statistics of the pilot study results for Scenario 6 when it was chosen as a preferred option.

Procedure
The data was collected via an online survey form using LimeSurvey (same as in the Pilot Study, see Section 4).Three different surveys were created for the three explanation conditions.The three conditions were as follows: 1.only visual explanation; 2. only textual explanation; 3. both visual and textual explanations.The three condition distributions were randomized to remove any biases.The participants were provided with information and ethical considerations about the study and had to fill in the demographic details.Following this, they were provided with a bird's eye view image of the office setting in which the autonomous robot was working (see Figure 4).The robot was supposed to fetch coffee from a coffee machine and encounter a deviation and a stoppage, and a request to the user to move the obstacle, eventually succeeding in the task.The participants were instructed to carefully view the image and familiarize themselves with the objects in the room.Then, an explanation video corresponding to their assigned conditions was shown, i.e., visual, textual, and visual-textual explanations.The survey was also timed to inspect whether the participants had watched the full video before answering further questions.In the end, participants answered a satisfaction questionnaire consisting of 9 items (see Table 3).The participants were also asked for their feedback about potential improvements and changes in current explanations.

Results
We calculated composite scores for satisfaction by averaging across all 9 items for each participant and performed descriptive statistics with the following results: Visual (N = 52, M = 3.953, SD = 0.386), Textual (N = 52, M = 3.737, SD = 0.573), Visual-Textual (N = 52, M = 4.190, SD = 0.486).These results indicate the highest average satisfaction of participants with visual-textual explanations, while on average the participants were least satisfied with only textual explanations.To assess the significance of differences in participants' satisfaction among the three experimental conditions discovered in descriptive statistics, we conducted a one-way analysis of variance (ANOVA).The results of the one-way ANOVA for participants' satisfaction scores revealed a statistically significant difference among the three conditions (F = 11.231,p < 0.001) (see Table 4).To examine whether there are significant pairwise differences in satisfaction between conditions we further conducted a post-hoc test.The results of Tukey's Honestly Significant Difference (HSD) post hoc test (see Table 5) show a significant difference between conditions visual-textual and textual (p < 0.001) and visual-textual and visual (p = 0.038) whereas no significant difference was found between textual and visual conditions (p = 0.064).I'm able to understand the actions/behavior of the robot with the given explanation.Q2 I'm satisfied with the explanation provided.Q3 The explanation provides sufficient details of the robot's actions and behaviors.Q4 The explanation accurately describes the movement and actions of the robot.Q5 The explanation provides reliable information about the robot's actions.Q6 I could see the reasons behind choosing this method (visual or textual or visual-textual) for the explanation.Q7 The explanation describes the robot's actions/behaviors efficiently.Q8 With the explanation provided, I'm able to predict the behavior of the robot.Q9 The explanation describes the robot's actions and situation completely.

Discussion
We evaluated the user satisfaction with different explanation representations provided by an autonomous robot navigating indoors.First, we conducted a pilot study where six different visualization schemes of the same scenario were shown to the participants.They were asked to rank the videos from best to worst in terms of their visualization scheme, followed by answering the satisfaction questionnaire for the video that they ranked as the best one.The results showed that participants preferred the scenario where obstacle visualization was in red-yellow-green heatmap [36] with different intensities and path visualization was a solid line [12].The least preferred schemes by participants were ones where only an object that causes a significant change in navigational behavior (deviation, failure) is highlighted.This shows that participants prefer visualization schemes that are richer in detail and continuously present during navigation.Participants preferred solid line over solid arrow regarding path visualization in 2 out of 3 times, showing that they prefer connected and undirected path representation instead of an interrupted and directed one.The feedback provided by the participants showed that the visualization scheme was well understood without any need for detailed information about the meaning of the colors or the solid line chosen, i.e., explanations were highly intuitive.However, a few participants pointed out that there could be a broader range in the red-yellow-green continuum for a smoother transition between the colors.This would also correspond to using the whole social zone in the Hall's social space representation.For path visualization, some participants suggested that the explanation should show all the possible routes at the beginning.
Our main study assessed user satisfaction across three explanation conditions: visual, textual, and visual-textual.We ran two-way ANOVA between these three conditions, which showed there exists a statistically significant difference among the three conditions, which leads to the acceptance of the hypothesis H1.We further tested pairwise condition differences for significance, which showed statistically significant pairwise differences between visual-textual (multimodal) and visual/textual (unimodal) representations, which additionally confirms the acceptance of the hypothesis H1 and leads to the acceptance of the hypothesis H2.At the same time, there was no significant difference found in satisfaction between visual and textual explanations.Feedback provided for explanations in the main study was positive, except for some participants requiring additional information from the robot in terms of its decision-making and visualization of all the paths available.However, given the robot's speed and existing information on the screen, the additional information can lead to overload and distraction.For the textual explanation condition, recurring feedback was to include visual explanations and vice versa, thereby further supporting our hypothesis that visual-textual explanations are most satisfactory.
Technical details: We establish an explanation layer atop the navigation, maintaining agnosticism towards the specific navigation method employed.Implemented within the Robot Operating System (ROS), our explanation navigation layer, remains agnostic to the robot platform, provided it operates compatibly with the ROS navigation stack.Visual explanations can be visualized in RViz over the map, whereas textual explanations can be displayed on a designated interface or screen.In addition to providing real-time explanations, our approach can be adapted for post-hoc analysis or examining recorded navigation data.Semantic explanations necessitate the provision of a knowledge base for the environment.

CONCLUSIONS
We have presented an approach to visually and/or textually explaining a robot's navigational decisions.The explanations are continuous and in real-time, following the robot during its navigation.The results of the two user studies have shown that users are highly satisfied with their preferred visual representation scheme and that they prefer multimodal over unimodal representation representations.With the knowledge of user preferences regarding explanation design and form, we aim to build a holistic explainable motion planning framework, which will include the insights from this paper for designing explanation representation modalities as well as strategies for temporal features such as explanation duration and timing, as well as personalization of explanations towards targeted explanation recipients.While our questionnaire included a question related to understandability, it's important to note that this study did not address the concept of understandability.The analysis of understandability, including incorporating measures such as NASA TLX and objective metrics of understandability, remains a prospect for future work.We acknowledge the need for further exploration into inclusive explanations, specifically addressing issues related to color blindness and accommodating individuals with hearing impairments.We intend to extend our investigations in subsequent work to encompass a broader range of accessibility and cultural considerations, promoting a more comprehensive and inclusive understanding of explanation interfaces.
Figure1: The autonomous robot (TIAGo by PAL Robotics), indicated by the white-bordered circle, experiences a planning failure trying to follow its initial trajectory (depicted in light grey) when it encounters obstructions (a trash bin and a closed door).The robot cannot produce an alternative trajectory to navigate around these obstacles.The primary factors responsible for this failure are highlighted in red and referred to as ℎ_ and  in the visual representation.Other objects are color-coded with a heat map that indicates their proximity to the robot.The written explanation clarifies the main reasons for the robot's failure and suggests the actions a nearby human can take so the robot can continue towards its target.

Figure 2 :
Figure 2: a) Technique of giving spatial and social context to a robot: a robot's social space is divided into four zones: intimate, personal, social, and public, respectively increasing in the distance from the robot.Each zone consists of two subzones: near and far.b) Textual explanations are direct statements informing participants of the current robot action, nearby objects, and reasons for its unpredicted behavior.

Figure 4 :
Figure 4: The office setting used in the main study Q1I'm able to understand the actions/behavior of the robot with the given explanation.Q2 I'm satisfied with the explanation provided.Q3 The explanation provides sufficient details of the robot's actions and behaviors.Q4 The explanation accurately describes the movement and actions of the robot.Q5 The explanation provides reliable information about the robot's actions.Q6 I could see the reasons behind choosing this method (visual or textual or visual-textual) for the explanation.Q7 The explanation describes the robot's actions/behaviors efficiently.Q8 With the explanation provided, I'm able to predict the behavior of the robot.Q9 The explanation describes the robot's actions and situation completely.Table3: Satisfaction questionnaire for the main study

Table 3 :
Satisfaction questionnaire for the main study

Table 4 :
One-way ANOVA results for the main study

Table 5 :
Results of the post hoc test in the main study