Exploring Tangible Explainable AI (TangXAI): A User Study of Two XAI Approaches

Explainable AI (XAI) has garnered significant attention as a theoretical subject in the research community. However, the practical application of XAI, particularly in the realm of user interfaces, remains limited. Moreover, evaluations of these interfaces from the perspective of end-users are scarce. In this paper, we introduce and evaluate two innovative tangible XAI interface concepts. The tangible interfaces capitalize on the widely recognized advantages of data physicalization, offering users a more intuitive and hands-on experience. We implemented two distinct XAI approaches within this tangible framework: feature relevance and local explanations. These approaches were applied to real-world use cases: recommending recipes and selecting jogging routes, respectively. The findings of our Wizard of Oz study indicate that participants had some challenges in distinguishing between the primary objectives of the XAI interface and the typical interactions associated with an AI recommender system. However, tangibility seems to support users’ understanding of AI’s explanations and enables users to reflect on their trust in the AI model.


INTRODUCTION
As the penetration of AI into our daily lives grows, there is a clear need for AI to be human-centric, ensuring user trust and promoting transparency.Yang et al. highlighted the unique challenges of HCI design for AI systems [18] and concretely Amershi et al. introduced 18 guidelines for human-AI interaction.For instance, Guideline 10 suggests that when there is uncertainty, the AI system should offer three or four alternative suggestions.Additionally, Guideline 15 emphasizes the importance of allowing users to provide feedback during their interaction with the AI [2].Building on this, the field of Explainable AI (XAI) has emerged to specifically focus on explaining AI's decision-making processes to users [15,17].Within XAI, several approaches exist, e.g., highlighting to the user which of the input parameters have the most significant impact on the AI's decision (feature relevance) or highlighting how much a particular parameter would need to shift the AI's conclusion (local explanations) [4].Research on explainable AI (XAI) has primarily concentrated on graphical user interfaces, e.g., enabling users to explore "what-if" scenarios with AI models by using touchscreen sliders [1,5].
Conveying the explanations of XAI effectively to users may be challenging; one solution may be found through the use of data physicalization [3,13].Data physicalization translates digital data into tangible forms, e.g., a physical 3D printed model might represent climate change data, making abstract concepts more tangible and understandable.Tangible user interfaces (TUI) take this one step further, allowing users to interact with the data physically and leveraging the benefits of multiple human senses [12].The research topic of "Tangible XAI" has recently been opened [6][7][8][9] exploring the design space for users to physically interact with AI explanations, deepening understanding and possibilities for collaborative data exploration.
Colley et al. [6] introduced a tangible XAI framework that includes four general approaches: simplified rule extraction, feature relevance, local explanations, and visual explanations.Expanding on their work, we conducted the first user study on Tangible XAI interfaces.Our study aimed to understand user interactions and perceptions of these tangible XAI interfaces.Specifically, we designed tangible interfaces based on two of the primary XAI approaches: feature relevance and local explanations [4].These were then assessed through a Wizard of Oz user study [10].

STUDY DESIGN
In this section, we first describe our general approach to tangible XAI interface design.We then detail the design of prototype XAI interfaces for two use cases: recipe recommendation and jogging route selection.To gain as broad data as possible in this exploratory research phase, different XAI approaches are applied to each use case.

Tangible XAI Interface Design
We selected two of the main XAI approaches proposed by prior work, feature relevance and local explanations [4].In feature relevance, the importance of each input parameter on the AI's decision is scored.Hence, the user can understand which parameters matter and which are largely irrelevant in the AI's decision process.One weakness of feature relevance is that it does not consider possible interaction effects between the parameters.Local explanations are an alternative XAI approach that focuses on the AI model's behavior near a given decision, i.e., they do not provide a full explanation of the AI model over its entire range of inputs.Hence, a 'counterfactuals' Local explanations aims to demonstrate the minimum change in input parameters needed to change the AI's decision.
In an initial framework, Colley et al. speculated how the XAI approaches of feature relevance and Local explanations could be presented through tangible user interfaces [6].Following this direction, we created mock-up interfaces for two AI use cases, a cooking recipe recommendation and the selection of a jogging route.As the primary focus was on the user experience, a Wizard of Oz study approach was used [10], with the test moderator acting as the AI system, according to a set of predefined rules.

Recipe Recommendation -XAI approach:
Feature Relevance In this use case, a tangible bar chart was used as the interface to the XAI.Tangible bar charts have been a much-used TUI interface, e.g., [16].They function as data visualization and provide affordance as a tangible input mechanism, e.g., by pushing and pulling the bars to change their height.In our study implementation, the two-column bar chart was formed using Lego Duplo bricks where red bricks represented the cost of the recipe (5€ per block) and yellow bricks the recipe's preparation time (5 minutes per block).Prior work has noted the benefits of using Lego as a prototyping tool [14], e.g., compared to sketching and cardboard models.Following our Wizard of Oz protocol, the test moderator configured the tangible Lego Duplo bar chart to present the input parameters that had caused the AI system's recipe recommendation (Figure 1).The study participants were then invited to interact with the Lego Duplo bars by adding and removing bricks.Based on the selected parameters, the test moderator changed the recipe recommendation according to a predefined mapping table that was not visible to the test participants.

Jogging Route Selection -XAI approach:
Local Explanations In the jogging route selection case, participants were introduced to a scenario where an AI system, e.g., incorporated into their training app, had suggested a jogging route to them "Forest Path".A tangible XAI interface presented a local explanation around the decision point, highlighting two parameters: the time available to complete the jog and the preferred solitude of the route.All other parameters are fixed.In this condition, the AI system's decision boundary between the selected "Forest Path" and an alternative "City Loop" route was presented by a piece of string.Participants were invited to move a selection puck in the interface to understand how changes in the input parameters affected the AI's route recommendation -the puck crossing the decision boundary line equating to a change in the recommendation.The moderator then explained how changing another parameter, the weather conditions from sunshine to rain, would cause a change in the decision boundary.This was demonstrated by moving the decision boundary string on the tangible chart interface.

User Study Process
To evaluate the understanding and experience of using the tangible XAI interfaces, we arranged 5 user study sessions, each including 2 participants.Of the 10 participants, 6 identified themselves as women and 4 as men.To improve the validity of the test, large screens were used to visualize the background context of each use case.At the start of the session, the test moderator introduced the general concept of AI.After this, each of the cases was presented and explored in turn.The presentation order of the cases was counterbalanced, with 3 groups starting with the recipe scenario and 2 with the jogging route.The think-aloud process was used during the study, which was audio recorded for later analysis.
In each case, the test moderator first introduced the use case, explaining that the AI system had made a recommendation and initiating the participants to question why the recommendation had been made.After this, the prototype XAI interface for each case was introduced, and participants were invited to interact with it.The test utilized the Wizard of Oz protocol [10], with the moderator performing the system's actions depending on the participants' interactions.After each case, participants were questioned on their level of trust in the AI system.At the end of the session, a final interview was conducted, probing participants' general perceptions towards the tangible XAI interfaces.
For the recipe recommendation case, following the first evaluation session, it was noted that participants had difficulty separating the AI system's recommendation from the XAI interface.Hence, for subsequent tests, an additional visualization of an Amazon Alexa device was added as the output for the recommendation.

FINDINGS
Based on the participants' think-aloud feedback while exploring the interfaces and the end interviews, the themes of misunderstanding XAI, the benefits of tangibility, and trust in the AI model were identified.

Misunderstanding XAI
One fundamental observation was that some participants were initially confused about the general concept of XAI.In both use cases, participants felt they were primarily using an interface to select a recipe or jogging route rather than using the interface to understand and build trust in the AI model's operation.For example, one participant described using the XAI interface to select a recipe, "Because if you input just 'dinner', you might get 4000 recipes.After seeing that, you'd then refine your search, thinking 'I want to spend only 50 € and have just 45 minutes', and then you'd simply use the slider to adjust and find the most suitable results.But in this case, one really had to think a bit beforehand" (Participant 1).To address this, after the first test, an image of an Amazon Alexa voice assistant was added to the recipe recommendation case to highlight that the recommendation was not controlled by the XAI interface.

Benefits of Tangibility
The tangible recipe XAI interface received positive comments for its clear and concrete aspects.The Lego Duplo bricks received positive feedback for their usability, clarity, and aesthetic appeal.Participants commented that they understood that the height of the stack of bricks represented the input to the decision-making process, making the systems' output easier to interpret.Some participants noted the added value in the tangible interface, "With a physical user interface, there's a lot of, well, added value for me at least.It makes things much more concrete, and I believe it has a lot of subconscious impact on it" (P7).Participant 1 highlighted the difference in interaction speed, and hence experience, compared to a touchscreen slider, "When using this [Lego Duplos], one needs to think more deliberately about the choices compared to a slider" (P1).However, one participant was uncomfortable assigning physical properties to intangible entities such as time, "It feels more appropriate to have a slider for money and time.Because [...] the money and our time blocks seem similar in size and appearance.But, are time and money truly equivalent units?" (P6).
All participants understood the function of the jogging route interface.However, it failed to provide clear value for participants due to its abstract nature and lack of clear outcomes.One participant noted that the jogging interface provided a feeling of uncertainty in the AI system's decision, "Compared to the recipe, which was straightforward, the jogging outcome kind of felt like it was hovering.It gave off a vibe like not being entirely certain" (P4).Another participant commented on the poor experiental aspects of the jogging interface, "In the recipe suggestion tool, with these blocks, there's an element of gamification that makes the task feel more relaxed.It doesn't come off as too serious.In contrast, the jogging suggestion tool feels more scientific in its visualization" (P8).

Trust and Acceptance of the AI Model
Many participants wished to know all the criteria that resulted in the AI system's recommendation, i.e., that the system would provide a fully transparent model.Further, they expressed the desire to be able to tweak the models' parameters to align with their preferences.This questioning approach suggests that participants' starting point is not to trust black box AI models, even on fairly trivial topics such as the ones employed in the study.Rather than questioning the AI model itself, participants were generally disposed to question the accuracy of the data fed into the model, e.g., "Also, weather data, if exclusively sourced from stations, might lack localized information" (P9), and referring to individuals listing items for sale, "I remain skeptical of the accuracy with which individuals label their listings... " (P5).
In the recipe recommendation use case, the parameters presented in the XAI interface (preparation time and cost) were perceived as essential and matched the participants' default expectations.This match with expectations created trust in the AI recommendation.This situation was reversed in the case of the jogging use case, where participants felt that the parameters presented by the XAI interface (time, degree of solitude, and weather conditions) were irrelevant to them when deciding on a jogging route.Due to this, most participants expressed mistrust in the decisions of the simulated AI system.This finding highlights that interacting with an XAI interface can also result in reduced trust in the AI systemthis may in fact be the desired result, inspiring users to critically question the AI system's outputs and place the appropriate level of trust in the system.
In both use cases, participants expressed the desire for the visibility of more parameters in the XAI interface, e.g., to include ones that they considered to be relevant.Others stated that their trust in the AI's recommendations would be increased through having visibility of more parameters, an understanding of the content of the recipe/jogging route databases, and information on how the reliability of the recommendation was ensured.This again suggests that the participants did not fully understand the intended purpose of the XAI and viewed the XAI interface as part of the recommender AI rather than a way to gain an understanding of the decisions made by the AI system.

DISCUSSION AND CONCLUSIONS
Our study's findings highlight the need for fundamental research on the placement and role of an XAI interface in the context of AI system usage.Additionally, they offer valuable methodological insights for future user studies on (tangible) XAI.Given that our research is among the initial explorations of the topic, it provides foundational information for subsequent researchers.
Training Data and Trust.Interestingly, participants' discussion on trust focused on the accuracy of the AI model's input dataset (i.e., the training data) and the set of parameters exposed in the XAI interface.Taking an experimental approach to establishing trust in AI systems, some participants recounted comparing different route planning applications to build trust.However, the AI model's performance was not explicitly highlighted as a factor contributing to trust.
The primary goal of an XAI interface is to guide users to achieve the appropriate level of trust in an AI system.This means that after interacting with XAI, users might trust the AI system more or less than before.Noting also that the trust may be contextdependent, with positive trust for specific tasks and distrust for other tasks [11].If a generally low level of trust is appropriate, then the target of the XAI interface is to convey this to users.In their work defining performance metrics for XAI systems, [11] highlight the development of curiosity in the user as a critical factor in XAI performance.Hence, it is beneficial if the XAI interface encourages users to critically evaluate the AI system's outputs, even beyond the specific details shown in the XAI itself.
The Role of (Tangible) XAI.There was evident confusion among participants about the distinction between providing inputs to the AI model, i.e., essentially using the AI tool and using the XAI system to comprehend and foster trust in the AI model.Of the two XAI approaches tested, feature relevance was the best understood and most misunderstood by our test participants.Participants valued the simple data physicalization of time and cost as Lego blocks.However, the format of the interface led them to incorrectly perceive its role as being the user interface to a selection tool.The local explanations XAI interface was generally less well understood, and future works should consider alternative design approaches to this XAI method.We also note that the tangible nature of the presented XAI interfaces slowed the pace of interaction compared to, e.g., a touchscreen interface.As pointed out by study participants, this may result in users thinking more deeply about the interaction and forming a better understanding of the AI model.
While tangible interfaces to XAI can enhance the user experience in scenarios where hands-on interaction and deep understanding are beneficial, as with tangible interfaces in general, they may not be suitable for applications requiring high interaction speed, scalability to cover multiple parameters, or portability.
Methodological Findings.Employing use cases in XAI user evaluations deeply rooted in personal context, such as recipe suggestions or jogging route recommendations, poses challenges.Participants prioritize the AI's decision accuracy and often hold preconceived notions about the primary decision influencers.This focus detracts from the actual evaluation of the XAI interface.Therefore, we suggest that future XAI studies opt for more generic use cases or, for instance, employ personas to convey that the AI solution is targeted to a third party.
We also suggest that in future XAI studies, the AI system and the full range of possible outputs are first introduced to participants, enabling them to form a comprehensive understanding of its operation before introducing any XAI interface.We suggest that this should also include incorrect suggestions.In this way, the full range of performance of the XAI interface can be explored, e.g., enabling understanding of why the AI made incorrect decisions is a critical function of XAI.A vital role of XAI is to guide users to question the outputs of AI systems and only place an appropriate level of trust in the system.

Figure 2 :
Figure 2: Recipe Recommendation study.Left: The Wizard of Oz AI model used by the moderator that was hidden from the test participants.Right: An example state of the user exploring the model using Lego bricks.