Incremental XAI: Memorable Understanding of AI with Incremental Explanations

Many explainable AI (XAI) techniques strive for interpretability by providing concise salient information, such as sparse linear factors. However, users either only see inaccurate global explanations, or highly-varying local explanations. We propose to provide more detailed explanations by leveraging the human cognitive capacity to accumulate knowledge by incrementally receiving more details. Focusing on linear factor explanations (factors $\times$ values = outcome), we introduce Incremental XAI to automatically partition explanations for general and atypical instances by providing Base + Incremental factors to help users read and remember more faithful explanations. Memorability is improved by reusing base factors and reducing the number of factors shown in atypical cases. In modeling, formative, and summative user studies, we evaluated the faithfulness, memorability and understandability of Incremental XAI against baseline explanation methods. This work contributes towards more usable explanation that users can better ingrain to facilitate intuitive engagement with AI.


INTRODUCTION
As Artificial Intelligence (AI) systems become prevalent, it is paramount for explainable AI (XAI) to be developed to support their proper use and understanding [1,3,4,33,38,55].Although much work has shown that XAI can improve satisfaction and trust [26,33,41,45,60], many studies have failed to demonstrate measurably improved understanding [24,48].This requires users to ingrain AI explanations to quickly recall and apply the knowledge for decision making.Providing short explanations like sparse linear models could help, but these would be too simplified to be faithful to the complex underlying AI decision, and mislead users [45].In contrast, more expressive explanations may be more faithful, but can be challenging to read or recall [2], hindering their accessibility.This is especially important for users to generalize their understanding of AI behavior for future scenarios [44].Hence, XAI needs to be sufficiently detailed, yet memorable to support effective understanding.
To help users develop a richer understanding of AI models, instead of inundating users with complex explanations, we propose to explain incrementally.This is inspired from pedagogy, where students learn a concept gradually rather than all-at-once.For example, physics students learn about classical Newtonian mechanics for objects moving at common speeds, but later learn the theory of Special Relativity that describes objects at very high speeds with the Lorentz transformation.Understanding relativistic mechanics is very complicated and requires the foundational understanding of classical mechanics first.Thus, we argue that users can eventually understand complex explanations and models, but they should be grounded on simpler explanations, and incrementally informed.
We propose a step towards elevating user understanding towards complex AI explanations with Incremental XAI.This framework defines how to explain AI predictions from typical cases to outlier cases.We investigate this for simple surrogate explanation models, specifically, sparse linear explanations that describe linear factors that multiply against feature values.For example, a factor of  (Bathrooms) = $17k explains the predicted price of a house based on # Bathrooms by indicating that each bathroom adds $17k, and that two bathrooms would contribute $34k together.We begin with partitioning the dataset into subspaces, training a linear model in the majority (typical) subspace with Base factors, and training a linear model in the minority (outlier) subspace with (Base + Incremental) factors.To minimize new information to learn, we regularize the Incremental factors to be 0 when possible.In our bathrooms example, the majority smaller houses could have a rate of $17k/bathroom, while minority larger houses can have costlier bathrooms at $17k + $51k, perhaps due to luxury fittings.We contribute: • The Incremental XAI paradigm which enables gradual delivery of complex explanations, gaining the benefit of multiple lightweight explanations that achieves higher faithfulness.• A tree-based incremental explanation using linear model trees, additive factors, and factor sparsity regularization.We also developed a tabular user interface to convey explanations incrementally, and contrasted this with baseline variants.
• An evaluation of the faithfulness, usage, understanding, and memorability of Incremental explanations against Global, Subglobal, and Local baseline explanations in modeling, formative, and summative user studies.We compared Incremental explanations with Global explanations to evaluate if providing more detailed explanations based on category of cases (subspaces) helps understanding, with Subglobal explanations that are a baseline subspace model that explains each subspace independently, and with Local explanations since they are often singularly deployed primarily for instance-based explanations, but may be misused for general understanding.• A discussion of how to generalize the Incremental XAI paradigm to other applications and AI explanations.

BACKGROUND AND RELATED WORK
Explainable AI (XAI) remains problematic for human interpretation due to the inaccuracy of overly simplified methods.Here, we give a primer on XAI and and their cognitive demands, techniques to mitigate cognitive load, the need to provide multiple explanations, and XAI techniques partitioned into subspaces to improve accuracy.

Surrogate explanations of AI
Explanations of AI can improve user understanding by providing surrogate explanations of accurate AI models, or making "glassbox" models that are intrinsically interpretable.However, the latter approach may have limited accuracy since these models tend to be overly simple.Instead, we focus on providing surrogate explanation models that approximate complex AI models that retains the use of accurate AI models while explaining with some unfaithfulness.Miller [44] identified two goals for explanations in AI: i) to select a small set of causes for an observation [39], and ii) generalize observations into a conceptual model to predict and control future cases [22].Wang et al. identified other reasoning processes that XAI should support [63].Our research objective is to improve XAI techniques to better support the second goal of a generalized understanding.This requires explanations to be intuitive and memorable so that users can rapidly apply their knowledge to anticipate the AI model's behavior in future settings.Global explanations provide a suitable basis to support this goal.They answer the question "How does the AI model make predictions?"Techniques include explaining the AI decision in terms of linear factors [48], nonlinear partial dependence plots [27] and generalized additive models [2,12], and decision trees [49] and rules [29,32,52].
To support the former goal of explaining causes for an individual case, instance explanations are provided instead.These answer the question: "Why did the AI model make this prediction?"Techniques include feature attributions [5,15,42,59], and counterfactual explanations [11,62].An instance explanation only explains the decision for a target instance and may provide a different explanation for another instance; thus it may not generalize to multiple instances.To overcome this, Ribeiro et al. proposed local explanations to train explainer models on instances similar to the target instance of interest [51].Since these explanations focus on narrower sets of instances, they are more faithful to the underlying AI being explained, but require users to remember many models for dissimilar instances.
Given their ubiquity in XAI practice, we include them in our investigations.In this work, we aim to provide explanations that are memorable like Global explanations and faithful like Local explanations, investigated Subglobal explanations that balance between the two, and proposed Incremental explanations that improve the memorability of Subglobal explanations.

Cognitive demands of AI Explanations
Although XAI aims to improve user understanding, they are not necessarily easy for users to interpret [34].High cognitive load harms user experience and the effectiveness of AI explanations [28,48].This is often measured by the number of attributes used in explanations [16] or the nonlinearity of visualizations [2].Indeed, people consider simpler explanations as more probable than those with more clauses [40], but oversimplifying explanations will erode trust in XAI [47,67].Explanations need to be delivered at the right level of cognitive effort to be effective [20,25,31,58].
A simple method to get users to understand explanations is to prompt users to think when reading explanations [10], but this does not ensure deep learning and understanding or make explanations less cognitively demanding.Several techniques have been proposed to reduce cognitive load.The most common is to do feature reduction to limit the number of attributes shown to users.This can be accomplished with sparsity regularization [45] and constraining explanations to use integer coefficients instead of real numbers [61].However, this limits the expressiveness of explanations that users could consume.Another approach is to simplify more sophisticated visual explanations, such as nonlinear line graphs.Cognitive-optimized GAM (COGAM) balances cognitive load and accuracy by quantifying the visual cognitive chunks in line chart explanations, and providing a hybrid explanation with sparse linear factors and less curvy line charts [2].However, these approaches only optimize one explanation at a time, but neglects the human cognitive capacity to accumulate knowledge.

Providing multiple AI explanations
Accurate understanding of an AI system requires detailed knowledge of its parameters and non-linear decisions, yet explanations need to be simple for easy comprehended.To avoid information overload, detailed explanations can be provided on demand [36] or with progressive disclosure [58].Users have various demands for explanations [35], diverse usage strategies of explanations [36], and use multifaceted explanations to understand AI decisions [37,66].
Hence, instead of considering XAI interpretation as independent interactions, it should be considered as sequentially dependent accumulation of knowledge (e.g., dialogic [44]).People use explanations to build a mental model of the AI, so successful explanations can be measured through the goodness of the learned mental model [33,65].Mental models play a key role in human-AI interactions [6], but can be formed poorly without intervention [18].In this work, we propose a new paradigm of providing explanations incrementally by ensuring that the shallower, simpler and explanations can smoothly transition into deeper, more detailed ones.This leverages the human ability for cumulative learning [57], and allows users to understand how the explanations relate [68] at different levels.Table 1: Comparison of linear explanation models with varying faithfulness and memorability due to the # factors used, which affects their expressiveness and the # terms for the user to remember.The AI System typically has too many factors to be interpretable, while sparse linear explanations consider a sparse set of  factors.A Global explanation and Local explanation have 1 set of  factors, but the latter requires many  explanations to understand all use cases cumulatively.Subglobal explanations split the instances into  subspaces, needing  factors to explain fully.Incremental explanations similarly can explain the same  subspaces, but reuses some factors, and can omit negligible changes, thus it has ≤  factors.

Subspace-based XAI techniques
Several XAI techniques have been developed to address the shortcomings of global explanations beingf too coarse and local explanations being too narrow.We discuss methods that divide instances into subspaces and explain each subspace separately.Methods are based on trees, rules, or aggregation.Model agnostic multilevel explanations (MAME) [46] provides an explanation tree with weights at each node, representing a progression from a global explanation at the root to local explanations at the leaves.However, their method does not enforce stability between each linear model, so it would be difficult for users to learn each sub-explanation incrementally.Model Understanding through Subspace Explanations (MUSE) [30] provide decision sets for different subspaces by simultaneously optimizing for faithfulness and rule compactness, but the explanations are in terms of rules unlike our factors-based format, and the attributes are not necessarily the same for each subspace, thus not consistent.Equi-explanation Maps [14] divide a feature space into hyper-cuboid subspaces (i.e., defined within min/max ranges for specific attributes) that are consistent, and explain each subspace with linear classifiers, but its boundary definitions are much more complex.Submodular Pick LIME [51] leverages instance-based local LIME explanations to provide a global explanation by picking diverse LIME explanations that have high non-redundant coverage.This aims to limit the total number of Local explanations needed to achieve global understanding.GLObal to loCAL eXplainer (GlocalX) [56] iteratively merges local decision rules into global explanations to provide a smooth pathway from detailed local explanations to more general global explanations and vice versa.This was demonstrated for rule-based explanations of classification, unlike our prediction regression task with linear factors.Sparse LInear Subset Explanations (SLISE) [7] is a robust regression method that finds the largest subset in the data and trains a sparse linear model.Our method could use this to learn the base model of the Incremental explanation.SLISEMAP [8] extends SLISE to group instances into clusters based on the similarity of their local explanations.This involves dimensionality reduction, so the resulting dimensions are not explicitly interpretable.
All these methods aim to explain each subspace faithfully, but neglect to account for users having to remember or relate across subspaces.Thus the inter-subspace consistency is low.In our work, we focus on first providing a base explanation for a majority subspace, and explain remaining smaller subspaces as incrementally different from the base.This new requirement stems from usability needs of XAI that the prior works neglect.

TECHNICAL APPROACH
We first describe baseline explanation approaches using sparse linear factors, then articulate our Incremental explanation approach.An AI System's prediction ŷ is typically generated from many input attributes (features) , and the AI's decision may change nonlinearly with each attribute (e.g., price can increase exponentially with living area of a house).Sparse linear models provide simple Figure 2: User interface (UI) of AI System with Global explanation showing: 1) attributes used for prediction, 2) their values  ( ) for the given instance, 3) factors   that the explainer multiplies with values, 4) partial contributions ỹ =  ( )  ( ) of each attribute, 5) output estimation ỹ =  ỹ ( ) from the AI Explainer, 6) prediction ŷ from the AI System, with inequality indication (< in this case), and 7) indicator of how different the AI Explainer estimation is from the AI System prediction.Factors are the same for all instances and do no change.Different information may be hidden under various test conditions.
explanations by articulating only a few important attributes (hence sparse), and indicating how each attribute influences the prediction.These are typically presented as feature attributions, i.e., positive or negative numbers indicating the direction and magnitude of the influence.However, attributions are not particularly easy to track or interpret, since they vary inconsistently for different instances.
Instead, like Poursabzi-Sangdeh et al. [48], we focus on sparse linear factor explanations that compute the feature attribution of the  th attribute  ( ) as a multiplication of a factor weight  ( )  and the attribute value  ( ) , i.e., ỹ ( ) =  ( )  ( ) .For example, consider a house with 1 3 ⁄ 4 bathrooms, i.e.,  (1) = 1.75. (1) = 17k means that each increase in one bathroom costs $17k more, so the # Bathroom contributes $17k × 1.75 = $30k to the total house price.Users can apply these factors to another instances to calculate how the AI Explainer would estimate the AI prediction for that instance.For example, a house with 3 bathrooms would have its # Bathrooms contribute $17k × 3 = $51k to its price.Sparse linear factors can be applied broadly to all instances (Global explanation), semi-broadly to groups of instances (Subglobal explanation), or to individual instances (Local explanation).We introduce each of these explanation methods and then describe our approach for the Incremental explanation that extends Subglobal.Each type has varying faithfulness to the AI prediction and memorability for users to recall the factors, which we summarize in Table 1 and illustrate conceptually in a 1-dimensional example in Fig. 1.We refer to these explanation variants as XAI types.

Global explanation
The simplest explainer uses a single linear factor model with one set of factors to explain for all instances.
where  ( ) is the  th feature value of the instance with is the explanation factor for that feature with  (0) as the bias term, and ỹ is the estimated AI prediction.The Global explanation is trained by fitting a linear regression model on the whole training dataset with mean squared error (MSE) as the training loss against the AI model's prediction not the ground truth.Fig. 2 shows our user interface (UI) implementation of a Global explanation of how an AI System predicts the price of a house based on 4 attributes, and the bias term which we name as "adjustment".

Subglobal explanation
While the Global explanation is simple for users to understand, its small number of factors limits its expressiveness, so it may not be very faithful to the AI System predictions, i.e., ỹ is not close to ŷ.Instead of adding more complexities for users to interpret, the explanation faithfulness can be increased by partitioning instances into multiple subspaces.Each subspace is then modeled with a separate sparse linear factor explanation.We constrain explanations of each subspace to have the same attributes, and enforce the partition based on binary univariate rules, i.e., inequality on one attribute (e.g.,  2 ≥ 2.5).Thus, a Subglobal explanation has the form where   is the weights of the th subspace explanation model, [•] is the Iverson bracket that is 1 if its expression is true or 0 otherwise, and   is the set of instances in subspace .Eq. 2 shows that each subspace has different weights (factors) which it applies to instances within its boundaries.
Training the Subglobal explanation model requires learning the partition boundaries of the subspaces, and the weights of each subspace model.We achieve this by training a linear model tree [13] on the whole training dataset with MSE for the training loss.Such trees are different from common classification decision trees that predict a probability distribution of categorical labels  ( ŷ) at leaves, or regression decision trees that predict a scalar number ŷ at the leaves.Instead, linear model trees predict a linear regression model   at each leaf, where each leaf represents a subspace   .During training, for each branch in the decision tree, the training algorithm iterates through all features and possible splits, training a linear model for each subspace ( ≤ and  ≥ ), measuring the combined loss for both models, and choosing the split with the lowest combined Figure 3: User interface (UI) of Subglobal explanations for a typical instance (top), and an outlier instance (bottom).Factors are different for each subspace but apply in a fixed way to any instance in each subspace.For example, while small houses with Living Area < 2.5 ksqft have each bathroom being worth $16k, larger houses have much costlier bathrooms at $57k.
Figure 4: User interface (UI) of Incremental explanation for an instance in the typical subspace with Living Area < 2.5 (top), and an outlier instance in the minority subspace with Living Area ≥ 2.5 ksqft (bottom).Factors are different for each subspace to fit them accurately.Unlike Subglobal explanations, an additional column (3b) is used to show how factors are incrementally different for the outlier cases.The main factors (3) are the same for both subspaces.For example, while smaller houses have a modest rate of price increase per living area ($95k/ksqft), larger houses have a rate that is $120k/ksqft higher ($215k/ksqft).loss.We then assign the majority subspace with the larger dataset as "typical" and minority one as "outliers" (although this can be flexibly adapted to fix user preferences or standard conventions).Though linear model trees are not a novel technique, they are seldom used in explainable AI, and we extend it for Incremental explanations for our technical contribution, described in the next subsection.
Fig. 3 shows our UI of Subglobal explanations with two subspaces: typical (Living Area < 2.5 ksqft) and minority outlier (otherwise).For simplicity, we specifically train a decision stump (one branch).Each subspace is defined with simple univariate decision boundaries that are easy to interpret.Note that training a logistic regression or linear support vector machines (SVM) would lead to less interpretable decision boundaries, e.g., "16(# Bathrooms) + 120(Grade) < 697", while training on a decision tree would have a more interpretable rule, e.g., "# Bathrooms ≥ 5 and Grade < 5".
Figure 5: User interface (UI) of the Local explanation of an instance.Factors are specific to this and similar instances, and will be different for other instances.For example, for houses similar to the one shown, each increase in Grade decreases the house price by $7.1k, but this may not be the case for other houses that have very different attributes.

Incremental explanation
While Subglobal is more faithful to the AI System than Global, this comes at a cost of the user having to read and remember more factors.The factors are not necessarily consistent between subspaces too, so users would have to interpret them independently.To improve memorability, we propose Incremental explanations that provide general factors for the majority, typical subspace and an incremented factors for special, outlier subspaces.We describe our approach for two subspaces, but it can be extended for multiple subspaces.We define an Incremental explanation as: where  0 is the base factors of the general explanation model, Δ Δ is the incremental factors of the th special subspace explanation model, [•] is the Iverson bracket that is 1 if its expression is true or 0 otherwise, and   is the set of instances in subspace .Eq. 3 shows that while the typical subspace has Base factors, all other subspaces have factors defined as an additive Incremental adjustment on the Base factors.
We train the Incremental explanation model in a similar manner as for Subglobal explanations but constrain a dependency in factors across subspaces, i.e., Δ .Furthermore, to reduce the number of terms to remember, we aim to keep most incremental weights Δ Δ to be zero.This is achieved by adding a sparsity L1 regularization to the original MSE training loss, i.e., where  indexes the training instances, and  is the regularization hyperparameter.This sparsity regularization makes incremental factors easier to remember, but this trades-off accuracy, so we hypothesize that the Incremental explanation ỹ is less faithful than the Subglobal explanation ỹ .The training algorithm is similar as in Subglobal explanations, but with non-independent parameters for the linear models, extended loss function, and unified optimization of both set of factors.For each candidate split, we set the majority subspace as "typical" assigning the base factors1  0 to it, and specifying Incremental factors Δ  in other minority "outlier" subspace.
Training is performed using gradient descent.Since the partitioning of subspaces is similar in both Incremental and Subglobal explanations, we keep them the same in the modeling and user studies to avoid partitioning being a confounder.Fig. 4 shows our UI implementation of an Incremental explanation for cases in the typical and outlier subspaces.

Local explanation
We have described sparse linear factors to explain multiple instances, but they can also be used to explain individual instances.
Local explanations, such as LIME [51] and SHAP [42], are popular XAI techniques to explain AI decisions on a target instance by training a linear regression model from a dataset of instances that are local (similar) to the target instance.Explaining other instances require retraining other explanation models locally around those instances, and the explanations are not necessarily similar to one another.Hence, Local explanations are faithful to the AI prediction only to instances that are similar, and not globally or subglobally.Consequently, users would need to view many Local explanations to have an overview of the AI behavior across all instances.Although local explanations are not designed for general understanding, their ubiquity encourages their misuse for this objective.We hypothesize that this makes it very difficult for users to estimate how the AI System would predict for new instances or estimate general factors from the inconsistent factors of each instance.We define local linear factor explanations around a target instance   as: where  is the instance being explained that is similar to   ,   is the weights of the model (factors) with  (0)  as the bias term, and ỹ is the estimation of the local model.We implemented the Local explanation with LIME [51].Fig. 5 shows our UI implementation of a Local explanation around an instance.

EVALUATION
We evaluated Incremental explanations against baseline explanations (Global, Subglobal, Local) across multiple studies to investigate: i) faithfulness to estimate the AI prediction in a modeling study, ii) usage strategies and outcomes to interpret AI decisions in a qualitative formative user study, and iii) impact on decision duration, explanation recall, and AI decision understanding in a quantitative summative user study.

Modeling Study
We conducted a modeling study to evaluate how faithfully each explanation model estimates the AI.We evaluate on three datasets, and our approach can further generalize since we are using standard machine learning processes.We describe the dataset preparation, methods to train and test the models, and evaluation results.
4.1.1Applications and datasets.We evaluated the sparse linear factor explanations on a regression prediction task, since the predictions remain linear, unlike classification that would have tapered effects at high or low probabilities (e.g., logistic regression).Like Poursabzi-Sangdeh et al. [48], we evaluated on a housing price dataset due to the simplicity of the application scenario that most users can readily understand and appreciate.However, we chose not to reuse their NYC dataset since, surprisingly, a linear global model is sufficient to predict prices highly accurately.However, real-world datasets tend to be more complex, and require nonlinear models.Hence, we used the "House Sales in King County, USA" dataset [21] with 21,613 instances to predict the price of houses with 22 features.Prices ranged from $72k to $7.7M (Median = $452k).We performed feature selection to obtain four features (# Bathrooms, Living area, Grade, Age) to limit the cognitive load for users.
For generality, we further evaluate on two additional datasets: Heart Disease [23] with 1025 instances to predict heart disease using 14 common features, and Auto MPG [50] with 398 instances to predict the miles per gallon fuel efficiency using 7 features.Although the prediction task for heart disease is to classify whether a patient has heart disease, the predictor model produces a numeric confidence that can be interpreted as a continuous risk score.We train subsequent explainer models as a regression task to predict the risk score of the predictor model.For the heart disease dataset, to support human interpretability of the explainer models, we performed feature selection to obtain four features (Age, Resting blood pressure, Cholesterol, Max heart rate).Similarly, for the Auto MPG dataset, we performed feature selection to obtain four features (Cylinders, Displacement, Horsepower, Weight).
For simplicity, we partitioned each dataset into two subspaces and set the same rule boundary for both Subglobal and Incremental explanations.The optimal partitions were at Living Area ≥ 2.5 ksqft for House sales, Age ≥ 58 years for Heart disease; and Horsepower ≥ 92W for Auto MPG.

Results on performance of AI prediction models.
For each dataset, we trained a random forest regressor (House Sales, Auto MPG) or classifier (Heart Disease) [9] as an AI prediction model on a training set of 80% instances, and evaluated on a heldout test set of 20% instances.We then trained the four explanation (XAI) types to explain all instances in the test set.For Local explanation, we averaged the performance across all instances.Using the training set, we performed 5-fold cross validation in all our analyses and report the mean and standard deviation of the validation performance averaged across folds; see Table 2.We report the performance on the heldout test dataset -House Sales: mean absolute error (MAE) = $139k and  2 = 0.67; Heart Disease: accuracy = 86% and test AUC = 0.86; Auto MPG: MAE = 3.12 mpg and  2 = 0.71.

4.1.3
Results on faithfulness of XAI types.Fig. 6 shows the unfaithfulness of the XAI types calculated by their absolute error (AE) between the explainer ỹ and predictor predictions ŷ.Note that the explanation faithfulness ( ỹ vs. ŷ) does not measure the same thing as predictor performance ( ŷ vs. ).As expected, Global explanations had the worst faithfulness due to its low expressiveness (fewest factors), while Local explanations were the best because they were trained to be accurate to small local neighborhoods.However, they are not robust or memorable, so we expect users to not gain as much understanding from them compared to the other XAI types.Subglobal explanations had slightly better faithfulness than Incremental explanations since the latter had another objective of simplicity (fewer incremental factors).However, we expect this difference to be negligible in practical use by people, and the problem of memorability or cognitive load would override the small benefit of faithfulness, and we hypothesize that Subglobal explanations are less memorable and interpretable than Incremental explanations.We evaluate these hypotheses later in the summative user study.

4.1.4
Results on performance of XAI as glassbox explainers.We further evaluate whether Subglobal and Incremental explanations can serve as accurate interpretable "glassbox" models.In such cases, these models would be used for the AI prediction task, and be intrinsically interpretable, thus avoiding any unfaithfulness of surrogate explanation models.We trained the models on the training dataset and report their performance.Since Heart Disease is a classification task, we accordingly apply logistic activation to the linear regression outputs and change the training objective to the binary cross-entropy loss.The interpretability of the factor coefficients is affected, since the sigmoid transform in logistic regression applies a nonlinear distortion on all weight.However, the directionality and the magnitude of the factors still provide more interpretability than blackbox models.We report the glassbox explainer performance as "AI Performance" in Table 2.In summary, Subglobal and Incremental models performed better than Global models, but this is still a worse than the nonlinear AI model (random forest, in this case).

Formative User Study
To investigate how people use the XAI types (Global, Subglobal, Incremental, Local) and identify usability issues, we conducted a formative study with 14 participants recruited from a local university.They were 23 years old on average (20 to 30), and 6 were female.All were undergraduate and graduate students from various disciplines (5 Sciences, 4 Business, 2 Engineering and Technology, 2 Arts, and 1 Healthcare).The study was conducted virtually over a Zoom video call with screen recording, and lasted 60 minute.
Participants were compensated with a digital payment of $15 USD.

Method and
Procedure.We conducted the study with a withinsubjects experiment design, where each participant views multiple XAI types, so that they may directly compare among them.The experiment apparatus and procedure are similar to the subsequent summative study, which we describe later.To ensure that participants did not confuse between attribute names, values, factors, and partial contributions, we trained them to distinguish the columns in the tabular UI, understand how each partial contribution is calculated as a multiplication of factor (weight) and value, and verified their understanding with screening questions.Each participant first used the Global explanation as a baseline, then 1-3 randomly selected XAI types as time permitted.For each explanation, the participant performed 3 trials of viewing an AI explanation to predict the price of a house instance.They were asked to estimate what they thought the AI System would predict based on the information provided by the explanation.The instances were chosen from the same dataset as the Modeling study, and used the same apparatus as the summative study (conducted later), with user interfaces shown in the Technical Approach section.We used the think aloud protocol to elicit the participant's thought processes as they read and applied the explanations.The participant could ask clarification questions any time too.Since the participants were guided and supported by the experimenters, we do not report the performance of their estimations on the AI system.With participant consent, we audio and screen recorded participant vocalizations and interactions with the UI.

Findings.
We conducted a thematic analysis on participant behaviors and report key findings.
We note that some participants may have conflated the AI and XAI behavior, but we do not require our users to treat them as separate, since both are meant to be presented as a unified agent in the AI's user interface.In this study, we focus on how participants interpreted each XAI type, rather than their trust or decision with respect to the AI prediction model.a) Dynamic explanations perceived as more realistic than static, global explanations.Most participants preferred explanations to be dynamic rather than static like in Global explanations.Only P13 felt that "the fixed factors of [Global] are more intuitive and similar to how many people think." Perhaps, she preferred rules of behaviors to be consistent and unchanging.On the other hand, many participants appreciated the complexity of house price estimations and AI systems.P1 believed that the AI system "is a bit more dynamic in nature, or the equations will adjust accordingly to how much data is set into the thing." He felt that Global explanations did not reflect the AI system well since "it would just come up with one static figure because the factors itself is consistent and doesn't change across the house type." P7 expected to see different factors for instances of different categories, remarking that "it's not very realistic for the factors to be the same for different house types,like factor for bathrooms is always [the same]".This suggests she categorizes instances into types and expect rules to apply different for each category.In contrast, P8 appreciated the adaptiveness of non-Global explanations and felt that "it's logical that the factors would change for different type of houses, ... since there might be other factors that influence the factor values for each attribute."Similarly, on seeing that "all the factors were the same" for the Global explanation, P6 remarked "that might not be good."He explained that for larger houses, the factor for Living Area "should be on a diminishing graph", i.e., smaller factor than for smaller houses due to smaller marginal utility of living area in an already large house.
Though less dynamic than Local explanations, participants found the subspace partitioning of Subglobal and Incremental explanations intuitive.P1 explains, "[Subglobal] is a lot more accurate [than Global] because it considers more things", referring to the two sets of factors given.P14 affirmed that "the additional factors [in Incremental] are helpful for the predictions in terms of accuracy".P3 remarked that "[the Incremental factors] makes sense, because for bigger houses the land would cost more." This also shows the relative understanding that P3 had to compare between subspaces, thus demonstrating the usefulness of explaining incrementally.
b) Incremental explanations perceived as more memorable and accurate than Subglobal explanations.Both Incremental and Subglobal explanations partition the subspaces similarly, but Incremental articulates the relationship between the two subspaces, and Subglobal treats them independently.Participants could appreciat the benefit of providing this context in Incremental explanations.P6 liked that "[Incremental] would be more informed since you are telling the user how they're changing the factors, that there is an addition".P11 even stated that the consistency of Incremental made him feel assured because "not all the factors are changing, like there were more considerations being made by the explanation." P4 mentioned that "[Incremental] would be easier for me to remember because there are fewer numbers" and P9 agreed that this is due to "rather than remembering two separate sets of factors".Furthermore, P12 believed "[Incremental] will give you more accurate values, which helps you make decisions quicker."Though, this is not necessarily true, and suggests a positive halo effect of better usability leading to perceived correctness.Nevertheless, it suggests that this can help boost user confidence, trust and usage of Incremental explanations.Despite these benefits, some participants faced some usability issues.P7 felt that "[Incremental] feels logical... but more time-consuming since it's slightly complex due to the additional factors you have to add for the calculation."

Summative User Study
We conducted a summative user study to evaluate the interpretability and memorability of each XAI type.We investigate how well participants understand, remember, and apply explanations to anticipate behavior for future instances.While testing the impact on a downstream decision making task would be meaningful, it would impose experiment confounders, such as the participant's prior knowledge of the task [38], their varying underlying utility objectives (e.g., how much they care about cheap housing) [43], increased mental fatigue which limits the number of trials [2], and conflation between AI and XAI estimations.Thus, we leave that for future work.
Next, we describe our experiment design and hypotheses, experiment apparatus, procedure, analysis and results.

Experiment
Design.We designed our experiment as a 4×2 factorial mixed-design experiment with primary independent variable (IV) as XAI type (four levels: Global, Subglobal, Incremental, Local) and secondary IV as Subspace segment (two levels: typical, special) to investigate if effects differ by instance type.XAI type was manipulated between-subjects due to the learning effect of participants sticking to one mental model of the AI Explainer after being trained on the first XAI type.Subspace was manipulated Perceived helpfulness Perceived ease-of-task 7-pt Likert scale L < G < S < I L ≈ G ≈ S ≈ I Fig. 11 within-subjects by selecting 100 instances from the full datasets, where we balanced 50 typical and 50 special.Each participant is tested on 30 randomly selected instances.
We measured several objective dependent variables to evaluate explanation recall, application, and understanding: • Explanation recall measures how accurately the participant can infer or remember each factor  ( ) of the XAI type, by typing them out.This explicitly measures memorability.For the  th attribute, given the participant's estimate  ( ℎ ), we calculate the lack of recall by the MAE.We asked about the factors for all instances (global), typical or special instances (subglobal).Although participants with Local explanations never see general explanations, we ask them to infer broadly.
• Sustained understanding (without XAI) measures how well the participant can estimate the AI System's prediction.This is forward simulatability [16], a popular metric in XAI research and evaluations.Since we are modeling a regression problem (rather than the typical classification), we calculate this with a proxy metric for unfaithfulness with the absolute error (AE) of regression predictions, i.e., | ŷℎ − ŷ|.This measures how well the participant can apply knowledge gained from studying explanations for other instances without having seen their explanations.
It measures deeper understanding than Supported understanding, which we also measure, described next.• Supported understanding (with XAI) measures how well the participant can estimate the AI System's prediction, given that he/she can view an approximation from the AI Explainer ỹ, i.e., ŷℎ | ỹ.This is similar to Sustained understanding, but easier, since the participant can leverage ỹ to estimate his answer.• Explanation evocation measures participant correctness to estimate the AI Explainer's estimation ỹ.This is the forward simulatability of the AI Explainer, which we compute its reverse as ỹℎ − ỹ.Unlike Explanation recall, which directly elicits explanatory factors, this queries the participant about the explanation outcome, which implicitly evokes the explanation.• Decision duration measures how long participants spent to perform the forward simulatability task without XAI.Since duration follows a long-tail distribution, we analyzed its logarithm.
For participant convenience, we measured numeric factors and prediction estimates with sliders to give users bounds when answering, but use the same wide range to avoid priming.Furthermore, we measured subjective opinions on Perceived helpfulness and Perceived ease-of-task to investigate how helpful the different XAI types were.Specifically, we asked whether the participant agreed or disagreed that: the AI was accurate (1), the explanation was helpful to estimate factors globally (2) and subglobally (3), the forward simulatability tasks were easy with (4) or without (5) the explanation.These were measured on a 7-point Likert scale (-3 to +3).Table 3 summarize our hypotheses and the subsequent findings from our results and analysis, described later.

Experiment Apparatus.
Our user interface was inspired by the linear factors explanation interface of Poursabzi-Sangdeh et al. [48], but we adapted it to distinguish the linear model explainer from the nonlinear model predictor, and extended it to support various XAI types: Global, Subglobal, Incremental, Local.Participants saw the exact interface as shown earlier in Figs.2-5.See the Appendix for the full survey that participants saw.We also made the UI interactive (see Fig. 7) to facilitate participant learning and engagement by examining how explanations and predictions depend on instance values and factors.To improve interpretability and usability, we rounded most numbers to two significant figures, though the calculations are still done in full precision, and participants can see the precise numbers by hovering their mouse cursor.The intercept term is rounded to three significant figures, since it is a direct partial contribution component, unlike the other terms as factors.We included green meter bars to show the relative levels of attribute values to allow participants to interpret the sense of each number.During training, we show the % error between the AI Explainer and AI System to make this salient and accelerate learning about explanation faithfulness.We implemented our survey in Qualtrics and embedded the user interface.We provided an incentive bonus of £0.03 for each Sustained understanding task if the participant could estimate the AI System prediction correctly to within 10% relative error (max £2.70), and max of £0.15 for each Explanation recall task (3 tasks) based on the mean relative error   on a test set of 100 instances, calculated as £0.15 × (1 −   ).

4.3.4
Participants.We recruited workers from Prolific.co,where 160 passed screening and 336 failed.Participants who completed the study had an median age 37 years old (26 to 81), and were 35% female.Participants completed the survey in a median time of 96 min, and were compensated with a base of £9.00 and Median bonus of £1.20 (£0 to £2.88).

Statistical
Analysis.We performed a linear mixed effects model fit on each dependent variable as the response, XAI type and Subspace, along with other confounding variables as fixed effects, some interaction effects among the factors, and Participant as random effect.See Table 4 for details.Note that Supported understanding and Sustained understanding are calculated from the same measure, forward simulatability, but differ only by when the task was posed, before and after showing XAI, respectively.Since they share the dependent variable, we analyze these responses with a single linear mixed effects model with Test with XAI to distinguish between the two types of understanding.
The model fit was good for Log(Task time) ( 2 = .628)indicating that task time depended much on fixed effects XAI type, subspace, trial sequence specific test instance, and the participant random effect.The model fit for Explanation recall was good ( 2 = .763),indicating that it was influenced much by two factors (XAI type and Subspace).The model fits for Perceived helpfulness and ease-oftask were also very good (2 = .977and .863,respectively), though there were no significant effect due to XAI type, suggesting high variance based on participant individual effect.The model fit was slightly poorer for Explanation evocation ( 2 = .483),due to the difficulty to recall the explanation factors and apply weight sum arithmetic to estimate the AI Explainer's prediction ỹℎ , leading to increased variance in participant performance.The model fit for Understanding was somewhat low ( 2 = .386),because estimating the AI System's prediction regardless of explanation ( ŷℎ and ŷℎ | ỹ) are even more difficult and uncertain than estimating ỹℎ .When analyzing Supported and Sustained understanding in separate models instead of a larger model with the "Test with XAI" factor, we obtained better model fits ( 2 = .415and .479,respectively) which is similar as for Explanation evocation, but this does not properly account for viewing XAI as a causal factor.

Quantitative Results
. Table 4 summarizes the model fits in terms of our hypotheses.We describe our results in terms of each dependent variable and summarize the findings with respect to each XAI type.All fixed effects reported are very statistically significant (p<.0001), and describe specific comparisons based on contrast tests.We discuss i) how well participants could estimate the AI Explainer and AI System predictions given different XAI types based on the Forward simulatability trials, ii) their ability to recall the factors of each XAI type in the Factors recall session, and iii) their perceptions on XAI helpfulness and ease-of-task.Forward simulatability tasks.Participant performance varied across XAI types, but were generally poorer for special than typical cases (p<.0001).See Fig. 9. Next, we discuss specific effects and interpret their effect sizes AI explanations.Participants who were trained on Global explanations were worst (highest AE) by $97.3k ± $13.0k (95% CI), with 16.5% lower error, compared to those trained on other explanation types (M G = $270.1kvs. M S,I"L = $172.6k,contrast test p<.0001).Furthermore, for Special cases, participants trained on Incremental and Subglobal explanations were better by $104.6k ± $13.0k, with 17.7% less error, than those trained on Local explanations (M S,I = $104.6kvs. M G,L = $292.8k,contrast test p<.0001); this suggests that subspace explanations help users to better understand special cases.
Explanation recall task.We analyzed how well participants could recall or infer each factor for any instance in general (globally) or for typical or special cases (subglobally).While recall for most factors across global/subglobal were not significantly different, the recall for the explanation intercept term  (0) was notable.Fig. 10 shows the results of recalling  (0) for factor recall sessions of any, typical and outlier cases.Participants recalled factors from Incremental explanations significantly better (lower AE) by $456k ± $185k (95% CI) than from Global and Local explanations (contrast test p<.0001), which is practically significant compared to the intercept terms −$1,040k (Combined), −$697k (Typical), −$1,660k (Outliers).Furthermore, though recalling Incremental factors was slightly better than of Subglobal factors, this was not significant (p = n.s.).
Perception ratings.We had posed multiple questions on Perceived helpfulness and Perceived ease-of-task, but found that all perception questions except ease-of-task without explanation were correlated.Thus, we averaged them into a Perceived helpfulness metric (Cronbach's  = .805).Fig. 11b summarizes the results of the perception measures.Participants perceived all XAI types as somewhat helpful (M=0.84 on a -3 to +3 Likert scale), but found the forward simulatability task without XAI somewhat difficult (M=-0.75).There were no significant differences across XAI types.4.3.7 Summary of results.We now summarize our results of how each XAI type compares to others.• Global explanation was among the fastest type due to its simplicity, but it was not the most objectively helpful for understanding due to its low faithfulness.• Local explanation supports better understanding when provided, but this understanding was not sustained for instances without explanations, since participants were unable to learn to infer factors for new cases.Figure 10: Results of explanation recall of the intercept term  (0) for the global and subglobal test sessions.
• Subglobal explanation supported better recall and understanding than Global or Local explanations, but participants were slow when using them to estimate what the AI System will predict.• Incremental explanation was the fastest to use (as fast as Global explanation), and best for Supported and Sustained understanding (equally good as Subglobal).

DISCUSSION
We have introduced the paradigm of Incremental XAI, implemented its capabilities and validated its usefulness to help user understanding and recall.Here, we discuss its generalization and limitations.

Generalizing incremental linear factors explanations
Our implementation of Incremental explanations only had a partition along one attribute to divide instances into two subspaces.
Nevertheless, since we used a tree-based partitioning method, our approach can apply to more splits and splits in multiple attributes.
The splits can also be done on categorical attributes where each subspace can be defined by individual or a set of labels.However, adding more splits will add complexity to the user interface and more information for users to learn and understand, especially in a short online study.Future work is needed to investigate this.Furthermore, we had partitioned our subspaces with trees, but rules may be used to allow fewer terms instead.While tree-based subspaces cannot overlap due to the hierarchical execution of rules, rule sets for different subspaces may overlap, i.e., multiple rules with different features may be true [29].This can be mitigated with a tie-breaker [29] or by using prioritized rule lists [32].We had only investigated instances with numeric features, since we focused on training linear factors for explanation models.To accommodate non-numeric features, such as categorical features, standard approaches to convert them, such as one-hot encoding, could be applied.Each categorical level would be interpreted as a feature that is either present (1) or absent (0) with linear factor that is only applied when the level is present.
While our approach reduced the number of factors of the incremented weights, the base weights can also be simplified.This way, users can view an initial explanation with very few attributes, then incrementally learn new attributes for special cases or further details.This can be accomplished by applying sparsity regularization to the base factors, and a loss penalty to adding new incremental factors (which facilitates new factors if beneficial to the loss).We had strongly limited the number of features to four shown to participants to manage their cognitive load.However, applications in machine learning could to involve about 100 features.Incremental XAI that gradually shows more features can help users to eventually learn many features, and gain an understanding that will be highly faithful to the AI model.This could be implemented by applying modeling on one subspace, iterating >2 steps, and regularizing against reusing features across steps.Further work is needed to model and evaluate on datasets with many features.Though, user testing would be challenging in lab or online studies, since learning new features is harder than adapting prior knowledge about existing features, and learning many features may not be feasible in short durations.The partitioning of subspaces was determined in a data-driven manner with the tree model, but the factors can still be unwieldy.We had rounded the numeric factors for simplicity, but they could also be constrained as integers or multiples of integers [61].The split levels and factors could also be relatable [68] and presented verbally or narratively [53], so that users can make sense and better remember them.

Generalizing incremental explanations
We have investigated incremental explanations for linear factor explanations, but argue that this can be generalized to other explanation techniques, such as generalized additive models (GAM) and rules or decision trees.These models can be used for Global explanations by training on the full dataset, or for Subglobal or Incremental explanations by training on subspaces, or Local explanations by training on local neighboring instances.For example, a base function could describe a quadratic trend in a feature, while an incremental function could describe a suppressing cubic effect for special cases; or typical cases could be described with a rule of two features, while special cases could be described by substituting the second feature with a third one for a new rule.
First, we generalize to nonlinear models with independent features, such that each feature  ( ) has a partial contribution  ( ) to the prediction, i.e., ỹ =   ( ) .We had modeled the contribution of each feature by a linear factor, i.e.,  ( ) =  ( )  ( ) However, but this contribution could be nonlinear, i.e,  ( ) =  ( ) ( ( ) ).Indeed, this matches the form of GAMs that combine nonlinear effects of features additively, i.e., ỹ =   ( ) ( ( )  [ ∈   ]Δ ( ) Δ (7) where  0 is the base contributions of the typical explanation model, and Δ Δ is the incremental contributions of the th special subspace explanation model.Here we consider that the incremental difference is additive, i.e., linear.Nonlinear effects could be investigated with multiplicative interactions ( 2 →  3 ) or a kernel transformation.To keep Incremental explanations simple, we can constrain the incremental contributions Δ Δ with a sparsity regularization to reduce the number of terms, and with a smoothness regularization [2] to penalize overly curvy lines.
Next, we discuss generalizing Incremental explanations to models with interaction effects, i.e., multivariate functions that involve multiple features, e.g.,  ( (1) ,  (2) ).Common models are rules and decision trees.While we have discussed several works to model subspaces with rules and trees, they do not support an incremental approach [30,46].To do so, future work could first convert any rule representation into a decision tree, compute the similarity between trees in each subspace (e.g., by calculating a graph edit distance [19]), and minimizing the difference.However, note that rules may overlap and lead to overlapping subspaces [29].

Scope of incremental explanations
We evaluated Incremental explanations for the understanding and memorability of explanatory factors in AI, specifically, for users to estimate the predictions that an AI would make (forward simulatability).This is meant to help human cognition toward decision making.Further work is needed to investigate whether these lead to improvements in downstream decision making, e.g., to decide whether to accept or reject a case based on quality estimations [64]; such a study would require careful framing and incentivization to ensure that users are correctly aligned and properly motivated to the task, and avoiding the confounder of prior knowledge which can diminish the benefits of XAI [38].We do not propose it for perception tasks (e.g., vision and audio) or language reasoning (NLP), since they involve innate mental processes due to stimuli or low-level skills rather than deliberate reasoning.
Our paradigm of explanation incrementation assumes that users are novices who start with limited knowledge of the domain or AI application, thus they need to be taught gently.We do not expect Incremental explanations to be strongly beneficial for domain experts who can handle complex data and have established conventions [34].
Similar to Poursabzi-Sangdeh et al. [48], we had evaluated Incremental explanations only for one application task of predicting housing prices.Perhaps, for applications that are less common (e.g., health diagnosis), or with critical but complicated numbers (e.g., decimals or fractions), Incremental and Subglobal explanations may still be overwhelming.Future work should validate our results across other application tasks.

Implications of incremental explanations
Our approach for Incremental explanations enables better learning of sparse linear factor models.This adds to the body of work of subspace-based explanations that take a divide-and-conquer approach to partitioning instances, and explaining each subspace as similarly as possible.COGAM moderated the number of visual chunks in line graphs [2], and it can be made to incrementally allow more curviness to allow users to learn more details.GlocalX [56] provides rule explanations in detail and in aggregate by merging them.Future work could investigate which explanation format (factors, line segments, or rules) are easier and more beneficial to learn incrementally.
Although our participants could well learn and recall Incremental explanations in our study, we acknowledge that the learning time was brief.Most learning that people do occurs over longer time periods with more repetitions.Hence, future work could deploy Incremental explanations to investigate its longitudinal benefits.We note that under longer durations, the learning of Subglobal explanations may also be improved, but perhaps less so than Incremental explanations, due to the slightly higher cognitive load.
Our approach of Incremental explanations limited the explanations to the same type (sparse linear models).However, users have diverse preferences for and usage strategies of explanations [36,37], so incremented explanations should also be diverse.For example, first provide factors, then rules.This provides users with diverse retrieval cues, which can reinforce their memory of the explanations.Future work can explore how to increment across explanation structures.

CONCLUSION
We have introduced Incremental XAI to help users better recall and apply explanations of AI.This provides a set of base general factors for typical instances and sparse incremented factors for special cases.In modeling and user studies, we found that Incremental explanations help facilitate fast understanding like Global explanations that explain generally, and are easy to recall and understand like Subglobal explanations that explain subspaces more faithfully than Global explanations.Incremental explanations are also more memorable than Local explanations, facilitating better recall and understanding performance.This work demonstrates the importance of supporting more memorable explanations to deepen user understanding of AI for more productive interactions.

Figure 1 :
Figure 1: Conceptual examples of XAI types with univariate (1D) data shown for simplicity; see Fig. A.1 for 2D multivariate examples with real data.a) Original AI System predicts output ŷ non-linearly with respect to attribute   .b) Global explainer that approximates ŷ with a linear equation ỹ ∝   .c) Subglobal explainer increases faithfulness by segmenting along   to provide multiple linear explanations ỹ 1 ∝   ,   <   and ỹ 2 ∝   ,   ≥   .d) Incremental explainer that is similar to Subglobal, but first explains with a linear model ỹ 0 ∝   the contiguous majority of instances (in this case,   <   ), then explains outlier instances (  ≥   ) with an additive linear model ỹ 0 + Δ ỹΔ .e) Local explanation explains each instance with a linear equation ỹ ∝   based on neighboring instances.Multiple local explanations are needed to represent the full input space.

4. 1 . 5
Investigating explanations for multivariate attributes across subspaces.In Fig. A.1 in Appendix A.1, we show the 2D decision surfaces of the AI and explanation models for the House Sales dataset to demonstrate how the four different XAI types model the relationships between the AI prediction and multivariate attributes.

4. 1 . 6
Investigating varying subspace thresholds.While Subglobal and Incremental explanations learn the feature space partitioning threshold automatically with the linear model tree, they can also be set with custom values to fit the explanation needs.We thus examine, in Appendix A.2, how selecting different partition thresholds affect the faithfulness of each subspace explainer model, how the factors change, and whether incremental factors are kept small.

Figure 6 :
Figure 6: Results of modeling study showing the unfaithfulness of each explanation type calculated as absolute error (AE) between the AI Explainer estimation and AI System prediction AE( ỹ, ŷ) across three prediction tasks with different datasets: a) House Sales in King County [21], b) Heart Disease [23], c) Auto MPG [50].Global explanations are least faithful, Local explanations most faithful, and Subglobal and Incremental explanations have similar faithfulness.The faithfulness of typical or outlier cases depends on the explainer models trained for each dataset.

Figure 7 : 2 )
Figure 7: User interface (UI) during testing with factors hidden, but editable.Participants can type their own numbers to explore how the AI Explainer would compute based on various factors.This helps users to learn how factors work.Here, participants are asked to forward simulate both the AI Explainer and AI System without seeing any factor explanations.

Figure 8 : 9 )
Figure 8: Procedure of a trial in the summative user study to evaluate explanation understanding and memorability.

2 .a)Figure 9 :
Figure 9: Results from forward simulatability trials to estimate the AI Explainer and AI System outputs without viewing explanations (b, d), and estimate the AI System output with explanation with timing (a, c).Error bars indicate 90% confidence interval.

Figure 11 :
Figure 11: Results of Perceived helpfulness of AI and XAI, and Perceived ease-of-task without XAI on a 7-point Likert scale from Strongly Disagree (-3) to Strongly Agree (+3).Error bars indicate 90% confidence interval.

In A. 1 ,
we examine the decision surfaces of the predictor and explainer models to study how the different XAI types support linear or piecewise-linear relationships between multiple attributes and the prediction value.Based on the House Sales dataset, this provides a conceptual interpretation to the reader of i) how nonlinear the decision surface of the AI Predictor model is (a), and how the Local explanations cumulatively capture the nonlinearity (e); ii) how Subspace explanations (c) better fit the nonlinear decision surface of the AI predictor model (e) compared to the Global explanation (b); and iii) how Incremental explanations simplify memorability by first conveying a linear relationship (d, left), then showing a partial linear segment (d, right)

Figure A. 1 :
Figure A.1: Decision surface of AI prediction model and various explanation models with two attributes (x-y axes), showing prediction output (color: darker is higher value) for the House Sales in King County, USA [21] dataset.Models are based on the dataset used in the modeling and user studies; showing only values for attributes 2 and 4. a) Scatter plot of AI System predictions for instances with various ( 2 ,  4 ) values.Dashed line indicates threshold to split for Subglobal and Incremental explanation models.b) Contour plot of Global explanation showing linear slope mostly increasing in the direction of  2 i.e.,  2 >  4 (see factors in Fig. 2).c) Contour plot of Subglobal explanation model showing two linear models -gentler-sloped for typical instances ( 2 < 2.5) and steeper-sloped for outlier instances ( 2 ≥ 2.5).d) Contour plot of Incremental explanation model showing general model for all instances (Left), and incrementally-sloped model for outlier instances ( 2 ≥ 2.5, Right).e) Contour plot of accumulation of multiple Local explanation models showing non-linear surface that manifests when learning from heterogeneous local explanations.

Figure A. 5 :
Figure A.5: Questions on users' prior experience with housing price estimation and AI background.

Figure A. 7 :
Figure A.7: Tutorial on the attributes, values, factors, and partial prices of the explanation interface.All participants are first trained on the Global explanation since it has the simplest format.

Figure A. 8 :
Figure A.8: Tutorial on estimated explanation price, correct AI System price, and the percent difference.

Figure A. 9 :
Figure A.9: Screening questions for the Global explanation to check users' comprehension.

Figure A. 10 :
Figure A.10: Tutorial on the Subglobal explanation.

Figure A. 11 :
Figure A.11: Screening questions for the Subglobal explanation to check users' comprehension.

Figure A. 12 :
Figure A.12: Tutorial on the Incremental explanation.

Figure A. 13 :
Figure A.13: Screening questions for the Incremental explanation to check users' comprehension.

Figure A. 14 :
Figure A.14: Tutorial on the Local explanation.

Figure A. 15 :
Figure A.15: Screening questions for the Local explanation to check users' comprehension.

Figure A. 16 :
Figure A.16: Sample of the unassisted forward simulation trial, where participants are asked to estimate the explanation and AI system outputs.The displayed UI is the Incremental condition.

Figure A. 17 :
Figure A.17: Sample of the assisted forward simulation trial, where participants are asked to estimate the AI system output based on the given explanation.The displayed UI is the Incremental condition.

Figure A. 18 :
Figure A.18: Review of performance on forward simulation trials, for participants to strengthen their understanding.The displayed UI is the Incremental condition.

Figure A. 19 :
Figure A.19: Explanation recall task for all instances overall (Global).

Table 2 :
Modeling results from 5-fold cross-validation of AI performance and XAI faithfulness across three datasets showing mean ± standard deviation.AI performance indicates when an explainer is trained on the ground truth dataset as a glassbox interpretable model.XAI unfaithfulness evaluates each explainer as a surrogate explanation with respect to the AI Model.Except for AI performance for Heart Disease that is measured as % Accuracy, all other metrics are MAE, where smaller is better.

Table 4 :
Statistical analysis of responses due to effects (one per row), as linear mixed effects models with random effects, fixed effects, and their interaction effect. and  values indicate ANOVA tests and  2 indicate model goodness-of-fit.
is the nonlinear partial contribution of the  th feature in the th subspace.Similarly, extending Eq. 3, the generalized, nonlinear Incremental explanation is