An Information Bottleneck Characterization of the Understanding-Workload Tradeoff in Human-Centered Explainable AI

Recent advances in artificial intelligence (AI) have underscored the need for explainable AI (XAI) to support human understanding of AI systems. Consideration of human factors that impact explanation efficacy, such as mental workload and human understanding, is central to effective XAI design. Existing work in XAI has demonstrated a tradeoff between understanding and workload induced by different types of explanations. Explaining complex concepts through abstractions (hand-crafted groupings of related problem features) has been shown to effectively address and balance this workload-understanding tradeoff. In this work, we characterize the workload-understanding balance via the Information Bottleneck method: an information-theoretic approach which automatically generates abstractions that maximize informativeness and minimize complexity. In particular, we establish empirical connections between workload and complexity and between understanding and informativeness through human-subject experiments. This empirical link between human factors and information-theoretic concepts provides an important mathematical characterization of the workload-understanding tradeoff which enables user-tailored XAI design.


INTRODUCTION
With the rapid development of powerful yet opaque artificial intelligence systems, enabling AI transparency through approaches that effectively explain the outputs of these systems to humans has become increasingly important.Recent advances in explainable AI (XAI), defined as "AI systems that can explain their rationale to a human user, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future" [27], have aimed to address the problem of AI transparency.Accounting for human factors related to information processing is at the core of producing effective explanations that support human understanding of these AI systems.Such explanations require not only computer science expertise, but also cross-disciplinary efforts with the fields of cognitive science, human factors, and the social sciences, more generally [50,51,60].that the complexity-distortion balance in IB can be effectively applied to model the workload-understanding tradeoff in human-centered XAI design and to generate user-tailored explanations, but that care must be taken to ensure such abstract explanations are visualized effectively.

RELATED WORK
Here we provide an overview of the literature related to the human factors constructs of workload and situation awareness as they apply to explainable AI as well as existing XAI approaches to explaining functions (including reward functions) to humans.We also discuss the Information Bottleneck method, which we leverage to automatically generate abstract explanations of reward functions.In our experiments, we hypothesize that the information-theoretic concepts used to generate these abstract explanations correlate with the human factors constructs of workload and situation awareness.

Human Factors and Explainable AI
2.1.1Human Mental Workload.Mental workload is a construct that has been widely studied in the human factors literature, and can be defined as the relationship between the mental resources demanded by a task and the resources available to be supplied by the human performing the task [54].It has been researched in domains such as aviation [11,37], healthcare [58,69], usability in human-computer interaction [45], and workplace tasks [49], among others [46], and has been shown to correlate with task performance across a variety of settings [14][15][16]74].Various models of mental workload have been proposed, such as the widely cited and applied multiple resource model (MRM) [72], which categorizes human cognitive resources into different "pools" that are available for information processing and can be filled independently.These pools are defined according to four dimensions: the modality of information representation presented to the human, the form of information encoding in the brain, stages of information processing, and response modality.
In the context of XAI, addressing workload considerations through application of models such as the MRM can help to ensure that explanations are effective and that the information they communicate is comprehensible to the recipient of the explanation.If XAI systems communicate too much information, a person may process only partial information (or none at all) from the explanation.At the same time, explanations must provide adequate information in order to be useful for the task at hand.A person's available mental capacity, therefore, informs both the choice of how much information to include in a given explanation and the frequency with which to provide explanations in effective XAI design [60].A person's context, including all other tasks they must perform when they receive an explanation, must also be considered in explanation design.Beyond this, accounting for individual differences in baseline cognitive capacity between people is critical to supporting human task performance [75,76].
The impact of XAI on workload has been widely researched, with some studies finding that the addition of AI transparency reduces workload due to increased access to critical information [17,62,78], some finding that additional information provided by XAI systems increases workload [26,41,53,59], and others finding little impact of XAI on workload [14][15][16]74].In these cases, workload was measured according to a variety of objective and subjective assessments.Objective measurements quantify workload in terms of a person's performance of primary and secondary tasks with varying cognitive demands [71]; subjective assessments often draw upon participant responses to workload scales, such as the widely applied and validated NASA TLX scale, which asks respondents to answer a set of Likert scale-based questions about their workload after completing a task [28].In addition to these assessments, physiological measures of heart rate, skin-based properties, and ocular movements have also been applied to measure workload [13], Manuscript submitted to ACM although these assessments tend to be more invasive and difficult to implement outside of experimental settings.In this paper, since workload has been shown to be a critical factor in the design of effective XAI systems, we study how workload relates to an information-theoretic concept of complexity, and how this can inform explanation design.
2.1.2Situation Awareness and Human Understanding.Situation awareness (SA) is a human factors construct commonly defined according to the three-level framework proposed by Endsley [21]: the perception of elements in the environment within a volume of time and space (level 1), the comprehension of their meaning (level 2), and the projection of their status in the near future (level 3).It is, essentially, a structured way of defining a person's understanding of a given scenario and their ability to use this information to perform required tasks within the given context.In other words, the quality of a person's context-related understanding can be assessed according to how well that person perceives, comprehends, and projects the status of relevant elements of their environment.
Since SA defines a person's contextual informational needs, it has been also applied to the development of transparency frameworks for defining the informational requirements for XAI users that XAI systems must meet through the information they explain [17,60].Within the field of human factors, validated measures such as the Situation Awareness Global Assessment Technique (SAGAT), which probes a user's understanding of critical information at various points throughout that user's performance of a task [21,22,24], have been proposed and leveraged for SA evaluation.Through application of these measures, SA has been shown to correlate with task-related measures such as performance and error frequency [23].
At the same time, XAI researchers have sought ways of effectively assessing user understanding of AI decision making processes as a measure of explanation efficacy [32,40,50].To date, understanding has been assessed through a variety of means (most of which have not been validated through human-subject experiments), such as scales for explanation goodness [32,40] and a user's ability to simulate an agent's optimal behavior or next decision [33,40].Since the SA construct and SAGAT have been validated, a SAGAT-like approach can readily be applied to measure understanding in the context of XAI; however, the question of how to effectively probe a person's understanding of abstract concepts such as reward functions within the context of SAGAT remains.One recent study employed human-subject evaluations to validate reward alignment metrics that could be applied to measure a person's understanding of a reward function in the context of XAI [61].We leverage a subset of these metrics (discussed further in Section 3.1.2) in our analysis to study how a person's understanding relates to an information-theoretic concept of informativeness (which we call distortion), and how this can be leveraged to design explanations that effectively trade off workload and understanding through the proxy measures of complexity and informativeness.

XAI Approaches to Explaining Functions
Many existing approaches to XAI strive to explain the often-complex functions that characterize models used in AI.For example, feature importance techniques explain the most important predictive features in regression [47,56], saliency maps unveil information about the gradients of the functions that characterize neural networks and related models [1,34,48], and rationalizations summarize agent policies that can be computed based on agent reward functions [18,20], among others.While some approaches have accounted for human workload by enabling explanations to provide variable amounts of information to users depending on their cognitive capacity [12,56,64] to our knowledge, none has formally accounted for the tradeoff between human workload and understanding by leveraging mathematical characterizations of these human factors constructs in XAI design.In this paper, we empirically demonstrate the links between these constructs and mathematically-grounded information-theoretic concepts, which enables the automatic generation of Manuscript submitted to ACM explanations that trade off workload and understanding differently depending on user needs and capacities.We focus specifically on explanations of reward functions, which characterize desired autonomous agent behaviors in sequential decision-making problems such as reinforcement learning.
2.2.1 XAI and Reward Functions.Reward functions are one of the primary components of Markov decision processes (MDPs), which are often used to model autonomous agent planning problems [66].Within an MDP, the reward function characterizes the reward an agent receives for taking different actions from different states; in other words, reward functions dictate what optimal agent behavior (often referred to as the agent's policy) will look like within a given domain.
Reward functions are often defined as follows: Here, Φ() is a set of features whose values can be calculated based on the agent's state in the world (), and  is a set of weights indicating the trade-offs between these features.
Within existing XAI literature, reward functions have been explained through means including policy summaries which demonstrate roll-outs of optimal agent behavior originating from a variety of world states [4][5][6]29], languagebased rationalizations of agent policies [18,20], techniques that reconcile a human's reward function with that of an agent [67], counterfactual demonstration-based explanations of key reward features [42], and decompositions of interpretable reward components provided to human users [7,36], among others.One recent study found that explaining reward functions through abstractions of reward features effectively balanced a workload-understanding tradeoff among different reward explanations [59].Since the abstraction-based approach proved effective in that study, we also evaluate abstract explanations of reward features in this work.

Information Bottleneck
We leverage methods from Information Bottleneck (IB) literature to formalize a tradeoff between complexity and reward distortion, which we then connect to human factors.In canonical IB settings, one seeks to generate (lossy) representations,  , of inputs,  , which are used to predict a downstream quantity,  [2,68].In this work, we only consider predicting a reward,  , from features,  , but the IB framework is more widely applicable.The IB maximization problem is formulated as a tradeoff between two information-theoretic terms: maximize  ( ;  ) −  ( ;  ) where  is a scalar parameter,  ( ;  ) is the informativeness (measured as the number of bits about the reward  retained in  ), and  ( ;  ) is the complexity (measured as the number of bits about the features  in  ).In other words, the IB formulation seeks to maximize informativeness while minimizing complexity.Notably, there is a theoretical limit for the maximum informativeness for a given complexity, but this limit shifts as a function of .In our work, since we aim to explain reward functions,  is the features of the reward function,  (),  is the reward value, and  are the abstract representations grouping  to predict  .Therefore, as  increases and complexity decreases, the above optimization will group features with similar rewards in the same abstraction.Lastly, we note that IB work is closely related to rate distortion theory where, rather than computing informativeness ( ( ;  )), one measures the distortion, or error, in predicting  from  [79].In our work, we measure the distortion in predicting a reward value, which we dub reward distortion.
Beyond a purely mathematical formulation, several works in the fields of cognitive science, psychology, and behavioral economics have investigated aspects of IB tradeoffs in human cognition.Across domains and languages, naming systems (e.g., words for colors) are nearly perfectly efficient in the IB sense: maximizing the ability of listeners to reconstruct a speaker's meaning at a given complexity level [52,[79][80][81].In vision-based domains, people similarly create compressed representations of images via sketches that capture functionally useful details at the expense of visual fidelity [35]; this type of behavior is consistent with an IB system under complexity constraints.In economics, recent research points to the importance of information constraints in human behavior [8].Even within XAI, Bang et al. [10] briefly explored the role of penalizing complexity to create more "interpretable" AI models for humans to understand, but they only used a fixed complexity in experiments.This evidence, collected across multiple fields, suggests that IB tradeoffs play an important role in human cognition; in our work, we connect notions from IB to human factors measures of explanation understanding.

RESEARCH AIMS
In this paper, we aim to establish an empirical link between human factors concepts that are relevant to the design of effective explainable AI systems and information-theoretic concepts.This link will enable a mathematical characterization of tradeoffs occurring between relevant human factors in XAI design, as well as provide tools to automatically generate abstraction-based explanations which trade off these factors, which is useful for meeting the varying informational and workload needs of individual users of AI systems.Specifically, we perform human-subject experiments in order to validate an information-theoretic measure of explanation complexity as an indicator of human workload and an information-theoretic measure of reward distortion as an indicator of human understanding.We then detail the human factors concepts and information-theoretic metrics we studied in greater detail, along with the hypotheses assessed in our experiments.have been applied to the study of workload within the field of human factors.One of the measures most commonly applied in the literature is the NASA Task Load Index (TLX) scale [28], which has also been applied to the study of techniques for explainable AI [26,41,53,59,77].We therefore used the NASA TLX scale to assess workload in our own set of experiments.

Human
Understanding.While a number of approaches have been proposed for assessing human understanding in the context of XAI [32,33,40], as discussed in Section 2.1.2,few have been validated through human-subject experiments.
One recent set of experiments validated reward alignment metrics which capture the similarity between a human's reward function and that of an autonomous agent [61]; these metrics can be applied to study either how aligned an agent's reward function is with a human's after a reward-learning process on the one hand, or how aligned a human's understanding of an agent's reward function is with the agent's true reward after an explanation of the reward is provided on the other.As our aim is to study human understanding of reward functions resulting from the provision of reward explanations at different levels of abstraction, we consider the latter metric in this work.
Sanneman and Shah [61] identified two categories of alignment in their experiments: feature alignment, which captures how aligned human and agent reward features and weights are; and policy alignment, which captures how Manuscript submitted to ACM aligned human and agent policies corresponding to these reward functions are.We leverage one of the validated metrics from each of these categories in our assessments of human understanding.
The feature understanding metric we apply in this work is a similarity metric called feature ranking, which can be defined as follows: Here,   is the set of pairwise comparisons of the magnitudes of the weights  of a set of reward features,  (), as in Equation 1(e.g., one of these comparisons could be   >   , where   is the weight of feature , which is higher than   , the weight of feature ).  specifically captures the pairwise rankings of the feature weights in the ground truth reward function, and   is the set of pairwise rankings from the human's reward function.In our experiments, we assessed the human's reward function by asking the human participants to rank a set of predetermined features in order of importance according to their understanding of the reward function upon receiving an abstract explanation of this reward.(We include examples of this assessment in Appendix D.) The feature ranking metric is the intersection over union of the ground truth rankings and the human's rankings, and thus captures the similarity between how important the human believes a set of reward features are relative to each other versus the ground truth relative importance of each feature.
The policy understanding metric we apply is a regret-based metric called best demonstration, defined as follows: Here,  * is the optimal demonstration (i.e., a set of state-actions pairs) of a given task according to a ground truth reward function -which, in our case, is the reward function being explained.  is the human's best demonstration according to the reward function they understood from the explanation.Finally,  − is the worst possible demonstration in terms of the ground truth reward, which we assume to be calculable for a finite-horizon task.(•) evaluates each of these trajectories according to the ground truth reward function.
In order to evaluate this metric in our experiments, we ask human participants to provide their optimal demonstration of a task given their understanding of the associated reward function after having received an explanation of it.(Examples of this assessment are also included in Appendix D.) The best demonstration metric essentially captures how close the human's reward function is to the ground truth reward in terms of the policies that result from these rewards for a given task.

Information-Theoretic Measures of Explanation Complexity and Reward Distortion
3.2.1 Complexity.Drawing upon prior literature, we define complexity as the mutual information between an input,  , and an abstraction,  [70,79].This measure is defined via the Kullback-Leibler divergence of the conditional distribution of  given  from the prior over  : Complexity is minimized at 0 if all  are represented via the same  ; beyond such uninformative representations, more complex representations include additional information about  in  .For example, more complex color naming systems use more distinct words: naming systems using the word "crimson" convey more information about a precise color than naming systems that only use less specific words like "red" [79].More generally, in IB literature, by penalizing the complexity of representations, one imposes a bottleneck on how much information is stored in  , which in turn Manuscript submitted to ACM induces lossy representations that do not enable perfect reconstructions of  from  [2,68].In our work, we both use existing IB methods to generate representations across a spectrum of complexity, as well as calculate complexity as a metric which we then relate to human workload.
3.2.2Reward Distortion.We define reward distortion as a measure for how well one can predict a reward value,  , from an abstract representation,  .Formally, we measure reward distortion as the minimum mean squared error (MSE) for predictions of  from  : where, assuming access to a dataset of reward features ( ) and rewards ( ),  ( ) represents the abstraction generated from  , and Ŷ ( ( )) represents the optimal prediction of  given  ( ).With a small set of discrete representations,  , computing an optimal predictor is equivalent to traditional methods for MSE regression.We note that reward distortion is similar to notions of informativeness ( ( ;  )) from traditional IB literature (see Zaslavsky et al. [79] for discussion of connections to rate distortion theory).Informativeness measures, however, do not explicitly represent how some prediction errors are better or worse than others.Given the continuous nature of reward values, we therefore use reward distortion as our preferred metric.In our paper, we seek to connect reward distortion to feature rank (FR) and best demonstration (BD) metrics of human understanding.Given that we always measure the distortion in predicting reward, we at times refer to reward distortion simply as distortion.

Hypotheses
We investigated the impact of varying complexity and distortion on human workload and human reward understanding by evaluating the following hypotheses: Hypothesis 1.The distortion of the abstraction-based explanations will be negatively correlated with human reward understanding, including both feature and policy understanding.
Hypothesis 2. The complexity of the abstraction-based explanations will be positively correlated with human mental workload.
Jointly, these hypotheses state that decreasing distortion will improve human understanding (H1), but increasing the complexity of abstractions will result in greater workload (H2).In other words, we aim to evaluate whether (the inverse of) distortion can serve as a suitable proxy for human understanding and whether complexity can serve as a suitable proxy for human workload in explanation design.Given theoretical bounds from IB literature showing a minimum distortion for a given complexity, this suggests optimal explanations will also trade off these two competing factors: minimizing distortion to achieve a desired level of understanding, while subject to bounds on workload (and therefore complexity).

Domains
We leveraged two domains for our set of experiments: a grid-based navigation domain and a color domain.reward functions within this domain [79].For the continuous reward function, we set each color's reward equal to that of the blue value in the RGB representation (between 0 and 1).For the discontinuous reward function, we used a hand-specified function that divided the blue values into eight bins, which we assigned different rewards between -1 and 1; the exact function is in Appendix A.2. Thus, the color domain largely mirrored the grid navigation domain by establishing both continuous and discontinuous reward functions.The task that participants needed to perform for the best demonstration (BD) assessment in the color domain was a sample collection task, where the objective was to navigate through a grid and maximize the total value of samples collected along the path.The samples were represented by one of the colors in the original color grid (shown in Figure 1 c), and abstract representations of the color regions were provided as heat maps, as shown in Appendix C.

Information-Bottleneck Explanation Generation
We used existing methods from prior literature to generate explanations of our domain reward functions at different complexity and distortion levels.The embo package provides an implementation of the deterministic information bottleneck, which, given a joint distribution P(,  ), generates abstractions of inputs  over the range of possible complexities [55,65].For each domain, therefore, we computed this joint distribution by iterating over all possible  -c), eventually recovering the underlying reward grid.Generating abstractions to recover the  coordinate in the grid (red "X" curve and d), rather than the reward, led to higher distortion due to abstractions that did not align with the true reward structure.
inputs  (e.g., the (, ) location of a cell in the grid world) and computing the associated reward  (e.g., the value at that location in the grid world).Further details of this process are included in Appendix A.
For a given reward function, the IB method generated abstractions representing different optimal solutions trading off distortion and complexity; we dubbed such abstractions the "reward-optimal" abstractions.Increasing the complexity of reward-optimal abstractions led to more fine-grained abstractions and lower distortion, as shown in Figures 2 b (low complexity) and c (high complexity).In our experiments, however, we wished to explore the effects of varying distortion and complexity independently.Therefore, in addition to the reward-optimal abstractions, we generated additional abstractions using different "training objectives:" alternate reward functions, which were not necessarily relevant to structure of the reward function being explained.The abstractions generated by these alternate training objectives resulted in higher distortion (with respect to the reward function being explained) for the same complexity as the reward-optimal abstractions.
For example, in the Manhattan grid, one training objective we used was predicting the  location of each cell as the reward value.Abstractions generated from this function represented vertical strips in the grid, as shown in Figure 2 d.
At the same time, we evaluated the distortion of such abstractions by measuring the MSE in predicting the actual reward in the Manhattan grid.Unsurprisingly, the -location abstractions led to high distortion in predicting reward.
Figure 2 a shows how, in general, using -based abstractions led to higher distortion, for the same complexity, than using reward-optimal abstractions based on the true Manhattan reward.
The same trends held in the color domains as well, where we generated reward-optimal abstractions based on the continuous or discontinuous reward functions of a color's blue value (from its RGB representation), as discussed in Section 4.1.2.To generate non-reward-optimal baseline abstractions in these cases, we used the color's red value

Experiment Design
In order to empirically study the relationships between abstraction complexity and human cognitive workload and between distortion and human understanding, we performed human-subject experiments in both the grid navigation and color domains.In each domain, we studied explanations of two different types of reward function (continuous and discontinuous), as discussed in Section 4.1.For each type of reward function, we generated a set of abstract explanations spanning a range of complexities and distortions.(We include examples of abstractions at different complexity and distortions in Appendix C.) The independent variables were the complexities and distortions of the abstractions; dependent variables included both human mental workload (as measured via the NASA TLX scale) and reward understanding, measured according to feature ranking (FR) and best demonstration (BD) assessments.

Procedure
For each domain, participants were first asked a set of demographic questions, including a question about whether they were colorblind, since successful performance of the provided tasks relied upon interpretation of color.Next, participants received an overview of the experiment.In order to reduce learning effects, they were also presented with a set of example abstract explanations within the given domain, along with corresponding examples of correct responses to the two reward understanding questions that they were asked throughout the experiment (feature rank and best demonstration as detailed in Section 3.1.2).
Following the examples, participants were presented with two different scenarios in the given domain.In the grid domain, one scenario involved the Manhattan grid (based on a continuously varying distance from a point), and one scenario involved the random grid (a fundamentally discontinuous reward function).In the color domain, one scenario involved the continuous reward function, while the other involved the discontinuous reward function.The order of presentation of scenarios was counterbalanced for each experiment across all participants.
For each scenario, a participant was shown abstractions based upon a combination of training objective and complexity level.In the color domain, abstractions were selected from the set of three training objectives (continuous blue, discontinuous blue, and red) and five possible complexities , for a total of 15 possible abstractions.The five complexity values were chosen such that the abstract colors regions spanned a range from one to eight regions.In the grid navigation domain, there were similarly three training objectives (Manhattan or random reward, -based reward, and -based reward) with five possible levels of complexity, for a total of 15 possible abstractions.The number of abstract color regions again spanned a range of one to eight.Since all abstractions trained according to the same objective were the same regardless of underlying reward function, there was overlap in the set of possible abstractions for each reward function in both domains (e.g., the abstractions trained to predict the ''X" values in the navigation domain were the same for both the random and Manhattan reward functions).Therefore, we ensured no participant received the same abstraction across both scenarios.
Following each scenario, participants were asked the feature rank and best demonstration questions; they were also asked the six NASA TLX scale questions in order to assess cognitive workload, along with seven additional questions related to subjective assessment of the abstract explanation quality, which were adapted from a scale for team fluency proposed by Hoffman [31].At the end of the experiment, participants were asked a set of open-ended feedback questions about the experiment, including what they found to be most challenging and whether they had additional feedback to provide about the experiment.In addition, we asked two attention-check questions during the survey: one before the two scenarios, and another immediately after.
Manuscript submitted to ACM We administered both experiments through the Qualtrics platform, and recruited our participants via the Prolific platform.Participants received no time limit, and took an average of 28 minutes to complete the color survey and of 19.5 minutes to complete the grid navigation survey.They were compensated with $7 for completing the grid navigation survey and $10 for completing the color survey, with bonus payments of $20 provided to the highest-performing participants in each case.As this experiment was survey-based and involved minimal risk, it qualified for exempt human-subject evaluation status according to the policies outlined by the institutional review board (IRB) at the university where this research was conducted.

Grid Navigation Domain
Fifty-one participants completed the grid navigation survey (20 women, 30 men, and 1 non-binary individual).The median age was 37 years (min=19 years, max=76 years).Data from six participants was omitted from analysis due to failed attention-check questions or incomplete responses.We first analyzed the Spearman correlations between distortion and understanding and between complexity and workload for the each of the Random and Manhattan grids separately.We leveraged Spearman correlations due to the non-normality of the underlying datasets in this analysis, as well as the expected monotonic relationships between the correlated variables.We then analyzed the combined Random and Manhattan grid data through a linear mixed-effects analysis to account for individual differences in participants' responses to the reward understanding and workload questions.
We present results from the Random grid domain in Figure 3. Figures 3 a and b demonstrate that both feature rank ( ) and best demonstration () scores were negatively correlated with reward distortion.Intuitively, this supports our hypothesis that metrics of understanding would decrease as reward distortion increased (H1).Quantitatively, these results were significant: using the Spearman correlation coefficient, we found  (FR, Dist) = −0.83,( < 0.001) and  (BD, Dist) = −0.68,( < 0.001).At the same time, Figure 3 c shows a significantly positive correlation between workload and complexity ( (Work., Comp.) = 0.53, ( < 0.001)); that is, as the complexity of abstractions increased, so did the workload, supporting H2.
We observed similar trends in the Manhattan grid domain, depicted in correlation was stronger for FR than for BD.Unlike with the Random grid, we did not identify a significant correlation between workload and complexity, although the trend was still positive:  ( Work., Comp.) = 0.23  = 0.13.
Following the analysis of Spearman correlations for each grid type separately, we performed linear mixed effects modeling (LMEM) on the joint data from the Manhattan and Random grids and found significant trends supporting all our hypotheses.The models we applied for this analysis were formulated according to the following equations in Wilkinson notation [73]:  ∼  + (1|),  ∼  + (1|), and   ∼  + (1|).Here, the models were fit using  as a grouping variable, with a random intercept to account for the individual differences between participants, which were not accounted for by the Spearman correlations (e.g., perhaps one participant would consistently report higher workload).In our joint analysis, we leveraged the fact that each participant answered one question about the Manhattan grid and one about the Random grid.We observed significant main effects for distortion and complexity within each model at the  < 0.001 level, with the following effect Overall, the results from our grid domain experiments strongly support our hypotheses that 1) decreased distortion would be associated with increased understanding and 2) increased complexity would be associated with increased workload.We found statistically significant support for all but one of our hypothesized results via Spearman correlation tests assessing each grid-based reward function separately, and we found support for all of our hypothesized results when evaluating the combined datasets through a linear mixed effects analysis which accounted for individual differences in participant responses.

Color Domain
We applied the same set of analyses as in the grid navigation domain to the color domain: again analyzing the results for the continuous and discontinuous reward functions separately via the Spearman correlation, then performing a joint analysis of the data together with a LMEM.Forty-seven participants completed the grid navigation survey (11 women, 33 men, 1 non-binary individual, and 2 not reporting gender).The median age was 33 years (min=20 years, max=66 years).We omitted data collected from two participants from analysis due to failed attention-check questions or incomplete responses.
Results from the color domain experiments corroborate the key trends observed in the grid navigation domains, with the exception of the best demonstration (BD) understanding assessment.First, separating results by continuous and discontinuous reward functions, we established significant Spearman correlations for some of the hypothesized trends.Feature rank (FR) and distortion were negatively correlated for both reward functions ( < 0.001).For the continuous reward (shown in Figure 5 a),  (FR, dist) = −0.47;for discontinuous,  (FR, dist) = −0.40.Correlations between complexity and workload were positive, but not at the  = 0.05 level: for continuous (Figure 5 c),  (Work., Comp.) = 0.24 ( = 0.06) and for discontinuous,  (Work., Comp.) = 0.18 ( = 0.11).Lastly, correlations between the best demonstration (BD) understanding assessment and distortion were weak, with no significance value lower than 0.15.We attribute the best demonstration (BD) correlation failure to high random chance performance with high distortion: even with completely uninformative abstractions, some participants selected the optimal best path through random guessing.
Also of note is that the visualizations of the abstractions within the color domain were not provided in the same space in which the best demonstration task was performed (as opposed to the grid domain, where the abstractions were visualized in the grid itself), so another possible reason for this difference (and the added difficulty with high-complexity abstractions) is the extra step necessary to translate the abstract information into the task space.
Complementing our Spearman correlation analysis for each domain separately, we again performed LMEM tests on all the joint data, grouped by participant, as we did for our grid navigation experiments.We applied linear mixed effects models of the same form, and found significant main effects for both distortion in the  model and complexity in the  model.The linear trends for  vs. distortion and workload vs. complexity were significant at  < 0.05: FR = −0.70* Dist + 0.34, Work.= 39.43 * Comp.+ 156.23.These findings support our two hypotheses that (H1) increasing distortion would decrease understanding and (H2) increasing complexity would increase workload.As before, however, the correlation between  and distortion was not statistically significant: BD = −1.11* Dist.+ 0.73 ( = 0.12).

DISCUSSION
Across domains, we found evidence supporting our two hypotheses, with more mixed results in the color domain.In every domain, and with every reward function, we observed significantly negative Spearman correlation coefficients between feature-based understanding and distortion.While the significance of other trends varied slightly across domains, when we leveraged the within-participant aspect of our experiment design to account for individual differences in participant responses, we similarly found significantly positive correlations between workload and complexity.Although weaker than the  and distortion correlations, we also found significant correlations between the  measure of understanding (i.e., policy understanding) and distortion in all cases within the grid navigation domain.These weaker correlations track with previous experimental results, which demonstrated that the factor loadings for policy-based assessments of alignment (understanding) were weaker than those for feature-based assessments [61]; this is likely Manuscript submitted to ACM due to the additional challenge of translating a reward function into an optimal policy within a given environment.
Nonetheless, the overall trends in our results support the hypothesized link between human factors constructs and information-theoretic concepts.This enables us to leverage these information-theoretic concepts to mathematically characterize the workload-understanding tradeoff in XAI design and to automatically generate abstraction-based explanations which trade off these factors, allowing us to account for variable informational and workload needs between different users of AI systems.
Overall, we observed a larger number of significant results and stronger correlations in the grid-navigation domain compared with the color domain, particularly for the  (policy understanding) results.This is likely related to the visualizations of the abstractions in each domain: in the grid-navigation domain, heat maps of square values were provided within the best demonstration task grid itself, while in the color domain, participants interpreted the heat maps and their relation the color grid separately from the task grid, and then had to translate their reward understanding into an optimal policy in the task grid in an additional step.While we identified significant support for our key hypotheses related to the relationships between information-theoretic concepts and human factors constructs in abstract explanation generation across both experiments, the differences in the  results corresponding to the different types of abstraction visualizations between the experiments highlight the importance of carefully considering how to visualize abstractions for effective communication in future explanation design.Finally, we observed some evidence that continuous reward functions may be better candidates for abstraction than discontinuous ones.The correlations between  (feature-based reward understanding) and distortion were stronger for the Manhattan grid (with a fundamentally continuous reward structure) than the Random grid in the grid-navigation domain, and for the continuously varying reward function than the discontinuously varying reward function in the color domain.This suggests that abstracting such continuously varying reward regions may lead to more natural explanations of reward functions than groupings of discontinuous reward regions.We leave additional exploration of the impact of the structure of reward functions and the inherent "abstractability" of their feature spaces on explanation efficacy as an area for future work.

Limitations and Future Work
Our work takes an important step toward connecting IB methods to human factors in understanding explanations, but we rely upon several simplifying assumptions.First, in our experiments, we used the "ground truth" reward function, as well as alternative reward function baselines, to generate abstractions.As shown in our results, generating low-distortion explanations is extremely important for understanding explanations, and lacking the ground truth reward function would necessarily lead to higher-distortion abstractions.For the purposes of this paper, we therefore scoped our work to assume access to such functions; future work may wish to relax this assumption, but must therefore propose methods for the use of alternative reward functions.
Second, in our experiments, we used exact IB methods for generating abstractions, which may not scale to larger domains.For example, as the number of grid locations in the navigation domain increases, standard IB methods will slow down considerably.Fortunately, recent complementary work proposed approximated discrete IB methods for rapidly generating abstractions at varying complexity levels [70]; we anticipate that such methods may be easily combined with our explanation work to incorporate approximate IB abstractions into explanations.
Finally, we scoped our work to study abstract explanations of reward functions in particular, but there are additional concepts related to AI decision making, such as policies, constraints, counterfactual decisions, and decision uncertainty (among others), which might also be effectively explained through the application of automatically generated abstract explanations.Beyond this, such techniques can be extended to explain complex concepts in larger-scale domains, such as reinforcement learning-based robotics applications and large language models (LLMs).In this work, we have established empirical connections between information-theoretic concepts and human factors constructs which we hope will apply to explanation design for this broader scope of AI concepts and domains, and have laid the groundwork for future exploration and confirmation of these relationships in different settings.Future work can explore these relationships and their applicability to human-centered XAI design in an expanded assortment of settings.

CONCLUSION
In this work, we established empirical connections between human factors metrics of explanation understanding and workload with Information Bottleneck (IB) concepts of distortion and complexity.In the standard IB framework, a tradeoff exists between distortion and complexity; we established a similar tradeoff of people improving reward understanding as distortion decreased, but at at the cost of increased workload as complexity increased.Our findings may be used directly to inform explanation design, especially in accounting for differing informational and workload needs between individual users of AI systems.For example, given a maximum acceptable workload, one could find the corresponding allowed complexity level for explanations, and, at that complexity level, promote understanding by minimizing distortion.More generally, our work establishes important connections at the intersection of human factors and information theory, which we hope future work will continue to explore.

A GENERATING IB ABSTRACTIONS IMPLEMENTATION
We used the embo package to generate abstractions at different levels of complexity and reward distortion and employed these abstractions in human participant experiments to analyze the role of complexity and distortion in human understanding [55].Here, we discuss how we generated our abstractions, and demonstrate that we induced meaningful variation in complexity and distortion. 1

A.1 Grid Domain
In the grid domains, we numbered each of the 25 grid cells with a unique id, and used four reward functions during the IB process: Manhattan distance, random reward,  coordinate, and  coordinate.For the Manhattan distance reward, depicted in the main paper in Figure 1 a, reward was set to +1 at cell location (1, 3) and decreased by 0.33 for each increase in Manhattan distance, to a minimum of −1.0.For the random reward, depicted in Figure 1 b, we selected a reward value uniformly at random in the range [−1, 1].For exact values, we refer the reader to our code which, given the fixed random seed of 0, will exactly reproduce the random values in our experiments.Lastly, for the  and  coordinate rewards, we set the reward equal to the integer value of the  or  coordinate of each cell, ranging from 0 to 4, inclusive.
For each of the four training objects used during the IB process, we evaluated abstractions with the Manhattan and random grid reward functions.For example, we generated abstractions via the  coordinate reward function, which divided the grid into vertical regions, and evaluated distortion using such abstractions for the Manhattan and random grid rewards.We chose these values to intentionally cause similar continuous blue values (e.g., 0.124 and 0.126) to have dissimilar rewards (e.g., 0.5 and −0.5).
We include results of evaluating all such abstractions using the continuous blue reward function in Figure 6.
Unsurprisingly, using the continuous blue reward to generate abstractions resulted in lower distortion for the same complexity compared with using the other two training objectives.Concretely, we observed that increasing the complexity of abstractions used for predicting a color's red value had virtually no effect on the distortion for predicting the color's blue value.Given the orthogonal nature of blue and red representations in RGB colors, this is unsurprising.
Overall, we used these three training objects for generating abstractions to decouple changes in complexity and distortion, while exploring meaningful ranges for each value.

B COLOR DOMAIN RESULTS
In the main paper, we omitted some of the graphs from the color domain experiments for clarity; we include them here for completeness.The different rows in Figure 10 reflect abstractions generated using different training objectives.In the top row, we used the continuous blue reward function -the same function used for evaluation.In the second row, we used the discontinuous blue reward function, and in the third row we used a color's red value to generate abstractions.
As in the grid domains, we found that 1) increasing complexity led to more fine-grained abstractions and 2) using different training objectives in the IB process led to sub-optimal abstractions that resulted in high distortion.For example, using the discontinuous blue reward function (second row) at high complexity (Figure 10 f), we observed that two abstractions have nearly identical average rewards of 0.17 and 0.18.Thus, the additional complexity incurred by having more abstractions comes with barely any decrease in distortion.This suboptimality is even more pronounced for abstractions generated with the continuous red reward function (bottom row).Using just two abstractions at low complexity (Figure 10 g), the color space is partitioned into two groups that are meaningful for predicting the redness of a color, but with almost no difference in blue value (mean values of 0.43 and 0.47).
Manuscript submitted to ACM

3. 1
Measures of Human Workload and Human Understanding from the Field of Human Factors 3.1.1Workload.As discussed in Section 2.1.1,multiple measures of human workload (both objective and subjective)

4. 1 . 1
Grid Navigation Domain.In the grid-navigation domain, different reward values between -1 and +1 were assigned to squares in a 5x5 grid, as depicted in Figure1.The task was to navigate between a start square and goal square while Manuscript submitted to ACM

Fig. 1 .
Fig. 1.We conducted experiments in three domains.In the two grid-based domains (a-b), reward was based on location: either Manhattan distance from a fixed point, or randomly-distributed reward.In the color domain (c), reward was based on the blue value in a color's RGB representation.

Fig. 2 .
Fig. 2. Complexity-distortion curves (a)) and corresponding abstractions (b-d) for the Manhattan grid navigation domain.Using the true grid reward leads to low distortion as complexity increases (blue "Reward" curve), and more fine-grained abstractions (b-c), eventually recovering the underlying reward grid.Generating abstractions to recover the  coordinate in the grid (red "X" curve and d), rather than the reward, led to higher distortion due to abstractions that did not align with the true reward structure.

(
again, from RGB) as an alternate training objective; such abstractions were not useful in predicting the true reward value (which depended only on the blue value), leading to high distortion regardless of complexity.(See Figures 6 for complexity-distortion curves in the color domain.)Overall, by using a variety of training objectives to generate abstractions, we could test explanations using abstractions at the same complexity, but different distortion, levels.Appendix C includes examples of abstractions at different complexity levels, for different training objectives, in all domains.

Fig. 3 .
Fig. 3. Results from the Random grid domain.As distortion increased, explanation understanding, as measured by Feature Rank (a) or Best Demonstration (b), decreased.At the same time, as complexity increased, workload increased (c).

Figure 4 .Fig. 4 .
Fig. 4. Manhattan domain results, connecting metrics of understanding to distortion (a-b) and workload to complexity (c).Increased distortion led to worsened understanding, and increased complexity led to increased workload.

Fig. 5 .
Fig. 5. Color domain results, using the continuous reward function.Trends were weaker than those observed in the grid domains, but still reflected the hypothesized directions.Similar results for the discontinuous reward function are included in Figure 7 in Appendix B.

Figure 2 a
in the main paper shows how abstractions generated using different reward functions resulted in different distortion values for the same complexity.

Fig. 6 .
Fig. 6.Complexity-distortion curves (a)) and corresponding abstractions (b-e) for the continuous reward function in the color domain.Using different rewards to generate abstractions (different curves) led to varying distortions for the same complexity.Using the continuous reward function led to optimal distortion-complexity tradeoffs, and varying complexity increased the number of abstractions (d-e).

Fig. 8 .Fig. 9 .
Fig. 8. Manhattan grid abstractions for various training objectives (different rows) and complexity levels (different columns).Increasing complexity led to more and finer-grained abstractions.Some abstractions led to low distortion (top row), whereas others removed important information, leading to high distortion (bottom two rows).

Figure 10 includes
Figure10includes visualizations of abstractions from the color domain much like the previous visualizations of grid-based abstractions.Abstractions were evaluated according to the continuous blue reward function; the heatmap used for visualization shows the average blue value of each abstraction.As before, we selected three checkpoints at low, medium, and high complexity (corresponding to different columns).

Fig. 10 .
Fig. 10.Color abstractions evaluated on the continuous blue reward function.Increasing complexity (left to right) increased the number of abstractions.The distortion associated with such abstractions varied greatly, however, depending upon the training objective used when generating abstractions.Using the continuous blue reward to generate abstraction (top row) led to evenly spaced abstractions with little distortion.Different training objectives (discontinuous blue in the second row, continuous red in the third) led to suboptimal abstractions with higher distortion.

Fig. 11 .
Fig. 11.Example of the feature rank question in the grid navigation domain.The top left grid designates regions of the grid which are ranked by participants when answering the question.The top right grid depicts the abstract reward regions.The reward values associated with each reward region are depicted in the five numbered swatches below the grids.At the bottom, sample responses for the feature rank question are provided based on the given abstract grid.

Fig. 12 .
Fig.12.Example of the best demonstration question in the grid navigation domain.The same explanation (grid with the abstract reward regions) is shown to participants as for the feature rank question.For this question, participants must select the reward-maximizing path through the grid at the bottom of the figure (depicted through the selected green boxes with check marks).

Fig. 13 .
Fig. 13.Example of the feature rank question in the color domain.The heat map and the associated numbered color swatches at the top indicate the abstract regions of colors from the color grid below and their corresponding reward values.At the bottom, sample responses for the feature rank question (a set of ranked colors from the color grid) are provided based on the given abstraction of the color grid.

Fig. 14 .
Fig. 14.Example of the best demonstration question in the color domain.The same explanation (color grid with the abstract reward regions) is shown to participants as for the feature rank question.Here, participants must select the path through the grid at the bottom of the figure which maximizes the value of the collected samples, which are indicated by different colors from the original color grid.The selected path is, again, marked by green boxes with check marks.

Fig. 15 .
Fig. 15.Questions from the NASA TLX survey [28].The questions above are depicted for the grid navigation domain.

Fig. 16 .
Fig.16.Subjective assessment questions adapted from Hoffman[31].The questions above are depicted for the grid navigation domain, and were largely similar in the color domain.