The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features

AI systems have been known to amplify biases in real-world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between this proxy feature and the protected one may be less clear to a human. In this work, we study the effect of the presence of protected and proxy features on participants’ perception of model fairness and their ability to improve demographic parity over an AI alone. Further, we examine how different treatments—explanations, model bias disclosure and proxy correlation disclosure—affect fairness perception and parity. We find that explanations help people detect direct but not indirect biases. Additionally, regardless of bias type, explanations tend to increase agreement with model biases. Disclosures can help mitigate this effect for indirect biases, improving both unfairness recognition and decision-making fairness. We hope that our findings can help guide further research into advancing explanations in support of fair human-AI decision-making.


INTRODUCTION
Improving the fairness and trustworthiness of AI systems is often cited as a goal of explainable AI (XAI) [e.g., 4,17,19,20,39,42,62]. Research in XAI aims to improve fairness in human-AI decisionmaking by providing insights into model predictions, and thereby allowing humans to understand and correct for model biases.On the other hand, in the context of human-AI decision-making, previous work has noted that humans often over-rely on AI predictions, and explanations can exacerbate this concern [9].This is especially troubling if the underlying model contains systematic biases, which may go unnoticed even when teamed with a human.For the human-AI team to succeed, the human needs to be able to determine when to rely on or override potentially biased AI predictions.Previous work has shown that explanations can help human-AI teams alleviate model biases when those biases depend directly on protected attributes [18,54], but little is known in the very common case that protected attributes are not explicitly included, and rather the features used for prediction contain proxies thereof (e.g., zip code for race, length of credit for age, and university attended for gender).In particular, it may be difficult for humans to identify and resolve biased model predictions based on the proxy features present in real-world data, even when explanations are provided.
In this work, we study whether explanations can help people to identify model biases and to calibrate their reliance on a biased AI model.We extend work in this space by moving beyond direct biases that are revealed through the use of protected (i.e., sensitive) to indirect biases that are revealed through proxy features that may be less obvious to a human.Further, we examine whether explicitly disclosing model biases and correlations between the proxy and protected features can help humans calibrate their trust in a biased model.Our study aims to evaluate whether explanations can directly help notice model biases, even when the biases are obfuscated by the presence of proxy features and whether explanations can help users correct model biases when they are known to be present, through the use of bias disclosure and correlation disclosure.We study the effect of these treatments (explanations, model bias disclosure, and proxy correlation disclosure) on the fairness, including fairness perception and fairness in decision-making (measured by group-wise parity), as well as the accuracy of the decisions made by human-AI teams.
We conduct our study in the context of micro-lending outcome prediction-a setting that entails judging whether a loan applicant will fulfill their loan request based on profile information of the applicant (e.g., size of the loan, borrower occupation, etc).For our experiments, we use semi-synthetic data where the majority of the features in an applicant profile as well as the final loan repayment status comes from the website Prosper. 1 To incorporate fairness considerations, we add to the applicant profiles (binary) gender, which is a protected feature, and university which, when considering women's vs co-ed colleges, can be a proxy for gender.Because we seek to test whether people can correct for model bias, we intentionally train a biased predictor with outcomes skewed against applicants with gender assigned as female or university assigned as a women's college.
We find that explanations alone can help people notice unfairness in the case of direct bias (through protected features, e.g., gender), but not in the case of indirect bias (through proxy features, e.g., university).Surprisingly, regardless of whether people notice the unfairness in the AI decisions, explanations lead people to accept model's biased decisions leading to less fair decisions.In the case of direct bias, as participants often recognize clear-cut gender bias before an explicit disclosure, disclosing model biases does not further affect participants' fairness perception.However, in the case of indirect bias, disclosing both model bias and the correlation between protected and proxy features or disclosing partial information with the addition of explanations significantly improves participants' awareness of the unfairness.However, contrary to explanations alone, this change is not paired with a worsening of decision-making fairness.Instead, with these disclosures, people increase their rate of positive predictions for the disadvantaged group, improving decision-making fairness.Our work aims to highlight methods to assist users in effectively leveraging explanations, especially in scenarios where bias may be indirect and not apparent through explanations alone.

BACKGROUND AND RELATED WORK
Biases in Models and Humans.Both models and humans can be biased.Humans are known to exhibit many implicit and unconscious biases [31].For instance, Bertrand and Mullainathan [6] find that an applicant with a "White-sounding name" on a resume that is otherwise identical to a resume with an "African-American-sounding name" is more likely to receive an interview callback.
Models, in turn, can inherit human-like biases (e.g., through biased data [3, i.a.]), even if this is not intended by the developers.For instance, Angwin et al. [1] show that training on data collected from a racist justice system can lead to a model that predicts that white defendants are less likely to recidivate than their black peers.
This paper explores how humans interact with predictions from a biased model, wherein AI systems may be able to uncover helpful patterns in existing data and humans may be able to apply their contextual understanding and societal awareness to contribute to correcting these biases within the model.XAI and Decision-Making.The potentially complementary strengths and weaknesses of humans and machines raises a question of whether human-AI teams can overcome the biases that exist in each individually (e.g., in the case of recidivism prediction [e.g., 14,21,61]).Existing work in explainable AI (XAI) has focused on providing explanations of the model decisions to help improve the 1 https://www.kaggle.com/datasets/yousuf28/prosper-loanoutcomes of human-AI decision-making [2, 8-12, 26, 29, 35, 37, 38, 43, 49, 55, 58, 60, 61, 63, 65].However, these studies find varying utility of explanations.Much work has found that explanations can help humans collaborate more effectively with AI [11,26,29,30,37,38,58,63], for instance helping them answer trivia questions more accurately [26] or understanding how the AI system works [13].Other work has found that explanations can worsen human-AI performance [2, 8-10, 12, 35, 49, 55, 60, 61] even below the performance of the human or AI alone.Further, the utility of explanation can also vary based on the participant's level of expertise in the task [e.g., 61], the participant's math and logic skills [57], how easy the explanations are to understand [63], etc.
Beyond explanations, other work has considered how further transparency can or cannot be beneficial to a human-AI team such as tutorials [38], disclosing model confidence [51], disclosing model accuracy [23], and disclosing whether test examples fall into the scope of model training data [15].
Building upon previous research, this paper investigates the impact of explanations on the behavior of a human-AI team, especially their influence on the fairness of human-AI decisions in cases where the underlying model exhibits bias.In addition to explanations, we draw inspiration from work considering other methods of improving transparency in human-AI decision-making [e.g., 15,23,51], exploring the implications of disclosing model bias and the correlation between protected and proxy features on the overall fairness of a human-AI team.
XAI and Fairness.Improving model fairness is often cited as a potential benefit of XAI systems [4,17,19,20,39,42,62].XAI is hoped to help "diagnose the reasons that lead to algorithmic discrimination" [20], to "highlight an incompleteness" in problem formalization that leads to unfairness [19], or to show compliance with fairness requirements [62].
Previous work has examined how explanations affect humans' perceptions of AI systems' fairness [7,18,40,50,54,64].Rader et al. [50] find that participants that are told that an AI system is being used in decision-making rate the system as significantly less fair even without any specific system information.Lee et al. [40] find that explanations of an AI system's general decision-making process do not increase perceived fairness while input-output level explanations of individual outcomes have mixed effects on fairness perceptions.Binns et al. [7] consider how four different styles of explanations affect justice perception, finding no clear winner between the approaches.Dodge et al. [18] further study the explanations styles in [7] and find that local explanations (such as presenting outcomes for similar examples) help surface fairness discrepancies between different cases while global explanations (such as describing how each feature influenced the decision for a given example) increase user confidence in their understanding of the model and enhance users' fairness perceptions.
As self-reported perceptions do not always align with observed behaviors in human-AI decision-making [8,47], recent work has begun to expand out of fairness perceptions and into observed fairness in decision-making [54,59].Schoeffer et al. [54] study how explanations can help users appropriately rely on potentially unfair AI predictions.They find that explanations that highlight protected features negatively affect fairness perceptions and that decreases in fairness perception are associated with an increase in overrides of AI predictions, even on examples where this override is detrimental to the fairness of the human-AI team.Wang et al. [59] study the effects of the level of model bias and the presence of explanation on the fairness of human decisions.They find that explanations lead participants to make more unfair decisions, even when participants were no longer given access to model predictions or explanations.
Existing work has primarily studied fairness when the model decision is directly based on a protected feature, like gender or race.However, models can produce biased outcomes, even without access to protected features, by relying on proxy features [34,48].For instance, a model that has direct access to a "race" feature and one with access to features like zip code, name, or language spoken at home could produce similarly biased predictions.In contrast to existing work considering the relationship between explanations and fairness perceptions or decision-making fairness, we consider not only direct bias through a protected feature but also indirect bias through a proxy feature.

RESEARCH QUESTIONS
We study the effect of explanations and disclosures in improving the fairness perception and fairness of decisions made by human-AI teams.In our study, model biases can be direct: stemming from the protected feature (gender), or indirect: stemming from a proxy feature (university) that is correlated with the protected feature.For explanation, we consider an input-influence explanation of how each feature contributed to the AI's prediction.For disclosures, participants may be told about the demographic parity (described in §5.1) of the system (model bias disclosure) and the strength of correlation between the proxy and protected features (proxy correlation disclosure; see §4.1).Our study addresses the following research questions: RQ1a: Are explanations beneficial to the fairness of a human-AI team?RQ2a: Without explanations, does disclosing only model bias or disclosing model bias and proxy correlation benefit human-AI fairness?RQ3a: With explanations, does disclosing only model bias or disclosing model bias and proxy correlation benefit human-AI fairness?RQ4a: Does the joint intervention of adding explanations and disclosures benefit human-AI fairness?RQ1-4b: Do the answers to RQ1-4a change when models exhibit direct (e.g., gender) vs indirect (e.g., university) bias?
We consider the utility of explanations and disclosures under three lenses: the accuracy of human's perception of fairness and the improvement in demographic parity in human-AI decisionmaking over AI-only parity, and the decision-making quality (namely, accuracy, false negative rate (FNR), and false positive rate (FPR)) of human-AI decisions compared to the AI alone.
Beyond these primary research questions, we also consider: RQ5: Does dispositional trust ("an individual's enduring tendency to trust automation" [32,44]) affect decision-making and fairness perceptions when working with models exhibiting direct or indirect bias?
RQ6: Do explanations and disclosures affect self-reported learned trust (based on "past experience or the current interaction" [32,44]) in models exhibiting direct or indirect bias?
In our study, we vary conditions based on the directness of bias, whether explanations are shown, and the kind of disclosure the participant receives.We consider six conditions.In the first three conditions, we do not show participants the explanations.Here, we consider one Protected condition with bias disclosure, and two Proxy conditions: one Proxy with correlation disclosure and one Proxy without correlation disclosure.We similarly consider three conditions with explanations allocating the biased feature and disclosure types in the same fashion.
We assess the effect of explanations (RQ1a) by comparing conditions with and without explanations (before any disclosures) in a between-subjects analysis.We assess the effect of disclosures without explanations (RQ2a) and the effect of disclosures with explanations (RQ3a) in a within-subject analysis comparing fairness perceptions and human-AI decision pre-and post-disclosures.This allows us to study how disclosures may help participants identify model biases over what is apparent before any disclosures.These first three effects are summarized in Figure 1.Lastly, we assess the effect of explanations and disclosures jointly (RQ4a) by comparing conditions in which participants are not shown explanations pre-disclosures with conditions in which participants are shown explanations post-disclosures.These experiments are repeated for protected and proxy conditions to assess the differences in interventions therein (RQ1-4b).

STUDY DESIGN
To answer the research questions posed in §3, we study decisions made by human-AI teams. 2 In this study, the AI teammate is a classification model trained on partially-synthetic data in the context of loan prediction.We choose the task of loan prediction from a micro-lending platform as it is a decision-making task performed by laypeople which means that crowd-workers are more likely to have intuitions about the task and the features used in predictions.In our study, participants are shown either the protected feature of binary gender 3 or the proxy feature of university.
Task Phases.In each task phase (P1, P2), the participant is shown 10 profiles of loan applicants: their features and the overall AI prediction.Depending on the condition, they may or may not be shown an explanation of the AI prediction (Figure 3 left and right, respectively).This profile will, according to the condition, include either a "gender" or a "university" feature but not both.Participants are asked to mark on a five-point-scale whether they think the applicant will complete their loan on time or be late in repaying Figure 1: Summary of primary effects considered in our study.Participants are assigned to either with or without explanations conditions and then complete the study moving horizontally from phase 1 to phase 2. We then compare the results of different combinations of phases and explanation conditions to investigate the effects of explanations alone, disclosures without explanations, and disclosures with explanations.their loan (Figure 3, below).Their response to this question serves as the decision made by the human-AI team.
In each phase, we control the distribution of gender and AI predictions.The participant sees applications from 2 women who are predicted as "Complete" and 3 women who are predicted as "Late" and vice versa for men.(This is true in the underlying data even if the participant and the model do not directly see each applicant's gender.)We hold this ratio constant to avoid any effect due to the gender distribution or the rejection rate observed by different participants.
To discourage participants from making decisions without any consideration of the prediction and (when applicable) explanation, we ask participants for a free-text justification of why they agreed or disagreed with the model prediction (or was neutral) after they have chosen their prediction on selected profiles.We randomly select one application in each gender + prediction combination for collecting these free-text justifications.These justifications also help us qualitatively assess the reasoning behind participants' decisions.Further, to help filter out low-quality responses, participants are shown an attention check question, asking them to recall the previous AI prediction (Figure 9 in the Appendix) after seeing the first applicant in P1.
Disclosures.Before proceeding to P2, participants may be shown general explanatory materials or specific disclosures on model bias and feature correlations.In the model bias disclosure (Figure 4a), participants are told that the model they saw in P1 had a low demographic parity (below 80%) (see §5.1 for details about demographic parity).In the correlation disclosure Figure 4b, participants are told the correlation between each university and gender in the model's training data.The bias disclosure is shown across conditions, whereas correlation disclosure is only shown in the proxy conditions with correlation disclosure.In the proxy conditions without correlation disclosure, participants are only told that models can rely on proxy features to make biased predictions, without specifying the correlation between gender and university.This is done to make participants aware of potential biases without explicitly disclosing the correlations.
Based on the disclosures seen, participants are asked up to two comprehension questions (Figure 13 in the Appendix).All are asked whether the model's demographic parity was above 80%.Those who received correlation disclosure are asked to select one university that is highly associated with women.
Note that participants are not encouraged or primed to consider fairness explicitly before or during the first phase of the task.We For the applicant profile (above), an AI system has predicted that the applicant will complete the loan on time.The figure shows the weights assigned to different attributes of the applicant's profile by the AI system.Do you think that the applicant will be Late or will Complete the loan on time?Example profile with explanation from the "protected" model (left) without an explanation (right) and question to the user (below).The predicted outcome is completing the loan on time.The labels on the left show the name of each feature.The labels on the right show the value of each feature for the current applicant and the percent/percentile of this value in the training data.For the explanation, on the x-axis positive blue values correspond to "Complete" predictions and negative red to "Late".See Figure 8 in the Appendix for an example profile as shown in the study interface.
only refer to fairness directly after phase 1.This allows us to measure how well participants can notice, or account for, unfairness when they aren't explicitly told to look out for it in phase 1. Subsequently, in phase 2, we can measure how participants perceive and account for unfairness when they know it is a salient concern.
Surveys.The three surveys (S0, S1, S2) aim to capture participants' trust and fairness perceptions.All surveys include questions asking participants to rate their level of agreement with statements relating to trust (on a scale of 1-5) [33].In S0, participants are asked about their trust in AI systems generally, assessing their dispositional trust (Figure 14 in the Appendix).In S1 and S2, participants are asked about their trust in the system presented in the task phases, assessing their learned trust in the AI system that they interact with in the study (Figure 15 in the Appendix).
In the post-task surveys (S1 and S2), alongside trust-related questions, participants are also asked about their perception of the fairness of the system they have been interacting with (whether "the AI system was fair across different genders").Additionally, participants are asked the reason(s) that led to their disagreements with AI such as the explanations including irrelevant features or the decisions being unfair towards applicants of different genders.
Tutorial and Warm-up.In P0, participants are acclimatized to the task with a full tutorial example.They are shown one tutorial example with a walk-through of the task, the AI decisions and explanations (when applicable).Then, they are shown warm-up examples.In the conditions without explanation, they are shown two examples with no AI prediction or explanation.In the conditions with explanations, they are first shown a version of this example with no AI prediction or explanation.This is designed to encourage participants to properly engage with the features present.Second, they are shown the same example with the AI feature explanation (still without any prediction) as this setting has been shown to benefit decision quality and support learning by encouraging participants to cognitively engage with explanations [27].

Participants
We recruit 369 participants for our study through the crowdsourcing platform Prolific. 4Each participant is restricted to taking the study only once.Participation is restricted to US participants, fluent in English.We compensate all participants at an average rate of US$15 per hour.We discard responses that fail more than one attention check, leaving a total of 350 participants, with 51, 48, 45 participants in the protected condition, the proxy condition with correlation disclosure, and the proxy condition without correlation disclosure without model explanation and 68, 69, 69 participants in the three respective conditions with model explanations.42% of participants self-identified as women, 52% as men, 3% as nonbinary/non-conforming, 3% as transgender, and 1% as a different gender identity, with 1% of participants opting not to respond. 59% of participants were between the ages of 18-25, 46% between 25-40, 27% between 40-60, and 6% over the age of 60, with 2% of participants opting not to respond.
For decision-making tasks, such as microlending outcome prediction, AI systems can be biased against different demographic groups, such as gender, race, etc.These systems may be used to recommend acceptance for microlending applications (that is, to accept loan request if the applicant will likely complete the loan on time and reject it if the applicant will likely be late on the loan).Unfairness in the AI systems can potentially limit the access to loans for certain demographic groups.
To avoid discrimination, decision makers should follow the 80% rule: the acceptance rate for the disadvantaged group should be within 80% of the acceptance rate for the advantaged group.
For the 10 applicants in phase 1, the model predicted 60% of the men would complete the loan on time and 40% of the women would complete the loan on time.This leads to the acceptance rate for the women to be about 65% of that of the men.

(a)
One thing to note is that AI systems can be discrimininatory even based on features that you may not expect.For example, even if a system does not explicitly know applicants' gender, it can still discriminate against applicants who went to women's colleges.(b) Figure 4: a) Bias disclosure.b) Full correlation disclosure.Proxy "no correlation disclosure" conditions include the top paragraph but with the example of a hiring system relying on the relationship between zip code and race.See Figure 10 and Figure 11 in the Appendix for how these disclosures are shown in the study interface.

SYSTEM OVERVIEW
We conduct our study using model predictions and explanations from logistic regression models trained on partially synthetic microlending data. 6Since the participant's perceptions of how the model is interacting with the profile features is key to answering our research questions, we want to avoid any potential confounding effects from using artificial or Wizard-of-Oz model explanations, or entirely synthetic data.The scenario of predicting whether an applicant will complete microloan repayment on time or will be late is one that our participants will likely be sufficiently familiar with to have reasonable prior intuitions about what features are relevant.A challenge is that under US law, protected features like gender cannot be considered when making loan allocation decisions [52] and, therefore, is not in the dataset that we consider.For this reason, we augment our data with a synthetic "gender" feature, which we correlate with outcome to induce model bias.We also generate a proxy feature, university, which allows us to finely control the level of correlation between the proxy and gender.
Data.Our loan prediction data comes from a modified set of microloans from the website Prosper. 1 The original dataset contains 79 features of microloans including their status (completed, past due, etc).We group the loan statuses into "Complete" (including "Final Payment in Progress"), "Late" (including "Defaulted" and "Charged-Off"), or "Other" (including "Current" or "Canceld").We keep the ∼14000 profiles with "Complete" or "Late" statuses (with a 7:3 train-test split).This grouped loan status is the feature that the participants and the model will predict.As showing all 79 features to the participant may be overwhelming, [49] we select 5 features (the original amount of the loan, the category of the listing, the applicant's occupation and employment status, and their state of residence) that are both important to loan prediction and are likely interpretable by a layperson. 6https://github.com/ctbaumler/protected-vs-proxyAs described above, we synthetically generate values for our protected characteristic (binary gender).The existing applicants are assigned a gender in such a way that the ratio for "Complete" vs "Late" outcomes is 2:3 for female applicants and vice versa for male applicants.This simulates historically biased data, which will cause the model to associate femaleness with being late on loans and maleness with completing them.
Using the generated "gender" feature, we further generate the proxy feature (university).We include co-ed and women's colleges, setting the joint distribution of gender and university such that most co-ed universities have relatively balanced gender ratios (See Figure 4b).For women's colleges, the distributions reflect real-life statistics.We choose exclusively liberal arts colleges with similar US News rankings 7 to avoid confounding due to the effect of perceptions of liberal arts vs non-liberal arts schools and perceptions of school rankings.
Since, in our biased dataset, gender is correlated with outcome and, of course, the existing features are correlated with outcome, all features may be weakly correlated with gender.To confirm that university is the only strong proxy in our data, we compare the correlation of each categorical and continuous feature with gender.For continuous features (and one-hot features of each university), we use Pearson's r coefficient.We find that the women's colleges have at least an absolute correlation of 0.273 across Proxy conditions, whereas the maximum absolute correlation for other continuous features is 0.014, which is much lower.Similarly, for categorical features, we use Cramer's V, finding that the university feature has at least an absolute correlation of 0.417 while the maximum absolute correlation for the remaining categorical features is 0.082, which is also lower.Overall, we see that university (especially women's colleges) has a much stronger correlation with gender than any other feature shown to the participants.Models.For our AI predictions, we use logistic regression models as explanations on simple models may be more useful to humans [37].We train the models on 14 pre-selected features from the Prosper dataset (of which participants will only see 5) and, when applicable, the gender or university feature.These models have an average accuracy of about 65% when compared to the original ground-truth values before adding synthetic features.Since we are using logistic regression, we can create a simple input-influence explanation of the AIs' predictions using feature weights.For continuous features like LoanOriginalAmount, we multiply the normalized feature value by the corresponding feature weight.For categorical features like EmploymentStatus, we take only the feature weight corresponding to the feature value (e.g., the weight of the EmploymentStatus = Full-Time feature).These values are graphed as in Figure 3 (left).

Metrics
We evaluate study outcomes based on participants' perceptions and decisions.We consider two fairness perception metrics based on questions in the post-phase surveys, and we consider the decisions made by the human-AI teams based on one fairness measuredemographic parity-and three decision quality metrics: accuracy, false negative rate, and false positive rate.
We measure all metrics in the two task phases across conditions.We count both "Likely Complete" and "Definitely Complete" as "Complete" and similarly for "Late".We count "Neutral" as agreement with the system prediction.

Decision-Making Fairness
Measure.We employ demographic parity [25, i.a.] as a measure of fairness in decision-making, which captures the independence between protected characteristics and prediction.There are other measures of fairness [46], however, not all definitions can be simultaneously satisfied [16,36].Demographic parity has been found to be more understandable to laypeople and better capture their perception of fairness than competing metrics [53,56].We can calculate the demographic parity for human-AI teams in task phases 1 and 2 across conditions as follows.
where Ŷ, is the predicted decision for the applicant by the participant .We obtain one demographic parity score in this way for each participant's decisions in each phase.
A parity close to 1 means an equal acceptance rate.As the acceptance rate for the advantaged group increases over the disadvantaged group, parity becomes closer to 0. If the acceptance rate of the disadvantaged group increases above the advantaged group, then the parity can increase above 1.A parity of less than 4  5 is considered "evidence of adverse impact" under US Anti-Discrimination law [24].In our model bias disclosure, we tell the participants about this 80% rule and that the model failed this test in phase 1, that is, the demographic parity of the model is below 80% (Figure 4a).

Fairness Perception Measures.
Based on our post-task surveys (described in §4.1), we calculate two measures of how participants perceive the degree of model unfairness.First, we consider how much participants agree with the statement "The AI model was fair across different genders".Here, the participant's fairness rating is higher when they believe the model is more fair.We also consider whether participants mark "unfairness" as a reason that they disagreed with model decisions.Here, the participant's fairness saliency is higher when they have a greater belief that they disagreed with the model due to unfairness.

Decision Quality
Measures.Measures such as accuracy require   's: a ground-truth "Complete" or "Late" value to compute.We have access to the ground-truth loan completion status for the original applicants.However, as we discuss in §5, our study uses an edited set of applicants with synthetic gender and university features which are made to be correlated with the outcome.We estimate the loan completion status for the edited profile from the ground-truth completion status of the original applicants and our defined sampling rates of the synthetic features using Bayes' rule.In turn, we compute an expected accuracy, expected FPR, and expected FNR using the estimated loan completion status as our decision-quality measures.See Appendix A for more details.

Statistical Analyses
To answer our research questions ( §3), we perform separate multiway ANOVA tests for different treatments (explanations, disclosures without explanations, and disclosures with explanations) for both protected and proxy conditions.For each statistical test, we construct a linear model with a fixed effect term for each independent treatment variable and one fixed effect term representing the participant's dispositional trust, which is calculated by averaging the scores from the pre-study trust survey.Additionally, in the within study comparisons (that is, moving from phase 1 to phase 2), we include the participant ID as a random effect.
The independent treatment variables are determined by the factors that vary between the effect of interest.For instance, to estimate the effect of explanation (that is, the vertical arrow in Figure 1), the treatment variable is the presence of explanations.For the effect of disclosure (that is, the horizontal arrows in Figure 1), the treatment variables are: (1) whether only bias disclosure has been shown (i.e., is this a phase 2 measurement with no correlation disclosure), and (2) whether full bias and correlation disclosure has been shown.For the effect of adding both explanations and disclosures (that is, the diagonal in Figure 1 going from without explanation and disclosures in phase 1 to with explanation and disclosures in phase 2), we include all three treatment variables.
In each ANOVA test, we consider the data from the relevant sections.For instance, to estimate the effect of explanations alone in the case of direct bias through a protected feature, we only consider the data in phase 1 of the "protected" conditions (left vertical section in Figure 1), and similarly for "proxy" conditions.
We fit a separate model for each fairness perception and decisionmaking metric as the dependent variable for each of the above effects.Although the ratio of "Complete" and "Late" model decisions shown to each participant is kept the same in each phase across conditions (leading to a constant AI-only parity of ∼0.67), the model accuracy, FPR, and FNR vary across conditions.To account for this variation, we subtract the model score from the score of the human-AI team.Similarly, in assessing learned trust measures, we adjust for Figure 5: Effect of explanations alone on various metrics when bias stems from usage of a protected vs proxy feature.The marks show the average and standard error of the given metric across participants in the given condition.
dispositional trust in AI by subtracting the participant's response to the corresponding question in the pre-study survey.
In addition to our key metrics detailed in §5.1, we also study the effect of the treatments on learned trust.We follow the same procedure as before but with the learned trust measures as the dependent variable in the ANOVA test.In this case, however, we adjust participants' post-phase 1 or post-phase 2 survey responses based on their baseline responses, and we do not include the overall dispositional trust term.Lastly, we perform additional ANOVA tests to analyze the difference between dispositional and learned trust.For this, we use the participants' trust ratings in the pre-study survey and surveys after phase 1 or phase 2 as the dependent variable.We fit two linear models, one for each phase, testing whether the phase has a fixed effect on the participants' trust ratings in different conditions (with the participant added as a random effect).
We perform Benjamini-Hochberg correction to avoid multiple testing effect with a false discovery threshold of 0.05 [5].This leads to a significance threshold of 0.0175 for the reported results.

QUANTITATIVE RESULTS
In this section, we report our findings on the effects of different interventions on the decision-making and fairness perception metrics.We first discuss the primary effects detailed in Figure 1: the effect of explanations ( §6.1), the effect of disclosures without explanations ( §6.2) and the effect of disclosures with explanations ( §6.3).
Next, we examine the effect of the joint intervention of adding both explanations and disclosures ( §6.4).Lastly, we discuss the effect of dispositional trust on the decision-making and fairness perception, the effect of different interventions on participants' trust, and the differences in participants' dispositional trust vs learned trust in §6.5 (full results in Appendix B).

Effect of Explanations Alone
First, we consider the effects of explanations alone by comparing the first phase of with and without explanations conditions with either type of bias.
In the case of direct bias through a protected feature, we find that explanations alone have a significant effect on all metrics; however, the direction of the effect is not consistent.Explanations alone significantly improve participants' ability to recognize unfairness (Figure 5a).Surprisingly, despite participants being more able to recognize that the model is unfair when shown explanations, when considering decisions instead of perceptions, we see that explanations significantly decrease gender parity (Figure 5b).Looking closer, we find that explanations lead to a significantly lower acceptance rate for female applicants, whereas the acceptance rate for male applicants does not change significantly (Table 5 in the Appendix).This decrease in acceptance rates also leads to a significant increase in the FNR and a significant decrease in FPR, with an overall higher accuracy (Figure 5c). Figure 6: Effect of disclosures without explanations on various metrics when bias stems from usage of a protected vs proxy feature.The marks show the average and standard error of the given metric across participants in the given condition.
In the case of indirect bias through a proxy feature, we find that explanations alone significantly reduce gender parity, similar to the case of direct bias (Figure 5b).Analogous to direct bias, this occurs due to a significant decrease in acceptance of female applicants (Table 5 in the Appendix).Further, explanations also lead to a significant increase in accuracy in the case of indirect bias.However, unlike with direct bias, explanations have no significant effect on fairness perceptions in the case of indirect bias (Figure 5a).
Overall, we find that explanations can help people recognize unfairness in the case of direct bias but not indirect.This is in line with our intuition that indirect biases are harder for participants to notice.However, regardless of fairness perceptions, in line Wang et al. [59], we find that explanations lead people to accept model biases leading to less fair decisions.This could be attributed to the presence of explanations assisting humans in rationalizing AI's unfair predictions rather than challenging them.

Effect of Disclosures without Explanations
We consider the effects of disclosures without explanations by comparing between phase 1 and phase 2 in without explanations conditions with either type of bias.
For both direct and indirect bias, we find that disclosing model bias alone does not have a significant effect on any of the outcome metrics (gender parity, fairness perception, accuracy, FPR, and FNR).However, in the case of indirect bias, when we disclose both the model bias and the relationship between the protected and proxy feature (i.e., that some universities in the study are women's colleges), participants were significantly more likely to report that the model is unfair or that this unfairness caused them to disagree with the model's decisions (Figure 6a).Interestingly, this still does not translate to fairer decisions-as seen in Figure 6b, the gender parity does not change significantly on disclosing both model bias and correlations in the case of indirect bias.
In sum, we find that, interestingly, being explicitly told that the model is biased does not affect participants' fairness perception of the model decisions (in both direct and indirect bias conditions).In the direct bias condition, this could be because the model is perceived as unfair even pre-disclosures.In the indirect bias condition, this might be because disclosures alone, without explanations, are insufficient for participants to fully acknowledge the bias in the model's predictions.But, disclosing both the model bias and the correlation between protected and proxy features does lead to participants perceiving the model as less fair in the case of indirect bias.However, this is not sufficient to improve decision-making fairness.This may be because, although disclosures assist participants in recognizing the unfairness of model predictions, they still lack sufficient information to overturn individual predictions without additional guidance on how the model utilizes the correlation between protected and proxy features.Figure 7: Effect of disclosures with explanations on various metrics when bias stems from usage of a protected vs proxy feature.The marks show the average and standard error of the given metric across participants in the given condition.

Effect of Disclosures with Explanations
We consider the effects of disclosures with explanations by comparing between phase 1 and phase 2 in explanations conditions with either type of bias.
In the case of direct bias through a protected feature, we find that bias disclosure with explanations has no significant effect on fairness perceptions (Figure 7a).Unlike with bias disclosure without explanations ( §6.2), we find that bias disclosure with explanations significantly increases the acceptance rate for female applicants (participants flip models' "Late" predictions for female applicants at a much higher rate), with the acceptance rate for the male applicants unchanged (Table 5 in the Appendix).Bias disclosure with explanations also results in a significant increase in FPR and a significant decrease in FNR, leading to an overall insignificant change in accuracy (Figure 7c).Even despite a higher acceptance rate for female applicants, the increase in gender parity is not significant (Figure 7b), likely due to the normalizing effect of the acceptance rate of male applicants, which also increases insignificantly.
In the case of indirect bias through a proxy feature, we find that disclosures with explanations have a positive impact on fairness both with respect to perceptions and decision-making.Similar to the without explanations case ( §6.2), disclosing both model bias and the association between gender and university while including explanations significantly decreases perceived model fairness (Figure 7a).For fairness rating, this effect is significant even without the correlation disclosure.Further, disclosing model bias alone, as well as disclosing model bias along with the correlation between protected and proxy feature with explanations leads to a significant increase in gender parity (Figure 7b).This stems from a significantly higher acceptance rate for female applicants., while the acceptance rate for male applicants remains unchanged (Table 5 in the Appendix).This also results in a higher FPR (significant) and a lower FNR (not significant), with an overall drop in accuracy (significant).
In sum, in the case of direct bias, even though bias disclosure with explanations does not improve recognition of model unfairness significantly (possibly because fairness ratings are low even pre-disclosures), it does reduce agreement with the model's biased decisions, leading to a significantly higher acceptance rate for female applicants (however, decision-making fairness does not improve, possibly because of the normalizing effect of acceptance rate of male applicants, which also increases, albeit insignificantly).This is in stark contrast with the effect of explanations alone ( §6.1), which improved recognition of unfairness but led to more biased decisions overall.
Further, in the case of indirect bias, disclosing the model bias and correlations between protected and proxy feature with explanations significantly increases both recognition of unfairness and gender parity in decision-making.We also observe an approximately 1% drop in accuracy in the case of indirect bias after disclosing model bias and correlations with explanations (which might be acceptable in certain cases).Overall, we conclude that neither explanations nor bias or correlation disclosures alone are sufficient.We observe better decision-making fairness outcomes when participants are not only shown the model explanations but also made aware of the biases underlying them.

Effect of Joint Intervention
We have seen that explanations alone decrease decision-making fairness, while disclosures with explanations can, in the case of indirect bias, have the opposite effect.Here, we consider the effect of both adding explanations and giving disclosures over including neither (i.e., comparing phase 1 without explanations to phase 2 with explanations).
We find that, for decision-making metrics (accuracy, FPR, and FNR), the effect of joint intervention is never significant (Table 6 in the Appendix).For fairness perception metrics, since the effect of explanations alone and disclosures with explanations pointed in the same direction, as expected, adding both explanations and disclosures also significantly improves recognition of unfairness.In sum, the joint intervention of including both explanations and disclosures (over including neither) helps participants in recognizing model biases but not in correcting them.This is a curious finding as we would expect that giving participants full information about bias in the model and also an explanation of individual prediction would lead to more fair decisions.We believe that this indicates that even though disclosures help undo some of the over-reliance on the model's biased decisions stemming from the inclusion of explanations, users still tend to be more accepting of model decisions with explanations than without.

Effects of Additional Variables
Beyond the primary effects considered in our study, we also investigate the effect of participants' dispositional trust levels on decisions and perceptions (RQ5) and the effect of our interventions on learned trust (RQ6).We include a detailed discussion of these results in Appendix B, along with a discussion of the differences between dispositional and learned trust and the relationship between a participant's gender and their decisions and perceptions.

Does dispositional trust affect decision-making and fairness perception measures?
As discussed in §5.2, we include a measurement of a participant's dispositional trust in AI as a fixed effect in our linear models.The effects and their significance were generally not consistent across models.Overall, we find that dispositional trust does not affect fairness perception in the case of direct bias, but it indeed leads to a significantly higher perceived fairness in the case of indirect bias, that is, participants with higher levels of dispositional trust were also less able to recognize indirect bias.Additionally, we find higher dispositional trust in AI was associated with making less fair decisions by relying more on the biased model.This is in line with previous findings that a person's dispositional trust significantly affects their reliance on a machine [45].However, we find that this effect is alleviated after including disclosures, both in the case of direct and indirect model bias.

Do explanations and disclosures affect self-reported learned trust?
In addition to the fairness perception, decision-making fairness and quality measures discussed above, we additionally consider the effect of the explanation and disclosure interventions on learned trust when compared to dispositional trust in AI generally.We find that our interventions generally have no effect on learned trust ratings in models exhibiting direct bias, except for explanations leading to significantly lowered feelings that the AI system works well.When model biases are indirect, full disclosures with explanations (or sometimes full disclosures without explanations) lead to a drop in learned trust.Lastly, explanations alone and full disclosures alone also lead to an increase in the predictability of the underlying model in the case of indirect bias but not in the case of direct bias.

QUALITATIVE RESULTS
As described in §4.1, for a selected set of applicants in each phase, participants were asked to write a free text justification for why they agreed or disagreed with AI (or marked it as "neutral") after marking their agreement and confidence.In addition to encouraging careful thinking, this also helps us gauge the kinds of reasoning participants employ in their decision-making.
As the main goal of our study is to understand how humans interact with AI decisions when the AI is biased, we primarily focus our qualitative analysis on rationales concerning biases.To analyze how participants perceive and use (or discard) the biased feature (gender or university), we consider justifications that directly reference the protected (gender) or proxy feature (university) by using a set of keywords for both.We started with an initial keyword set (e.g., "gender", "female", "university") and, based on reading a subset of the justifications, expanded to include spelling variations (e.g., "skool" and "collage") and other topically relevant words (e.g., abbreviated names of schools).We discuss our qualitative findings on justifications involving the protected feature in §7.1 and justifications involving the proxy feature in §7.2.Lastly, we discuss additional observations indicating over-reliance based on a random sample of justifications in §7.3.

Justification Involving Protected Feature
Here, we analyze justifications that explicitly mention gender in the direct bias conditions.In these justifications, we assess how participants incorporate gender into their judgment of AI predictions or when making their own predictions.Pre-disclosure, when explanations are not provided, participants rarely discuss gender as a salient part of their justification.However, when explanations are provided, participants often mention trying to "ignore gender" when making their decision.Notably, participants who mentioned gender in predictions about female applicants tended to make "Complete" or "Neutral" predictions.Thus, even though explanations significantly decreased gender parity ( §6.1) overall, they appear to help participants correct model biases in some cases.
Post-bias disclosure, justifications mentioning gender still predominantly appear in the condition with explanations.Nevertheless, some participants, even without explanations, mention gender bias and flip "Late" predictions for female applicants.For instance, one participant explained overriding such a prediction based on observing a male applicant with the same occupation predicted as "Complete".In the condition with explanations, many participants asserted that "gender should not be a deciding factor" and ignored gender when making their prediction.Some participants, when supporting a "Late" prediction for a female applicant, clarify that their decision was based on other features ("The large amount of negatives aside from gender still point towards being late.").However, even in the case of direct bias with both explanations and bias disclosure, some participants still align with model biases.For example, one participant agreed with a "Complete" prediction of a male applicant "[b]ecause according to AI, male gender is more likely to complete loan..."

Justifications Involving Proxy Feature
Here, we analyze justifications that explicitly mention universities in the indirect bias conditions and assess how participants incorporate universities in their judgments.Pre-disclosures, some participants mentioned that attending college generally increases the likelihood of repayment, regardless of the specific school (for example, mentioning that an applicant is "college-educated" and predicting "Complete").However, other participants factor in the specific college when deciding to accept or reject an applicant.In conditions without explanations, we see evidence of participants relying on their own judgment of school quality.For instance, participants mention that they have "never heard of Kenyon College, " or that a co-ed school in the application "...is not a particularly prestigious university" (even though in the underlying model, attending a co-ed school is counted as a positive).In contrast, with explanations provided, we instead see examples of participants aligning their evaluation of a school with the model's biases.For instance, on applicants from women's colleges, participants claimed "The applicant didn't go to a good college," or "...College history was a major contributing factor to being late on loan." On applications from co-ed schools, participants claimed that the applicant "...attended a good university, " or "the university is listed as a good one." This supports our quantitative finding that explanations alone lead participants to align with model biases ( §6.1).
After bias disclosure alone (and especially without explanations), mentions of universities were quite sparse.Some participants mentioned that the university feature is given excessive weight ("I just find it hilarious that the borrower's state and university is such a huge factor.") but may not recognize this as indirect bias.One participant, although aware that Bryn Mawr is a women's college, expressed uncertainty about identifying biased predictions without direct access to protected features: "After learning more about possible discriminatory predictions on the AI's part... I'm specifically concerned about gender and race... but don't quite know how to discern that from these charts...This applicant profile gave me pause because I *think* Bryn Mawr College is an all-women's college." After full bias and correlation disclosure, the "university" feature appears frequently in justifications.Here, participants continue to highlight the excessive weight assigned to the university in explanations, noting particularly unwarranted negative weight on women's colleges (e.g., "While the system said late, I thought this was unfair because it placed a strong negative value on the college, which might be a women's college.").Although some participants use this university bias as a justification for flipping model predictions, many acknowledge the bias and either make a neutral prediction or concur with predictions of women being late in repayment.
Additionally, we observed that some participants struggled to recall which universities were co-ed, which may have limited their ability to intervene and correct model biases.

Justifications Indicating Over-reliance
In addition to the positive examples of explanations and disclosures helping participants notice and correct model biases, we also observe instances where decisions were solely based on the AI prediction or the corresponding explanations, regardless of disclosures.For example, even after bias disclosure in a direct bias condition, a participant agreed with a prediction of a female applicant being "Late" saying that "They seem to have more negatives than positives."Similarly, after bias and correlation disclosures in an indirect bias condition, a participant changed the prediction for an applicant from a women's college from "Complete" to "Late, " providing a similar justification.This indicates some participants persist in using explanations containing known biases, since for these applicants, ignoring the biased features (gender and university, respectively) would have resulted in the positives outweighing the negatives.
We also find instances of over-trust in AI even after being told that AI is biased such as "I have no reason to disagree with the AI, if the AI is discriminating it probably has a good reason to," or "An AI is usually better than a professional let alone an amateur like me."This indicates that even despite explanations and disclosures, there is room for improvement in educating and training humans to avoid unwarranted trust in AI systems and promote fair decision-making.

DISCUSSION AND LIMITATIONS
In this work, we studied the effect of explanations and disclosures on fairness perceptions and decision-making when humans are provided predictions from models exhibiting direct or indirect bias.Our findings are summarized in Table 1.Regardless of intervention, we consistently observed that human-AI teams made fairer decisions than the AI alone.We found that explanations alone significantly improved participants' ability to notice unfairness in the case of direct bias only.However, explanations led participants to be more influenced by model biases, whether they noticed these biases or not.Disclosures were an effective tool for helping users recognize unfairness in the case of indirect bias, especially with the help of explanations.And we saw that this increased recognition of bias was paired with fairer human-AI decisions, showing that disclosures helped participants understand when and how to intervene on model decisions to produce fairer outcomes.
However, we found that the joint intervention of including both explanations and disclosures (over including neither) was only effective in helping participants recognize model bias, not correct it.If the main objective is to help users notice direct model biases, we recommend including explanations, and if it is to help users notice indirect model bias, we recommend including explanations and disclosing both model bias and the correlations between protected and proxy features.But if the main objective is to help the human-AI team produce fairer outcomes, we did not find including explanations with disclosures to be an effective intervention.However, if explanations are to be used, then disclosures may help contextualize explanations and the potential biases, especially when these biases are indirect.While in a perfect world, such known biases Table 1: Summary of our main results.Arrows represent significant effects and point in the direction of the change."BD" and "CD" represent bias and correlation disclosures, respectively.
could be addressed in the model itself instead of relying on human intervention, this may not always be possible.In many cases, we may have limited access to the underlying model (e.g., only having API access) or may not be able to non-superficially "debias" it [28].
Disclosures may help uncover these biases to humans, possibly leading to fairer human-AI decisions.
A key limitation of work is that since we show our participants partially-synthetic loan data, we cannot directly rely on the existing ground truth.Instead, we calculate the expectation of ground-truth based metrics (accuracy, FNR, and FPR) which means that there are applicants for which neither choice is very likely to be "correct" (i.e., both  (  = 1) and  (  = 0) are close to 0.5).We handle this in part by adjusting for the baseline AI-only scores; however, using a fully non-synthetic dataset and original ground truth values may lead to cleaner results.This lack of a true ground-truth, in part, led us to use demographic parity which has been argued to be insufficient as a notion of fairness [22].
There is potential concern about the use of a loan prediction task since the participants are not financial experts.As we discuss in §5, participants are shown a subset of the original Prosper features that we believe are relatively intuitive without more than a commonsense understanding of lending (e.g., size of the loan being requested and employment status).We also hope that a task mimicking the loan approval process is high-stakes enough to encourage more care from the crowd-workers in their decision-making.However, more work is needed to study how our findings generalize to settings with varied task stakes or domain expertise.
Another limitation is that our study design forces participants to make decisions one at a time, without seeing the entire pool of applicants.It is our hope that the percent/percentile information given for each feature gave participants a better sense of how each applicant's profile compared to the general pool, even without seeing many profiles.However, we recognize that it may be difficult for participants to conceptualize what a "strong" or "weak" candidate looks like under this design.This may make it more difficult for participants who, for example, wish to increase the acceptance rate of women in phase 2 to decide which female applicants are "most deserving" of having their prediction flipped to "Complete".
Despite these limitations, our work provides insights into the effect of explanations on fairness in human-AI decision-making, especially when the biases are indirect (through a proxy features).We conclude that neither explanations nor disclosures alone improve the fairness of decisions made by a human-AI team.Our findings serve to caution the wider community from treating explanations as a foolproof solution to human-AI collaborative decision-making: explanations may not always make model biases clear and may make people more prone to align with model biases, leading to less fair decisions.When people are repeatedly exposed to explanations that justify or rationalize biased predictions, they may begin to accept these biases as valid or even desirable, rather than critically questioning and challenging them.We highlight that explanations and disclosures in conjunction may be helpful to some extent.However, more work is needed to further examine how best to aid humans not only in identifying indirect model biases, but also in systematically correcting these biases.

A DERIVING METRICS USING PROBABILISTIC GROUND TRUTH
Here, we consider in more detail how to calculate the probability of ground-truth completion and the expected value of the decision quality metrics.

A.1 Probability of Loan Completion
In our study, each applicant  is only evaluated by a single participant  in a given condition, and each participant  evaluates 20 applicants across the two phases ( §4.1).We represent the set of observed decisions as  = {(, ) | participant  sees applicant }.
For the  th applicant, we want to know the probability that the true outcome should be complete, that is,  (  = 1).Let   be the set of original features of the  ℎ applicant's and  *  be the assigned (synthetic) gender or university.Note, we drop the subscript  when referring to a general applicant.We can write the probability of the true outcome for applicant with features (,  * ) as Since we assign the values of the protected or proxy feature  * based solely on the ground-truth outcome, we can assume that  * and  Then, removing any terms not containing  , which can be normalized away, we are left with In the protected case, we know the probability of acceptance given the synthetic feature (that is,  ( = 1 |  * )) based on our selected 60/40 male/female acceptance ratio.In the proxy case, we estimate the probability of acceptance given the university using the joint distribution of gender and university and the probability of acceptance given gender.For each applicant, we estimate the probability of the applicant completing their loan  ( = 1 | ) based on only the non-synthetic features () using a linear regression model that does not have access to the synthetic information.Finally, we can calculate  ( = 1) based on the rate of ground-truth acceptances in the original data.
A.2 Expected Accuracy, FNR, and FPR Using our calculated probability of the ground-truth acceptance of a given applicant (with original features  and synthetic feature  * ), we can calculate an expected accuracy, FNR, and FPR for a given human-AI team.For example, for expected FNR, we consider the expected number of false negatives over the expected number of ground truth positives.
Let ŷ, be the human-AI decision for the  ℎ applicant by the  ℎ participant, such that ŷ, is 1 if the human-AI decision is "Complete" and 0 if it is "Late".We can write the expected number of false negatives E[# False Negatives] as (, ) ∈ (1 − ŷ, ) (  = 1|  ,  *  ), that is, the probability of the ground truth label for the  ℎ applicant being 1 but the human-AI decision for the same applicant being 0. Similarly, we can write the expected value of positives

B EXTENDED RESULTS
In this section, we report additional results regarding dispositional trust, learned trust, and participant gender.We additionally include a full table detailing the primary effects considered in the study (Table 6) as well as the effects of our interventions on reliance split by applicant gender (Table 5).

B.1 Does dispositional trust affect decision-making and fairness perception measures?
As discussed in section 5.2, we include a measurement of a participant's dispositional trust in AI as a fixed effect in our linear models.The effects and their significance was generally not consistent across models (See Table 2).
We consistently find that dispositional trust has no significant impact on fairness ratings in the protected conditions, while it significantly increases fairness ratings in the proxy conditions.In other words, when biases are direct, people are equally able to notice model biases even when they tend to trust AI in general; however, when biases are indirect, people with higher dispositional trust in AI are less likely to believe that the model is unfair.For participants' fairness saliency, we see that there is never a significant effect in the case of direct bias, but higher dispositional trust significantly decreased the rate of disagreement only when considering disclosures without explanations.
We also find that, under models that consider the effect of explanations and disclosures with explanations, increases dispositional trust in AI significantly increased FPR and decreased FNR in the proxy conditions only.This is likely due to participants with higher trust in AI being more influenced by subtle indirect biases leading to lower acceptance rates for female applicants.This is also supported by the models measuring the effect of explanations on gender parity.Here, we see that an increased dispositional trust in AI significantly decreased parity under both types of bias.
Overall, people with a greater dispositional trust in AI tended to make more unfair decisions (when working with a biased model) and were less likely to notice indirect bias.

B.2 Do explanations and disclosures affect learned trust?
In this section, we discuss the effect of our interventions-explanations, disclosures without explanations, and disclosures with explanations (Figure 1)-on learned trust over dispositional trust in AI generally.We perform statistical tests similar to the ones described in §5.2.We consider the different trust measures as the dependent variable and the treatment as the fixed effect term.We also control for the dispositional trust level as a fixed effect.As seen in Table 6, we find that our treatments generally have no effect on trust ratings in models exhibiting direct bias, except for explanations alone leading to significantly lowered feelings that the AI system works well.In the case of indirect bias, we often see that full disclosure with explanations (and sometimes also full disclosure without explanations or bias disclosure with explanations) has a significant effect on learned trust.These effects demonstrate lowered feelings that the AI system works well as well as decreased feelings that the AI system can perform as well as an untrained human, decreased confidence in the system, decreased feelings of safety when relying on the system, and increased wariness of the AI system.
We also find that explanations alone significantly increase participant's perception of model predictability in the proxy conditions but not the protected conditions.Without explanations, full bias and correlation disclosure also significantly increased predictability.With explanations, however, disclosures do not increase predictability, likely due to high predictability ratings even with explanations alone.This is to say that our models are already seen as relatively predictable when biases are direct, but when biases are indirect, explanations or disclosure of model bias and the model's usage of the university feature help make the model more predictable.In the previous section, we discussed how interventions affected learned trust when controlling for baseline dispositional trust.Here, we study whether there is a significant difference between dispositional trust and learned trust in the biased models across the questions described in §4.1.These results are shown in Table 3.We find that participants usually thought our model worked worse than AI does generally, that it inspired less confidence, and was less predictable.Participants also regarded our AI system as less safe than AI in general, but this is primarily true only in the case of direct bias.Surprisingly, participants did not consider our AI systems to be less safe than general in phase 1 when they were given explanations (which would have directly indicated that the system used gender as a feature to determine loan outcomes).Participants' wariness of the biased models was not significantly different from their baseline wariness in AI.

B.4 Does participant gender correlate with decision-making and fairness perception measures?
Because our models exhibits gender bias, it stands to reason that participants of varied gender may react differently to the models.Namely, non-male participants may be more sensitive to bias against women.Using point-biserial correlation tests [41], we consider whether gender8 correlates with our fairness perception, decision-making fairness, and decision-making quality metrics as well as the rate of "Complete" predictions for female and male applicants directly.We find no significant correlations between gender and behavior or perceptions in our task (See Table 4).However, we do find marginally significant correlations with acceptance rate for female applicants, FPR, and accuracy.This shows there may be a weak trend in male participants accepting fewer female candidates ( = 0.05), leading to a lower FPR and higher accuracy.

Figure 2 :
Figure 2: Order of study phases.

Figure 3 :
Figure3: Example profile with explanation from the "protected" model (left) without an explanation (right) and question to the user (below).The predicted outcome is completing the loan on time.The labels on the left show the name of each feature.The labels on the right show the value of each feature for the current applicant and the percent/percentile of this value in the training data.For the explanation, on the x-axis positive blue values correspond to "Complete" predictions and negative red to "Late".See Figure8in the Appendix for an example profile as shown in the study interface.
In the figure below, you can see the associations between different colleges and binary gender.(This is based on the historical data used to train our AI system.)M o u n t H o l y o k e C o l l e g e B r y n M a w r C o l l e g e D e n i s o n U n i v e r s i t y S c r i p p s C o l l e g e T r i n i t y C o l l e g e H a r v e y M u d d C o l l e g e B u c k n e l l U n i v e r s i t y L a f a y e t t e C o l l e g e K e n y o n C o l l e g e M a c a l e s t e r C o l l e g e -0.288 -0.281 0.013 0.024 0.049 0.056 0.066 0.074 0.087 0.103 The colleges towards the left (in purple) are more associated with women.On the other hand, the colleges towards the right (in green) are more associated with men.The values on the figure indicate the strength of association (the closer to zero, the weaker the association).

Figure 10 :
Figure 10: Bias disclosure showing the demographic parity of the model in phase 1.

Figure 11 :
Figure 11: Correlation disclosure showing the relationship between university and gender in our synthetic data.

Figure 12 :
Figure 12: In proxy conditions where participants are not given correlation disclosure.They are instead given this screen explaining that proxies can general, without mentioning the relationship in our data.

Figure 13 :
Figure13: Comprehension check screen testing both understanding of bias disclosure (Figure10) and correlation disclosure (Figure11).The correlation disclosure question is only shown in proxy conditions where the participants are given correlation disclosure.

Figure 14 :
Figure 14: Initial trust survey given before the task is introduced.

Figure 15 :
Figure15: Example post-task survey.This is the version that is shown after phase 2 of proxy conditions.In protected conditions and after phase 1 of proxy conditions, the question about which feature might have lead to gender bias is omitted.

Table 2 :
Effects of dispositional trust in AI on different outcome metrics (Perception, Parity, Accuracy, FPR, and FNR).

Table 3 :
Comparison of dispositional trust vs learned trust in varied conditions and phases.

Table 4 :
Correlation between participants self-describing as male and various performance metrics.

Table 5 :
Effect of interventions on acceptance rate for female and male applicants across conditions and phases.

Table 6 :
Overall results of tests regarding the primary effects of our study on fairness perceptions, decision-making fairness, decision-making quality, and learned trust.
C HUMAN STUDY INTERFACE