Towards User Guided Actionable Recourse

Machine Learning’s proliferation in critical fields such as healthcare, banking, and criminal justice has motivated the creation of tools which ensure trust and transparency in ML models. One such tool is Actionable Recourse (AR) for negatively impacted users. AR describes recommendations of cost-efficient changes to a user’s actionable features to help them obtain favorable outcomes. Existing approaches for providing recourse optimize for properties such as proximity, sparsity, validity, and distance-based costs. However, an often-overlooked but crucial requirement for actionability is a consideration of User Preference to guide the recourse generation process. In this work, we attempt to capture user preferences via soft constraints in three simple forms: i) scoring continuous features, ii) bounding feature values and iii) ranking categorical features. Finally, we propose a gradient-based approach to identify User Preferred Actionable Recourse (UP-AR). We carried out extensive experiments to verify the effectiveness of our approach.


INTRODUCTION
Actionable Recourse (AR) [30] refers to a list of actions an individual can take to obtain a desired outcome from a xed Machine Learning (ML) model.Several domains such as lending [28], insurance [26], resource allocation [6,27] and hiring decisions [1] are required to suggest recourses to ensure the trust of a decision system; in such scenarios, it is critical to ensure the actionability (the viability of taking a suggested action) of recourse, otherwise the suggestions are pointless.Consider an individual named Alice who applies for a loan, and a bank, which uses an ML-based classier, who denies it.Naturally, Alice asks -What can I do to get the loan?The inherent question is what action she must take to obtain the loan in the future.Counterfactual explanation introduced in Wachter [31] provides a what-if scenario to alter the model's decision, but it does not account for actionability.AR aims to provide Alice with a feasible action set which is both actionable by Alice and which suggests as low-cost modications as possible.
While some features (such as age or sex) are inherently inactionable for all individuals, Alice's personalized constraints may also limit her ability to take action on certain suggested recourses (such as a strong reluctance to secure a co-applicant).We call these localized constraints User Preferences, synonymous to user-level constraints introduced as local feasibility by Mahajan et al. [17].Figure 1 illustrates the motivation behind UP-AR.Note that how similar individuals can prefer contrasting recourse.
Actionability, as we consider it, is centered explicitly around individual preferences, and similar recourses provided to two individuals (Alice and Bob) with identical feature vectors may not necessarily be equally actionable.Most existing methods of nding actionable recourse are restricted to omissions of features from the actionable feature set and box constraints [18] that bound actions.
In this study, we discuss three forms of user preferences and propose a user-provided score formulation for capturing these dierent idiosyncrasies.We believe that communicating in terms of preference scores (by say, providing a 1-10 rating on the actionability of specic features) improves the explainability of a recourse generation mechanism, which ultimately improves trust in the underlying model.Such a system could also be easily re-run with dierent preference scores, allowing for diversiable recourse.We surveyed 40 individuals and found that an overwhelming 60% majority preferred to provide their preferences on individual features for inuencing a recourse mechanism, as opposed to receiving multiple "stock" recourse options or simply receiving a single option.Additional details of our survey are included in Section 7. We provide a hypothetical example of UP-AR's ability to adapt to preferences in Table 1.
Motivated by the above considerations, we capture soft user preferences along with hard constraints and identify recourse based on local desires without aecting the success rate of identifying recourse.For example, consider Alice prefers to have 80% of the recourse "cost" from loan duration and only 20% from the loan amount, meaning she prefers to have recourse with a minor reduction in the loan amount.Such recourse enables Alice to get the benets of a loan on her terms, and can easily be calculated to Alice's desire.We study the problem of providing user preferred recourse by solving a custom optimization for individual user-based preferences.Our contributions include:  to identify a recourse tailored to her liking.Our approach highlights a cost correction step to address the redundancy induced by our method.• We consolidate performance metrics with empirical results of UP-AR across multiple datasets and compare them with state-of-art techniques.

Related Works
Several methods exist to identify counterfactual explanations, such as FACE [22], which uses the shortest path to identify counterfactual explanations from high-density regions, and Growing Spheres (GS) [16] which employs random sampling within increasing hyperspheres for nding counterfactuals.CLUE [3] identies counterfactuals with low uncertainty in terms of the classier's entropy within the data distribution.Similarly, manifold-based CCHVAE [21] generates high-density counterfactuals through the use of a latent space model.However, there is often no guarantee that the what-if scenarios identied by these methods are attainable.
Existing research focuses on providing feasible recourses, yet comprehensive literature on understanding and incorporating user preferences within the recourse generation mechanism is lacking.It is worth mentioning that instead of understanding user preferences, Mothilal et al. [18] provides a user with diverse recourse options and hopes that the user will benet from at least one.The importance of diverse recourse recommendations has also been explored in recent works [18,25,31], which can be summarized as increasing the chances of actionability as intuitively observed in the domain of unknown user preferences [13].Karimi et al. [14] and Cheng et al. [5] also resolve uncertainty in a user's cost function by inducing diversity in the suggested recourses.Interestingly, only 16 out of the 60 recourse methods explored in the survey by Karimi et al. [13] include diversity as a constraint where diversity is measured in terms of distance metrics.Alternatively, studies like Cui et al. [7], Rawal and Lakkaraju [23], Ustun et al. [30] optimize on a universal cost function.This does not capture individual idiosyncrasies and preferences crucial for actionability.
Eorts of eliciting user preferences include recent work by De Toni et al. [8].The authors provide interactive human-in-the-loop approach, where a user continuously interacts with the system.However, learning user preferences by asking them to select from one of the partial interventions provided is a derivative of providing a diverse set of recourse candidates.In this work, we consider fractional cost as a means to communicate with Alice, where fractional cost of a feature refers to fraction of cost incurred from a feature 8 out of the total cost of the required intervention.
The notion of user preference or user-level constraints was previously studied as local feasibility [17].Since users can not precisely quantify the cost function [23], Yadav et al. [32] diverged from the assumption of a universal cost function and optimizes over the distribution of cost functions.We argue that the inherent problem of feasibility can be solved more accurately by capturing and understanding Alice's recourse preference and adhering to her constraints which can vary between Hard Rules such as unable to bring a co-applicant and Soft Rules such as hesitation to reduce the amount, which should not be interpreted as unwillingness.This is the rst study to capture individual idiosyncrasies in the recourse generation optimization to improve feasibility.

PROBLEM FORMULATION
Consider a binary classication problem where each instance represents an individual's feature vector x = [x 1 , x 2 , •, x ⇡ ] and associated binary label y 2 { 1, +1}.We are given a model 5 (x) to classify x into either 1 or +1.Let 5 (x) = +1 be the desirable output of 5 (x) for Alice.However, Alice was assigned an undesirable label of 1 by 5 .We consider the problem of suggesting action r = [r 1 , r 2 , •, r ⇡ ] such that 5 (x + r) = +1.Since suggested recourse only requires actions to be taken on actionable features denoted by , we have r 8 ⌘ 0 : 88 8 .We further split into continuous actionable features 2>= and categorical actionable features 20C based on feature domain.Action r is obtained by solving the following optimization, where userCost (r, x) is any predened cost function of taking an action r such that: and 5 (x + r) = +1.
(3) User Preference Type III (Ranking categorical features): Users are also asked to provide a ranking function R : 20C !Z +1 on 20C .Let R 8 refers to the corresponding rank for a categorical feature 8. Our framework identies recourse by updating the candidate action based on the ranking provided.For example, consider 20C = {HasCoapplicant, HasGuarantor, CriticalAccountOrLoansElsewhere} for which Alice ranks them by {3, 2, 1}.The recourse generation system considers suggesting an action on HasGuarantor before HasCoapplicant.Ranking preferences can be easily guaranteed by a simple override in case of discrepancies while nding a recourse.

Cognitive simplicity of preference scores.
The user preferences proposed are highly benecial for guiding the recourse generation process.Please note that in the absence of these preferences, the recourse procedure falls back to the default values set by a domain expert.Additionally, the users can be rst presented with the default preferences, and asked to adjust as per their individual preferences.A simple user interface can help them interact with the system intuitively.For example, adjusting a feature score automatically adjusts the corresponding preference type scores.

Proposed optimization
We depart from capturing a user's cost of feature action and instead obtain their preferences for each feature.We elicit three forms of preferences detailed in the previous section and iteratively take steps in the action space.We propose the following optimization over the basic predened steps based on the user preferences.Let us denote the inherent hardness of feature action r 8 for feature value x 8 using cost (r, x) which can be any cost function easily The proposed method minimizes the cost of a recourse weighted by 8 for all actionable features.We discuss the details of our considerations of cost function in Section 3.1.The order preference of categorical feature actions can be constrained by restrictions while nding a recourse.The next section introduces UP-AR as a stochastic solution to the proposed optimization.

USER PREFERRED ACTIONABLE RECOURSE (UP-AR)
Our proposed solution, User Preferred Actionable Recourse (UP-AR), consists of two stages.The rst stage generates a candidate recourse by following a connected gradient-based iterative approach.
The second stage then improves upon the redundancy metric of the generated recourse for better actionability.The details of UP-AR are consolidated in Algorithm 1 and visualized in Figure 2.

Stage 1: Stochastic gradient-based approach
Poyiadzi et al. [22] identies a counterfactual by following a highdensity connected path from the feature vector x.With a similar idea, we follow a connected path guided by the user's preference to identify a feasible recourse.We propose incrementally updating the candidate action with a predened step size to solve the optimization.At each step C, a candidate intervention is generated, where any feature 8 is updated based on a Bernoulli trial with probability r (2)   ...
predened step X (C ) 8 using the following procedure: where With precomputed costs for each step, weighted inverse cost is computed for each feature, and these values are mapped to a probability distribution using a function like softmax.Softmax gives a prob- ⌘ by converting 8 scores into probabilities.We leverage the idea of log percentile shift from AR to determine the cost of action since it is easier to communicate with the users in terms of percentile shifts.Specically, we follow the idea and formulation in [30] to dene the cost: were & 8 (x 8 ) representing the percentile of feature 8 with value x 8 is a score below which & 8 (x 8 ) percentage of scores fall in the frequency distribution of feature values in the target population.We adapt and extend the idea that counterfactual explanations and adversarial examples [29] have a similar goal but with contrasting intention [19].A popular approach to generating adversarial examples [10] is by using a gradient-based method.We employ the learning of adversarial example generation to determine the direction of feature modication in UP-AR: the Jacobian matrix is used to measure the local sensitivity of outputs with respect to each input feature.Consider that 5 : R ⇡ !R maps a ⇡-dimensional feature vector to a -dimensional vector, such that each of the partial derivatives exists.For a given x = [x 1 , . . ., x 8 , . . ., x ⇡ ] and 5 (x) = [5 [1] (x), . . ., 5 [ 9 ] (x), . . ., 5 [ ] (x)], the Jacobian matrix of 5 is dened to be a ⇡ ⇥ matrix denoted by J, where each ( 9, 8) mx 8 .For a neural network (NN) with at least one hidden layer, J 9,8 is obtained using the chain rule during backpropagation.For an NN with one hidden layer represented by weights {F }, we have: Where in Equation 7, 0 ; is the output (with possible activation) of the hidden layer and F ; is the weight of the node ;.Notice line 4 in Algorithm 1 which updates the candidate action for a feature 8 at step C as: r Following the traditional notation of a binary classication problem and with a bit of abuse of notation ⌘ captures the direction of the feature change at step C.This direction is iteratively calculated, and additional constraints such as nonincreasing or non-decreasing features can be placed at this stage.
1 Calibrating frequency of categorical actions.We employ temperature scaling [11] parameter g observed in Equation 5 to calibrate UP-AR's recourse generation cost.Updates on categorical features with xed step sizes are expensive, especially for binary categorical values.Hence, tuning the frequency of categorical suggestions can signicantly impact the overall cost of a recourse.g controls the frequency with which categorical actions are suggested.Additionally, if a user prefers updates on categorical features over continuous features, UP-AR has the exibility to address this with a smaller g.
To study the eect of g on overall cost, we train a Logistic Regression (LR) model on a processed version of German [4] dataset and generate recourses for the 155 individuals who were denied credit.The cost gradually decreases with decreasing g since the marginal probability of suggesting a categorical feature change is diminished and the corresponding experiment is deferred to the Appendix.Hence, without aecting the success rate of recourse generation, the overall cost of generating recourses can be brought down by decreasing g.In simple terms, with a higher g, UP-AR frequently suggests recourses with expensive categorical actions.We note that g can also be informed by a user upon seeing an initial recourse.After the strategic generation of an intervention, we implement a cost correction to improve upon the potential redundancy of actions in a recourse option.

Stage 2: Redundancy & Cost Correction (CC)
In our experiments, we observe that once an expensive action is recommended for a categorical feature, some of the previous action steps might become redundant.Consider an LR model trained on the processed german dataset.Let = {LoanDuration, LoanAmount, HasGuarantor} out of all the 26 features, where HasGuarantor is a binary feature which represents the user's ability to get a guarantor for the loan.Stage 1 takes several steps over LoanAmount and LoanDuration before recommending to update HasGuarantor.These steps are based on the feature action probability from Equation 5. Since categorical feature updates are expensive and occur with relatively low probability, Stage 1 nds a low-cost recourse by suggesting low-cost steps more frequently in comparison with high-cost steps.Once an update to a categorical feature is recommended, some of the previous low-cost steps may be redundant, which can be rectied by tracing back previous continuous steps.Consider a scenario such that 98 2 20C : r > 0 for a recourse obtained after ) steps in Stage 1.The CC procedure updates all the intermediary recourse candidates to reect the categorical changes i.e., 88 2 20C : r : 8C 2 {1, 2, . . .,) 1} to obtain r (C ) .We then perform a linear retracing procedure to return r (C )   such that 5

DISCUSSION AND ANALYSIS
In this section, we analyze the user preference performance of UP-AR.For simplicity, a user understands cost in terms of log percentile shift from her initial feature vector described in Section 3. Let ˆ 8 be the observed fractional cost for feature 8 formally dened in Equation 11.Any cost function can be plugged into UP-AR with no restrictions.A user prefers to have 8 fraction of the total desired percentile shift from feature 8. Consider = {LoanDuration, Lemma 4.1 implies that the expected cost E [cost (r 8 , x 8 )], specifically for a continuous feature action is positively correlated to the probabilistic interpretation of user preference scores.Hence r satises users critical Type I constraints in expectation.Recall that Type II and III constraints are also applied at each step C. Lemma 4.1 signies that UP-AR adheres to user preferences and thereby increases the actionability of a suggested recourse.C 4.2.For UP-AR with a linear f (•), predened steps with equal costs and cost (r, x) = Õ 8 2 cost (r 8 , x 8 ), total expected cost after ) ⇤ steps is: Corollary 4.2 states that with strategic selection of f (•), X • and cost (•, •), UP-AR can also tune the total cost of suggested actions.In the next section, we will compare multiple recourses based on individual user preferences for a randomly selected adversely aected individual.

Case study of individuals with similar features but disparate preferences
Given an LR model trained on german dataset and Alice, Bob and Chris be three adversely aected individuals.= {LoanDuration, LoanAmount, HasGuarantor} and corresponding user preferences are provided by the users.In Table 3, we consolidate the corresponding recourses generated for the specied disparate sets of preferences.
From Table 3 we emphasize the ability of UP-AR to generate a variety of user-preferred recourses based on their preferences, whereas AR always provides the same low-cost recourse for all the individuals.The customizability of feature actions for individual users can be found in the table.When the Type I score for LoanAmount is 0.8, UP-AR prefers decreasing loan amount to loan duration.Hence, the loan amount is much lesser for Chris than for Alice and Bob.

EMPIRICAL EVALUATION
In this section, we demonstrate empirically: 1) that UP-AR respects 8 -fractional user preferences at the population level, and 2) that UP-AR also performs favorably on traditional evaluate metrics drawn from CARLA [20].We used the native CARLA catalog for the Give Me Some Credit (GMSC) [12], Adult Income (Adult) [9] and Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) [2] data sets as well as pre-trained models (both the Neural Network (NN) and Logistic Regression (LR)).NN has three hidden layers of size [18,9,3], and the LR is a single input layer leading to a Softmax function.Although AR is proposed for linear models, it can be extended to nonlinear models by the local linear decision boundary approximation method LIME [24] (referred as AR-LIME).
PERFORMANCE METRICS:.For UP-AR, we evaluate: (1) Success Rate (Succ.Rate): The percentage of adversely aected individuals for whom recourse was found.(2) Average Time Taken (Avg.Tim.):The average time (in seconds) to generate recourse for a single individual.
(3) Constraint Violations (Con.Vio.):The average number of non-actionable features modied.( 4) Redundancy (Red.):A metric that tracks superuous feature changes.For each successful recourse calculated on a univariate basis, features are ipped to their original value.The redundancy for recourse is the number of ips that do not change the model's classication decision.( 5) Proximity (Pro.):The normalized ; 2 distance of recourse to its original point.( 6) Sparsity (Spa.):The average number of features modied.
We provide comparative results for UP-AR against state-of-the-art counterfactual/recourse generation techniques such as GS, Wachter, AR(-LIME), CCHAVE and FACE.These methods were selected based on their popularity and their representation of both independence and dependence based methods, as dened in CARLA.In addition to the traditional performance metrics, we also measure Preference-Root mean squared error (pRMSE) between the user preference score and the fractional cost of the suggested recourses.We calculate ?'"(⇢ 8 for a randomly selected continuous valued feature 8 using: where Here ( 9) 8 and ˆ ( 9)   8 are user provided and observed preference scores of feature 8 for an individual 9.In Table 4, we summarize ?'"(⇢, which is the average error across continuous features such that: DATASETS:.We train an LR model on the processed version of german [4] credit dataset from sklearn's linear_model module.We replicate Ustun et al. [30]'s model training and recourse generation on german.The dataset contains 1000 data points with 26 features for a loan application.The model decides if an applicant's credit request should be approved or not.Consider 2>= = {LoanDuration, LoanAmount}, and 20C = {CriticalAccountOrLoansElsewhere, Has-Guarantor, HasCoapplicant}.Let the user scores for 2>= be = {0.8,0.2} and ranking for 20C be {3, 1, 2} for all the denied individuals.For this experiment, we set g 1 = 4. Out of 155 individuals with denied credit, AR and UP-AR provided recourses to 135 individuals.
Cost Correction: Out of all the denied individuals for whom categorical actions were suggested, an average of ⇠ $400 in LoanAmount was recovered by cost correction.
For the following datasets, for traditional metrics, user preferences were set to be uniform for all actionable features to not bias the results to one feature preference over another: (1) GMSC: The data set from the 2011 Kaggle competition is a credit underwriting dataset with 11 features where the target is the presence of delinquency.Here, we measure what feature changes would lower the likelihood of delinquency.We again used the default protected features (age and number of dependents).The baseline accuracy for the NN model is 81%, while the baseline accuracy for the LR is 76%.The baseline accuracy for NN is 78%, while baseline accuracy for LR is 71%.
PERFORMANCE ANALYSIS OF UP-AR:.We nd UP-AR holistically performs favorably to its counterparts.Critically, it respects feature constraints (which we believe is fundamental to actionable recourse) while maintaining a signicantly low redundancy and sparsity.This indicates that it tends to change fewer necessary features.Its speed makes it tractable for real-world use, while its proximity values show that it recovers relatively low-cost recourse.These results highlight the promise of UP-AR as a performative, low-cost option for calculating recourse when user preferences are paramount.UP-AR shows consistent improvements over all the performance metrics.The occasional lower success rate for a NN model is attributed to 0 constraint violations.
?'"(⇢: We analyze user preference performance in terms of ?'"(⇢.From Table 4, we observe that UP-AR's ?'"(⇢ is consistently better than the state of art recourse methods.The corresponding experimental details and visual representation of the distribution of ?'"(⇢ is deferred to Appendix 5.1.

Random user preference study
We performed an experiment with increasing step sizes on German dataset.We observed that, with increasing step sizes, ?'"(⇢ 8 increased from 0.09 to 0.13, whereas it was consistent for AR.
With these experiments we conclude that UP-AR's ˆ deviation from the user's is consistently lower than the existing recourse generation methodologies.We observe that AR is unaected by the varying user preference due to the fact that AR and other stateof-the-art recourse methodologies lack the capability of capturing such idiosyncrasies.On the other hand, UP-AR is driven by those preferences and has signicantly better ?'"(⇢ in comparison to AR.

Cost Correction analysis
In Table 5 we explore the eect of UP-AR's cost correction procedure on the Adult and COMPAS datasets.We do not include the GMSC dataset as it does not include binary features, and therefore does not utilize the cost correction procedure.In Table 5 we show the number of factuals, the percentage of factuals for which recourse was found, the percentage of recourse found which contained at least one binary action, the percent of recourse found which underwent cost correction, the average percentage of steps saved by the cost correction procedure, and the average percent of cost savings, measured as the percent reduction in continuous cost (; 2 distance) between a factual and its recourse before and after the cost-correction procedure.

CONCLUDING REMARKS
In this study, we propose to capture dierent forms of user preferences and propose an optimization function to generate actionable recourse adhering to such constraints.We further provide an approach to generate a connected [15] recourse guided by the user.
We show how UP-AR adheres to soft constraints by evaluating user satisfaction in fractional cost ratio.We emphasize the need to capture various user preferences and communicate with the user in comprehensible form.This work motivates further research on how truthful reporting of preferences can help improve overall user satisfaction.

USER ACCEPTANCE SURVEY
We surveyed 40 random students and employees from a mailing list.The goal of this survey is to establish whether people preferred to provide specic preferences over other mechanism.The survey included one question with four options as follows: If you are denied a loan application.What do you expect from bank to get your loan approved ?
(1) Single list of suggestions to your prole.Ex: (increase income by 100$ & reduce loan duration by 1 year) (2) A set with multiple lists of suggestions to your prole.Ex: (i) increase income by 100$ and reduce loan duration by 1 year OR ii) increase income by 500$ OR iii) reduce loan duration by 3 year OR iv) bring a co-applicant) (3) Inuence bank's suggestions by providing preferential scores for actions you can take.Ex: (preferring to increase loan duration more than loan amount by 8:2, or preferring to bring a guarantor before a co-applicant) (4) Any other form of preferences Every individual in the survey was asked to select one of the four choices provided.In this survey, it is identied that majority of 60% of individuals preferred inuencing the bank's decision by providing preference scores for individual features, followed by 30% of individuals who wanted multiple recourses from the bank.The remaining 10% of individuals preferred a single recourse or any other form of preference.

APPENDIX 8.1 Analysis
Interpretable and Incremental steps: In this study, each step X (C ) 8 is a predened minimal feature modication inherently derived from the feature vector x.A recourse suggested by UP-AR can be broken down into interpretable actions.Alice was denied a loan application, and her suggested recourse is to decrease the loan amount from $8072 to $6472 and decrease the loan duration from 30 years to 10 years.Here the recourse is broken down into reducing the loan amount by 16 steps of $100 each, implying that the loan amount is 16 steps connected with the original feature value.Such steps increase the comprehensibility of recourse.

ETHICS STATEMENT
We proposed a recourse generation method for machine learning models that directly impact human lives.For practical purposes, we considered publicly available datasets for our experiments.Due care was taken not to induce any bias in this research.We further evaluated the primary performance metric for two groups (males and females) for german dataset.
This study reects our eorts to bring human subjects within the framework of recourse generation.Comprehensible discussion with the users about the process improves trust and explainability of the steps taken during the entire mechanism.With machine learning models being deployed in high-impact societal applications, considering human inputs (in the form of preferences) for decisionmaking is a highly signicant factor for improved trustworthiness.Additionally, comprehensible discussion with human subjects is another crucial component of our study.Our study motivates further research for capturing individual idiosyncrasies.
Gathering preferences from an individual could be another potential source of bias for UP-AR recourses, which needs to be evaluated with further research with human subjects.Preferential recourses will have a signicant positive impact on humans conditioned on truthful reporting of various preferences.Preference scores are subject to various background factors aecting an individual, some of which can be sensitive.Additional care must be taken to provide condentiality to these background factors while collecting individual preference scores, which have the potential to be exploited.

ABLATION STUDIES
In this section, we perform multiple experiments to understand several properties of UP-AR.First, run an experiment to measure the disparities in ?'"(⇢ between the two gender groups.Secondly, we run experiments to understand the eects of the temperature parameter g on UP-AR.Thirdly, we try to understand the relation between ) ⇤ and ˆ , if any.

UP-AR user preference disparities
UP-AR satises user Type I user preferences as observed in Section 4. For the following experiment, we consider a similar setup as in Section 4. We now evaluate similar performance among males and females separately in terms of ?'"(⇢.With a similar setup as Section 4, Figure 9, shows a distribution of cost between the two gender groups.Observed ?'"(⇢ !>0=⇡DA0C8>= for males is 0.09, whereas for females it is 0.11.With this simple experiment, we conclude that UP-AR does not show any signicant disparities in terms of adhering to user preferences.

Ablation study on g
For the following experiment, we again consider a similar setup as in Section 4. Each data point in the plot represents the mean total cost of recourses for the target population for 20 independent runs of UP-AR, and the shaded region represents the ± 1 standard deviation of the 20 runs.We observe: (1) Eect of calibrating the overall cost of target population using g.g controls the frequency of categorical actions detailed in Section 3.1.1.(2) ˆ LoanDuration is not aected by any setting of g as observed in Figure 11.

Relation between ˆ and ) ⇤
Again considering a similar setup as in Section 4, Figure 12 visualizes the relation between the observed ˆ !>0=⇡DA0C8>= and the number of steps taken to identify a recourse ) ⇤ .We conclude that ˆ !>0=⇡DA0C8>= is not aected by the number of steps taken to identify a recourse by UP-AR.

Real cost vs Expected cost
In this experiment, we compare the expected cost and the actual observed cost of the recourses generated.Figure 13 visualizes the expected cost and observed cost for actionable features.We observe that with increasing g, the total cost of recourses increases suggesting high categorical actions suggested in the generated recourses.Additionally, We also notice the consistency in ˆ !>0=⇡DA0C8>= for varying g.Please note that careful calibration of g can help individuals who prefer categorical feature actions over continuous features.

Ablation study on Actionable Feature Set
We conducted an experiment on the average computational cost (modeled by execution time) of UP-AR and GS across a varying number of actionable features to explore how their performance changes as the actionable set size increases.Figures 14 and 15 show the performance trends for an LR model and NN model on the Adult Income dataset, while gures 16 and 17 show the performance trends for an LR model and NN model on the German Credit dataset.We observe that UP-AR's average time increases as the actionable   feature dimension increases whereas gradient based GS remains relatively consistent.This can be attributed to the additional user   scoring preference and ranking preference constraints while identifying a recourse, as well as the cost correction procedure as the number of binary changes increases.

Figure 1 :
Figure 1: Illustration of UP-AR.Similar individuals Alice and Bob with contrasting preferences can have dierent regions of desired feature space for a recourse.

(C ) 8
derived from user preference scores and the cost of taking a do

Figure 3 :
Figure 3: AR and UP-AR's distribution of ˆ LoanDuration for a Logistic Regression model trained on German.

Figure 5 :Figure 7 :
Figure 5: Logistic Regression model Figure 6: Neural Network model Figure 7: Distribution of the average ?'"(⇢ of UP-AR and other recourse methodologies.

Figure 8 :
Figure 8: Snapshot of the human acceptance survey.

Figure 9 :
Figure 9: Comparison of UP-AR's distribution of ˆ LoanDuration between males and females for a Logistic Regression model trained on German.

Figure 10 :
Figure 10: Total cost of the recourses generated for target population for varying g.The user preference scores are xed for the individuals.

Figure 11 :
Figure 11: Mean fractional feature cost ratio of LoanDuration for varying g.For this experiment, LoanDuration is set to 0.8 for the target population.

Figure 13 :
Figure 13: Expected and observed cost of modications on 2>= for all the recourses generated on the adversely aected target population.

Figure 14 :
Figure 14: Average time to nd recourse for LR model on the Adult dataset with a variable number of actionable features.

Figure 15 :
Figure 15: Average time to nd recourse for NN on the Adult dataset with a variable number of actionable features.

10. 6
Additional proofs of results discussed in Section 4 10.6.1 Proof of Lemma 4.1.Consider that recourse r was suggested by UP-AR for Alice represented by a feature vector x.Let r was

Figure 16 :
Figure 16: Average time to nd recourse for LR model on the Credit dataset with a variable number of actionable features.

Figure 17 :8 8 , 2 2 f ( 8 ) 10 . 7
Figure 17: Average time to nd recourse for NN on the Credit dataset with a variable number of actionable features.

Figure 18 :
Figure 18: An example interface to capture user preference.

Table 1 :
A hypothetical actionable feature set of adversely aected individuals sharing similar features and corresponding suggested actions by AR and UP-AR.UP-AR provides personalized recourses based on individual user preferences.
• We start by enabling Alice to provide three types of user preferences: i) Scoring, ii) Ranking, and iii) Bounding.We embed them into an optimization function to guide the recourse generation mechanism.• We then present User Preferred Actionable Recourse (UP-AR)

action steps can be discretized into pre-specied step sizes of 8 = {B : B 2 [X 8 , X 8 ]}. For categorical features, steps are dened as the feasible values a feature can take. For all categorical features we dene, 8 = {X 8 , . . . , X 8 } : 88 2 20C representing the possible values for categorical feature 8.
2>= 8 = 1.Such soft constraints capture the user's preference without omitting the feature from the actionable feature set.8 refers to the fractional cost of action Alice prefers to incur from a continuous feature 8.For example, consider 2>= = {LoanDuration, LoanAmount} with corresponding user-provided scores = {0.8,0.2} implying that Alice prefers to incur 80% of fractional feature cost from taking action on LoanDuration, while only 20% of fractional cost from taking action on LoanAmount.Here, Alice prefers reducing LoanDuration to LoanAmount and providing recourse in accordance improves actionability.

Table 2 :
Redundancy corrected recourse for a hypothetical individual.

Table 3 :
Recourses generated by UP-AR for similar individuals with a variety of preferences.

Table 4 :
Summary of performance evaluation of UP-AR.Top performers are highlighted in green.

Table 5 :
The Frequency and Eect of Cost Correction