Iterative Partial Fulfillment of Counterfactual Explanations: Benefits and Risks

Counterfactual (CF) explanations, also known as contrastive explanations and algorithmic recourses, are popular for explaining machine learning models in high-stakes domains. For a subject that receives a negative model prediction (e.g., mortgage application denial), the CF explanations are similar instances but with positive predictions, which informs the subject of ways to improve. While their various properties have been studied, such as validity and stability, we contribute a novel one: their behaviors under iterative partial fulfillment (IPF). Specifically, upon receiving a CF explanation, the subject may only partially fulfill it before requesting a new prediction with a new explanation, and repeat until the prediction is positive. Such partial fulfillment could be due to the subject's limited capability (e.g., can only pay down two out of four credit card accounts at this moment) or an attempt to take the chance (e.g., betting that a monthly salary increase of \$800 is enough even though \$1,000 is recommended). Does such iterative partial fulfillment increase or decrease the total cost of improvement incurred by the subject? We mathematically formalize IPF and demonstrate, both theoretically and empirically, that different CF algorithms exhibit vastly different behaviors under IPF. We discuss implications of our observations, advocate for this factor to be carefully considered in the development and study of CF algorithms, and give several directions for future work.


INTRODUCTION
Recently, machine learning models have been increasingly deployed in high-stakes domains in finance, law and medicine, performing tasks such as loan approval [8], recidivism prediction [21] and medical diagnosis [13].For these domains, the reason why a particular prediction is made is often as important as the prediction itself, especially since most of the high performing models, such as neural networks and random forests, are black-box in nature.In some jurisdictions, the "right to explanation" is even legally required for people receiving adverse model predictions (e.g., mortgage application denial) to understand the reason and available recourses.
For these purposes, counterfactual (CF) explanations, also known in the literature as contrastive explanations or algorithmic recourses, have been a popular choice due to their desirable properties in human psychology and cognition theories [26].For a particular input  with a certain model prediction ỹ, its CF explanation is another input similar to  but with a prediction ỹ′ different from ỹ.Thus, this explanation indicates how the input would need to change in order for the model prediction to also change, as shown in Fig. 1 (left).When ỹ is a negative prediction (e.g., mortgage application denial) and ỹ′ is a positive one (e.g., application approval), this CF explanation essentially gives as a direction for the applicant to improve their situation and get the application approved the next time, given that it is feasible (e.g., not changing immutable features such as gender and race), which automatically avoids the unfaithfulness problem of many feature attribution explanations [23,33,36] where the salient features are not really important for the model's decision making [1,47,48].
Over the past few years, there have been many investigations into various properties of these counterfactual explanations, such as their validity, action feasibility and cost, stability with respect to input perturbations and model updates, and agreement with the underlying causal mechanism, which are reviewed in Sec. 2. Taken as a whole, they generate quality profiles for various CF explanation algorithms and establish their relative strengths and weaknesses.
In this paper, we propose a novel aspect of these explanations, which, to the best of our knowledge, has not been studied before.Specifically, in real life scenarios, the subject of the prediction (e.g., the mortgage applicant) may not completely fulfill the CF explanation for various reasons.First, the subject may not be able to do so.For example, the CF explanation requires the subject to pay down all four credit card accounts, but they can only pay down two of them with their current saving.Second, the subject may want to take their chance and get a favorable outcome with less effort.For example, the CF explanation recommends the subject to increase their monthly salary by $1,000 but they wonder if an increase of just $800 would be sufficient.Last, the subject may misunderstand the wording of a CF explanation (e.g., a bullet list in the notice of denial) as taking any subset of actions rather than all of them.In all cases, we have a partial fulfillment of the CF explanation followed by a re-query for the model prediction and CF explanation, repeating until a positive prediction is obtained.We term this process as the iterative partial fulfillment (IPF) of CF explanations.
How does IPF, compared to a one-shot complete fulfillment, affect the subject welfare?In particular, does the subject need to make more total improvement under IPF?Intuitively, the effect can be positive, negative or neutral.
A positive effect could result from CF algorithms that generate instances landing well into the positive model prediction region.A less complete fulfillment (e.g., a salary increase of $800 rather than the recommended $1,000) is still sufficient, allowing the subject to incur a lower cost of improvement, as shown in Fig. 1 (middle).
By contrast, if the initial partial fulfillment (e.g., paying down two out of the four recommended credit card accounts) is unsuccessful, the new CF explanation may suggest some other factors to change and effectively "reset" the progress, resulting in a larger total cost of improvement, as shown in Fig. 1 (right).In the most extreme case, an oscillation may even occur among the series of explanations, leaving the subject stuck in an infinite loop.
Last, if, for an input  and its CF explanation  ′ , the CF explanations for all partially fulfilled input states along the trajectory of  →  ′ are also the same counterfactual explanation  ′ , and no partial fulfillment results in a positive prediction, then the total improvement cost under IPF is the same as that under one-shot complete fulfillment, since both will lead to the subject achieving  ′ in the end.Such as scenario is possible if the CF algorithm works by either finding the exact closest input with a different target prediction or returning a CF instance from a small set of candidates.
In practice, the specifics of a CF algorithm, such as the optimization method and considerations for stochasticity and diversity, determine the net effect of these three possibilities, making certain CF algorithms preferable to others from the IPF welfare perspective.
In this paper, we investigate this problem by first formalizing the notion and implementation of IPF in Sec. 3. Then in Sec. 4 we theoretically prove that certain CF algorithms can exhibit positive, negative and neutral effects on IPF welfare (i.e., total improvement cost compared to the one-shot complete fulfillment), as conceptually explained above.Empirically, in Sec. 5 using two financial datasets, three CF algorithms and four CF generation configurations per algorithm (for a total of 24 setups), with the widely used DiCE-ML package 1 , we confirm that the generated explanations indeed possess different IPF characteristics.Thus, from these two pieces of evidence, we argue that an IPF assessment should be part of a comprehensive evaluation of CF algorithms.Finally, in Sec.6, we discuss the broader technical and societal implications of IPF and its future directions.

RELATED WORK
In this section, we discuss works on the algorithms and properties of counterfactual explanations, with focus on impact, recency and relevance to IPF.For much more detailed discussions, we refer the readers to standalone surveys, such as those by Byrne [3], Guidotti [11], Keane et al. [18] and Verma et al. [42].
CF explanations are popularized by Wachter et al. [46], who proposed a gradient ascent algorithm to search for a counterfactual instance that both achieves the target model prediction and is close to the original input being explained.Subsequent works have extended the basic idea to make CF explanations diverse [27,34], in-distribution [30,41], aware of the causal mechanism [17], and less susceptible to gaming [4].In addition, different optimization methods have also been proposed, such as those that do not require model differentiability or gradient access [27,29], as well as those based on integer programming [34,40], constraint satisfaction [16] and optimal transport [6].
A recent line of work aims to generate a sequence of instances from the input to the final CF explanation [15,28,31,43].This sequence provides an explicit path for the subject to follow, and is argued to be more user-friendly and actionable.While our proposed IPF setup also results in a sequence of explanations, it is fundamentally different in both goals and constraints.In sequential generation, the explanation algorithm has full control of the generated sequence and only the last sequence element needs to be the CF explanation (i.e., inducing the target prediction).By comparison, in IPF, the algorithm needs to work with whatever partially fulfilled instance provided by subject and generate a valid CF explanation.
Kenny and Keane [19] and Aryal and Keane [2] proposed to generate semi-factual explanations, defined as data instances which move towards the decision boundary but have not crossed it.These explanations could be used to construct "even if" explanations: e.g., even if the down payment is $10,000 more, the mortgage would still not be approved.A partially fulfilled CF in IPF and a semi-factual instance are both on the way to some CF instance, but they are otherwise distinct concepts -one is an intermediate state of the subject and the other is an explanation.
On the evaluation and analysis side, various properties of CF explanations have been proposed.The two core desiderata of CF explanations are validity and feasibility.The former is just the success rate of the CF generation, while the latter, conceptually defined as the ease for the subject to follow the CF recommendation, is much more nuanced.Different approaches have been proposed to enforce and evaluate it, such as ensuring a close distance to the original data point [46], lying in a high-density region of the data distribution [41], satisfying causal constraints [17], and respecting custom limitations in modifying feature values [27,40].
Most relevant to our IPF proposal are two notions of stability.The first is with respect to input perturbations, where Dominguez-Olmedo et al. [7] and Virgolin and Fracaros [44] found that a given input can be adversarially but minimally perturbed into an instance with a very different CF explanations.Maragno et al. [25] proposed a robust optimization formulation to find stable CF explanations.Slack et al. [37] demonstrates that a model could be trained to make this behavior more prevalent and hide discrimination issues.The second one is with respect to the model update, where a new model is trained on an updated version of the dataset.Rawal et al. [32] found that many CF algorithms are often unstable under model update in that very different explanations are generated for the new model and the original ones are no longer valid, and Upadhyay et al. [39] proposed an algorithm to find CF instances that are stable under model update.
From this perspective, our IPF property can be considered as a third notion of stability: the stability of CF explanations for inputs along the path of improvement.If the CF explanations are stable, then the subject will follow a mostly consistent path of improvement, while if not, the subject may be given unrelated or even contradicting recommendations after every partial fulfillment.

ITERATIVE PARTIAL FULFILLMENT 3.1 Background
In this section, we formalize the concept of iterative partial fulfillment (IPF) of CF explanations.Due to the variety of real world human behaviors, there are many ways to formalize IPF.As we are the first to do so, we provide and analyze one canonical setup, and discuss other design choices and extensions in Sec. 6.
Let X be the input space with  features, and X  for  ∈ {1, ...,  } be the set of values for the -th feature.We consider categorical and numerical features, where   is a finite set for the former and (a subset of) R for the latter.Thus, an  ∈ X can be written as ( 1 , ...,   ) with   ∈ X  .For notational simplicity, we restrict ourselves to binary classification tasks, and represent the model prediction function as  :  → [0, 1] that returns the predicted probability of the positive class.Thus, () ≥ 0.5 indicates a positive prediction.In the ensuing discussion, we consider negatively predicted input instances () < 0 and their positively predicted counterfactuals ( ′ ) ≥ 0.5.
We denote a CF explanation algorithm as  : X → X, which takes an input instance and returns another instance.Since some algorithms are stochastic, we allow () to return a random  ′ sampled from the corresponding distribution.In addition, some algorithms generate a set of diverse CF explanations, and the subject chooses one of them as the goal using some strategy, such as picking the most similar one or selecting one uniformly at random.In this case, we let (•) to manage this CF selection procedure so that it always returns a single (but possibly stochastic) CF explanation.To simplify notation, we write  ′ = () if  is deterministic on  and  ′ ∼ () for a sampled value.
For  and  ′ , to represent the cost of change between the two, we define a cost metric where where  is cumulative distribution function of the feature values, following Ustun et al. [40], to account for the feature value density, with the maximal change incurring a cost of 1.Given a sequence of  instances x = ( (1) , ...,  ( ) ), the total cost for this sequence is the sum of pairwise neighbor costs  (x) =  −1 =1  ( () ,  (+1) ).

Partial Fulfillment
We now formalize the partial fulfillment as follows.
Definition 1 (-partial fulfillment).For current state  and goal state  ′ (e.g., as generated by the CF algorithm), the -partial fulfillment  ∈ X, with  ∈ [0, 1], is generated by the following operation on each feature: • if the feature is continuous, the new feature value   is an interpolation between the two feature values: • if the feature is categorical, the new feature value   takes   with probability 1 −  and  ′  with probability .Since categorical feature values are generated stochastically, we use  (,  ′ , ) to denote the distribution of partial fulfillment .
Conceptually, from the subject's perspective, when partially fulfilling  ′ from , at an effort level , for every continuous feature, they will move an amount proportional to  towards the goal feature value, and for every categorical feature, they will choose to make the change with probability .Thus, the partial fulfillment result is stochastic as long as there is at least one categorical feature value change required.A technical exception is put on continuous features, where the partially fulfilled value is set to the CF value if the value difference is small.This ensures the success of IPF when the CF instance lies exactly on the decision boundary.
Given this partial fulfillment definition, we model the iterative partial fulfillment (IPF) process in Alg. 1.The subject starts with an input , and repeatedly requests a counterfactual explanation to partially fulfill, until receiving a positive prediction or reaching a maximum number of iterations.The algorithm returns x, a sequence of states that the subject has been.For effort level , maximum number of iterations  , model  and CF algorithm , we use  (, , , , ) to represent the distribution of realized state trajectories x.When it is clear from the context, we omit some of the input arguments, such as .The most direct measure of subject Algorithm 1: The iterative partial fulfillment (IPF) process.x.append(); welfare under IPF is the total improvement cost  (x).Other metrics include final success rate and number of steps.If we are interested in the fairness implications of IPF (i.e., whether one demographic group is disproportionately affected by IPF), we can also compute these metrics separately for each group, as we conduct in Sec. 5.

THEORETICAL ANALYSIS 4.1 IPF Stability
Does IPF always increase or decrease the total improvement costs?As we demonstrate in this section, its effects on different CF algorithms are different.First, we formally define the concept of IPF stability discussed at the end of Sec. 2, which is a sufficient condition for cost preservation (i.e., IPF does not increase the total cost).
For IPF stable CF algorithms, we are assured that IPF never makes the total cost higher, compared to one-shot complete fulfillment.Theorem 3. If a CF algorithm  is IPF stable at , then for all  and  , E x∼ (,, ,) [ (x)] ≤  (, ()).
The proof is straightforward.At every iteration of IPF, the same CF explanation is given.Thus, the total improvement cost is upper bounded by  (, ()).If the model gives a positive prediction in some intermediate step (or  is not large enough to achieve () or a positive prediction), the total improvement cost is strictly less, which could happen when  is configured to be "conservative" and gives a CF instance of high model confidence.

Cost-Preserving/Decreasing CF Under IPF
Do IPF stable CF algorithms exist?Obviously, a constant-valued CF algorithm that always produces the same CF instance (•) =  ′ is stable, but this serves as a terrible CF explanation for most of the dissimilar input instances.More usefully, we show that the optimal cost CF algorithm is also stable.Theorem 4. For  ≥ 0.5, the optimal cost CF algorithm which gives the instance closest to  with model prediction at least  (using deterministic tie-breaking if necessary), is IPF stable globally.
Proof.We first consider the case of all numerical features and no categorical features.Recognizing the feature-wise absolute value CDF distance function   = |  ( ′  ) −   (  )|, we define sign flag   = sgn( ′  −   ), and have Therefore, the search over the best CF can be reduced to that in 2  "quadrants," with one value of  = ( 1 , ...,   ) specifying one quadrant   .Denote the globally optimal CF as  * .We need to show that IPF preserves the optimality of  * within the quadrant and across different quadrants.
The last line establishes the within-quadrant optimality of  * for .For across-quadrant optimality, consider the  * for the quadrant of  * and  ′ for that of an CF instance  ′ in a different quadrant.
For a feature  such that  *  =  ′  ,   makes the same amount of improvement towards both CFs (except when  overshoots with respect to  ′  , which offsets the improvement on  ′  ), while for  such that  *  ≠  ′  (which must exist because  * ≠  ′ ), the improvement towards  *  strictly makes the distance to  ′  worse.Thus, if  * is optimal for  across all quadrants, it is still optimal for .
Combining within-quadrant and across-quadrant optimality together, we see that IPF preserves the optimality of  OC () (for inputs of all numerical features).
For a categorical feature  that needs change (i.e.,  *  ≠   ), if   =   , it does not change the feature-wise cost   for any target instance (including  * ), while if   =  *  , it reduces the cost for  * by 1, and it reduces that for any other  ′ by at most 1, if  ′  =  *  ).Thus, the cost reduction for  * is as fast as any other  ′ , so adding the cost for categorical features to the overall cost  does not affect the optimality of  * .This completes the proof.□ With similar proofs, the theorem also applies to  1 distance functions, including the case of different features scaled differently (e.g., by respective mean absolute deviation as used by Wachter et al. [46]), or with   being arbitrary monotonic functions.These variations greatly increases the generality of the theorem.This algorithm is considered as the gold standard by many works that propose approximate procedures due to the intractability of the exact optimization, such as local gradient ascent [46] or randomized search [27,29].Hence, we see that IPF is not a concern in the ideal case.In fact, for conservative  OC with  > 0.5, it is likely that the total cost of IPF is smaller due to early stopping.
Moreover, the result extends easily to look-up based CF algorithms, as defined below.Theorem 5. A look-up based CF algorithm which selects the instance closest to  from a (finite) set of candidates  (using deterministic tie-breaking if necessary), is IPF stable globally.
The proof is analogous.A natural choice of  (for a negatively predicted instance ) is the set of correctly predicted positive training instances.Indeed, using the training set as a constraint or regularization is a common ingredient in many CF algorithms [30,41], often to make the CF explanations more realistic and thus feasible, while this theorem demonstrates an added benefit of it.
Putting everything together, we reiterate the central results of this section with the following corollary: Corollary 6.Both optimal cost and look-up based CF algorithms ( OC and  LU ) are IPF stable.

Cost-Increasing CF Under IPF
Next, we demonstrate that two popular approximation methods, gradient ascent and randomized search, are prone to increasing the total improvement cost, possibly without limit.
For differentiable models, gradient ascent is often used from the current input to find a CF instance that offers a good trade-off between the model prediction and distance, sometimes with other considerations.Different works have proposed different objective functions, with the earliest one proposed by Wachter et al. [46] as where  MAD is the  1 distance weighted by the inverse median absolute deviation (MAD) per feature, and  controls the trade-off.We define a gradient ascent CF function  GA as the one that follows the gradient of (•) from  to the local minimum or the boundary of X.If this end state does not achieve the required model prediction , we return a default positive instance (which can be a fixed correctly classified positive training instance).It turns out that  GA could lead to arbitrarily bad IPF behaviors due to an oscillation phenomenon.Theorem 7.There exists a model , input instances  (1) ,  (2)  with all continuous features, and effort level , such that  ( (1) ,  GA ( (1) ), ) =  (2) and  ( (2) ,  GA ( (2) ), ) =  (1) .
In this case, starting at  (1) and making partial fulfillment with effort level of  results in an oscillation of  (1) →  (2) →  (1) →  (2) → ... for  steps.A concrete example is illustrated in Fig. 2, which plots gradient field of the 2-dimensional objective function (•) as gray arrows pointing in the ascent direction.A "valley" (blue dashed line) separates the inputs into two regions.We have two instances, represented by orange and green circles.For each instance, the gradient ascent yields the red trajectory to the star marker of the same color.However, starting from the orange circle, a 0.5-partial fulfillment towards the orange star lands exactly on the green circle, whose counterfactual explanation is the green star, but a 0.5-partial fulfillment goes back to the orange circle again.
The root cause for this issue is the non-optimality of the gradient ascent algorithm, in that it may only find farther local minima by following the gradient, such that a partial fulfillment (which move in the straight-line path, not along the gradient) could reset the progress.While the above example can be easily solved by caching the CF explanations found so far and returning the closest one if the gradient ascent cannot do better, models trained on real-world datasets with a large number of features may have many local minima in the high-dimensional input space, as evidenced by the prevalence of adversarial examples [10,22], rendering such caching effort mostly futile.
Another popular approach, especially for non-differentiable models, is based on randomized search.Generally speaking, a randomized search algorithm  RS draw samples from the input space X using some strategy (e.g., uniformly at random or weighted towards the input instance ), and returns the best sampled instance according to some objective function (e.g., Eq. 11).However, this approach is also prone to increasing the total improvement cost under IPF.Theorem 8.There exists a model , an input instance , and an effort level , such that Intuitively, this theorem should not be surprising: at step  and state  ( ) , when a new CF goal  (CF, +1) is set, some of the effort expended during the previous round of partial fulfillment becomes wasted if the new goal requires a different fulfillment operation from the previous state  ( −1) ; i.e.,  ( ) ∉ Φ( ( −1) ,  (CF, +1) ).
As a simple example, consider a probabilistic CF algorithm  that gives one of two CF explanations,  (CF,1) and  (CF,2) .For an input , let  1 and  2 be the Euclidean distance to them respectively (assuming all continuous features).We have 1)  with (unnormalized) probability  −1 1 ,  (CF,2)  with (unnormalized) probability  −1 2 . ( Thus, if we have the initial state starting at the middle of these two CF states,  = ( (CF,1) +  (CF,2) )/2, with  = 0.5 (i.e., fulfilling halfway through the CF explanation), then the probability of CF always recommending the same counterfactual is 3 4 • 7 8 • 15 16 • 31 32 ... ≈ 0.58, (14) meaning that 42% of times, there is at least one step that erases the effort of an earlier step.On our earlier mortgage approval example, these two states could represent the two ways of getting approved (high saving and high debt, or low saving and low debt), and a partially fulfilling applicant risks receiving contradictory feedback every time they make an application.Using a Monte Carlo simulation, Fig. 3 (blue line) shows the total improvement cost at different effort levels  relative to that under single-shot complete fulfillment.Analogous results for the same setup but with three to five counterfactual states arranged on a regular polygon (with initial state  at the center) are also presented in different colors.
As we can see, a smaller value of  and a larger number of candidate CF instances of all exacerbates the total improvement cost under IPF.In particular, with just five CF instances and an effort level of 0.5, the total improvement cost increases 10-fold relative to the one-shot complete fulfillment ( = 1).An effort level of 0.1 increases the cost more than 40,000 times!

Summary
In this section, we characterize four basic algorithmic approaches to generating CF explanations by their IPF cost property.On the positive side, the optimal algorithm that performs an exhaustive 0.1 0.3 0.5 0.7 0.9 1 u 10 0  search and its finite search space variant are both IPF stable and thus cost preserving.In addition, if these algorithms are configured to be conservative, in that they only return instances with model prediction over a  > 0.5, it is likely that IPF can save total cost by rewarding subjects who take chance.
On the contrary, algorithms based on gradient descent and randomized search risk increasing the total cost under IPF.The issue can be attributed to the same underlying reason: since these algorithms are not guaranteed to always return the closest CF instance, partial fulfillments in the earlier iterations may be "cancelled" by later ones, resulting in increased total costs.In addition, many CF algorithms [27,34] aim to generate multiple CF instances at the same time in order to provide more diversity and options to the subject.In this case, if the choice made by the subject is not consistent across iterations, the net effect is similar to a randomized search CF algorithm, with higher total improvement cost.

EMPIRICAL ANALYSIS 5.1 Experiment Setup
In this section, we empirically study the IPF behaviors of CF algorithms.We use two datasets, Adult Income [20] and German Credit [8] The first dataset is about predicting whether the annual salary is above $50k or not from demographic information collected in the 1994 Census.The second dataset is about predicting whether a person is likely to repay a loan or not from the information about the person's finance and that of the loan.
For each dataset, we use a 80%/20% train/test split and apply onehot encoding to the categorical features and train a random forest classifier as the model.To compute counterfactual explanations, we use the DiCE-ML package, which is one of the most popular Python packages for tabular data and non-differentiable classifiers.For all the experiments, we focus on correctly classified negative instances and generate positively predicted CF explanations for them.This scenario is the most common use case of CF explanations as recourses, but our analysis applies to any model input and prediction.Tab. 1 gives summary statistics about the dataset and model performance, and Tab. 2 presents some inputs and CF instances.
DiCE-ML package searches for diverse CF explanation diversity Mothilal et al. [27] in various ways.Since the random forest classifier is not meaningfully differentiable (zero gradient almost everywhere), we study random search -the default method, genetic search algorithm -based on the method by Schleich et al. [35], and prototype-guided search with KD tree -based on the method by Van Looveren and Klaise [41].The generated CF explanations are post-processed for sparsity with a feature selection procedure.
In addition, DiCE-ML can generate multiple CF explanations.We study two setups, a single CF explanation (which is still stochastic for random and genetic algorithm search), and 20 CF explanations.In the latter case, we consider three CF selection strategies carried out by the subject: (1) closest: select the closest CF instance, (2) weighted: sample a CF instance from a softmax function (with temperature 1) on the negative distance, and (3) uniform: select one CF instance uniformly at random.In other words, closest and uniform selections are equivalent to weighted selection with temperature approaching 0 and ∞ respectively.We call the setup where only one CF is generated (and hence no selection necessary) as "single CF. " For IPF, we use a maximum number of  = 30 iterations and evaluate effort level  from the set of {0.1, 0.3, 0.5, 0.7, 0.9} along with one-shot complete fulfillment  = 1.At the lowest effort level of  = 0.1, if the counterfactual explanations were consistent each round, after 30 rounds the input would be to more than 95% towards the CF (1 − 0.9 30 = 95.8%).We do not employ the  parameter as none of the algorithms return CF instances exactly on the boundary.

Results
We first answer the most fundamental question.Can CF algorithms lead to positive predictions in the face of IPF? Fig. 4 shows the success rate of IPF (up to the maximum number of 30 iterations).Most runs with effort level  ≥ 0.3 succeed eventually without any issue (i.e., getting a positive model prediction).For  = 0.1, Genetic and Prototype algorithms struggle the most, especially when the final CF is stochastically selected from a diverse set with weighted or uniform distributions.
Focusing on the input instances for which all methods succeed (to ensure a fair comparison), we study the main quantity of interest, total improvement cost under IPF, relative to the one-shot baseline, as plotted in Fig. 5.We observe a variety of behaviors across 0.1 0.3 0.5 0.7 0.9 1 different setups.The trade-off between cost decrease due to conservatism of CF algorithms (i.e., outputting instances far from the decision boundary) and cost increase due to their non-optimality is best shown on the Adult Income dataset by Genetic and Prototype algorithms, under uniform and weighted selection strategies.In this cases, taking very small steps of  = 0.1 results in lower total improvement cost than taking medium steps of  = 0.3 and 0.5 which may incur a 80% higher cost, because the small steps in the former helps stop closer to the decision boundary, yet all three choices are inferior to even larger  values, where the inconsistency in different iterations of the search is largely avoided.By comparison, the total improvement cost in other setups of Adult Income are not too sensitive to IPF, although it can have both mildly negative (for Random search with Closest selection) and mildly positive effects (for the remaining setups).On German Credit, the Genetic and Prototype algorithms exhibit the opposite effect, showing a monotonic cost decrease with less effort level .One possible reason for this phenomenon is the high-dimensionality of the input space of German Credit vs. Adult Income (24 vs. 13), with more than twice as many categorical features.Thus, it is more likely for some categorical features to be changed in German Credit, which, in conjunction with conservative CF explanations, results in smaller total cost under low effort levels.The performance of the Random CF algorithm is similar to that in Adult Income, though with slightly higher variance.Fig. 6 plots the average number of steps until success (for runs that do succeed).As expected, the number increases with decreasing , but the speed of increase varies a lot, with those for Genetic and Prototype algorithms on Adult Income being the largest.Interestingly, the closest selection strategy for the Random search algorithm (orange bar on the leftmost plot) performs markedly worse than the rest, suggesting that such strictly greedy selection from a random sample may be especially suboptimal under IPF.For German Credit, the profiles across different algorithms are largely similar, confirming again that properties of the dataset can be influential in the IPF behaviors of the CF algorithms.
Overall, the three analyses above demonstrates a variety of behaviors of algorithms under IPF, and hence we advocate for them to be included in a standard suite of evaluations for CF algorithms, as well as considered when developing new CF algorithms.From a human perspective, it may also be necessary to for model users (e.g., banks) to provide explicit guidelines to subjects (e.g., mortgage applicants) to calibrate their expectations on this aspect, which may require new policies to be established on this issue.We provide more discussions in Sec. 6.
For the rest of this section, we demonstrate how IPF can be incorporated in other, existing aspects of analysis.In particular, as a preliminary investigation, we study the fairness of CF algorithms under IPF.Additional ideas are again discussed in Sec. 6.At a high level, the fairness property requires that different demographic groups (e.g., male vs. female, white vs. other race, etc.) should be treated "equally, " with different criteria implementing this notion differently.The criterion that we use is demographic parity [9,14,24], one of the simplest and most popular, which basically asserts that for a fair metric (e.g., mortgage application approval), its average value is the same across different demographic groups.Viewed from this angle, we study the fairness of total improvement cost and number of steps under the concept of demographic parity.We consider four demographic group splits in the fairness evaluation, commonly used in the literature [5,37,38].For Adult Income, we study gender with a male/female split, and race with a white/non-white split.For German Credit, we use the same gender split, along with marital status with a married/single split.For each group, we take the second value (e.g., female) as the potentially disadvantaged group and study the ratio of the target of investigation in the disadvantaged group to that in the advantaged group.
We first study the total improvement cost.Note that we compute the ratio of relative total cost (relative to  = 1), to assess whether IPF further exacerbate the fairness issue, on top of what is already observed in the literature for vanilla CF explanations [45], the same target as in Fig. 5.The ratio for these four groups are plotted in Fig. 7, and while we could not identify any clear and consistent trend, IPF could increase the fairness issue as measured by demographic parity by as much as 30% for the German Credit model in some settings.
On the number of steps to achieve success, Fig. 8 plots the ratio.The trend is more pronounced.In most setups, the ratio increases as  gets smaller, indicating that IPF has a disproportionately higher impact on the disadvantaged group.Given that the total cost does

DISCUSSION
In this paper, we propose the concept of iterative partial fulfillment, which, to the best of our knowledge, is the first formal study of the situation where the subject of a negative model prediction (e.g., denied mortgage application) does not completely fulfill the given counterfactual (CF) explanation before asking for an updated prediction, for many reasons.First, the subject may intentionally decide to take a chance (e.g., betting that a monthly salary increase of $800 is enough even though the CF instance requires $1,000), hoping that a state less qualified than the given CF state is sufficient to secure a positive prediction.Second, the subject may not be able to fully satisfy the CF state (e.g., can only pay down two out of four credit card accounts), especially if given a time limit on the CF validity guarantee (e.g., within the next six months).Furthermore, the subject may misinterpret the CF explanation, such as fulfilling any one of the action items rather than all of them when it is presented as a bullet list.When the partial fulfillment does not result in a positive model prediction, the subject receives a new CF state as part of the rejection and performs an improvement towards the new state.This process repeats until the model prediction is positive, and we call it iterative partial fulfillment (IPF).
Given that virtually all CF algorithms are memoryless (i.e., the CF explanation is generated from only the current input), and most employ local gradient-based or randomized search, it is possible that the CF explanation for a (still negative) partially fulfilled state is different from that of the original input, guiding the subject on a different path of improvement.As a result, the net effect of IPF on the welfare of the subject, most directly measured by final success rate and total improvement cost, could be positive or negative.
A positive effect could occur when the generated CF instance is conservative, i.e., lying far into the positive prediction region.Such a CF algorithm configuration could be preferred if the model user (e.g., the bank) wants to ensure that the subject is likely to get a positive prediction even if they cannot perfectly follow the CF recommendation.The exact same reasoning allows the subject to engage proactively in partial fulfillment and save on the improvement cost.By contrast, a negative effect could occur when the CF explanation provides different and conflicting advice at different rounds of partial fulfillment.
In our theoretical analysis, we prove that the optimal cost CF algorithm and its finite search approximation version are guaranteed to not increase total cost under IPF.However, the same could not be said for two popular practical algorithms, gradient ascent and randomized search, both of which worsen subject welfare, sometimes significantly and even potentially unboundedly in theory.
In our experimental investigation on two datasets, Adult Income and German Credit, totalling 24 CF explainer configurations, we identified both positive and negative effects of IPF, suggesting that IPF is sensitive to properties of the dataset and explainer.As a result, we recommend IPF analysis to be included as part of a standard evaluation suite of CF algorithms.
For future work, one direction is to consider alternative formulations of IPF.We use a deterministic, fixed-proportion model for continuous features (i.e., for current feature value of   and target value  ′  , partial fulfillment results in (1 − ) •   +  •  ′  ), but this step could be made stochastic by sampling from some distribution centered on (1 − ) •   +  •  ′  , or a fixed-magnitude model could be used where the amount of improvement Δ  on each feature is specified.Alternative models on categorical features could also be developed.Last, improvements on some features may be correlated, due to the underlying causal relationships (e.g., change in job title → change in salary), so incorporating causal information, potentially in the form of a causal graph, could be explored.Moreover, finding CF algorithms that are stable with respect to more than one IPF notion would be desirable, as different subjects are likely to employ different IPF approaches.
Additionally, the temporal aspect of the IPF could be studied with more real-world elements.As time goes by in the IPF process, some feature values, such as age, would change in certain manners, which is ignored in the current formulation.Moreover, the very act of querying for a new model prediction may have an impact on some features, such as the bank account balance due to the payment of an application fee, or the credit score due to the bank pulling the credit report, which, at least in the United States, results in a small decrease of the credit score.
One direction to extend the IPF analysis is to integrate it with other aspects of evaluations.We give a demonstration for the case of fairness, and future work could focus on aspects such as its stability to input perturbations [7,44] and model shifts [32].
In addition, we focus on IPF analyses of existing CF algorithms, but as a recurring theme of research, the other side of the coin naturally follows: developing new CF algorithms or regularizing existing ones to behave well under IPF scenarios, following analogous works for other CF properties such as robustness [38] and fairness [12].
Finally, given the diverse and potentially discriminative effects exhibited by IPF, society needs to be better informed and aware of it, especially as some subjects have already been engaging in such behaviors.For example, when the rejection letter of a mortgage application provides some CF explanations as recommendations, the bank may want to, or even be required to, include information about possible outcomes of a re-application with only partial fulfillment.In addition, the application process could allow the applicant to voluntarily reveal their previous applications, so that more stable and consistent CF explanations can be computed, in order to minimize the possibility of conflicting improvement recommendations given to the applicant.All of these changes require not only technical innovations but also policy discussions, for which we hope that this paper serves as a good starting point for such conversations.

Figure 2 :
Figure 2: An example illustrating the oscillation behavior of gradient-ascent CF algorithms under partial fulfillment.

Figure 3 :
Figure 3: Total improvement cost as a multiple of the oneshot complete improvement for different effort levels .

Figure 4 :
Figure 4: Final success rate of IPF.

Figure 5 :
Figure 5: Average total cost at each effort level  relative to that of the one-shot fulfillment  = 1 for different setups.

Figure 6 :
Figure 6: Average number of steps (for successful runs) incurred under different levels of partial fulfillment effort .

Figure 7 :
Figure 7: Demographic parity ratio for relative total cost.

Table 1 :
Statistics about the dataset and the model.

Table 2 :
One sample input instance and two counterfactual explanations for Adult Income (top) and German Credit (bottom).Non-changed feature values are marked with "-".Some non-changed features are omitted for presentation.
Demographic parity ratio for number of steps.notdemonstrate a clear trend, this means that the per-step improvement cost is smaller for the disadvantaged group, which means that the generated CF instances are closer to the queried inputs in the first place.Nonetheless, we leave a definitive verification and further exploration of the implications to future work.