Flexible and Robust Counterfactual Explanations with Minimal Satisfiable Perturbations

Counterfactual explanations (CFEs) exemplify how to minimally modify a feature vector to achieve a different prediction for an instance. CFEs can enhance informational fairness and trustworthiness, and provide suggestions for users who receive adverse predictions. However, recent research has shown that multiple CFEs can be offered for the same instance or instances with slight differences. Multiple CFEs provide flexible choices and cover diverse desiderata for user selection. However, individual fairness and model reliability will be damaged if unstable CFEs with different costs are returned. Existing methods fail to exploit flexibility and address the concerns of non-robustness simultaneously. To address these issues, we propose a conceptually simple yet effective solution named Counterfactual Explanations with Minimal Satisfiable Perturbations (CEMSP). Specifically, CEMSP constrains changing values of abnormal features with the help of their semantically meaningful normal ranges. For efficiency, we model the problem as a Boolean satisfiability problem to modify as few features as possible. Additionally, CEMSP is a general framework and can easily accommodate more practical requirements, e.g., casualty and actionability. Compared to existing methods, we conduct comprehensive experiments on both synthetic and real-world datasets to demonstrate that our method provides more robust explanations while preserving flexibility.


INTRODUCTION
Understanding the internal mechanisms behind model predictions is difficult due to the large volume of parameters in machine learning models.This problem is particularly significant in high-stakes domains such as healthcare and finance, where incorrect predictions are disastrous [8].Counterfactual explanation (CFE) [44] aims to identify minimal changes required to modify the input to achieve a desired prediction and provides insights into why a model produces a certain prediction instead of the desired one.CFEs can help understand the underlying logic of certain predictions [31], detect the inherent model bias for fairness [19], and provide suggestions to users who receive adverse predictions [17,39].Therefore, CFEs can be adopted in broad applications of healthcare, finance, education, justice, and other domains.
Despite the valuable insights provided by CFEs, recent studies [21,31,35,36,42,44] have shown that multiple CFEs can exist with equivalent evaluation metrics (e.g., validity, proximity, sparsity, plausibility), yet significantly differ on feature values for an input or seemingly indifferent inputs.For example, Wachter et al. [44] pointed out "multiple counterfactuals are possible, ... " and "multiple outcomes based on changes to multiple variables may be possible".Laugel et al. [21] demonstrated that instances that are close to each other can have different CFEs.Moreover, Virgolin and Fracaros [42] have argued that CFEs lack robustness to adverse perturbations if not deliberately designed.While multiple CFEs can lead to the same desired prediction, each CFE tells a different story to reach the target.
It is important to note that counterfactual multiplicity has both advantages and disadvantages.On one hand, multiple CFEs can be beneficial because they afford more flexibility and freedom to select user-friendly CFEs when a single CFE may be overly restricted for users.Specifically, diverse CFEs [25,30] are offered to potentially cover broad user preferences; interactive human-computer interfaces [7,45] are designed on multiple CFEs to obtain more satisfiable ones.On the other hand, users who have the same feature values or seemingly inconsequential differences may receive inconsistent CFEs (e.g., two different diverse sets) as the CFE method itself does not store historical CFEs and guarantee the optimal solutions either.Such inconsistency inevitably raises fairness issues [1,42] and undermines users' trust [31] in CFEs.For example, two financially similar individuals are rejected when they apply for a loan.Yet, CFEs for two people are quite different-one needs to update the salary slightly while the other is required to get a higher education degree and a better job.Another negative example is when users make some efforts towards previous CFEs but receive a significantly different CFE, rendering their previous efforts futile.Therefore, it is crucial to take advantage of the flexibility of multiple CFEs and maintain consistency for users having the same feature values or slight differences.
Recent research [4,12,27,38,43] on robustness mainly studies CFEs with consistent predictions under slight model updates (by restricting CFEs to preserve causal constraints, or follow data distribution, etc.), rather than generates CFEs with consistent feature values.Therefore, these studies fail to address fairness concerns and ignore the freedom of user selection.Generally, models to explain are highly complex and non-convex, e.g., DNN models.Even constrained by consistent prediction, heuristic search strategies [22,32,37,44] can still converge to different non-optimal solutions due to the huge search space.Meanwhile, these works do not explicitly exploit flexibility to meet user preferences.As the number of possible CFEs can be huge, existing methods on flexibility or robustness can be explained to be different selection strategies from a solution pool (without optimizing diversity and robustness in advance), i.e., selecting CFEs that are diverse [25], follow data distribution [27], or withhold causal constraints [43].Motivated by this, we target to design a novel method that obtains a diverse and robust set of CFEs simultaneously.
To overcome the above limitations, we propose to incorporate task priors (normal ranges, a.k.a.reference intervals) to stabilize valid search regions, while ensuring that counterfactual explanations (CFEs) are diverse to meet various user requirements.It should be noted that robustness measures the differences between two sets of CFEs in different trials while diversity measures the inherent discrepancy within a set of CFEs.Normal ranges in our approach commonly exist in broad domains and are easy to obtain from prior knowledge.For example, the normal range of heart rate per minute is between 60 and 100; the IELTS score should be greater than 6.5 and the minimal GPA is 3.5 for Ph.D. admissions.We assume that the undesired prediction results from certain features outside of normal ranges and thus, we attempt to move abnormal features into normal ranges to generate CFEs with the desired prediction.Specifically, we replace an abnormal feature with the closest endpoint of its normal range.As the endpoints are stationary, CFEs after feature replacement tend to have the same feature value in different trials for the same/similar input.In practice, it may be unnecessary to move all abnormal features into normal ranges for the desired prediction.Therefore, we aim to select minimal subsets of abnormal features to replace where each subset corresponds to a CFE.CFEs determined by all minimal subsets are diverse as an arbitrary minimal subset is not contained by another subset.
As mentioned earlier, the problem of finding CFEs boils down to selecting minimal subsets of abnormal features to replace, which can be formulated as either the Maximally Satisfiable Subsets (MSS) or Minimal Unsatisfiable Subsets (MUS) problem [23,24].However, finding all minimal sets for satisfiable CFEs is an NP-Complete problem as an exponential number of subsets should be checked.To enhance efficiency, we covert the enumeration of minimal subsets to the Boolean satisfiability problem (SAT) that finds satisfiable Boolean assignments over a series of Boolean logic formulas, which can be solved with efficient modern solvers.As for commonly mentioned constraints (e.g., actionability, correlation, and causality), we can conveniently write them in Boolean logic formulas, which can be conjugated into current clauses in conjunctive normal form (CNF). Therefore, our framework is flexible to provide feasible counterfactual recommendations.
The main contributions of this paper are summarized as follows.
• We reformulate the counterfactual explanation problem to satisfy both flexibility and robustness by replacing a minimal subset of abnormal features with the closest endpoints of normal ranges.• We convert this problem by checking the satisfiability of Boolean logic formulas for a Boolean assignment, which can be solved by modern SAT solvers efficiently.In addition, common constraints can be easily incorporated into current Boolean logic formulas and solved together.• We conduct intensive experiments on both synthetic and real-world datasets to demonstrate that our approach produces more consistent and diverse CFEs than state-of-the-art methods.

RELATED WORK
Counterfactual explanations [44] refer to perturbed instances with the minimum cost that result in a different prediction from a pretrained model given an input instance.These explanations provide ways to comprehend the model's prediction logic and offer advice to users receiving adverse predictions.Most existing algorithms focus on modeling practical requirements and user preferences with proper constraints.Typical constraints include actionability [39], which freezes immutable features such as race, gender, etc; plausibility [15,40], which requires CFEs to follow the data distribution; diversity [25,30], which generates a diverse set of explanations at a time; sparsity [9], which favors fewer features changed; causality [15,18], which restricts CFEs to meet specific causal relations.However, recent studies [21,30,31,36,42,44] have revealed that there often exist multiple CFEs with equivalent performance but different feature values for an input or seemingly indifferent inputs.Next, we review research that takes advantage of counterfactual multiplicity and addresses concerns regarding non-robustness.Multiple CFEs provide users with more flexibility to prioritize their preferences without compromising the validity and proximity of CFEs.When a single CFE is inadequate to meet users' requirements, employing a diverse set of CFEs is an effective and straightforward strategy to overcome this limitation.For example, Wachter et al. [44] generate a diverse set by running multiple times with different initializations; Russell [30] prohibits the transition to previous CFEs in each run; while Mothilal et al. [25] add a DPP (Determinantal Point Processes) term to ensure that the CFEs are far apart from each other.In addition, multiple CFEs also enable researchers to develop Human-Computer Interaction (HCI) tools for interactively satisfying user requirements [7,45].However, such a diverse set can be inconsistent for two inputs with no or slight differences.In our paper, we aim to generate a consistent and diverse set of CFEs for an input or seemingly different inputs, to enhance the robustness and reliability of CFEs.
The non-robustness issue of CFEs has garnered significant attention recently.As introduced in [21,35], even a slight perturbation to the input can result in drastically different CFEs.To verify this phenomenon for neural network models, Slack et al. [35] train an adversarial model that is sensitive to trivial input changes.Some relevant works [4,12,27,38] propose to generate CFEs that yield consistent predictions when the model is retrained.For example, [27] proves that adhering to the data manifold ensures stable predictions for CFEs; [38] incorporates adversarial training to produce robust models for generating explanations; Black et al. [4] states that closeness to the data manifold is insufficient to indicate counterfactual stability, and they propose Stable Neighbor Search (SNS) to find an explanation with the lower model Lipschitz continuity and higher confidence.However, constraining consistent predictions of CFEs does not necessarily ensure CFEs with the same or similar feature values, and still fails to address the unfairness issue.Moreover, these robustness methods lack the flexibility to meet user requirements while our work considers both flexibility and robustness simultaneously.

PRELIMINARY 3.1 Counterfactual Explanations
Let us consider a pretrained model  : X → Y, where X ⊆ R  denotes the feature space and Y is the prediction space.For simplicity, let Y = {0, 1}, where 0/1 denotes unfavorable/favorable prediction, respectively.Given an input instance x ∈ X, which is predicted to be the unfavorable outcome ( (x) = 0), a counterfactual explanation (CFE) c is a data point that leads to a favorable prediction, i.e.,  (c) = 1, with minimal perturbations of x.Formally, a counterfactual explanation method  :  × X → X can be mathematically defined as follows: . . (c) = 1 where cost(•, •) : X × X → R + is a distance or cost metric that quantifies the efforts required in order to change from an input x to its CFE c.In practice, the commonly used cost function includes  1 /MAD [30,44], total log-percentile shift [39], and  2 norm on latent space [26].To optimize Eqn.
(1), it can be further transformed to the Lagrangian form [44], as shown below: where ℓ (•, •) is a differential function to measure the gap between  (c) and the favorable prediction 1, and  is a positive trade-off factor.By optimizing the above objective, a CFE method ( , x) returns a single CFE or a set of CFEs for an input x.
The definition in Eqn.(1) captures the most basic form of counterfactual explanations.Usually, additional constraints are often required to ensure that the produced CFEs are useful and actionable for specific applications [41].

Robustness of Counterfactual Explanations
Motivated by the formalization in [1], we formally define the robustness of CFEs in more general cases that include slight input perturbations and model changes.However, before presenting technical details, one critical question that needs to answer is "Do we want CFEs to remain consistent after a series of slight changes, or should they vary to reflect such changes?".The answer depends on practical scenarios.In certain applications, one may expect CFEs to be sensitive to such tiny changes.For example, in the study of the effects of climate change on sea turtles [5], one may expect CFEs to be sensitive to temperature changes.In this paper, we assume that such trivial changes are either irrelevant or less important to the generation of CFEs.As such, we aim to produce CFEs that are robust to trivial changes, such as inputs added with random noise, and model retraining on new data from the same distribution.
Let x represent a slightly perturbed sample that is close to x, meaning x ∼  (x), where  (x) is the density estimation of perturbed samples that yield the same prediction as the input x.Similarly, let f ∈ F denote a retrained model belonging to the class F , which consists of potential models that perform equivalently well as the original one.

Definition 1 (Robustness of Counterfactual Explanations).
Given a function  (•, •) computing the similarity between two sets of CFEs, we quantify the robustness of the explanations ( , x) by assessing the expected similarity between the current set of CFEs, and a new set of CFEs after potential input perturbations or model changes.
A lower value indicates higher robustness.By minimizing the above expectation, we can generate robust CFEs.However, in real life,  (x) and F are typically unknown.Intuitively, they can be determined based on specific changes that users desire to be robust against.For instance, one can consider adding Gaussian noise to the input, masking certain features, or retraining the models on data from the same distribution, to decide  (x) and F .

Causes of Non-robustness
Here, we explain the root causes of non-robustness.The total loss in Eqn.(2) form is usually non-convex due to the non-convex decision surface of probabilistic models and other constraints.As shown in Figure 1, multiple local minima can be found, but current methods often select a single or  CFEs from them.Such selected CFEs can be different in each trial.Next, we discuss several influential factors that result in non-robust CFEs.
• Input perturbations.Input instances can be perturbed by adding some noise or masking random features.Due to the local sensitivity of large models, such trivial perturbations can significantly influence model predictions, leading to different counterfactual explanations (CFEs) [35].• Model updates.The predictive model  in Eqn. ( 1) is typically retrained periodically in deployment.The updated model  ′ may exhibit slightly different behavior compared to the previous model and thus may have a great impact on the cost of the desired prediction.• Random factors.Heuristic search methods ( , x) for Eq. ( 2) often involve random factors, e.g., random initial points in gradient descent [44], random samples in Growing Sphere [20,26], and random selection in genetic algorithms [32].Algorithm randomness often causes different solutions in non-convex situations.• Algorithmic configurations.Different configurations of ( , x) also affect the search process, e.g., the trade-off factor , the step size in gradient descent, the stop condition, the guide point [40], the radius of the sphere [20,26], variational autoencoders used to approximate data distribution [15,26].

PROPOSED METHOD
In this section, we propose a method offering a diverse set of CFEs that remain stable when the input or model being explained is changed subtly by incorporating domain knowledge into the search.
x  ,  ∈ N denotes the -th feature value of an input instance x, and [a  , b  ] denotes the normal range of -th feature.We assume the normal range of each feature can be acquired from domain knowledge.For example, the normal range of BMI [6] is [18.5, 24.9], and the normal heart rate [34] is between 60 to 100.For a given input x, then we partition N into two disjoint sets by checking whether each feature is in the normal range or not: N 0 (with size  0 ) for abnormal features and N 1 (with size  1 ) for features within the normal range.We further assume that undesired predictions are attributed to abnormal features.Our intuition is to bring an abnormal feature within its normal range, which would increase the probability of the desired prediction.To ensure consistency, we replace an abnormal feature with the closest endpoint of the corresponding normal range.For instance, if the BMI of an obese person is 40, and the suggested BMI is 24.9 (BMI normal range [18.5, 24.9]).We aim to convert a subset of abnormal features into their respective normal ranges, to achieve lower cost and higher sparsity.Let A ⊆ N 0 represent the subset of abnormal features to be replaced.We employ the function  (•, •) ∈ R  to calculate the values after replacement for the abnormal features, where the -th element is computed as follows: In applications where normal ranges are not available (we set N 0 = N ), we replace the input feature values with the corresponding values of a guide point or prototype that has the desired prediction.Correspondingly, given a guide point or prototype x,  (•, •)  can be expressed as follows: We use the binary vector m(A) ∈ {0, 1}  (abbreviated as m) to represent the presence or absence of each feature in subset A. Specifically, each element m  indicates whether the -th feature is included in A, i.e., m  = 1  ∈ A .The feature replacement operation for an input x on subset A is defined as follows: where ⊙ is the Hadamard (element-wise) product of vectors.If the -th feature is not included in A, we keep its original value, otherwise, we change it to the closest endpoint of its normal range.As there could be a large number of subsets A that can achieve the desired prediction after feature replacement, we aim to find the minimal subsets A * .Note that a minimal subset does not necessarily indicate minimal cardinality due to the difficulty of comparing the costs of different features.For example, increases in "academic degree", "credit score", and "salary" cannot be measured concisely in a mathematical cost.As such, we treat the subset (on "academic degree", "credit score") and the subset (on "salary") as two different solutions, although the latter subset has the smaller cardinality, same to [45].Based on the above discussion, we formulate this problem to generate nondominated sets of CFEs as follows: Definition 2 (Minimal Satisfiable Counterfactual Explanations).Given a pre-trained model  , input x and feature normal ranges, the goal is to find all minimal satisfiable subsets, where for each subset A * ⊆ N 0 ,  ( (x, A * )) can achieve the desired prediction and  (x, A * ) represents a counterfactual explanation c.
Diversity Analysis: In [25], authors propose a diversity metric over a set of CFEs of size , named count-diversity, to measure the inner discrepancy, written by, Theorem 1.The lower bound of count-diversity defined in Eqn.(7) over solutions of our problem definition is 2  .
Proof.Let's denote the inner loop term  =1 1 [c   ≠c   ] as  (c  , c  ) for brevity.We aim to prove that for any two arbitrary CFEs c  and c  , the pairwise distance  (c  , c  ) is always greater than or equal to 2. To demonstrate this, we consider two contradictive cases by assuming 0 ≤  (c  , c  ) < 2. pair-wise distance  (c  , c  ), we can obtain the lower bound 2   .□ Robustness Analysis: Let z = c − x represent the recommended actions for a user.In our method, z consistently applies to slightly perturbed instance x, except in the following two situations: (1)  ( x + z) is no longer valid, which occurs when slight perturbations have a negative impact on the desired prediction.For example, normal features may be turned into abnormal ones.We need more effort than z to achieve the desired prediction.(2) Changing fewer abnormal features is sufficient to achieve the desired prediction, indicating that slight perturbations are beneficial.In this case, z is omitted as there exist more cost-efficient solutions.As both model continuity and perturbation strategies can influence the z, we leave the determination of the maximal bound of perturbation, to which our method remains robust, for future work.

Problem Solving
The brute-force method that evaluates all possible subsets is exponentially complex with respect to the number of abnormal features.Next, we propose a technique Counterfactual Explanations with Minimal Satisfiable Perturbation (CEMSP) to boost the search process.Our method starts with finding the binary vectors m that satisfy the desired prediction after feature replacement.This can be converted into the Boolean satisfiability problem that checks whether there exists a Boolean value assignment on  0 variables (features in abnormal ranges) such that the conjunction of Boolean formulas evaluates to .For better efficiency, we introduce the following proposition from domain knowledge.
This proposition aligns with common sense in practical applications.It is important to note that the undesired prediction arises from specific abnormal features according to our assumption.Intuitively, moving an abnormal feature into the normal range should never decrease the desired probability.Additionally, we assume that the predictive model  has learned the relationship between feature normal ranges and the predicted classes.
Based on the monotonicity of function  ( (x, •)), we derive the following two theorems for any A ⊆ B ⊆ N 0 .Proof.We first prove theorem 2. If  ( (x, A)) ≥ , we can induce that  ( (x, B)) ≥  as  ( (x, B)) ≥  ( (x, A)) holds for A ⊆ B, where  is the confidence threshold of desired prediction.Similarly, theorem 3 can be proved.□ Theorem 2 illustrates that if we can achieve the desired prediction by replacing abnormal features in A, there is no need to change more abnormal features.The theorem tends to produce sparser results at a lower cost.Theorem 3 demonstrates that if we cannot achieve the desired prediction by changing abnormal features in B, there is no need to check the satisfiability of any subsets of B.
To prune as many as subsets at a time with these two theorems, we need to find the minimal satisfiable subset (MSS) and the maximal unsatisfiable subset (MUS), shown as boxes filled with green/red background in Figure 2. Next, we introduce two algorithms to achieve this: Grow(•) in Algorithm 1 and Shrink(•) in Algorithm 2. The Grow(•) algorithm starts with an arbitrary unsatisfiable subset Â and iteratively attempts to change other abnormal features until a maximal unsatisfiable subset is found.The Shrink(•) algorithm starts with an arbitrary satisfiable subset and iteratively attempts to reserve some features until a minimal satisfiable subset is found.Note that Grow(•) and Shrink(•) algorithms serve two plugins in our method, which can be replaced by any advanced algorithm with the same purpose.
Further, we introduce how to solve it under Boolean satisfiability problem [2,23].In particular, any subset can be converted to a satisfiable Boolean assignment under a set of propositional logic formulas in conjunctive normal form (CNF), i.e., A ⇐⇒ m : CNF = .For example, for a subset A = {1, 2} in Figure 2, we can write the CNF in the following equation, and [1, 1, 0, 0] is the
By employing this approach, the explicit materialization of all subsets can be avoided, thereby mitigating the exponential space complexity.The crux lies in devising the appropriate propositional logic formulas.
Our complete algorithm is shown in Algorithm 3. Initially, in line 1, we merely forbid the changes on normal features and any possible binary assignments on abnormal features can satisfy the CNF.getMask(CNF) uses an SAT solver to return a solution satisfying the CNF in line 3.In our paper, we adopt the Z3 package 1 .In line 7, we convert the binary vector m to indices of subsets.Next, we check whether replacing features in this subset can achieve the desired prediction.If not satisfying the desired prediction, we call the Grow(•) algorithm to find the maximal unsatisfiable subset and then prune all subsets of it.The prune operation is achieved by the following propositional Boolean formulas which are conjugated into the existing CNF.
Similarly, if we find a subset satisfying the desired prediction, we call Shrink(•) function to return a minimal satisfiable subset, which can induce a CFE with minimal perturbations of features.Then, we prune all supersets of it with the following logic formula, If no solution satisfies the CNF in the current iteration, we stop our algorithm in the line 4 and 5.
Correctness.In our algorithm, each subset is either evaluated to be an MSS/MUS or pruned by an MSS/MUS.Hence, our algorithm if not m then ⊲ No assignment m returned.A ← { ∈ N 0 : m  = 1} 8: if  ( (x, A)) == 0 then end if 16: end while will return all minimal CFEs, providing the same solutions as the brute-force search.
Space Complexity.The space complexity depends on how many minimal CFEs are returned.The worst case is  (  0 ⌊ 0 /2⌋ ), which corresponds to that feature replacements on arbitrary abnormal features of size ⌊ 0 /2⌋ are satisfiable.
Time Complexity.The runtime of our method primarily depends on two parts: (1) a solver (line 3) that takes a set of constraints and returns a mask m.The solver is considerably faster than calling the pretrained model.( 2) evaluating the prediction on a subset.Compared with brute-force search, which calls the deep model 2  0 times to check all 2  0 possible sets, our method reduces the number of calls on the model by pruning on certain subsets and supersets.Therefore, the empirical running time will decrease.

Compatibility with Other Constraints
The major theme of recent research is to model various constraints into CFE generation.Here, we show how to write these constraints by propositional logic formulas that can be conjugated into the CNF in line 1 of our Algorithm 3.
Immutable features.Considering some features are immutable (e.g., race, birthplace), CFEs should avoid perturbations on these features.To achieve this, we add the following Boolean logic formula for a set of immutable features I, Accordingly, these features should be ignored in Grow(•) algorithm when it searches for the maximal unsatisfiable set.Alternatively, we can directly treat immutable features as normal features to avoid any changes to them.Conditional immutable features.These features must change in one direction, e.g., education degree.We can examine whether moving a feature value into its normal range follows the valid direction.If violating the valid direction, we treat this feature as an immutable feature, otherwise, we put no restriction on this feature.
Causality.In practice, changing one feature may cause a change in other features.Such causal relations among features are generally written by a set of triplets in the structural causal model (SCM) [28], that is, M = ⟨ ,  ,  ⟩, where  are exogenous features,  are endogenous features, and  :  →  is a set of functions that describe how endogenous features are quantitatively affected by exogenous features.To adapt causality to our method, we only keep these triplets that normal exogenous features lead to normal endogenous features as our method merely considers discrete feature changes (from abnormal to normal).For example, feature x 1 is an exogenous feature that affects two endogenous features x 2 and x 3 , and x 2 and x 3 become normal in the consequence of its normal ancestor feature x 1 .In this example, we can add two material conditions in the following that restrict the feature change of CFEs to follow the causal relations.
At the same time, Grow(•) and Shrink(•) algorithms should be updated to satisfy such causal relations when they attempt to add/remove a feature.This can be easily implemented by storing these causal relations by an inverted index where an entry is an exogenous feature and the inverted list contains all its endogenous features.
Correlation.Correlation can be regarded as bidirectional causal relations.For example, if features x 1 and x 2 are correlated and in the normal range simultaneously, we can write the correlation between x 1 and x 2 as, The great advantage of our framework is that it allows us to insert these constraints gradually and flexibly, as the complete relation graphs (e.g., full causal graph) are often difficult to derive in the beginning.

EXPERIMENTS
In this section, we undertake a quantitative comparison between our proposed method CEMSP and state-of-the-art approaches.Additionally, we demonstrate empirical examples of counterfactual explanations that effectively integrate practical constraints.The source code is available at the GitHub repository 2 .
Datasets.We conducted a comprehensive series of experiments involving a synthetic dataset and two real-world UCI medical datasets.Notably, the medical datasets encompass diagnostic features with well-defined and clinically significant normal ranges.
• Synthetic Dataset is a binary class dataset consisting of 20, 000 samples with 4 features.Each feature is sampled from the normal distribution independently.Regarding label balance, the binary label  is assigned a value of 1 when the following equation is satisfied; otherwise,  is set to 0: We set the lower values of normal ranges of four features as [0.55, 0.45, 0.05, 0.55] for a higher confidence prediction.
2 https://github.com/wangyongjie-ntu/CEMSP • UCI HCV Dataset [10].This dataset contains 615 instances.Following [3], we convert 5 categories of diagnosis into binary classes.After label conversion, the dataset consists of 75 individuals diagnosed with HCV and 540 individuals labeled as healthy.Next, we remove the "Age" and "Sex" and keep the other 10 medical features with normal ranges.We adopt the tight normal ranges from laboratory tests in [13] as certain normal ranges depend on "Sex" and we remove the "Sex" attribute in preprocessing.• UCI Thyroid Dataset [29].The raw dataset contains 3, 772 instances where each instance is described by 15 features and labeled as either hypothyroid or normal class.We retain the most discriminative features "FTI", "TSH", "T3", "TT4" that have meaningful normal ranges and remove other features.Subsequently, we drop certain rows with missing values.The final dataset consists of 223 patients and 2530 healthy users.Normal ranges of "TSH", "T3", "TT4" are from laboratory tests in [16].As we do not find the normal range of "FTI" that matches the "FTI" values in this dataset, we simply choose the 1-sigma interval of "FTI" of the normal group.
Evaluation Metrics.To comprehensively compare over CFEs across various approaches, we employ the following evaluation metrics.
• Inconsistency.We propose to adopt a modified Hausdorff distance [11,14] to measure the inconsistency between two sets of CFEs C and C ′ , where and the lower  is better.
• Average Percentile Shift (APS) [26] measures the relative cost of perturbations of CFEs, where   (•) denotes the percentile of the -th feature value relative to all values of the feature in the whole data set.A lower score is favored.• Sparsity.It measures the percentage of features that remain unchanged and we prefer higher sparsity, • Diversity.We consider two diversity metrics, named Diversity, which is introduced in [25], and count-diversity (named C-Diversity for abbreviation), which is defined in Eqn (7) in Section 4.1, to measure the discrepancy within returned solutions.
where dist(•, •) represents the  1 / and  is the number of CFEs.
Baselines.We compare our method with the following baseline methods.
• GrowingSphere (GS) [20].This algorithm searches for CFEs from random samples in the sphere neighborhood of the input.The radius of the sphere grows until a CFE is found.It adopts the postprocessing on returned CFEs to make sparser solutions.
• PlainCF [44].It minimizes the objective in Eqn.(2) with gradient descent.We run this algorithm from a random initial point and stop when an iteration threshold is reached or the loss difference is below a specified threshold.• CFProto [40].It adds a prototype term to restrict that CFEs should resemble the prototype of the desired class.In our experiment, we set the prototype as the closest endpoints of normal ranges of these abnormal features, that is,  (x, N 0 ).• DiCE [25].Compared with PlainCF, it considers the diversity constraint that is modeled by a  (•) term over a set of CFEs.
• SNS [4].It finds a CFE with higher confidence and lower Lipschitz constant in the neighborhood of a given CFE, to produce consistent prediction under model update.
Experiment configurations.We first randomly split the datasets into train/test sets at the ratio of 7 : 3 and normalize all features by a standard scaler on two UCI datasets (no feature normalization on the synthetic dataset).Then, we train a 3-layer Multilayer perceptron (MLP) model  with Adam optimizer.The test accuracies are 99%, 96%, and 98% on three datasets correspondingly.As we intend to convert unhealthy patients to healthy ones, we produce CFEs for all correctly classified patients in test sets for two UCI datasets.For saving time, we only produce CFEs for 100 random true negative samples in the synthetic dataset.
Our method produces a set of CFEs of varied size , while GS, PlainCF, CFProto, and SNS generate one at a time.We run GS, PlainCF, and CFProto  times for fair comparisons to generate the same number of CFEs as ours.For DiCE, we directly keep the size of the diverse set as ours.We evaluate all CFE methods ( , x) under the following two kinds of slight updates to measure the algorithm robustness.
• Inputs are fixed, and we produce two sets of CFEs from two models that are trained on the same dataset with different initializations.• Model is fixed, and we produce two sets of CFEs from an input x and its perturbed instance x ′ , where x ′ = x +  and  is the random noise sampling from a Gaussian distribution N (0, ).In our experiments,  ∈ {0.0001, 0.001, 0.01, 0.1}.

Quantitative Evaluation
We first report the quantitative comparison of sparsity, APS, Diversity, and C-Diversity in Figure 3, where the -axis denotes a counterfactual explanation generation method, the -axis represents the average score of each metric of all evaluated instances.We can see that our method achieves the competitive sparsity as GS.GS achieves sparsity through post-preprocessing techniques, whereas our method CEMSP focuses on making minimal modifications to subsets of abnormal features.In contrast, other methods  do not explicitly optimize for sparsity and consequently fall behind in this aspect.For the APS, CEMSP is slightly better.This result is grounded in two key considerations: firstly, we substitute an abnormal feature with its closest endpoints and secondly, we aim to change the minimal number of abnormal features.It is worth noting that although PlainCF minimizes the  1 / distance in its objective, this does not equate to minimizing the APS since APS is a density-aware metric among the population.Our method achieves at least 2  C-Diversity as expected.In contrast to sparsity, C-Diversity sums up the fraction of features that are different between any two CFEs.Therefore, the method with a higher sparsity often associates with a lower C-Diversity.As a result, our CEMSP appears to have less competitive C-Diversity than methods that make simultaneous changes on many numerous features.However, our CEMSP has a competitive Diversity score defined in [25].
Figure 4 and 5 report inconsistency scores under model retraining and input perturbations.Our CEMSP exhibits superior performance compared to other baseline methods.Specifically, GS, plainCF, and DiCE yield the poorest results when consistency restrictions are not enforced.CFProto, which incorporates a prototype term, achieves a better inconsistency score than PlainCF by directing all CFEs towards the prototype.As discussed earlier, our findings demonstrate that SNS does not perform well in generating CFEs with consistent feature values, despite having CFEs that yield consistent model predictions.In summary, our CEMSP outperforms the baseline methods across the aforementioned metrics, establishing its overall superiority.

Use-Case Evaluation
Next, we use the use-case evaluation in Figure 6 to present the compatibility of our method.The input instance is a patient in the HCV dataset who has the undesired prediction.Without any constraint, our method generates CFEs, as illustrated in the table located at the bottom left.By introducing additional constraints, our model can effortlessly generate new CFEs that meet the desired criteria.For example, if we want to keep the original value of , we can easily incorporate the CNF (¬m 5 ), leading to CFEs that solely modify the remaining features.Furthermore, domain knowledge reveals that  and  are correlated [33].Consequently, we can incorporate correlation constraints that limit simultaneous changes to both features.This can be achieved by including the CNF (¬m 3 ∧ m 4 ) ∨ (m 3 ∧ ¬m 4 ) in our method, effectively enforcing the desired correlation constraint.Although this use-case evaluation may not yet be fully applicable to real-life scenarios, it offers valuable insights and demonstrates the potential to accommodate more practical considerations.

CONCLUSION
Lacking robustness in counterfactual explanations can undermine both individual fairness and model reliability.In this work, we present a novel framework to generate robust and diverse counterfactual explanations (CFEs).Our work leverages the feature normal ranges from domain knowledge and generates CFEs that replace the minimal number of abnormal features to the closest endpoints of their normal ranges.We convert this problem into the Boolean satisfiability problem and solve it with modern SAT solvers.Experiments on both synthetic and real-life datasets demonstrate that our generated CFEs are more consistent than baselines while preserving flexibility for user preferences.

LIMITATIONS AND FUTURE WORK
While our work offers the potential to address the non-robustness issue through the utilization of domain knowledge, such as the  The top table presents the original feature values of the patient, while the shaded features represent the altered features of CFEs and their closest endpoints.The tables below display CFEs before/after incorporating constraints.To incorporate actionability and correlation, we introduced the expressions ¬ 5 and (¬m 3 ∧ m 4 ) ∨ (m 3 ∧ ¬m 4 ), respectively.
normal ranges in healthcare and finance, some limitations hinder its applicability in broader contexts.Firstly, the scalability of the proposed method is underestimated.SAT solvers exhibit exponential complexity in the worst-case scenario.When dealing with a substantial number of features, the time required to find a binary mask from an SAT solver may surpass that of a forward pass in the DNN model.This concern can be addressed through empirical comparisons of the execution time between the SAT solver and DNN model revoking.Secondly, our approach is not directly applicable to scenarios where a portion of normal ranges is unknown.It might be necessary to incorporate additional information to determine the appropriate replacement values for these features.Thirdly, our study is established on binary classification tasks.However, the direct adaptation of our method to multi-class classification or regression tasks remains challenging.Normal ranges are typically contingent upon the target prediction.In the context of multi-class classification or regression, the target predictions can become intricate, rendering the normal ranges unattainable.In future work, our ultimate goal is to investigate robust and flexible counterfactual explanations in more general situations without any assumption about normal ranges.In addition, we intend to develop a sustainable system that offers users actionable recommendations and gathers valuable feedback to nourish continual enhancements.

Figure 1 :
Figure 1: The toy example to illustrate the non-robustness issue caused by input perturbations for an input (0.3, 0).The figures in the top row depict the decision surface,  2 cost, and total loss in the Lagrangian form ( = 5 for simplicity).The bottom row shows the contour lines corresponding to each figure above.It is evident that even a slight perturbation on this input can result in different local minima.

Case 1 (
(c  , c  ) = 0): This case implies c  = c  , which contradicts the fact that two distinct CFEs are from the solution set.Case 2 ( (c  , c  ) = 1): In this case, there is only one feature difference.Let A  and A  denote the indices of abnormal features of two solutions.Then, A  ⊆ A  or A  ⊆ A  must exist, which also contradicts the minimal set to return in our problem definition.One CFE should be excluded because it costs more than the other.The above contradictive proof shows that  (c  , c  ) ≥ 2 holds.Summing up all  ( −1) 2

Figure 2 :
Figure2:The figure shows all subsets of a toy example with 4 abnormal features.The bitvectors denote the binary vector m.Boxes with red/green borders represent the unsatisfiable/satisfiable subsets respectively.The minimal subsets for CFEs are filled with green background and the maximal unsatisfiable subsets are filled with red background.

Figure 3 :Figure 4 :Figure 5 :
Figure 3: Evaluation of sparsity, APS, Diversity, and C-Diversity over three datasets.The ↑/↓ means the higher/lower score is better.Diversity and C-Diversity of the Thyroid dataset are missing as our method CEMSP only produces a single counterfactual explanation.

Figure 6 :
Figure 6: Use-case evaluation of a patient in the HCV dataset.The top table presents the original feature values of the patient, while the shaded features represent the altered features of CFEs and their closest endpoints.The tables below display CFEs before/after incorporating constraints.To incorporate actionability and correlation, we introduced the expressions ¬ 5 and (¬m 3 ∧ m 4 ) ∨ (m 3 ∧ ¬m 4 ), respectively.
ALB ALP ALT AST BIL CHE CHOL CREA GGT PROT