Learning for Counterfactual Fairness from Observational Data

Fairness-aware machine learning has attracted a surge of attention in many domains, such as online advertising, personalized recommendation, and social media analysis in web applications. Fairness-aware machine learning aims to eliminate biases of learning models against certain subgroups described by certain protected (sensitive) attributes such as race, gender, and age. Among many existing fairness notions, counterfactual fairness is a popular notion defined from a causal perspective. It measures the fairness of a predictor by comparing the prediction of each individual in the original world and that in the counterfactual worlds in which the value of the sensitive attribute is modified. A prerequisite for existing methods to achieve counterfactual fairness is the prior human knowledge of the causal model for the data. However, in real-world scenarios, the underlying causal model is often unknown, and acquiring such human knowledge could be very difficult. In these scenarios, it is risky to directly trust the causal models obtained from information sources with unknown reliability and even causal discovery methods, as incorrect causal models can consequently bring biases to the predictor and lead to unfair predictions. In this work, we address the problem of counterfactually fair prediction from observational data without given causal models by proposing a novel framework CLAIRE. Specifically, under certain general assumptions, CLAIRE effectively mitigates the biases from the sensitive attribute with a representation learning framework based on counterfactual data augmentation and an invariant penalty. Experiments conducted on both synthetic and real-world datasets validate the superiority of CLAIRE in both counterfactual fairness and prediction performance.


INTRODUCTION
Recent years have witnessed a rapid development of machine learning based prediction [10,14,44] in various high-impact applications such as personalized recommendation [36,51], ranking in searches [17,40], and social media analysis [1,32].Recent literatures [7] have shown that the predictions based on traditional machine learning often exhibit biases against certain demographic subgroups that are described by certain protected attributes (a.k.a.sensitive attributes) such as race, gender, age, and sexual orientation.Thus, how to develop a fair predictor has attracted a surge of attentions [5,9,20,22,49,54,55].Among them, the seminal work of counterfactual fairness [30] makes use of the causal mechanism to model how discrimination is exhibited, and eliminates it at the individual level based on the Pearl's causal structural models [39].The intuition of counterfactual fairness is to encourage the predictions made from different versions of the same individual to be equal.For example, the predictions for "in an online talent search, how would a certain candidate be ranked if this candidate had been a male/female?"should be identical to achieve the notion of counterfactual fairness.
A prerequisite of existing methods to achieve counterfactual fairness is the prior human knowledge of causal models.A causal model [38,39] typically consists of a causal graph and the corresponding structural equations that describe the causal relationships among different variables.Existing works on counterfactual fairness [30,42,52,53] overwhelmingly rely on the assumption that the underlying causal model is (at least partially) known and correct, in order to mitigate the biases across different sensitive subgroups.However, existing work often suffers from the following major limitation: In real world, the underlying causal model is often unknown, especially when the data is high-dimensional [6,50].The construction of a trustworthy causal model often requires knowledge from domain experts, which is expensive in both time and labor.In addition, it is extremely challenging to validate the correctness of the obtained causal model.Without external guidance of human knowledge, other existing works mostly rely on causal discovery techniques [23,26,31,38,46,47] to learn the causal model from observational data, but these methods can suffer from various mistakes in discovering the causal relations, and thus lead the predictor to pick up biased information of the sensitive attribute [37].Here, the toy example in Fig. 1 intuitively explains two scenarios with incorrect causal models.Fig. 1(a) shows an example of a true causal model (often determined by domain experts) in which we aim to predict the salary (prediction target  ) of people in different races (described by the sensitive attribute ).We assume that the level of education (observed feature  1 ) of each person is a cause, and the salary also influences the type of car each person would like to purchase (observed feature  2 ).Unobserved variables  (e.g., geographic location) could also have a causal effect on the observed variables.To learn a counterfactually fair predictor, most existing works [30,42] utilize a given causal model, and only use those variables which are not causally influenced by the sensitive attribute (i.e., non-descendants of ) for prediction.We now consider two cases when the given causal model is incorrect: 1) Consider an incorrect causal model M 1 in Fig. 1(b), where the direction of the causal relation  →  2 is reversed (highlighted in red).Note that  2 is causally influenced by  in the true causal model M. If a predictor is based on M 1 ,  2 would be directly used in prediction, and thus it violates counterfactual fairness with biases from the sensitive attribute.2) Consider another incorrect causal model  2 , where an existing causal relation  →  1 in the true causal model M is ignored.Predictors based on  2 would directly use  1 in prediction, which results in biases.Unfortunately, causal models are quite common to be incorrectly assumed or discovered [26,31,38,46].
To address the aforementioned issues of insufficient human knowledge of causal model, we study a novel problem of learning counterfactually fair predictor with unknown causal models.Although it is in principle impossible to achieve counterfactual fairness without any causal model [30], we take initial explorations to mitigate the unfairness based on certain general assumptions, and circumvent the prerequisite of explicit prior knowledge.However, this studied problem remains a daunting task mainly due to the following challenges: 1) In order to achieve counterfactual fairness, the causal effect from the sensitive attribute  to the prediction must be removed [30,42], but an unknown causal model brings challenges to track the influence of the sensitive attribute and eliminate the biases; 2) There might exist unobserved variables which can be used to predict the target (e.g., "geographic location" in the salary prediction example), but without a correct causal model, it is harder to capture these unobserved variables for prediction due to the lack of prior knowledge regarding these variables.3) Many factors (e.g., failure in obtaining correct causal relations) may lead to unfair predictions, but it is difficult to exclude their influence without a correct causal model.In a nutshell, all of these challenges are essential due to the lack of counterfactual data.
To tackle these challenges, we propose a novel framework -CounterfactuaLly fAIr and invariant pREdictor (CLAIRE), which learns counterfactually fair representations for target prediction.To remove the biases from sensitive attributes without any given causal model (challenge 1), we develop a counterfactual data augmentation module to implicitly capture the causal relations in data, and generate counterfactuals for each individual with different sensitive attribute values.In this way, CLAIRE can learn fair representations by using a counterfactual fairness constraint to minimize the difference between the predictions made on the original data and on its counterfactuals.To capture the unobserved variables which can help counterfactually fair prediction (challenge 2), CLAIRE maps the observed variables to a latent representation space to encode the unobserved variables that can facilitate the prediction.The aforementioned counterfactual fairness constraint can preserve those unobserved variables which are not biased.To further reduce the factors which potentially impede counterfactual fairness (challenge 3), we exclude the variables with spurious correlations to the target (i.e., variables that appear to be causal to the target but are not, e.g.,  2 in Fig. 1(a)) from the learned representations.Spurious correlations can easily lead to incorrect causal models.Besides, removing these variables can often benefit model prediction performance, as shown in [3].We summarize our main contributions as follows: • Problem: We study an important problem of learning counterfactually fair predictor from observational data.We analyze its importance, challenges, and impacts.

PRELIMINARIES 2.1 Notations
In this paper, we use upper-cased letters, e.g.,  , to denote random variables, lower-cased letters, e.g., , to denote specific values. ( ) refers to the probabilistic function of  .We use  , ,  ,  to represent the observed non-sensitive features/attributes, sensitive attribute, unobserved variables, prediction label/target for any instance, respectively.Specifically, we use   ,   to denote the corresponding features and target of any instance with the observation of a specific sensitive attribute value  = , where  ∈ S, and S is the space of the sensitive attribute value. denotes the predicted label (for classification tasks) or target (for regression tasks).

Counterfactual Fairness
Counterfactual fairness [30] is an individual-level fairness notion based on the causal mechanism.It is built upon the Pearl's causal framework [39], which is defined as a triple ( ,  ,  ) such that: •  is the set of latent variables, which are often assumed to be exogenous and consequently independent of each other; •  is a set of observed variables, which are endogenous and determined by variables in  ∪  ; Based on a given causal model, a predictor uses a function  =  (, ) to make the prediction for each instance.The predictor is counterfactually fair [30] if under any context  =  and  = , for all  and  ′ ≠ .Here  ← =  ( ← , ) denotes the prediction made on the counterfactuals when the value of  had been set to .

Biases under Incorrect Causal Models
To achieve the notion of counterfactual fairness, existing works often [30,42] follow a two-step process: 1) First, they use the observed data to fit the causal model and infer the posterior distribution  ( |, ) of unobserved variables  ; 2) Second, they train a counterfactually fair predictor based on the fitted causal model.In particular, this step can be achieved in different ways: an initial work [30] trains the predictor with only unobserved variables  and the non-descendants of  as input.We refer to this method as CFP-U.Another work [42] considers a counterfactual fairness objective | ( ← , ) −  ( ← ′ ,  ′ )| for each instance, aiming to minimize the difference between the predictions made on different counterfactuals of the sensitive attribute.We refer to this method as CFP-O.In this subsection, we use some simple examples to show the biases in the prediction of these existing counterfactual fairness methods when the given causal model is incorrect.Example 1. First, we consider the case when the counterfactual fairness methods have been given an incorrect causal model as shown in Fig. 1(b).In the aforementioned salary prediction example, the ground truth causal model M is shown in Fig. 1(a).It indicates that people's salary can causally influence their choices of cars to purchase.In this example, we let the causal model M be as follows: 2 is correlated with  because it is  's child node, but this correlation may lead the model to incorrectly take  2 as one of  's parent nodes, as the incorrect causal model M 1 shown in Fig. 1(b).Then the goal of counterfactual fairness: from what is defined on the true causal model M. Based on the incorrect causal model M 1 , CFP-U will take  2 as an input to the predictor, but  2 contains biased information because it is actually a descendant of the sensitive attribute, thus it will bring bias into prediction.For CFP-O, if we assume a linear predictor  =  1  1 +  2  2 +   , then the fairness penalty on the incorrect causal model would be: , while the fairness penalty based on the true causal model would be: Such difference can lead to inappropriate learning results for the parameters in the predictor.As the fairness penalty based on the incorrect causal model has no constraint on  2 , the predictor can not exclude the biases contained in  2 .Example 2. We now consider another case of incorrect causal model shown in Fig. 1(c).In the salary prediction example, consider that the dataset contains a majority sensitive subgroup  = 0 (e.g., race A) and a minority sensitive subgroup  = 1 (e.g., race B).The ground-truth causal model is assumed to be as below: As the subgroup  = 1 is underrepresented, the fitted causal model may miss the causal relation  →  1 for  = 1, i.e., the fitted causal model is biased (as the causal model M 2 shown in Fig. 1(c)).Then for CFP-U,  1 and  2 will be taken as input for prediction because they are considered to be non-descendants of , but as  1 and  2 are actually biased because they are descendants of , the predictor will also be biased consequently.Let us take the predictor  =  1 for example.The predictor makes prediction  ←0 =  1,←0 =  +  1 and  ←1 =  1,←1 =  +  1 + 1 in when  ← 0 and  ← 1, respectively, and this is obviously not counterfactually fair.For CFP-O, the fairness penalty on this biased causal model M 2 is: , while the fairness penalty based on the true causal model M is: Such difference may lead to inappropriate use of  1 and  2 , and thus bring biases to the predictor.
As a summary, existing counterfactual fairness machine learning methods heavily rely on given causal models, and would result in biases when the given causal models are incorrect.

THE PROPOSED FRAMEWORK
In this section, we introduce the proposed framework CLAIRE, which targets at achieving counterfactual fairness without relying on explicit prior knowledge about the causal model.To achieve this goal, CLAIRE learns counterfactually fair representations with counterfactual data augmentation, and then makes predictions based on the learned representations.

Assumptions and Examples
Before technical details, we first present the key concepts and assumptions of CLAIRE, and then use general examples of causal models (Fig. 2) to describe the information needed in CLAIRE.
Previous works of counterfactual fairness [30] have discussed three levels of required prior knowledge about the causal model: Figure 2: Case studies of different kinds of variables in causal models.Each white (gray) node denotes an observed (unobserved) variable, each arrow denotes a causal relationship, and each dashed arrow denotes a possible causal relationship.,  ,  denotes the sensitive attribute, the prediction target, and the unobserved variable, respectively. 1 is a causal variable of  and is a descendent of ,  0 is a causal variable of  and is non-descendent of , and  2 is a variable with spurious correlations to  .
1) Level 1 only requires to know which observed features are nondescendants of the sensitive attribute, and only uses them for prediction; 2) Level 2 postulates and infers the unobserved variables with partial prior knowledge of the causal model, and also uses them for prediction; 3) Level 3 makes assumptions on the causal model (e.g., additive noise model [24]), postulates the complete causal model, and then uses the inferred unobserved/observed non-descendants of the sensitive attribute for prediction.These three levels make increasingly stronger assumptions on the underlying causal model.
But even the first level still requires to figure out which variables are non-descendants of the sensitive attribute.In this work, we aim to propose a principled way for counterfactually fair prediction without relying on the prior knowledge of the causal model.The main assumptions in our framework are listed as follows: Assumption 1.The sensitive attribute is not causally influenced by any other variables.This is a common assumption in most of existing fairness works [7,30,42], as the commonly-used sensitive attributes such as race and gender usually do not have any causes.
Assumption 2. If a variable   directly affects  (i.e., an edge   →  exists in the causal model), we assume  ( |  ) is stable across different sensitive subgroups, but for the variables   which do not causally affect  ,  ( |  ) may be unstable in different sensitive subgroups.This assumption and its variants are widely used in invariant learning [2,3].
As the ground truth causal model can be complicated, to investigate more general settings, we consider several different types of variables in the causal model, including descendant and nondescendant variables of , causal and non-causal variables of  , and observed and unobserved variables.Here we conduct several case studies on the causal model, and each corresponds to a causal graph shown in Fig. 2. Suppose there is a ground truth causal model M, we call the variables in M which causally affect the prediction target  (i.e.,  is the descendant of such variables) as causal variables of  .In all the causal models in Fig. 2,  1 is a causal variable of  , but it is also a descendant of , thus it can not be directly used for counterfactually fair prediction.As shown in Fig. 2(b) and (c),  0 is also a causal variable of  , and is non-descendant of , thus  0 is supposed to be used for fair prediction. 2 is not a causal variable of  , but it has statistically spurious correlations to  .The reason may be that  2 is  's descendant, as shown in Fig. 2(b), or  2 and  are affected by some common variables, as shown in Fig. 2(c).As discussed in [3,11], the spurious correlations between  2 and  often vary across different sensitive subgroups and thus degrade the model prediction performance.Besides, if these non-causal variables are also descendants of sensitive attribute, incorporating them into prediction would also impede counterfactual fairness.Therefore, in our framework, we exclude these non-causal variables to further avoid potential biases.Above cases are all about observed variables, for those unobserved variables which are causative to  , such as  in Fig. 2(d), we try to better capture these unobserved variables by utilizing the observed variables which have correlations with them.
Overall, in our framework, we learn representations to capture the causal variables which are not influenced by the sensitive attribute.

Overview of CLAIRE Framework
Existing counterfactual fairness works [30,42] involve counterfactual inference for predictor training, but it is often infeasible in real-world applications due to the lack of a correct causal model, especially when the data is noisy and high-dimensional [6].Without enough knowledge about the causal model, inferring the unobserved variables and learning a fair predictor can be quite challenging.Here, we define the goal of our framework with respect to counterfactual fairness, and show an overview of the methodology.
Based on the aforementioned preliminaries, we know that the key point of this problem is to capture the information which elicits a fair predictor, such as the causal variables that are non-descendants of .In our framework, we use the observed features to learn a representation  = Φ( ) which captures the fair information, and then build a predictor  = ( ) on top of it.In the implementation, we learn the representations  in the following ways: (1) To capture the causal variables of  , we leverage the invariant risk minimization loss [3] to exclude those non-causal variables with unstable spurious correlations to  .(2) To avoid taking the biases from the sensitive attribute into prediction, we develop a counterfactual data augmentation module, and encourage the learned representation to achieve the following goal: for any  ≠  ′ , and any ,  (Φ( ← )) =  (Φ( ← ′ )).Intuitively, it means that for each individual with observed features  and sensitive attribute value , the distributions of the representations learned from its original version and its counterfactuals should be the same.
Algorithm 1 shows an overview of our framework, including counterfactual data augmentation and fair representation learning.Detailed techniques will be introduced in the following subsections.7) end

Counterfactual Data Augmentation
The lack of counterfactual data is the essential challenge to achieve counterfactual fairness.Thus, we pretrain a counterfactual data augmentation module to generate counterfactuals for each instance by manipulating its sensitive attribute.Then, the augmented counterfactuals together with original data are utilized to learn fair representations.The counterfactual data augmentation module is based on a variational auto-encoder (VAE) [28] with an encoderdecoder structure.Specifically, the encoder in the VAE takes {,  } as input, encodes them into a latent embedding space, and then the decoder reconstructs the original data {,  } with the embeddings  (notice that the embedding  is different from the representation  introduced in the previous subsection. is the output of the bottleneck layer of the VAE in counterfactual data augmentation to generate counterfactuals) and sensitive attribute .Note that  is only used as an input of the decoder to enable counterfactual generation in later steps.The reconstruction loss L  is: where  ( ) is a prior distribution, e.g., standard normal distribution To generate counterfactuals with the embeddings  and a manipulated sensitive attribute value later, we need to capture more "fair" generative factors (i.e., those generative factors which are not causal influenced by ) in the embeddings, i.e., in encoder, we remove the causal influence of the sensitive attribute on the embedding  .Based on Assumption 1, if there is no dependency between the embeddings and sensitive attribute, then the embeddings encode no descendants of sensitive attributes.Now, we introduce two different implementations to remove the causal effect of  on  by minimizing the dependency between them.These implementations include the distribution matching based CLAIRE (CLAIRE-M) and the adversarial learning based CLAIRE (CLAIRE-A).Distribution matching based CLAIRE.To remove the influence of the sensitive attribute, we use the distribution matching technique [33,45] on the embeddings for different sensitive subgroups.We refer this implementation as CLAIRE-M.In particular, we minimize the Maximum Mean Discrepancy (MMD) [33,45] among the embedding distributions of different sensitive subgroups.
The loss function of training the counterfactual data augmentation model with distribution matching is as below: where is the number of pairs of different sensitive attribute values, and |S| is the number of different sensitive attribute values.The second term is the distribution matching penalty, which aims to achieve  ( | = ) =  ( | =  ′ ) for all pairs different sensitive subgroups (,  ′ ).Here  ≥ 0 is a hyperparameter which controls the importance of the distribution balancing term.Adversarial Learning based CLAIRE.We also propose an adversarial learning based implementation, referred as CLAIRE-A.In this implementation, we train a discriminator ℎ(•) which uses the embeddings to distinguish instances that bear different values of the sensitive attribute.The objective function is as below: min where Ψ(•) is the encoder.The first term is the aforementioned reconstruction loss.The second term calculates the probability that the discriminator makes correct predictions for each instance's sensitive attribute.Therefore, the sensitive attribute predictor ℎ(•) is playing an adversarial game with the encoder Ψ(•).In this way, the embeddings are encouraged to exclude the information related to the sensitive attribute.Here  ′ ≥ 0 is a hyperparameter to control the weight of the sensitive attribute discriminator.The minimax problem is optimized with an alternating gradient descent process.
To learn counterfactually fair representations  , we add a counterfactual fairness constraint to mitigate the discrepancy between the representations learned from original data and its corresponding counterfactuals.The constraint is formulated as: where    ′ is the counterfactual generated in counterfactual data augmentation corresponding to  ←  ′ , and  (•, •) is a distance metric such as cosine distance to measure the discrepancy between two representations.

Invariant Representations.
As aforementioned, the non-causal variables which have spurious correlations to the target  are likely to degrade the model prediction performance, and may also incorporate potential biases from sensitive attributes to prediction.It has been shown in [3] that the relationships from these variables to  often vary across different domains, e.g., different sensitive subgroups.Therefore, to exclude the influence of such non-causal variables on the learned representations and capture the causal variables of  , we leverage the invariant risk minimization (IRM) loss [3] for the sensitive subgroup  as below: where L   is the IRM loss in the sensitive subgroup , the first term   ( • Φ) = E[L ((Φ(  ,   )),   )] is the prediction loss under sensitive subgroup , and  is a scalar and is fixed as  = 1.0.According to [3], the gradient of   ( • ( • Φ)) w.r.t. can reflect the "invariance" of the learned representations.Therefore, in the above formulation, the second term measures the invariance of the relationship between the representations and the target across different sensitive Here,  a hyperparameter for the tradeoff between the prediction performance and the level of invariance.The IRM loss aims to ensure that the predictor can be optimal in all the different sensitive subgroups, thus the unstable spurious correlations varying across sensitive subgroups can be excluded.
To put it all together, the overall loss function for fair representation learning is as follows: where  is the weight of the counterfactual fairness constraint.More implementation details can be found in Appendix A.

EXPERIMENTAL EVALUATIONS
In this section, we conduct extensive experiments to evaluate the proposed framework CLAIRE on two real-world datasets and one synthetic dataset.Before showing the detailed results, we first present the details of used datasets and the experimental settings.

Datasets
Law School.This dataset contains academic information of students in 163 law schools.Our goal is to predict each student's first year average grade (FYA), and this is a regression task.We take race as their sensitive attribute, and take grade-point average (GPA) and entrance exam scores (LSAT) as two observed features.Here, we select persons in races of white, black, and asian.The dataset contains 20, 412 instances.We use the level-2 causal model in [30] as the true causal model with causal graph shown in Fig. 3(a).
Adult.UCI Adult income dataset 1 contains census data for different adults and the target here is to predict whether their income exceeds 50K/yr.We take race as the sensitive attribute , and their income as the prediction label  .This is a binary classification task.We select 1 https://archive.ics.uci.edu/ml/datasets/adultpersons in the races of white, black, and Asian-Pac-Islander.In addition to the sensitive attribute of race, we use other 5 attributes for prediction.The dataset contains 31, 979 instances.Here, we follow [52] and consider the causal model used by them as the ground truth.The causal graph is shown in Fig. 3(b).Synthetic Dataset.Here, we use a ground truth causal model to generate the synthetic data.The true causal graph is shown in Fig. 4(a), containing a sensitive attribute  with four different categorical values {0, 1, 2, 3}, an unobserved variable  , a causal variable  0 which is non-descendant of , a causal variable  1 which is descendant of , and a variable  2 which is the descendant of  .The structural equations are as follows: where  = {0.5, 0.4, 0.05, 0.05},   =  0 = 1,  , * and   are set as {0.5, 1.0, 1.5, 2.0} and {0.1, 0.2, 1.0, 2.0} respectively for four values of sensitive attribute.In this dataset, the spurious correlation  2 →  and the imbalanced distribution of sensitive subgroups may lead to incorrect causal models, as shown in [37].We will further investigate the impact of these two situations in Section 4.4.

Experimental Settings
Baselines.To investigate the effectiveness of our framework in learning counterfactually fair predictors from observational data, we compare the proposed framework with multiple state-of-theart methods.First, we briefly introduce all the compared baseline methods and their settings: • Constant Predictor: A predictor which has constant output for any input.We obtain this constant predictor by finding a constant which can minimize the mean squared error (MSE) loss on the training data.• Full Predictor: Full predictor takes all the observed attributes (except the attribute used as label) as input for prediction.• Unaware Predictor: Unaware predictor is based on the notion of fairness through unawareness [20].It takes all features except the sensitive attribute as input to predict the label.• Counterfactual Fairness Predictor: We use two different counterfactual fairness predictors here, including CFP-U [30] and CFP-O [42].These methods require a given causal model.For baselines full/unaware/counterfactual fair predictors, we use linear regression for regression and logistic regression for classification.More details of baselines can be found in Appendix B. Evaluation Metrics.Generally, the evaluation metrics consider two different aspects: prediction performance and counterfactual fairness.To measure the model prediction performance, we employ the commonly used metrics -Root Mean Square Error (RMSE) and mean absolute error (MAE) for regression tasks and accuracy for classification tasks.To evaluate different methods with respect to counterfactual fairness, we compare the distribution divergence of the predictions made on different counterfactuals generated by      (Wass) [41] and Maximum Mean Discrepancy (MMD) [33,45]) to measure the distribution divergence.We compute the divergence of prediction distributions in every pair of counterfactuals ( ←  and  ←  ′ for any  ≠  ′ ), then take the average value as the final result.The smaller the average values of MMD and Wass are, the better a predictor performs in counterfactual fairness.For the synthetic data, the ground truth causal model is known, while for the real-world datasets, we adopt the widely accepted causal models as mentioned in Section 4.1.

Experimental Results on Real-world Data
To assess the superiority of the proposed framework CLAIRE, we compare its two implementations CLAIRE-M and CLAIRE-A against other predictors on two real-world datasets Law School and Adult.We show the ground truth causal models of these two datasets in Fig. 3 although our proposed framework and its variants do not rely on the causal model.Table 1 presents the performance of different methods regarding prediction and counterfactual fairness.The best results are shown in bold, and the runner-up results are underlined.Generally speaking, existing methods which are not designed for counterfactual fairness have higher MMD and Wass, although they can use the biased features to achieve better prediction performance.We make the following observations from Table 1: • Among all the compared methods, the constant predictor has the worst performance in prediction as it lacks capability to distinguish different instances.However, it always satisfies counterfactual fairness because it has constant output.• The full predictor performs well in prediction, as it utilizes all the features (both sensitive and non-sensitive).But the use of sensitive attribute also brings biases to the prediction, as demonstrated by its high values on fairness metrics.

Experimental Results on Synthetic Data
The above experiments on real-world datasets have demonstrated the superiority of CLAIRE.Here, we perform further studies on the synthetic dataset to show the impact of incorrect causal models.
Incorrect causal model M 1 .In this experiment, we use the synthetic data to showcase the impact of an incorrect causal model as the example shown in Fig. 4    majority sensitive subgroups ( = 0, 1) but relatively large on the minority sensitive subgroups ( = 2, 3).Here, the incorrect causal model misses the causal relation  →  1 (as shown in Fig. 4(c)).We compare the prediction differences between pairs of different counterfactuals generated by the true causal model shown in Fig. 4(a).The results are shown in Table 3, where we select two pairs of counterfactuals: ( ← 0 and  ← 1) and ( ← 0 and  ← 2).As   is small when  = 0 and  = 1, the biased causal model would not bring too much bias from the sensitive attribute to the prediction in the two counterfactuals ( ← 0 and  ← 1), so the discrepancy between this pair is relatively lower than the other pair.But for the counterfactuals of  ← 2 (and also  ← 3), CFP-U and CFP-O suffer more from the biased causal model.As observed in Table 3, when CFP-U and CFP-O are under the biased causal model, the prediction discrepancy between the pair of counterfactuals ( ← 0 and  ← 2) becomes larger than the case when CFP-U and CFP-O are under the true causal model.Similar observations can also be found in the pair ( ← 2 and  ← 3), as shown in Appendix C. Our framework outperforms the baselines due to the following key factors: the fair generative factors captured in counterfactual data augmentation remove the influence of the observed sensitive attribute to the generated counterfactuals.Therefore, the counterfactual fairness constraint mitigates the influence of sensitive attribute on the learned representations, and makes our framework suffer less from imbalanced sensitive subgroups.

Ablation Study
To evaluate the effectiveness of each component in our method, we provide ablation study with the following variants: 1) Empirical Risk Minimization (ERM): ERM can be considered as a variant of our proposed framework CLAIRE.Here, we only use the empirical risk minimization loss (the first term of Eq. ( 6)) in prediction without the counterfactual fairness constraint and invariant penalty by setting  = 0 and  = 0. 2) Invariant Risk Minimization (IRM) [3]: Here, we remove the counterfactual fairness constraint in our framework by setting  = 0. 3) CLAIRE-NI: As the third variant of our proposed framework, we remove the invariant penalty by setting  = 0 in CLAIRE.From the results shown in Fig. 5, the counterfactual data augmentation and invariant penalty both contribute to the overall fairness performance.

Parameter Study
We set the hyperparameter  ∈ {0.01, 0.1, 1.0, 10, 100}, the sampling number  ∈ {1, 5, 10, 20, 100},  ∈ {0.01, 0.1, 1.0, 10, 100},   ∈ {0.01, 0.1, 1.0, 10, 100}, and compare the performance of our proposed framework in Fig. 6.Here we only show the results of CLAIRE-M on the law school dataset, as similar patterns can be observed in CLAIRE-A and other datasets.As observed in Fig. 6(a),  controls the "fairness" of the embedding in counterfactual data augmentation.Larger values of  can improve the counterfactual fairness of the framework, and have no obvious impact on the prediction performance.With larger  in Fig. 6(b), the performance of counterfactual fairness also improves because more samples are generated in counterfactual data augmentation. controls the importance of counterfactual fairness constraint,  controls the invariance penalty of the representations.As shown in Fig. 6(c), with the increase of , the framework focuses more on removing the biases from the sensitive attribute, which may sacrifice some information to predict the target, and thus results in higher RMSE, but can achieve better fairness.As shown in Fig. 6(d), with the increase of , the framework may exclude more variables with unstable relationships to the target across different sensitive subgroups, it may thus lose some information specific to each sensitive subgroup, but can also contribute to better fairness.From the observations, the framework achieves a good trade-off on the prediction performance and counterfactual fairness with proper parameter settings.

RELATED WORK
Counterfactual Fairness.Recently, aside from traditional statistical fairness notions [4,12,13,16,22,54,55], causal-based fairness notions [30,35,42] have attracted a surge of attentions because of its strong capability of modeling how the discrimination is exhibited.Among them, the notion of counterfactual fairness [30] assesses fairness at the individual level.Most of the existing counterfactual fairness studies [18,30,53] are based on a given ground-truth causal model or rely on causal discovery methods [26,38,46].Multi-world fairness [42] considers the situation when the ground-truth causal model cannot be decided, but it still requires a candidate set containing causal models which may be true, and proposes an optimization based method to achieve counterfactual fairness with the average of the causal models in the candidate set.Many methods based on traditional causal discovery are limited in certain scenarios, such as low-dimensional and linear settings.Recent studies [19,27,56] provide more discussion about counterfactual fairness under different assumptions and scenarios.But in conclusion, most of the above methods require much explicit prior knowledge of the causal model to remove the influence of the sensitive attribute on the prediction, and lack discussion of the impact of incorrect causal models.Invariant Risk Minimization.Invariant risk minimization (IRM) [3] and its variants [2,11,21,25,29,34] are originally proposed for out-of-distribution (OOD) generalization [29,43].It is based on the theorem that the representations of causal features elicit the existence of an optimal predictor across different domains.From a causal perspective, IRM identifies these causal features and excludes those features with spurious correlations as these correlations are not robust across different domains.The connections between fairness and IRM are discussed in [3,15,48].IRM can learn representations to capture causal features which have invariant relationships to the prediction target.However, the representations may still contain the information of domains (e.g., different sensitive attributes), which may cause biases to prediction.Our work investigate to bridge this gap between IRM and counterfactual fairness.

CONCLUSION
In this work, we study a novel problem of learning counterfactually fair predictors from observational data with unknown causal models.We propose a principled framework CLAIRE.More specifically, we specify this framework by learning counterfactually fair representations for each instance, and make predictions based on the representations.To learn fair representations, a variational auto-encoder based counterfactual data augmentation module is developed to generate counterfactual data with different values of sensitive attribute for each instance.We further reduce potential biases by applying the invariant penalty in each sensitive subgroup to exclude the variables with spurious correlations to the target.We evaluate the proposed framework under both real-world benchmark datasets and synthetic data.Extensive experimental results validate the superiority of the proposed framework over existing fairness predictors in different aspects.Overall, this paper provides insights for promoting counterfactual fairness in a more realistic scenario without given correct causal models, and also shows the impact of incorrect causal models.In the future, more research work on counterfactual fairness in real-world cases, such as missing and noisy data, is worth further exploration.

Figure 1 :
Figure 1: An illustrative example of incorrect causal models.

Figure 3 :
Figure 3: The ground truth causal models of two real-world datasets Law School and Adult.

Figure 4 :
Figure 4: The true causal model (M) and two incorrect causal models (M 1 and M 2 ) of the synthetic dataset.
(b).The true causal model of the synthetic data is shown in Fig.4(a).Here, causal relations regarding  2 in M 1 are reversed.As all the baselines (except CFP-U and CFP-O) do not rely on the causal model for prediction, so their results are not influenced by the correctness of the causal model.Here, we investigate the influence of the incorrect causal model on CFP-U and CFP-O and compare their performance with our proposed framework.From the results shown in

Figure 6 :
Figure 6: Performance of CLAIRE with different settings of hyperparameters.
is a set of functions (referred to as structural equations) which describe the causal relationships among the above variables.For each variable   ∈  ,   =   (  ,    ), where "  ⊆  \   " and "   ⊆  " are variables that directly determine   .A causal model is associated with a causal graph, which is a directed acyclic graph (DAG).Each node in the causal graph corresponds to a variable in the causal model, and each directed edge represents a causal relationship.For example, for observed variables , , the value of the counterfactual "what would  have been if  had been set to ?" is denoted by  ← .

Table 1 :
Results comparison of different predictors on two real-world datasets.Our method CLAIRE can achieve the best performance in counterfactual fairness with competitive prediction performance.

Table 2 :
Study on synthetic data about the adverse effects of incorrect causal model M 1 .
the ground truth causal model.If a predictor is counterfactually fair, the distributions of the predictions under different groundtruth counterfactuals are expected to be the same.Here, we use two distribution distance metrics (including Wasserstein-1 distance

Table 3 :
Study on synthetic data regarding the adverse effects of incorrect causal model M 2 .
• The unaware predictor removes certain biases by ignoring the sensitive attribute, but it cannot exclude the implicit biases caused by inappropriate usage of the descendants of the sensitive attribute.• Both CFP-U and CFP-O infer the latent variables based on the given causal model, so they perform well if the given causal model is correct.• Our proposed CLAIRE consistently outperform other baselines (except the constant predictor) under different fairness metrics, and also have better prediction performance than many other fairness-aware baselines (including CFP-U and CFP-O).It implies that CLAIRE can achieve a good balance between prediction performance and counterfactual fairness.• The variants CLAIRE-M and CLAIRE-A generally have similar performance, but CLAIRE-A is slightly better in fairness, it may benefit from the effectiveness of its adversarial learning mechanism in removing the sensitive information.

Table 2 ,
we find the fairness of CFP-U and CFP-O are obviously affected by the incorrect causal model.Although CFP-U and CFP-O with incorrect causal model have slightly better performance in prediction, that is because based on the incorrect causal model, they may take  2 into prediction, which however, brings biases for prediction.Our proposed framework does not assume the existence of any given causal model for prediction.The counterfactual data augmentation enables us to eliminate the influence of sensitive attributes to the prediction.Furthermore, the learned invariant representations in CLAIRE exclude the adverse impacts of non-causal variables with spurious correlations and leverage the causal variables to learn representations, thus  2 is encouraged to be excluded from prediction.Incorrect causal model M 2 .Now, we use the synthetic data to showcase the impact of another incorrect causal model as shown in Fig.4(c).As described in Section 4.1, we set the parameter   in Eq. (8), which determines the relation  →  1 , to be small on the