DualFair: Fair Representation Learning at Both Group and Individual Levels via Contrastive Self-supervision

Algorithmic fairness has become an important machine learning problem, especially for mission-critical Web applications. This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations. Unlike existing models that target a single type of fairness, our model jointly optimizes for two fairness criteria - group fairness and counterfactual fairness - and hence makes fairer predictions at both the group and individual levels. Our model uses contrastive loss to generate embeddings that are indistinguishable for each protected group, while forcing the embeddings of counterfactual pairs to be similar. It then uses a self-knowledge distillation method to maintain the quality of representation for the downstream tasks. Extensive analysis over multiple datasets confirms the model's validity and further shows the synergy of jointly addressing two fairness criteria, suggesting the model's potential value in fair intelligent Web applications.


INTRODUCTION
Machine learning techniques are being used in many missioncritical real-world Web applications, such as recommendation [1], hiring [25], and advertisement of profiles [41].Many tasks are subject to stereotypes or societal biases embedded in data, leading models to treat individuals of particular attributes unfairly.Notable examples include facial recognition models that fail to recognize people with darker skin [9] or recruiting models that favor men over women candidates with comparable work experience [29].Lacking fairness in these tasks aggravates social inequity and even harms individuals and society.As a response, a growing number of studies are dedicated to algorithmic fairness [17,48].
Fairness is also important in representation learning.Fair representation learning is useful when data needs to be shared without any prior knowledge of the downstream task.Previous research has focused on finding data representations that apply to diverse tasks while considering fairness.For example, LAFTR [35] and VFAE [34] introduce an adversarial concept, and studies like [40] and [43] maximize the conditional mutual information with fairness constraints to reduce the potential bias.These approaches generate fairer embeddings, albeit at the cost of performance degradation in downstream tasks.They also consider only group-level fairness, missing out on individual-level fairness [6].
We propose DualFair, a self-supervised learning method that debiases sensitive information through fairness-aware contrastive learning while preserving rich expressiveness (i.e., representation quality) through self-knowledge distillation.DualFair can learn data representations satisfying two types of fairness, i.e., group fairness and counterfactual fairness.The former (a.k.a.demographic parity) requires every protected group be treated in the same way as any advantaged group, and the latter requires the model to treat individuals in a counterfactual relationship (i.e., those who share similar traits except for the sensitive attribute) alike [32].Counterfactual fairness is a type of fairness defined at the individual level.It removes bias from sensitive information by comparing against synthetic individuals from the counterfactual world.
Because the model is unaware of downstream tasks during training, adding fairness-related regularization terms (e.g., minimizing demographic parity from model predictions [26]) or fairness constraints (e.g., limiting the prediction difference among groups below the threshold [51]) as in other research is not feasible.Instead, we propose the following alternative goals for our loss design, which are illustrated in Figure 1: (1) Group fairness: Data points from every sensitive group have the same distribution across the embedding space; thus, their group membership cannot be identified.(2) Counterfactual fairness: Data points from those in a counterfactual relationship are located close in the embedding space.This means the embedding distance between an individual and its counterfactual version should be minimized.
These goals can apply universally to any downstream task.By making sensitive attributes indistinguishable in the embedding space, a downstream classifier cannot determine which data point belongs to the protected group and consequently produce unbiased predictions.The same applies to embeddings of counterfactual pairs.To implement these fairness objectives, we first propose a cyclic variational autoencoder (C-VAE) model to generate counterfactual samples (Sec.3.2).This sample generation task is non-trivial due to the high correlation between data features and sensitive attributes.Next is to create fairness-aware embeddings from the input data.We employ contrastive learning, which learns representations based on the similarity between instances by modifying the selection of positive and negative samples.We select negative samples from the same protected group and select positive samples from counterfactual versions (Sec.3.3).In addition, we use self-knowledge distillation to extract semantic features and enforce consistent embeddings between the original and its augmentation to maintain a high representation quality (Sec.3.4).Experiments demonstrate that the proposed framework can generate data embeddings that satisfy both fairness criteria while preserving prediction accuracy for a variety of downstream tasks.The main contributions of this paper are summarized below.
• We propose a self-supervised representation learning framework (DualFair) that simultaneously debiases sensitive attributes at both group and individual levels.• We introduce the C-VAE model to generate counterfactual samples and propose fairness-aware contrastive loss to meet the two fairness criteria jointly.• We design the self-knowledge distillation loss to maintain representation quality by minimizing the embedding discrepancy between original and perturbed instances.• Experimental results on six real-world datasets confirm that Du-alFair generates a fair embedding for sensitive attributes while maintaining high representation quality.The ablation study further shows a synergistic effect of the two fairness criteria.Codes are released at a GitHub repository. 1

RELATED WORKS 2.1 Fairness in Machine Learning
Fairness is a conceptual term, and cannot be measured in a straightforward manner.Instead, there are several criteria to observe it from different perspectives: unawareness, group fairness, and individual fairness [20,44].Removing sensitive attributes from data is a simple way to achieve fairness through unawareness [12].However, it can  be brittle if some hidden features are highly correlated with sensitive attributes.Group fairness states that subjects in different groups (e.g., gender) should have an equal probability of being assigned to the predicted class [15,20].Demographic parity and equalized odds are two measures of group fairness [23,50].Individual fairness is a fine-grained criterion that treats similar individuals as similarly as possible [18].Counterfactual fairness is one alternative to individual fairness that assumes a counterfactual sample by flipping the sensitive attributes and treats it similarly to the original one [39].
Researchers also focus on diverse learning steps to ensure fairness.Elazar et al. [19] focus on fairness of the input dataset by noting that a fair decision is made with the model trained with an unbiased dataset.However, Wang et al. [46] demonstrate that input-wise fairness does not entirely support fair decisions in largescale datasets.Therefore, post-processing techniques are proposed to secure fairness, such as hiding the sensitive attribute information from trained representations by null space projection [37] or identifying the subspace of sensitive attributes [7].
With the advent of representation learning, recent self-supervised learning approaches aim to produce a fair representation of individual instances without knowing downstream tasks (i.e., treatmentlevel fairness) [31].These fair representation learning approaches add fairness-related objectives in training steps, or apply adversarial learning objectives to obtain fair representations [5,22,33,52].For example, LAFTR [35] employs a discriminator that detects sensitive attribute information, while generators make indistinguishable representations against discriminators.VFAE [34] proposes a variational autoencoder with regularization using Maximum Mean Discrepancy (MMD) to learn fair representations.The overall architecture of DualFair, where  , , (ℎ 1 , ℎ 2 ) denote the backbone network, the projection head, and two prediction heads, respectively.DualFair aims to achieve both fairness and representation quality by jointly optimizing the fairness-aware contrastive loss and the self-knowledge distillation loss.

Counterfactual
Song et al. [40] proposes a user-centric approach that allows users to control the level of fairness and maximize the expressiveness of representations in the form of conditional mutual information with the preset fairness threshold.Tsai et al. [43] aim to optimize the same objective but maximize the lower bound of conditional mutual information via InfoNCE objective instead.However, these methods only consider the single group fairness objective and often lose the expressiveness during the training [10,11].Specifically, group fairness helps achieve anti-discrimination for protected groups, but individual justice is not guaranteed [6].Our research objective is to implement multiple algorithmic fairness concepts on a single deep model, a topic that is explored less in the literature.We seek to achieve both group fairness at a coarse-grained level and counterfactual fairness at an individual level while preserving a high level of representational quality.

Contrastive Self-Supervised Learning
The fundamental idea of contrastive learning is to minimize the distance between similar (i.e., positive) instances while maximizing the distance among dissimilar (i.e., negative) instances [13,24].Sim-CLR [13] utilizes augmented images as positives while the other images in the same batch as negatives.Maintaining a similar contrastive concept, MoCo [24] exploits a momentum encoder and proposes a dynamic dictionary with a queue to handle negative samples efficiently in both performance and memory perspectives.InfoNCE loss [36] is often used in contrastive learning.Minimizing this loss increases mutual information between positive pairs so that the model can extract the consistent features between the original and augmented samples.

METHODOLOGY 3.1 Overview
We present DualFair, a self-supervised learning method that ensures both group and counterfactual fairness criteria while maintaining high representation quality with the design of two special losses.We use a pictorial sketch in Figure 2 to describe them.
First is the fairness-aware contrastive loss, which treats individuals in counterfactual relationships alike (i.e., counterfactual fairness) and ensures non-distinguishable embeddings over sensitive attributes (i.e., group fairness).A key to this loss is the design of a sample generator that produces a counterfactual version x  ′ cnt of an input item x  by maintaining its latent characteristics but flipping the sensitive attribute  →  ′ (Sec.3.2).In a job interview, for instance, this corresponds to a hypothetical decision if the applicant's gender were to change, with all other latent traits, such as education and work experience, remaining unchanged [32].Given a dataset D with sensitive attributes  like gender, the fairness-aware contrastive loss is defined on the batch B  , whose data instances share the same sensitive attribute  ∈  (e.g., gender is female).The loss maximizes agreements between the original item and its counterfactual version generated by our generator (i.e., ensuring counterfactual fairness) and minimizes agreements between items with the same sensitive attribute (i.e., ensuring group fairness) (Sec.3.3).
Second is the self-knowledge distillation loss for extracting semantic information from the instance and maintaining the performance of downstream tasks (Sec.3.4).Consider a Siamese network that has two heads: one for the student branch and the other for the teacher branch.We impose the perturbed instance's embedding from the student branch to be similar to that of the original instance from the teacher branch.The rationale behind this treatment is that a perturbation that does not largely deform the original content should not change the learned representation.We introduce our unique perturbation module TabMix for an efficient self-knowledge distillation process.

Counterfactual Sample Generation
Our sample generator creates data following the definition of a counterfactual relationship in [32].Let us denote three variables  ,  , , where  is a set of observable variables, including the sensitive attribute  and other attributes  . is a set of latent variables  independent of  or  . is a set of functions representing causal relationships from  to  , or between  .Considering a causal model with ( ,  , ), counterfactual inference works in three steps: (1) Calculate the posterior distribution of latent variables  from the input instance.(2) Choose a target sensitive attribute  ∈  (e.g., gender) and reformulate a set of functions  assuming the sensitive attribute  ∈  is fixed in the causal graph.(3) Infer  from  and  ∈  using re-formulated .
Extracting the latent variable  is non-trivial because it should explain observable variables sufficiently without revealing information about sensitive attributes.
For this task, we introduce the cyclic variational autoencoder (C-VAE) depicted in Figure 3.Our counterfactual sample generator is trained in the form of  (x  , u, ) =  (u) () (x  |u, ), where u and  represent the latent variable and the sensitive attribute in the input x  . (u) and  () represent the prior distribution of the latent variable u from the encoder and the sensitive attribute . (u) is assumed to follow a standard normal distribution [28].Then  (x  |u, ) is a likelihood function from the decoder that reconstructs the input sample x  from u and .The encoder and decoder networks are denoted as   and   .We formulate  vae (x  ) from the variational lower bound of log  (x) as follows: Latent variable u from the encoder   and sensitive attribute  are not guaranteed to be independent even if they correspond to  and  in the causal model that are independent.To ensure independence, we introduce an additional adversarial objective and lead the model to eliminate unnecessary sensitive information from u.A discriminator   is trained with a cross-entropy loss to predict the sensitive attribute  from the latent representation u: We negate the cross-entropy loss to train the sample generator to prevent the discriminator   from predicting sensitive attributes.We further introduce the property of cyclic consistency to enhance training stability and improve generation quality.Our loss ensures that the counterfactual-counterfactual sample (i.e., doubleflipped sample created by our generator) is similar to the original item.We produce a counterfactual sample x  ′ cnt of the input instance x  , repeating the three steps below: (1) Compute a latent representation u from the encoder   .
(3) Reconstruct the counterfactual sample from the decoder   given the latent variable u and the target sensitive attribute  ′ .
The reconstructed sample x  ′ cnt goes back to the generator to produce a counterfactual-counterfactual (or double-flipped) sample x  cyc that shares the identical sensitive attribute with the original input.Our cyclic consistency loss in C-VAE maximizes the likelihood between two instances; x  , x  cyc (Eq.3).
The total loss to train the sample generator is as follows:

Fairness-aware Contrastive Learning
The next objective, given the learned sample generator, is to train an encoder that satisfies both group and counterfactual fairness.We use contrastive learning for this task.Let us denote an input query as x ∈ R  , and a positive sample and a set of negative samples for contrastive learning as x + and X − , respectively.Typically, positive samples and negative samples are determined by predefined rules.For example, a rule can set a single data instance and its augmented versions as positive, whereas set all others as negative (see description in §2.2).In this work, we define InfoNCE loss in contrastive learning to train the model  (i.e.,  := ℎ 1 •  •  in Figure 2) as follows: where sim(•) is the function to measure two embeddings' similarity, and  is the temperature parameter.
This contrastive loss in Eq. 7 comprises two terms.First, the alignment loss ( align ) encourages the embedding positions of positive pairs to be placed closer.Second, in contrast, the distribution loss ( distribution ) matches all instances' embeddings into the prior distribution with a high entropy value.Our model uses the generalized contrastive objective proposed in [45], which supports the diverse choice of prior distributions by introducing the optimal transport theory [8] and by changing the distribution loss with Sliced Wasserstein Distance (SWD) [30].Given the prior distribution over the embedding space Z  and the set of embeddings Z = { (x)|x ∈ X − }, the loss is formulated as the following equation: We modify the contrastive loss to jointly meet two fairness criteria.Assume that all instances in X − are sampled to have the same sensitive attribute  (e.g., gender = female).Then minimizing the distribution loss on X − (i.e., SWD( Z, Z  ) in Eq. 8) will cause the sensitive group  in embeddings to match the predefined prior distribution Z  .By iterating this process for every sensitive group (e.g., female and male), our model can produce embeddings of groups that follow the same distribution, Z  .As a result, embeddings become no longer distinguishable by sensitive attributes.Given a batched set of instances B  and an input x  , negative samples are defined as follows: To ensure counterfactual fairness, our model also considers a counterfactual version of the sample as positive in contrastive learning.Given an input sample x  , we flip the sensitive attribute  →  ′ (e.g., female to male) and generate the counterfactual sample x  ′ cnt from our C-VAE based sample generator.The alignment loss in Eq. 8 then minimizes the embedding discrepancy between the original and counterfactual data instances.From the positive sample and the set of negative samples (x  ′ cnt , X  − ), the fairness-aware contrastive loss for the given input x  is defined as follows: We train the embedding on top of the Euclidean space with Gaussian prior Z  .This is different from other contrastive approaches where embeddings are learned over the L2-normalized space [13,24].L2-normalized space used in those approaches does not account for the norm of embeddings during training, which can also be an important clue for sensitive attributes in downstream tasks.However, our choice of the Euclidean space regularizes the norm distribution and removes its dependency on sensitive attributes.The alignment loss  align is then defined with a negative Euclidean distance as a similarity measure between the original and positive instance.

Self-knowledge Distillation
The final objective is maintaining the representation quality.We design the self-knowledge distillation loss to reduce the embedding discrepancy between the original and perturbed instances.Inspired by the original literature [21], we present a Siamese network with two different heads, where each head becomes the student and teacher branches.Then, the model is trained through a prediction task so that the original instance from the teacher branch is highly predictive of the perturbed instances from the student branch.This process is called self-knowledge distillation, since the knowledge extracted from the teacher branch is progressively transferred back to the student branch [27,42].To prevent the model from collapsing into a naive solution (e.g., representation becomes constant for every instance), we let the architecture of student and teacher be asymmetric, restricting the gradient flows of the teacher branch [14,21] (see the bottom part of Figure 2).Let us denote  , , and ℎ 2 as the backbone network, the projection head, and the prediction head, respectively.The self-knowledge distillation loss is defined as follows: where x  pert is a perturbed version of original instance x  and  represents the stop-gradient operation.
When perturbing the instance, we propose TabMix for our augmentation strategy.Given an input sample x  , the generator randomly masks  features and replaces their values from other instances, as shown in Figure 4. 2 Let ⊙ denote an element-wise multiplicator and m ∈ {0, 1}  a binary mask vector indicating which feature to replace.The mixing operation is defined as: TabMix does not need to consider the scale of numeric variables; hence it is easily applicable to various datasets.The perturbed instance x  pert is obtained as follows: x  pert = TabMix(x  , x ′ ).
Finally, the loss function of the entire process is the sum of two losses: fairness-aware contrastive loss ( fair-cl ) and self-knowledge distillation loss ( self-kd ) as in Eq. 15.

EXPERIMENT
We compared DualFair with the latest models on multiple fairnessaware tabular datasets.Component analyses were performed to test the effect of each module.We also present qualitative analyses and case studies on how the model handles sensitive attributes.

Performance Evaluation
Datasets: We use the following datasets: (1) UCI Adult [3] contains 48,842 samples along with label information that indicates whether a given individual makes over 50K per year as a downstream task; (2) UCI German Credit [3]  Evaluation: The evaluation uses embeddings learned from the last ten epochs, and the averaged results of five metrics below: • AUC, the area under the receiver operating characteristics, measures the prediction performance of the downstream classification task (i.e., the performance of the binary classifier to distinguish between cases and non-cases).If this value is 1, the model distinguishes the target variable from input instances with absolute precision.• RMSE, the root mean squared error, measures the deviance between the prediction and ground-truth for the regression task.• Demographic Parity Distance (Δ) is a group fairness metric and is defined as the expected absolute difference between the predictions of protected groups.Given a set of sensitive attributes S, the definition of Δ is as follows ( ≠  ′ ): • Equalized Odds (Δ), also known as equality of opportunity, is a group fairness metric based on the expected difference between the estimated positive rates for two protected groups.
Given the set of sensitive attributes S, Δ is defined as follows ( ∈ {0, 1},  ≠  ′ ): • Counterfactual Parity Distance (Δ) is a metric for counterfactual fairness and measures the prediction parity between two instances in counterfactual pair relationships.Assume that (x, x) are from a set of counterfactual pairs D × D. Then Δ is defined as follows: Baselines: A total of nine baselines are employed.The first two directly utilize raw datasets to classify target variables, given that the downstream task is already known.They differ by input structure: (1) original data (LR) and ( 2) original data concatenated with the synthetic counterfactual samples (C-LR).The next one is (3) SCARF [4], an unsupervised contrastive learning method that does not consider any fairness requirement during training.The remaining six learn fair representations from data without any information on downstream tasks.They utilize different unsupervised representation learning techniques with fairness-aware objectives.( 4) VFAE [34] introduces the maximum mean discrepancy term on top of the variational autoencoder to produce fair representations.( 5) LAFTR [35] adopts an adversarial approach to avoid unfair predictions from embeddings.(6,7) MIFR and L-MIFR [40] learn the controllable fair representation through mutual information.The two differ in the use of the Lagrangian dual optimization method.(8,9) C-InfoNCE and WeaC-InfoNCE [43] maximize the conditional mutual information within representations.The two differ in how they introduce the sensitive variable in the InfoNCE objective.For all methods, we use a logistic regression model for the downstream classification task and a Random Forest regression model for the downstream regression task as a base predictor.The predictor is learned on top of either raw dataset (1-2) or learned embeddings (3)(4)(5)(6)(7)(8)(9).
Implementation details: We trained the multi-layer perceptron (MLP) model.A three-layer MLP was used for the backbone network  , and a two-layer MLP for both projection and prediction heads , (ℎ 1 , ℎ 2 ).ReLU activation function was used for all architectures, with the training of 200 epochs and a batch size of 128.Adam optimizer with a learning rate of 1e-3 and a weight decay factor of 1e-6 was utilized.For the counterfactual sample generator, threelayer, two-layer, and two-layer MLPs were utilized for the encoder, decoder, and discriminator, respectively.The sample generator was trained for 600 epochs with the Adam optimizer.Mode-specific normalization [49] was used for preprocessing categorical variables.Results: DualFair outperforms other baselines in terms of the averaged rank for each evaluation metric.Table 2 shows that our model's embedding maintains its prediction performance (i.e., AUC and RMSE) while successfully removing any bias related to sensitive information from the embedding.Tables 3 and 4 report the detailed performance of DualFair and other baselines on downstream classification and regression tasks.According to the results, naively adding the counterfactual samples (i.e., C-LR) is insufficient to handle bias from sensitive information.Similarly, the fair embedding learning baselines such as VFAE, L-MIFR, or WeaC-InfoNCE fail to remove sensitive information entirely nor generate a fair embedding without losing critical information for the downstream task.These findings suggest that DualFair achieves both the performance and fairness requirements in a single training.
Table 3: Detailed results over downstream classification tasks.Across the four datasets, gender is considered a sensitive attribute to protect.Up-arrow (↑) denotes that a higher value is better, and down-arrow (↓) is vice versa.The best results among fair representation learning baselines are highlighted in bold.

Component Analyses
We performed an ablation study by repeatedly assessing and comparing the models after removing each component.We also examined the counterfactual sample quality.
Ablation study: DualFair utilizes two learning objectives: fairnessaware contrastive loss to ensure both group and counterfactual fairness, and self-knowledge distillation loss to ensure the representation quality.Ablations remove each loss objective from the full model to assess the unique contribution.Table 5 reports the results based on the UCI Adult dataset.The full model achieves the best balance between fairness and prediction performance, implying that each loss plays a unique role in designing fair representations.The result without self-knowledge distillation shows that our fairnessaware contrastive loss effectively improves fairness while there is a trade-off for prediction performance.On the other hand, the experiment only with self-knowledge distillation loss (i.e., w/o  align and  distribution ) produces opposite result.The result without only alignment loss, which is in charge of counterfactual fairness, contrasts the role of two losses in fairness-aware contrastive objective; Counterfactual fairness does not have any improvement while group fairness has been reduced.Additionally, we experimented using Gaussian noise and dropout as alternative augmentation strategies for TabMix.This ablation reduced the prediction accuracy (AUC: 0.80→0.78,0.75) while maintaining fairness (DP: 0.03→0.03,0.02).
Counterfactual samples: To test the quality of the generated counterfactual samples, we examined if the target variable is predictable, even if the model is trained only with the synthetic counterfactual samples.Table 6 reports the logistic regression performance that changes the training set from the original data.The latter model is on par with the model trained with the original dataset.
We next examined if feature correlations are maintained for counterfactual samples.Figure 5 shows the correlation matrix between features in the original UCI adult dataset and the counterfactual dataset.Pearson correlation is used between continuous variables,   and Cramer's V value is utilized between categorical variables.To measure the correlation between categorical and continuous variables, we label-encode the categorical variables to their continuous counterparts and compute the Pearson correlation with original continuous values.Two matrices show a remarkable resemblance, indicating that the relationship between features in the counterfactual samples is well-maintained.

Qualitative Analysis
We visually check the learned representations to investigate how well our model handles sensitive information.Figure 6 shows how debiasing achieves group fairness in downstream predictions.It compares the prediction results on UCI Adult for two models: standard logistic regression applied on the raw dataset and the same regression model using the DualFair embeddings.After debiasing, there is almost no difference in the distribution of predicted income between male and female groups, as depicted in Figure 6b.However, two probability density functions in Figure 6a appear substantially different in the standard model.These results confirm the outstanding debiasing potential of the proposed model.We also investigate how well DualFair achieves counterfactual fairness by examining the difference in the Δ values of the proposed model and its ablation GroupFair, which omits the alignment loss between counterfactual pairs in fairness-aware contrastive loss (Eq.10).Experimental comparison over UCI Adult in Figure 7a shows a smaller prediction difference for the full model, indicating that the omitted loss is effective in debiasing sensitive attributes at the individual level.One of the counterfactual pairs from the model is illustrated in Figure 7b.Note that in generating counterfactual examples, we do not simply flip sensitive attributes (i.e., gender) but also observable variables (i.e., age, relationship) change together.When we compute Δ for the example case from two models, we confirm that the proposed DualFair satisfies the fairness concerns to some extent (prediction probability for original: 0.22 vs. counterparts: 0.24).Meanwhile, GroupFair fails to debias gender information and gives a higher score to the male counterparts (0.24 vs. 0.45).

CONCLUSION
We presented DualFair, a self-supervised embedding learning model that de-biases sensitive data attributes without any prior information on downstream tasks.Its design includes a unique fairnessaware contrastive loss and self-knowledge distillation technique.Experiments confirm that DualFair generates rich data representations that ensure both group fairness and counterfactual fairness.Our model is applicable to various Web applications, including classification, ranking, recommendation, and text generation tasks.
The algorithmic bias observed in search engine results and social media platforms has reinforced the need for a clear policy for protecting sensitive attributes.However, the problem is more complex because bias can also exist in other domains, such as in natural language processing (e.g., Q&A generation, chatbot).The recently released 'AI Bill of Rights' blueprint from the White House states that "you should not face discrimination by algorithms and systems should be used and designed in an equitable way." We believe that our self-supervised learning with debiasing techniques can serve as a building block for advancing the performance and fairness requirements of real-world web applications.The main model consists of the backbone network ( ), the projection head (), and two prediction heads (ℎ 1 , ℎ 2 ).We chose all components to be ReLU networks, whose hidden dimension is set to 256.The backbone network has three layers, and all heads have two layers.DualFair is trained 200 epochs, and its batch size is set to 128.The Adam optimizer with a learning rate of 1e-3 and a weight decay factor of 1e-6 is adopted.We apply one-hot encoding to discrete variables for input preprocessing and z-score scaling to continuous variables.
Counterfactual sample generator: We introduce the C-VAE model to generate counterfactual samples.When preprocessing the input data, discrete values are encoded into the one-hot vector, and every continuous value is converted to a vector of probability density via mode-specific normalization [49].Mode-specific normalization uses a variational Gaussian mixture model (VGM) to estimate the number of modes and fit the Gaussian mixture model on top of the target distribution.Then, the probability density of each mode is computed for the normalization.We discovered that mode-specific normalization improves counterfactual sample quality more than naive min-max normalization.The counterfactual sample generator is trained for 600 epochs, and the batch size is set to 256.The weight between reconstruction loss and probability distribution loss is set to 2:1.
Computational complexity: We used four A100 GPUs for all experiments.Our model took 20% more training time than WeaC-InfoNCE (11 vs. 9 minutes for 200 epochs), and the counterfactual VAE training took less than 5 minutes.

A.2 Further Results on Performance Evaluation
We present the performance of DualFair and other baselines on downstream classification tasks by setting gender as the sensitive attribute in the main manuscript.We here present the extra results on race to support DualFair's generalizability on different sensitive attributes.In the race attribute, UCI Adult, COMPAS, and LSAC have five, five, and six classes, respectively.UCI German Credit dataset does not include race information and hence is skipped.
The evaluation results in Table 7 demonstrate that DualFair learns data distribution of critical features while minimizing spurious information from the multi-class sensitive attribute.

A.3 Further Results on Ablation study
To confirm the effectiveness of the proposed counterfactual sample generator, we further assessed two ablations on loss objectives: (1) without cyclic consistency loss  cyc (Eq. 3) and ( 2) without reconstruction loss in  vae (Eq.1).Table 8 reports the test set performance of the logistic regression model, after fitting with synthetic counterfactual samples from each baseline.When we conduct experiments over four datasets, our model with full components outperforms baselines in terms of both the AUC and F1-score.8a).When we fit the logistic regression model to predict the sensitive attribute on top of the embeddings, the model reports very high AUC values.Meanwhile, the sensitive information is debiased across the epochs, and the sensitive attribute becomes indistinguishable in the embedding space 1 https://github.com/Sungwon-Han/DualFairMale Female Male ≈ Female (a) Group fairness: Embeddings of different member groups (e.g., by gender) should not be distinguished.This makes unbiased predictions possible at the group level.Pair w.r.t Gender (b) Counterfactual fairness (i.e., individual-level fairness): Individuals of similar traits should have similar embeddings irrespective of their sensitive attributes.This makes unbiased predictions possible at the individual level.

Figure 1 :
Figure 1: Illustration of two fairness criteria.

Figure 2 :
Figure2: The overall architecture of DualFair, where  , , (ℎ 1 , ℎ 2 ) denote the backbone network, the projection head, and two prediction heads, respectively.DualFair aims to achieve both fairness and representation quality by jointly optimizing the fairness-aware contrastive loss and the self-knowledge distillation loss.

Figure 3 :Figure 4 :
Figure 3: Design of our counterfactual sample generator.Adversarial training over VAE eliminates sensitive information  from u. Cyclic consistency loss makes training more stable by ensuring the double-flipped sample (x  cyc ) to be same as the original (x  ).

Figure 5 :
Figure 5: Correlation heatmaps show that (a) original UCI adult dataset and (b) synthetic dataset from our counterfactual sample generator-follow similar patterns.

Figure 6 :
Figure 6: Income prediction results between the male and female groups are similar when we use DualFair's embeddings.We compared two models, (a) standard logistic regression on top of original data and (b) logistic regression on top of Du-alFair embedding, on the UCI Adult dataset.

Figure 7 :
Figure 7: Additional findings on the counterfactual model: (a) Comparison of the Δ value between DualFair and its ablation GroupFair which omits the alignment loss between counterfactual pairs.(b) Observable variables affected by sensitive attributes are also changed in the counterfactual example.

Figure 8
Figure8visualizes the embeddings over the UCI Adult dataset from four models: SCARF and DualFair at three different points of training (1, 10, and 200 epoch).SCARF does not consider any fairness requirements, and hence the learned representations of the same protected group (i.e., gender) are placed nearby, forming a local cluster in the embedding space (see Fig.8a).When we fit the logistic regression model to predict the sensitive attribute on top of the embeddings, the model reports very high AUC values.Meanwhile, the sensitive information is debiased across the epochs, and the sensitive attribute becomes indistinguishable in the embedding space

Table 1 :
Data descriptions of six datasets We split each dataset into disjoint training and test sets.Embedding learning and counterfactual sample generation are based on the knowledge of the training set; evaluations are based on the test set.Table 1 summarizes statistics of each dataset.

Table 2 :
Performance comparison summaries among fairness-aware baselines and DualFair.Averaged rank for each evaluation metric across six datasets is reported.

Table 4 :
Detailed results over downstream regression tasks.The best results among fair representation learning baselines are highlighted in bold.

Table 5 :
Ablation results of DualFair on UCI Adult dataset. align and  distribution are from fairness-aware contrastive loss while  self-kd is self-knowledge distillation loss. align and  distribution 0.81 0.12 0.12 0.08

Table 6 :
The model trained with counterfactual samples shows good classification performance on the original data, proving that counterfactual samples are augmented well.

Table 7 :
Additional results over downstream classification tasks in which race is set as a target-sensitive attribute to protect.Performances are reported with four evaluation metrics (AUROC, Δ, Δ, and Δ) over three datasets.
DualFair: DualFair exploits the multi-layer perceptron architecture for both the main model and counterfactual sample generator.

Table 8 :
Ablation study of counterfactual sample generator.The proposed model with full components shows the best classification performance on the original data.Reconstruction in ( vae ) 0.35 0.45 0.35 0.45 0.48 0.42 0.36 0.47 A.4 Further Results on Qualitative Analysis