Automatic Feature Fairness in Recommendation via Adversaries

Fairness is a widely discussed topic in recommender systems, but its practical implementation faces challenges in defining sensitive features while maintaining recommendation accuracy. We propose feature fairness as the foundation to achieve equitable treatment across diverse groups defined by various feature combinations. This improves overall accuracy through balanced feature generalizability. We introduce unbiased feature learning through adversarial training, using adversarial perturbation to enhance feature representation. The adversaries improve model generalization for under-represented features. We adapt adversaries automatically based on two forms of feature biases: frequency and combination variety of feature values. This allows us to dynamically adjust perturbation strengths and adversarial training weights. Stronger perturbations are applied to feature values with fewer combination varieties to improve generalization, while higher weights for low-frequency features address training imbalances. We leverage the Adaptive Adversarial perturbation based on the widely-applied Factorization Machine (AAFM) as our backbone model. In experiments, AAFM surpasses strong baselines in both fairness and accuracy measures. AAFM excels in providing item- and user-fairness for single- and multi-feature tasks, showcasing their versatility and scalability. To maintain good accuracy, we find that adversarial perturbation must be well-managed: during training, perturbations should not overly persist and their strengths should decay.


INTRODUCTION
Fairness in Recommendation Systems (RS) has garnered considerable attention.Various techniques have been employed, such as outcome re-ranking [9,19,29] and unbiased learning [2,9,27,38,40] (mitigating biases in the training process directly).However, the specific fairness requirements vary depending on the stakeholders and the specific needs of the application.The definition of fairness in user-centric or item-centric recommendations relies on the chosen sensitive features [34].Prior studies [9,19,21,41] only consider the chosen sensitive features either from users or items, which poses challenges in terms of fairness scalability when considering the other aspect.Furthermore, imposing constraints to achieve fairness often compromises overall recommendation accuracy, further constraining real-world applicability.
To enable flexible selection of sensitive features, we introduce generic feature fairness as our core guiding principle.It centers on features themselves, agnostic to whether features are from users or items.In this work, we examine two statistical biases for feature values (e.g., student or male): feature frequency, indicating the occurrence rate within its feature domain (e.g., user occupation, or user gender); and feature combination variety, representing the diversity of co-occurring samples with other features.
To investigate the biased outcomes resulting from skewed features, let us take a case of MovieLens.Our preliminary analysis focuses on two feature-defined user groups: male+students and male+homemakers.We use Factorization Machine [25] for modeling here.Figure 1 illustrates that the majority group of male students (representing 21.09% of users) consistently outperforms the average, while the minority group of male homemakers (representing 0.1%) exhibits below-average performance with significant fluctuations during training.This disparity in accuracy and stability results in unfairness.There are two causes: (1) Limited co-occurrence frequency of the male and homemaker features in training hinders the model's ability to capture their interactions.(2) Additionally, when there is a greater combination variety of gender for the student feature compared to the homemaker feature, the model struggles  to recognize interactions between the homemaker feature and different gender values, resulting in poorer generalization.
In this work, we aim to utilize the two forms of feature biases to automatically (1) incorporate fairness considerations across diverse feature domains; and (2) ensure similar generalizability for different combinations of feature values.
Adversarial training [11] is a technique for augmenting model generalization [15], where the generalization derives from its robustness to unseen inputs.We thus adopt adversarial training to accommodate a variety of feature combinations.By integrating adversarial training into our regular training iterations, we enhance feature representations by perturbing them.However, applying this approach directly still poses issues.
First, existing approaches assume consistent perturbation intensities [8,15] for all feature representations, but there are significant variations in sample outcomes associated with different features.Our method utilizes combination variety as the measure to determine the intensity of adversarial perturbation.We employ a formula that maps lower variety values to higher adversarial intensity, thereby enhancing the stability of targeted groups.To prevent excessive perturbation that overly distorts the original data representation, we map variety inversely proportional to a range of 0 ∼ 1.Second, conventional adversarial unbiased learning approaches often view accuracy and fairness as conflicting objectives [34].As features often follow a long-tailed distribution, low-frequency features make up the majority of features.Hence, low-frequency features are important, so we prioritize their appropriate representation during training by assigning higher adversarial training weights.This balancing results in enhanced performance.
We instantiate the above-mentioned Adaptive Adversaries with the FM model as our backbone, or AAFM for short.Extensive experiments show that our method improves results by 1.9% in accuracy against baselines, while balancing group standard deviation by 7   10   on fairness metrics.AAFM further demonstrates scalability, tackling fairness concerns for both users and items simultaneously.Additionally, as the number of feature domains in the data increases, our approach consistently tackles fairness at finer levels among diverse groups.This serves as a bridge between group and individual fairness, spanning datasets with one feature domain to those with a broader range of three feature domains.Our method's universal applicability to fairness issues offers a win-win outcome by promoting both fairness and accuracy.
In summary, our contributions are as follows: (i) Compared to user fairness and item fairness, we define our task as a more fundamental feature fairness objective.The feature fairness task aims to develop a parameter-efficient framework that flexibly provides feature-specific fairness for various combinations of user or item features.(ii) We introduce AAFM, an adversarial training method that leverages statistical feature bias for unbiased learning, combining the benefits of fairness and accuracy.(iii) Through experiment datasets with varying numbers of features, user-and item-centric settings, we validate the scalability and practicality of AAFM in real scenarios.The code is available at: https://github.com/HoldenHu/AdvFM

METHODOLOGY
In what follows, we first outline our task and delve into the issue of feature fairness, which arises due to two biases.We then provide our solution -Adversarial Factorization Machines which applies the fast gradient method to construct perturbations over feature representations.We further propose an adaptive perturbation based on feature biases, which re-scales adversarial perturbation strengths and adversarial training weights.

Preliminaries of Feature Fairness
Problem Formulation.The recommendation task aims to predict the probability of unobserved user-item interactions ŷ (x) given the user and item features x [16].We represent one sample, the input as the combination of these features, denoted as x = { 1 ,  2 , ...,   }.
Here,   represents the  ℎ feature domain, encompassing user features (e.g., user occupation) and item features (e.g., item color).Concerning the  ℎ sample x ( ) ∈ X,  ( )  indicates its specific feature value (e.g., student or red) in feature domain   .In our work, the feature domains include user/item ID, and the categorical attributes of user/item.Concerning specific feature value  in domain   , we denote its corresponding samples of subset data as X   : = {x ( ) | ( )  =  }.The overall prediction error of the subset data is denoted as E   : = x∈ X   : E ( ŷ (x), ).Here, E indicates the metric (e.g., Logloss) measuring errors between the prediction ŷ and ground-truth , where  ∈ {0, 1}.
To achieve feature fairness, we expect a smaller difference between errors E   : 1 and E   : 2 with respect to each feature domain   and each value pair ( 1 ,  2 ) within   .In neural models, the precise representation of each value is vital, as it directly affects errors in corresponding samples.The quality of feature value representation depends on the statistical bias (e.g., popularity bias [24]) of feature values in the data.
Two Forms of Feature Biases.Feature values in the data distribution have the following statistical properties.To aid understanding, we show an example of feature value  in the feature domain   .
• Frequency   indicates the occurrence rate of the value  concerning its feature domain.• Combination variety   indicates the number of diverse samples where value  co-occurs with other features in combination.
can be used to measure how many times this feature value has been seen by the model, while   better reflects the degree of isolation of this feature-based data group.The more isolated the groups are, the more likely they are sensitive to model perturbation.In normal distributions, combination variety and frequency can be viewed as equivalent, where the frequency increase, the combination variety increase as well.But in real-world cases, this may not hold true as feature values may not always follow a strict joint probability dependence.Take the feature domain gender as an example.Given a situation where female has fewer combinations with occupation than male, this does not mean that the feature value female's frequency is necessarily less than male.In the results depicted in Figure 2, data samples were grouped into 5 bins based on the multiplied value of frequency or combination variety across all feature domains (user features+item features).While both biases contribute significantly to performance imbalances, they are not aligned, highlighting the interdependence between features in real-world data.Therefore, we consider them as separate statistical biases for utilization.(b) Representation Learning.Our key insight is that the interdependencies among low-level feature groups play a critical role in robustness and fairness.For this reason, we use Factorization Machines (FM) [25] as the backbone for our methodology.FM takes a set of vector inputs, each consisting of  feature values and performs recommendations through their cross-product.An FM model of degree 2 estimates the rating behavior ŷ as:

Adversarial Factorization Machine (AdvFM)
where parameter   ∈ R 1× models the linear, first-order interactions, and   ∈ R 1× models second-order interactions for each low dimensional vector   .⟨•, •⟩ indicates the dot product operation and     indicates element-wise product between them.To be concise, we use the notation ŷ (x|Θ) =  (e) to represent the model's processing of input x with the embedding parameter Θ.
where L indicates the cross entropy loss [10], the difference between the predicted and true values.

Adversarial Perturbation.
Inspired by previous work [28] which observed that users with rare interactions would benefit more from robustness, we adopt gradient-based adversarial noise [11] as the perturbation mechanism to improve balanced robustness from the feature perspective.
As shown in Fig 3, the normal representation learning of FM module utilizes the original embedding e.The adversarial training adds noise to each feature's embedding by perturbing FM's parameters: where Δ    is the parameter noise providing the maximum perturbation on the embedding layer.Δ  = {Δ  1  , ...Δ    } denotes the overall perturbations on embedding layer.
To efficiently perturb normal training, we estimate the optimal adversarial perturbation by maximizing the loss incurred during training: where the hyper-parameter  controls the strength level of perturbations, and ∥ • ∥ denotes the 2 norm.Our adversarial noise uses the backward propagated fast gradient [11] of each feature's embedding parameters as their most effective perturbing direction.Specifically, to perturb the embedding   , we calculate the partial derivative of the normal training loss: where the right-hand side's normalized term is the sign of the fast gradient direction of the feature   's embedding parameters.
Training objective.In each epoch, we conduct training as normal first, then introduce the adversarial perturbations in another following training session, round by round.We define the final optimization objective for AdvFM as a min-max game: where Δ  provides the maximum perturbation and Θ is trained to provide a robust defense to minimize the overall loss.Here,  is a hyper-parameter to control the adversarial training weights.

Automatic Adaptation on AdvFM
The approach described so far has a key drawback: It introduces a single, uniform perturbation strength level  overall features, and uniform adversary weights  over all samples.This makes the method inflexible, and unable to model nuanced weighting.To further balance and improve the accuracy, we further propose an Adaptive version of AdvFM (AAFM).It auto-strengthens the adversarial perturbations on the feature embedding parameters, and re-weights the samples in adversarial training.Our adaptive version leverages the two forms of feature biases previously introduced (Fig 3 , right).
• Auto-Strengthening.Considering each feature domain   with the corresponding value   , a smaller combination variety    indicates a higher degree of sensitivity representation.Thus, it needs to be trained with stronger perturbations on its embedding parameters to improve its robustness.We estimate the featurespecific    based on an inversely proportional basis: where   is a learnable parameter with respect to the feature domain   .We adopt SoftPlus activation function for  , as it does not change the sign of the gradient, and the SoftPlus unit has a non-zero gradient over all real inputs.• Re-Weighting.Unlike previous work [15] conducting fixed adversarial training weight  for all samples, we conduct samplespecific ones.Specifically, given a sample x ( ) , the sample-specific adversary weight   is defined as: For the sample x ( ) with a low overall feature frequency  ∈x ( )   in training, we increase the weight of its adversarial loss by increasing its associated  value.The function Φ(•, ) is used to scale the values between 1 and .If we use the previous design of trainable parameter  to scale,  is easily eliminated by the overarching optimization goal (Equation 6); hence we apply manually-controlled scaling via .
Optimization of Decaying Adversarial Perturbation.When the model adaptively adjusts the adversarial perturbation (noise) level , we observe that optimization may simply set  to zero, which best meets the normal training objectives by achieving a local optimum.However, this thwarts the benefit of introducing adversarial perturbation; canceling it prematurely.
To mitigate this, we envision a slow decline in the effect of adversarial perturbation, proportional to the time already trained.
To this end, we design a regulation term for  by defining an additional loss L  =  ( • ∥ ∥) −1 , where  represents the trained epoch number, and  is an annealing hyper-parameter controlling regulation strength.As such, the change of  is more marked during early training, where a small  would make L  large.As the training proceeds and the model stabilizes, the sensitivity of  gradually decays, as  increases.

EXPERIMENTS
Datasets.We experiment on three public datasets to examine our model's debiasing effect on both user and item groups.User feature enriched recommendation datasets include movie dataset MovieLens-100K 1 (user gender, occupation, and zip code), and image dataset Pinterest2 (user preference categories).Item feature enriched recommendation datasets include movie dataset MovieLens-100K 1 (movie category, and release timestamp), and business dataset Yelp3 (business city, star).Following the previous work [13] to reduce the excessive cost, we filtered out the user with more than 20 interactions in Yelp, and randomly selected 6,000 users to construct our Pinterest dataset.We convert all continuous feature values into categorical values (e.g., by binning user age into appropriate brackets), and consider the user and item IDs as additional features.
Evaluation Protocols.For the train-test data split, we employ standard leave-one-out [15].To evaluate the accuracy, we adopt AUC (Area Under Curve) and Logloss (cross-entropy).To assess the fairness concerning imbalanced features, we split data into buckets for evaluation, following previous work [23,31].We first rank the data samples x ( ) by joint feature statistics  ∈x ( ) (  •   ), and divide the ranked samples into 6 buckets.We propose two quantitative metrics as follows.
• EFGD (extreme feature-based groups difference).Following the previous practice [23] that term the difference between the two extreme data groups as the indicator, we take EFGD as the AUC difference between the first 10% samples and the last 10%.• STD (overall groups' standard deviation).STD is used to measure more fine-grained fairness (as [31]).And STD stands for the AUC standard deviation of the buckets.1.The upper/lower two datasets correspond to item-centric/user-centric fairness.

Recommendation Accuracy Comparison
introduction of adaptive  significantly enhances the overall performance.This indicates that our proposed adversarial training reweighting is promising and can optimize well, instead of locking the fairness model within performance-compromising constraints.However, introducing only adaptive  worsens the overall performance on several datasets.By considering both aspects together, synthesizing them into AAFM, and adding decaying perturbation regularization loss, we get D-AAFM.Either of them performs best across all datasets.In most cases, D-AAFM performs better, demonstrating that persistent adversarial perturbations can severely impact model accuracy.

Superior Fairness Against Baselines.
Feature fairness is another aspect of concern in our study.As depicted in Table 2, all fairness baselines show improvement over FM in terms of metrics measuring the reduction in bias (EFGD and STD).We observe that the phenomenon of feature unfairness does exist, and that current fairness models do alleviate this issue.Among the baselines, MACR performs the best; it considers the popularity bias of both users and items, taking into account the impact of skewed occurrences of user or item IDs.Our AdvFM also provides more fair results, compared to FM.However, it is not as good as the aforementioned debiasing baselines.This corroborates that though adversarial training has shown promise in promoting fairness recently, it necessitates further detailed investigation.Through careful design of adversarial perturbations, our AAFM and D-AAFM achieve better fairness, concerning either user features or item features.

Ablation Study.
To figure out how the effects of adversaries improve fairness, we conduct an additional ablation study, shown in the right columns of Table 2. Compared to AdvFM, the inclusion of adaptive  and adaptive  both significantly contribute to improving fairness.When both are utilized (i.e., AAFM), the effect on feature fairness is further enhanced.This demonstrates that both proposed automatic adaptations are complementary and indispensable.Features with smaller combination variety require a larger  to improve generalization ability.Even though we encourage it by using the reciprocal of its bias, it is still very easy to reduce  during training (thereby reverting back to normal training).In order to forcefully encourage adversarial training, it is necessary for samples with less frequent features to have more adversarial training weight, thus enabling the adversaries to truly play their role.Similar to the finding from the accuracy comparison, D-AAFM and AAFM alternately become the best models, suggesting different dataset sensitivities to long-term perturbations.

Robustness of AdvFM
Driven by the premise that adversarial training enhances robustness for perturbed parameters, we delve into understanding this improvement.In order to probe the robustness of groups under feature representation perturbations, we adopt the methodology from [15], which infuses external noise into the model parameters at levels spanning 0.5 to 2.0.As shown in Table 3, we observe that AdvFM exhibits less sensitivity to adversarial perturbations compared to FM.For instance, on the Yelp dataset, a noise level of 0.5 results in a decrease of 6.14% for FM, whereas AdvFM only experiences a decrease of 3.38%.Moreover, AAFM demonstrates even greater stability with a decrease of only 1.41%.From the perspective of these improvements in robustness, we see the model's ability to generalize to unseen inputs, giving indicative evidence for why rare features are handled well by our proposed methods.
Case Study.The benefits of such robustness improvement are particularly pronounced for small groups characterized by less frequent features and unstable performance during training.To illustrate this, we select the male entertainment group, which accounts for only 0.2% of the total users, as a case study (Figure 4).The figure demonstrates that normal FM training exhibits significant fluctuations, indicating the sensitivity of the data group to model updates.In contrast by incorporating annealing adaptive noise in AAFM, performance gradually converges while improving overall AUC in the later stages of training.This notable improvement in stability further confirms the enhanced robustness in small groups.

Trade-off Between Fairness and Accuracy
Fairness and accuracy often involve a trade-off, and sometimes their objectives can even be contradictory [31].However, we argue that fairness and accuracy can find common ground with appropriate adaptive adversarial weights.We adjust the hyperparameter  to control the scale of   in Figure 4.As  increases, we observe that fairness achieves the best results when  takes on the values of 100, 200, and 100 for MovieLens, Yelp, and Pinterest, respectively.On the other hand, accuracy reaches its peak when  is set to 50, 200, and 100.Notably, these two objectives are mostly aligned, suggesting that the improvement in fairness mainly stems from the enhanced accuracy of small groups rather than compromising the performance of larger groups (which could significantly reduce overall accuracy).The exception occurs in the MovieLens dataset, where there is a trade-off between the best accuracy ( = 50) and the best fairness  = 100.MovieLens contains more feature domains compared to the other two datasets.This implies a finer feature granularity and more similar joint feature statistics for samples.Larger  will magnify the differences in adversarial weights of samples that were originally similar.This will lead to a rapidly increasing amount of samples with low training weights, resulting in a more prominent overall performance drop.

RELATED WORK
Fairness in recommendation is a nascent but growing topic of interest [4], but hardly has a single, unique definition.The concept has been extended to cover multiple stakeholders [1,29] and implies different trade-offs in utility.From a stakeholder perspective, fairness can be considered from both item and user aspects.User fairness [9,19] expects equal recommendation quality for individual users or demographic groups, and item fairness [21,41] indicates fair exposure among specific items or item groups.From an architectural perspective, there are mainly two approaches to address fairness: One method is to post-process model predictions (i.e., re-ranking) to alleviate unfairness [9,19,29].The other unbiased learning method is to directly debias in the training process.Such latter methods come from two origins.Causal Embedding [5] is one way to control the embedding learning from the bias-free uniform data (e.g., by re-sampling [9]).Re-weighting [6,38] is another method to balance the impact of unevenly distributed data during training, where the Inverse Propensity Scoring [27,40] is a common means to measure the difference between actual and expected distributions.In this work, we generalize the problem to solve both user and item groups' unfairness, proposing an unbiased learning technique at the feature-level.
Adversarial training in recommendation helps models pursue robustness by introducing adversarial samples.One of the most effective techniques is to perturb adversarial samples by gradientbased noise (e.g., FGSM [11], PGD [22], and C&W [7]).Previous work found such noise is effective in improving recommendation accuracy, such as applying fixed FGSM on matrix factorization [15] and multiple adversarial strengths [39].Current adversarial perturbation in recommendation systems mostly focuses on representing individual users [3,20,28] or items [3,18] properly.
Adversarial training is increasingly discussed in unbiased learning approaches [33].Recent work [28] also found adversarial perturbation could benefit under-served users.Yu et al. [37] found a positive correlation between the node representation uniformity and the debias ability, and added adversarial noise to each node in contrastive graph learning.However, they lack systematic comparison with fair recommendation baselines and overlook the flexibility of selected features.While there have been discussions in computer vision on connecting fairness and model robustness [32,36], there is a lack of studies addressing the bridging between model robustness and the co-improvement of accuracy and fairness in recommendation tasks.

CONCLUSION AND FUTURE WORK
In this work, we propose a feature-oriented fairness approach, employing feature-unbiased learning for simultaneous improvement of fairness and accuracy.We address imbalanced performance among feature-based groups by identifying its root causes in feature frequency and combination variety.Our proposed Adaptive Adversarial Factorization Machine (AAFM) uses adversarial perturbation to mitigate this imbalance during training, applying varied perturbation levels to different features and adversarial training weights to different samples.This adaptive approach effectively enhances the generalizability of feature representation.Our experimental results show that AAFM outperforms in fairness, accuracy, and robustness, highlighting its potential as an effective approach for further study in this field.
While AAFM introduces adversarial training to unbiased learning, there are still many possible refinements.For example, AAFM defaults to using random negative sampling, which biases toward the majority of users/items features.How to balance the impact of such biased negative sampling in different groups deserves future study.It will also be valuable to further investigate the effectiveness of different adversaries (e.g., PGD [22], or C&W [7]) on more complex neural recommendation backbones.

A DERIVATION OF ADVERSARIAL PERTURBATION
We present the mathematical derivation of the adversarial perturbation for feature embedding   , and explain the reasoning behind utilizing combination variety as the bias parameter to achieve balance.
By applying the Chain Rule, we express the adversarial feature perturbation Δ    in the following manner:  can take on values of either 0 or 1, hence we can simplify the above expression as: Given that we have chosen FM as our prediction model, we can calculate the partial derivative of ŷ with respect to the feature embedding   as follows: The addition of this adversarial perturbation to the original embedding   utilizes the interacted feature embeddings   weighted by the pair-wise interaction weights   ,   to enhance the representation of embedding   .
Hence, we can find the perturbation on   is controlled by the strength , and the perturbation direction is influenced by   and .There exists a direct relationship between   and the perturbation direction.As for   ,   , being the second-order interaction parameters, their pairwise combinations determine the impact of other   values on the perturbation direction.When   is held constant, a larger variety of feature combinations results in a more diverse range of perturbation directions.Consequently, in our work, we assign a smaller perturbation strength to balance between the influence of adversaries and their impact.

Figure 1 :
Figure 1: (left) Unfairness between two groups with sensitive features.(right) Biased validation accuracy between two data groups during training of Factorization Machine.

Figure 2 :
Figure 2: Unbalanced results regarding two forms of feature biases.x-axis indicates the indices of sample groups sorted by the overall feature frequency/combination variety.The results are from FM applied to the Yelp/Movielens dataset.

2. 2 . 1
Base Model.Our framework consists of three stages (Figure ), characterized by stages for Embedding.Representation learning and Prediction.(a) Embedding Initialization.To improve the representative ability of features, we first map each original discrete feature value of   into -dimensional continuous vectors   = M (  |Θ) through the embedding layer M. Here, the concatenated feature embeddings are denoted as e =  [ 1 ; ...;   ].

Figure 3 :
Figure 3: The training process of adversarial factorization machine on sample x ( ) .(c) Prediction & Model Training.The training objective function is defined as:

Figure 4 :
Figure 4: Trade-off between accuracy and user group fairness via control of the re-weighting parameter .Smaller STD (↓) indicates better fairness, and larger AUC (↑) indicates better accuracy.

Figure 5 :
Figure 5: Validation accuracy (AUC) on the small group (male entertainment) in Movie-Lens dataset.The case study.

Table 1 :
Overall accuracy performance comparison.Smaller LL (Logloss) or larger AUC indicates better accuracy.ML  or ML  indicate the partial MovieLens dataset with only item or item features.AAFM  and AAFM  only adaptively adjust  (with fixed  = 0.5) and  (with fixed  = 1) respectively.D-AAFM indicates AAFM incorporating decaying perturbation regularization.

Table 2 :
Feature fairness effect comparison.The smaller the STD or EFGD, the fairer the results.The abbreviations are the same as in Table

Table 3 :
Performance drop ratio (%) in AUC of models in the presence of external adversarial perturbation.