Fairly Adaptive Negative Sampling for Recommendations

Pairwise learning strategies are prevalent for optimizing recommendation models on implicit feedback data, which usually learns user preference by discriminating between positive (i.e., clicked by a user) and negative items (i.e., obtained by negative sampling). However, the size of different item groups (specified by item attribute) is usually unevenly distributed. We empirically find that the commonly used uniform negative sampling strategy for pairwise algorithms (e.g., BPR) can inherit such data bias and oversample the majority item group as negative instances, severely countering group fairness on the item side. In this paper, we propose a Fairly adaptive Negative sampling approach (FairNeg), which improves item group fairness via adaptively adjusting the group-level negative sampling distribution in the training process. In particular, it first perceives the model's unfairness status at each step and then adjusts the group-wise sampling distribution with an adaptive momentum update strategy for better facilitating fairness optimization. Moreover, a negative sampling distribution Mixup mechanism is proposed, which gracefully incorporates existing importance-aware sampling techniques intended for mining informative negative samples, thus allowing for achieving multiple optimization purposes. Extensive experiments on four public datasets show our proposed method's superiority in group fairness enhancement and fairness-utility tradeoff.

There are two groups of items (determined by sensitive attributes, such as job payment or movie genres).Unobserved items in the majority group (in green color) have a higher probability of being sampled as negative items than that of the minority item group (in red color).

INTRODUCTION
Personalized recommender systems have been widely deployed to alleviate information overload issues [14,15] by providing users with relevant items on various online services [8,13,15,50], such as online shopping, job matching, online education, etc.As the most representative techniques in recommender systems, Collaborative Filtering (CF) methods [3,10] aim to learn recommendation models by encoding user-item implicit feedback (e.g., user clicks and purchases) with a pairwise learning paradigm, e.g., Bayesian Personalized Ranking (BPR) [7,38].In general, such a mainstream learning paradigm first conducts randomly negative item sampling to construct training data [12,22], where it usually pairs a positive item (from observed user interactions) with a negative item (from unobserved interactions), and then encourages the predicted score of the positive item to be higher than that of the negative item.
However, such a mainstream learning paradigm is vulnerable to data bias [2,28] on the item side, considering that the size of different item groups (determined by a specific item attribute) is  highly imbalanced in the real world.For example, low-paying jobs (majority group) occupy a larger portion than high-paying jobs (minority group) in the job matching system, and the number of comedy movies overpasses the Horror movies for movie recommendations, both of which can be viewed as the inherent data bias on the item side [18].Moreover, to optimize recommendation models via the BPR loss, the prevalent negative sampling technique, i.e., Uniform Negative Sampling (UNS), is more likely to treat unobserved items in the majority group as negative.
For instance, as illustrated in Figure 1, items in the majority group (in green color) gain higher sampling probability as negative instances than that of the minority group (in red color) when learning for user  2 's preference.As a result, in the training process, the majority item group will be pushed to obtain low (biased) prediction scores via the BPR loss, resulting in disparate recommendation performance (e.g., Recall) over different item groups.In other words, such a uniform negative sampling strategy naturally inherits groupwise data bias on the item side, which misleads recommendation models into learning biased users' preferences and dampens the recommendation quality for items in the majority group.
To address such undesired bias and unfairness caused by the negative sampling procedure, a straightforward solution is to adjust the negative sampling distribution by equalizing each item group's chance of being sampled as negative and positive instances, such that items in the majority group will not be oversampled as negative instances for the BPR optimization.Here we call such a bias mitigation method as FairStatic.In order to verify the aforementioned unfair treatment over different item groups in UNS and its corresponding solution (i.e., FairStatic), we conduct preliminary studies to examine how these two sampling strategies (i.e., UNS v.s.FairStatic) perform on two recommendation methods: a classic Matrix Factorization model (MF) [38] and an advanced Graph Neural Network-based model (LightGCN) [11,22].For simplicity, we only consider the two-group scenario and work on the ML1M-2 dataset, whose movie items can be categorized into two genres (i.e., Sci-Fi and Horror).The detailed data description is shown in Experiments Section 4, and the experimental results are shown in Figure 2.
It can be observed that MF/LightGCN+UNS methods exhibit a significant disparity in recommendation performance between the two item groups under Recall@20 metric, while MF/Light-GCN+FairStatic could reduce the group-wise disparity at the cost of sacrificing overall recall performance.From these observations, we conclude that adjusting the negative item sampling distribution can alleviate group unfairness on the item side.However, such naive adjustment based on the static (i.e., fixed in the whole training process) distribution greedily pursues the fairness objective, easily leading to a sub-optimal recommendation performance.Therefore, it is desirable to design an adaptive negative sampling method for achieving a better tradeoff between item group fairness and overall recommendation accuracy (utility).
In this paper, we seek to address item-side group performance unfairness in recommendations from a novel perspective.Existing negative sampling strategies used in the pairwise learning paradigm usually neglect the item's group attribute [4,6], which makes them vulnerable to item-side data bias and exhibit disparate group performance.Correspondingly, we propose a novel fairly adaptive negative sampling method (FairNeg), which can dynamically adjust the group-level negative sampling distribution based on the perceived performance disparity during the model training process.Moreover, FairNeg can be gracefully merged with importance-aware sampling techniques and yield a better utility-fairness tradeoff.Our major contributions are summarized as follows: • To the best of our knowledge, we are the first to investigate the bias and unfairness issues in recommendations from the perspective of negative sampling.We empirically show a principled connection between item-oriented group performance fairness and the group-level negative sampling distribution;

PRELIMINARIES
In this section, we introduce some preliminary knowledge concerning the problem under study.We first present the notions related to the top-k item recommendation task [51] and the formulation of BPR loss.Then we introduce the definition and evaluation metric of item-oriented group performance fairness, which deserves our efforts in this paper.

Personalized Ranking on Implicit feedback
Let U = { 1 , . . .,   }, V = { 1 , . . .,   } be the sets of users and items respectively, and Y ∈ Y × to represent the user-item historical (implicit) interaction matrix, where   = 1 (in Y) represents an observed interaction between user  and item , and 0 otherwise.In addition, we use V +  to denote the interacted (positive) item set of user , and V −  = V − V +  to represent the un-interacted (negative) item set.In this paper, we focus on the top- item recommendation task based on the implicit interactions Y, which can be generally optimized by the pairwise BPR loss L BPR for learning model's parameters Θ.The main idea is to encourage the predicted  scores on observed (positive) interactions to be higher than those of unobserved (negative) items, which can be formalized as follows: where the negative item  is sampled from the unobserved item set V −  of user , ŷ, and ŷ, denote the predicted ranking scores generated by the recommendation model  (•) with parameters Θ, and  is the sigmoid function.Here we ignore the regularization term for simplicity.

Problem Definition
We assume a set of item groups with size A,  = {z 1 , z 2 , ..., z  }, specified by the item attribute (e.g., job payment, ethnicity, or movie genre).These groups are non-overlapping since each item belongs to one group.In this paper, we focus on enhancing Item-oriented Group Performance Fairness of recommendation models.
In particular, to evaluate whether recommendation models treat different item groups equally, in the test stage, we can measure each group's recommendation performance using metrics such as Recall@, which is an essential metric that reflects the coverage of truly relevant items in the top-k recommendation results for one item group.Then, the disparity of group-wise Recall@ performance (abbreviated as Recall-Disp@) can be calculated following the definition of previous work [52]: where std(•) is the standard deviation function and mean(•) calculates the mean average value.Such relative standard deviation can be computed to reflect the group-wise performance disparity.The item-oriented fairness (indicated by the lower value of Recall-Disp@) is supposed to be fulfilled in general item recommendation systems so that all items, regardless of their group attributes, have the same chance of being exposed to users who truly like them, and no user needs are ostensibly ignored.

THE PROPOSED METHOD
In this section, we will first give an overview of the proposed method FairNeg, then describe each of its components in detail and finally illustrate the designed optimization algorithm for item group fairness.

An Overview of the Proposed Framework
Since the empirical risk minimization (i.e., BPR loss) does not consider the fairness criterion, the recommendation model usually exhibits severe unfairness issues.To address such limitations without explicitly changing the model architecture or training data, we design a fairly adaptive negative sampling framework FairNeg, as shown in Figure 3, which adjusts each item group's negative sampling probability for fairness considerations and incorporates importance-aware technique for enhancing negative samples' informativeness.The proposed FairNeg method consists of three main components: (i) Item Group Fairness Perception, (ii) Adaptive Momentum Update of Group Sampling Distribution, and (iii) Negative Sampling Distribution Mixup.The first component, Item Group Fairness Perception, is designed to perceive unfairness (i.e., recall disparity) among item groups during the training process.
Next, with the perceived group performance disparity, the second component (i.e., Adaptive Momentum Update of Group Sampling Distribution) adjusts each group's negative sampling probability accordingly, intending to equalize all groups' recall performance.At last, the third component (i.e., Negative Sampling Distribution Mixup) further incorporates an importance-aware sampling probability with a mixup mechanism.In this way, the group fairness optimization and the informativeness of negative samples can be jointly considered.

Item Group Fairness Perception
To achieve fair training from the perspective of adaptive negative sampling, we first need to perceive the group-wise performance disparity [19,32,39], which provides guidance for adjusting the group-level negative sampling distribution in the next step.However, the fairness evaluation metric (i.e., Recall-Disp@) mentioned above cannot be used here directly in the training phase since it is non-differentiable.We hereby propose a Group-wise Binary Cross-Entropy (G-BCE) loss as a proxy to measure each item group's recall performance.It only considers observed interactions (viewed as positive) in the training set and calculates the group-wise average classification loss as an approximation of group recall [19].Next, we will introduce the details of G-BCE loss, followed by the analysis revealing that mitigating the disparity of multiple groups' G-BCE losses at the training stage actually optimizes item-oriented group fairness.
More specifically, G-BCE loss measures the average difference between the predicted preference scores ŷ, and the truly observations  , in a specific item group z  : where  + z  denotes the set of all true observations for item group z  , ŷ, denotes the predicted preference score of user  to item , and  is the sigmoid function that normalizes the preference score to (0, 1).Note that the negative part in Equation 3 can be eliminated, since we only consider observed entries (referring to  , equals to 1) and log  ŷ, is equivalent to log   ŷ, = 1 .As a result, L + Taking the two groups scenario as an example, if L + z 1 > L + z 2 in the training phase, it indicates that the item group z 1 (disadvantaged group) obtains lower recall performance.In other words, the recommendation model assigns a lower recommendation probability to items in group z 1 , given that users truly like these items.Next, we focus on minimizing the disparity of G-BCE loss (Group-wise Recall) during training, which is equivalent to optimizing toward item-oriented group fairness (as previously defined).

Adaptive Momentum Update of Group Sampling Distribution
Figure 4: The overall gradient analysis on positive and negative samples in two item groups on the ML1M-2 dataset (as described in Table 1) and the backbone is a Matrix Factorization model.We collect the averaged L2 norm of the gradient of weights [43] in the item embedding layer at each epoch until model convergence.
To verify this, we conduct an empirical analysis to show the overwhelming negative gradients on the majority item group during the optimization process, as illustrated in Figure 4.It is clear that the disadvantaged group (i.e., Horror group with worse recall performance) receives larger gradients from negative samples due to the high negative sampling probability of this group.
Based on the above analysis, we are inspired to design the adaptive negative sampling strategy in two consequent steps, where the (dis)advantaged group is first identified based on the perceived G-BCE losses (as illustrated in Section 3.2), and then the negative sampling probability of this group will be updated so that the disadvantaged group is less overwhelmed by negative gradients.
Specifically, we assign each item group a negative sampling probability and update it via (1) Group-wise Gradient Calculation, which calculates the current step's gradient of each group's sampling probability, and (2) Adaptive Momentum Update, which utilizes the auxiliary knowledge from historical gradients to alleviate optimization oscillations.Group-wise Gradient Calculation.Considering items with a multi-valued sensitive attribute (i.e.,  ≥ 2), we define the groupwise negative sampling probability distribution as: To narrow the performance gap among different groups, as illustrated in the previous analysis, we consistently adjust each group's sampling probability  according to its deviation from the averaged G-BCE loss at each step: where z denotes the gradient of  z  at step , L + ( ) z  denotes the G-BCE loss of group z  at step , and 1  | | z∈ L + ( ) z denotes the step 's averaged G-BCE loss.Intuitively, a disadvantaged item group with a relatively large G-BCE loss (low recall performance) will receive a negative-valued update vector, which down-weights its negative sampling probability during the next step's training.Such an adaptive down-weighting scheme prevents the disadvantaged group from being overwhelmed by negative gradients and is therefore expected to boost the group's recall performance.
Adaptive Momentum Update.Furthermore, to alleviate the adverse influence of gradient noise [41] and instability of gradient direction, we design an adaptive momentum update strategy to produce the sampling probability of each item group at step  + 1, which can be summarized into two steps: .Here we maintain the momentum bank [21] as a queue of momentum update vectors: all groups' corresponding update vectors at the current step are enqueued, and the previous ones are dequeued.Intuitively, with the adaptive momentum update strategy, if the current epoch's gradient is in the same direction as the previous epoch, the updated value will be larger and helps to achieve faster convergence.By contrast, if the gradient direction conflicts, the updated value will be reduced, which facilitates smoothing out the variations and stable learning [5].

Negative Sampling Distribution Mixup
Though updating group sampling distribution adaptively ensures that item group fairness can be jointly optimized during training, the informativeness difference of items within the same group is neglected due to the equal-share of sampling probability.To be more specific, when drawing a negative sample from the unobserved item set V −  for user  only following the updated group sampling distribution, for a certain candidate item  from the group z  , its fairness-aware sampling probability can be computed as: where  z  is the group-wise sampling probability, and  z  () is a indicator function to identify whether item  in the negative candidate item set (unobserved) belongs to group z  .The indicator function returns 1 when item 's group attribute is z  , otherwise 0. It can be seen that, in the candidate item set of user , items from the same sensitive group share equal sampling probability.Nonetheless, such a sampling strategy tends to sample negative items that are easier to distinguish from positive samples and inhibit improvement of user and item feature discriminativeness.
To address this limitation, we propose a Negative Sampling Distribution Mixup mechanism, which incorporate an importance-aware negative sampling strategy.Such a strategy is built upon the dynamic negative sampling method for BPR [49].The importance of a candidate item refers to its relevance to the user.Formally, the importance-aware negative sampling probability of candidate item  for the user  can be calculated as follows: where  , is the user-item relevance score calculated by the recommendation model, and  is a temperature parameter in the softmax function.The goal is to pick the high-scored item with a higher probability for user , which facilitates learning more discriminative feature representations.Now that the importance-aware negative sampling distribution ( imp ) implicitly improves recommendation accuracy by strengthening feature representations, and the fairness-aware negative sampling distribution ( fair ) aids item-oriented group fairness.To incorporate them simultaneously, we introduce a mixup hyper-parameter , which controls the strength of fairness optimization.The final sampling probability   of candidate item  for user  can be formulated as: With the new mixup negative sampling distribution computed for each user, the recommendation model training considers both item group fairness and user's preference learning.

Optimization for Item Group Fairness
We formulate the optimization of FairNeg as a bi-level problem, where the optimization of the group-wise negative sampling distribution is nested within the recommendation model parameters optimization.This can be solved by alternatively updating the fairness-aware negative group sampling distribution and recommendation model parameters [42].More specifically, the negative sampling distribution is learned with the outer optimizer, while a personalized ranking algorithm (i.e., BPR) is kept as the inner optimizer for updating user and item representations.
Mathematically, such bi-level optimization can be formulated as follows: where  is the recommendation model parameters, and  is the group-wise negative sampling distribution. Recall-Disp represents the disparity of G-BCE loss, which is the sum of each item group's deviation from the macro-average performance.
The optimization algorithm is presented in Algorithm 1. First, we initialize group-wise negative sampling distribution  as the way of FairStatic, where each group's sampling probability equals its proportion in the overall user-item positive interactions; recommendation model parameters are initialized with Xavier [17] (line 1).Next, with  fixed, we conduct pairwise learning and iteratively update the recommendation model parameters  using BPR loss for one epoch.The negative items in the training pairs are sampled based on a mixup distribution, considering both item importance and group fairness (lines 2-7).Then, we fix the model parameters  and adjust each group's sampling probability to reduce their recall disparity (lines 8-9).The iteration will stop early when the recommendation performance does not improve for consecutive  epochs (line 10).

EXPERIMENT
In this section, we conduct extensive experiments on four realworld datasets to demonstrate the effectiveness of our proposed FairNeg.[20] and Amazon product reviews [31].The ML1M dataset consists of movie ratings in different genres, and we treat all ratings as positive feedback.The Amazon dataset includes the historical product reviews collected from the Amazon e-commerce platform, and we view all user purchase behaviors as positive feedback.Considering that some comparison methods can only work for the dataset with binary item attributes, we extract two subsets from each dataset, containing either a binary-valued or a multi-valued item attribute.Specifically, we extract ratings with four genres ('Sci-Fi', 'Adventure', 'Children', and 'Horror') to constitute our ML1M-4 dataset, and keep the product reviews with four types ('Grocery', 'Office', 'Pet', and 'Tool') to constitute our Amazon-4 dataset.Moreover, we create ML1M-2 ('Sci-Fi' vs. 'Horror') and Amazon-2 ('Grocery' vs. 'Tool') datasets by keeping the most popular (denoted by the average feedback number of this group) and least popular two groups.The statistics of the four datasets are detailed in Table 1, and the detailed groupwise statistics (i.e., feedback and item number) are illustrated in the Appendix Section A.1.All datasets are randomly split into 60%, 20%, and 20% to constitute the training, validation, and test sets.

Evaluation Metrics.
We evaluate the recommendation utility and the item group fairness in the experiments.For group fairness evaluation, we stratify the items by groups based on the item attribute, then compute Recall@k for each group based on the Top- recommendation results.Afterward, we report three group fairness metrics as in the literature [24]: (i) Recall-Disp: the Recall disparity among all groups, and the formula is described in Equation 2. (ii) Recall-Min: the minimum Recall among all groups.(iii) Recall-Avg: the macro-average Recall over all groups.Here, the lower Recall-Disp value represents better group fairness, and the higher values for the latter two metrics are better.For the recommendation utility evaluation on the user side, we employ standard Top- ranking metrics, including Precision (P@k), Recall(R@k) score, and Normalized Discounted accumulated Gain (N@k), measuring on top- recommendation results [9,46]  4.1.4Baselines.We compare FairNeg with four representative negative sampling methods and three state-of-the-art item-side fairness-aware recommendation methods.The method descriptions are as follows: • UNS [38]: a widely used negative sampling approach using a static uniform distribution in the sampling process.
• DNS [49]: a dynamic negative sampling method that assigns higher probability to samples with higher ranking order.• FairStatic: a simplified variant of FairNeg based on a static sampling distribution, as described in the introduction.• Reg [23]: a typical debiasing method for datasets with a binaryvalued attribute, which penalizes the squared difference between the average scores of two groups for all positive user-item pairs.• DPR [52]: an advanced item group fairness method, which uses score-based adversarial-learning to decouple the predicted useritem preference score with the item's attribute.• FairGAN [26]: a GAN-based algorithm that fairly allocates exposure to each item with high user utility preservation.

Performance Comparison of Recommender Systems
The comparison experiments are conducted on two backbone models (MF and LightGCN) separately.Each experiment is conducted five times with different random seeds, and we report the average result.Based on top-20 recommendation results, the item group fairness and their recommendation performance on four datasets are respectively shown in Table 2, 3.The Reg method can only work for binary group cases, so we only report its performance on ML1M-2 and Amazon-2 datasets.FairGAN has a fairness strictness hyperparameter.Thus we set two different values on the two backbone models separately so that the utility is close to other methods, denoted as FairGAN-1 and FairGAN-2.We have the following observations from the results (the evaluation of top-30 recommendation results are presented in the Appendix Section A.3, which also aligns with the observations): First, compared with FairNeg, most methods provide biased recommendation results regarding the Recall-Disp, Recall-min, and Recall-Avg metrics.Second, compared with the methods that do not Table 2: Fairness and recommendation utility of different methods on two backbone models over datasets with a binary-valued item attribute (evaluated on top-20 recommendation results).RI denotes the relative improvement of FairNeg over UNS.We highlight the best results in bold and the second best results with underline.

Backbone Method
Recall-Disp.@20Recall-Min.@20Recall-Avg.@20N@20 P@20 R@20 Recall-Disp.@20Recall-Min.@20Recall-Avg.@20N@20 P@20 R@20  explicitly consider group fairness (e.g., NNCF and DNS), fairnessaware methods usually yield better group-wise recall parity.Among them, the adversarial learning-based method usually performs better than others.It is worth noting that FairStatic shows competitive fairness results, which indicates that optimizing negative sampling distribution with debias considerations also brings substantial fairness improvements.Third, FairNeg consistently outperforms other methods on different backbone models over four datasets regarding its effectiveness in mitigating group-wise recall performance unfairness.At last, FairNeg can achieve superior fairness while preserving good recommendation utility on the user side.Even though DNS provides the best recommendation utility, it ignores the huge performance disparity among item groups.To conclude, these results validate that our approach FairNeg can effectively improve fairness in groupwise recall performance with minor recommendation utility loss.

Ablation Study
Next, we conduct ablation studies to verify the effectiveness of each component in FairNeg.We compare FairNeg with its variants, where one component are removed (i.e., Dynamic fair negative sampling (FairNeg-Dynamic), Importance-aware sampling (FairNeg-Imp), and adaptive momentum update of group NS probability (FairNeg-Momentum)).The uniform negative sampling (UNS) method served as the baseline.All the methods are evaluated on both MF and Light-GCN backbone.
For brevity, we report each variant's fairness performance (Recall-Disp@20) and the recommendation performance (F1@20) over the ML1M-4 dataset, as presented in Figure 5 (results on Amazon-4 dataset are presented in the Appendix Section A.4).
We have the following findings from the ablation studies.First, the dynamic sampling mechanism (which can be viewed as FairStatic) is vital for fairness improvement and utility-preserving, indicating the importance of adjusting the sampling distribution adaptively.Second, incorporating the importance-aware negative sampling probability distribution is vital for utility preservation, since the informativeness of negative items can be enhanced with such a strategy, further contributing to feature representation learning.Third, removing the momentum mechanism when updating the group-level sampling probability dampens fairness performance, which indicates that the fairness optimization process is unstable and the gradient noise needs to be considered.Fourth, FairNeg achieves the best utility-fairness tradeoff on two backbones, proving that our proposed method's effectiveness is not limited to a specific model architecture.

RELATED WORK
In this section, we briefly review related work, including fair recommendations and negative sampling.Fair Recommendations.Fair recommendations have drawn tremendous attention in recent years.The fairness notions can be defined from the user side [15,25,27] or the item side [16], or both [1], which makes fair recommendation quite challenging.On the item side, many existing works focus on exposure fairness in the ranking results [34,36,40], for instance, in [34], a configurable allocation-ofexposure scheme is proposed to overcome richer-get-richer dynamics in the ranking system.However, these methods only study exposure opportunities allocation of items, and the true user preferences are not considered.When items in certain groups have lower recommendation probabilities, given that the items are liked by users, the recommendation model suffers from item under-recommendation bias.A series of works have been proposed to achieve performance (i.e., recall) parity among different item groups.These methods can be divided into two types: (i) fairness-constrained regularization method [23], where the average score difference of item groups is formed as a regularizer to penalize the group-wise performance disparity; (ii) adversary learning methods [30,48,52], where a discriminator classifies the group attribute based on the predicted useritem relevance scores and the recommendation model prevents the discriminator from correctly classifying the groups.Unlike these methods, our approach proposes a fairly adaptive negative sampling technique, which addresses group unfairness from negative sampling perspective.
Negative Sampling.Negative sampling techniques [33] are frequently used in pairwise algorithms [47], and we can group them into static negative sampling and hard negative sampling.The former (i.e., uniform sampling) utilizes a fixed sampling distribution and provides high efficiency.The latter [6,45,49] selects the negative instances that are similar to positive instances in the embedding space in a dynamic way.For example, in [35], they adopt adversarial learning to mine hard examples, which brings more information to model training.In [6], they leverage a high-variance based method to choose high-quality negative instances in a reliable way.However, these methods are mainly designed to maximize the overall utility [44], while neglecting the data bias (group-wise) on the item side.Our work makes the first attempt to build the connection between item group fairness optimization and negative sampling.

CONCLUSION
Item-oriented group performance fairness is an essential factor for building trustworthy recommender systems [15,29].In this work, we propose a novel Fairly adaptive Negative sampling framework (FairNeg) to alleviate the adverse impact of negative sampling on training recommendation models.Based on the pairwise training paradigm, we introduce the fairness perception module to measure the recall performance disparity and then adjust the groupwise sampling probability with an adaptive momentum mechanism.Furthermore, we introduce the mixup mechanism for combining fairness-aware and importance-related sampling distribution, which aims to jointly consider representation learning and group fairness optimization.Extensive experiments show that our method outperforms state-of-the-art debiasing approaches regarding fairness performance by significant margins and yields better fairness-performance tradeoffs.In the future, we plan to investigate the possibility of leveraging the framework to be compatible with more group fairness metrics in recommendations.

ACKNOWLEDGMENT
The research described in this paper has been partly supported by NSFC (Project No. 62102335), a General Research Fund from the Hong Kong Research Grants Council (Project No. PolyU 15200021 and PolyU 15207322), and internal research funds from The Hong Kong Polytechnic University (Project No. P0036200, P0042693, and P0043302).This research was also supported by the InnoHK project.Dr. Zitao Liu was partly supported by Key Laboratory of Smart Education of Guangdong Higher Education Institutes, Jinan University (2022LSYS003).
Table 5: Fairness and recommendation utility of different methods on two backbone models over datasets with a binary-valued item attribute (evaluated on top-30 recommendation results).RI denotes the relative improvement of FairNeg over UNS.We highlight the best results (in bold) and the second best results with underline.

ACMFigure 1 :
Figure 1: Illustration of bias in Uniform Negative Item Sampling.There are two groups of items (determined by sensitive attributes, such as job payment or movie genres).Unobserved items in the majority group (in green color) have a higher probability of being sampled as negative items than that of the minority item group (in red color).

Figure 2 :
Figure 2: (a) Distribution of two item groups on the ML1M-2 dataset.In (b) and (c), the Recall@20 performance of Sci-Fi and Horror movie groups are reported on two recommendation models with two different negative sampling strategies.

Figure 3 :
Figure 3: Illustration of the proposed FairNeg framework.There are three components in this Fairly adaptive Negative sampling (FairNeg) framework: (i) Item Group Fairness Perception, (ii) Adaptive Momentum Update of Group Sampling Distribution, and (iii) Negative Sampling (NS) Probability Distribution Mixup.

4. 1 . 3
Parameter Settings.Due to the limited space, more implementation details are presented in the Appendix Section A.2.
where current step's momentum update vector   z ,  ∈ [0, 1] is the momentum coefficient,  is the learning rate; group-wise sampling probability
8 Calculate G-BCE loss for each item group z  ; 9 Update each group's sampling probability  z  based on Equation 4, 5; 10 Early stop when the model performance on validation set doesn't improve for consecutive  epochs; 11 return;

Table 3 :
Fairness and recommendation utility of different methods on two backbone models over datasets with a multi-valued item attribute (evaluated on top-20 recommendation results).RI denotes the relative improvement of FairNeg over UNS.We highlight the best results in bold and the second best results with underline.