Dual-interest Factorization-heads Attention for Sequential Recommendation

Accurate user interest modeling is vital for recommendation scenarios. One of the effective solutions is the sequential recommendation that relies on click behaviors, but this is not elegant in the video feed recommendation where users are passive in receiving the streaming contents and return skip or no-skip behaviors. Here skip and no-skip behaviors can be treated as negative and positive feedback, respectively. With the mixture of positive and negative feedback, it is challenging to capture the transition pattern of behavioral sequence. To do so, FeedRec has exploited a shared vanilla Transformer, which may be inelegant because head interaction of multi-heads attention does not consider different types of feedback. In this paper, we propose Dual-interest Factorization-heads Attention for Sequential Recommendation (short for DFAR) consisting of feedback-aware encoding layer, dual-interest disentangling layer and prediction layer. In the feedback-aware encoding layer, we first suppose each head of multi-heads attention can capture specific feedback relations. Then we further propose factorization-heads attention which can mask specific head interaction and inject feedback information so as to factorize the relation between different types of feedback. Additionally, we propose a dual-interest disentangling layer to decouple positive and negative interests before performing disentanglement on their representations. Finally, we evolve the positive and negative interests by corresponding towers whose outputs are contrastive by BPR loss. Experiments on two real-world datasets show the superiority of our proposed method against state-of-the-art baselines. Further ablation study and visualization also sustain its effectiveness. We release the source code here: https://github.com/tsinghua-fib-lab/WWW2023-DFAR.


INTRODUCTION
Online sequential recommendation [32] has achieved great success for its time-aware personalized modeling and has been widely applied in Web platforms, including micro-video, news, e-commerce, etc. Especially in today's video feed recommendation, users are attracted immensely by video streaming which can be treated as a sequence of items.Formally speaking, the sequential recommendation is defined as predicting the next interacted item by calculating the matching probability between historical items and the target item.As shown in Figure 1 (a), existing sequential recommendation models often exploit click behaviors of users to infer their dynamic interests [11,14,31,41,42], the optimization of which samples un-clicked items as negative feedback.However, such an approach only inputs positive items into the sequential model, and negative items are sampled as target items, ignoring the transition pattern between historical positive and negative items.
In the video feed recommendation where a single item is exposed each time, users either skip or do not skip the recommended items, as illustrated in Figure 1 (b).Skip can be treated as a kind of negative feedback which means users don't want to receive certain items, while no-skip can be treated as a kind of positive feedback.That is to say, users are passive in receiving the recommended items without providing active click behaviors [10,18,22] in such video feed recommendations.However, the existing click-based sequential recommendation does not consider the transition pattern between positive and negative items.Indeed, there are two key challenges when modeling positive and negative feedback in one sequence.
• Complex transition between positive and negative feedback.The transition pattern among interacted items has become arXiv:2302.03965v3[cs.IR] 10 Feb 2023 far more complex due to negative feedback.A user may provide negative feedback only because she has consumed a very similar item before, which makes accurate modeling of transition essential and challenging.• Mixed interest in one behavioral sequence.The negative feedback in the behavioral sequence brings significant challenges to interest learning.The traditional methods of sequential recommendation always conduct a pooling operation on user sequence to obtain the users' current interest, which will fail when the sequence is hybrid with positive and negative signals.
To address the above challenges, in this work, we propose a model named Dual-interest Factorization-heads Attention for Sequential Recommendation (short for DFAR), further extracting the transition pattern and pair-wise relation between positive and negative interests.To address the first challenge, in the feedback-aware encoding layer, we assume each head of multi-head attention [28] tends to capture specific relations of certain feedback [30].As different heads of multi-head attention [28] are independent, it may fail to capture the transition pattern between different feedback when positive feedback and negative feedback are indeed not independent of each other.Thus we exploit talking-heads attention [25] to implicitly extract the transition pattern between positive and negative historical items.However, talking-heads attention may mix different heads too much without sufficient prior knowledge.To explicitly extract the transition pattern between positive and negative historical items, we further propose feedback-aware factorization-heads attention which can even incorporate the feedback information into the head interaction.To address the second challenge, we propose a dual-interest disentangling layer and prediction layer, respectively, to disentangle and extract the pair-wise relation between positive and negative interests.Specifically, we first mask and encode the sequence hybrid with positive feedback and negative feedback into two single interest representations before performing disentanglement on them to repel the dissimilar interests.Then we perform a prediction of each interest with the corresponding positive or negative tower and apply contrastive loss on them to extract their pair-wise relation.
In general, we make the following contributions in this work.• We have taken the pioneering step of fully considering the modeling of negative feedback, along with its impact on transition patterns, to enhance sequential recommendation.• We propose a feedback-aware encoding layer to capture the transition pattern, dual-interest disentangling layer and prediction layer to perform disentanglement and capture the pair-wise relation between positive and negative historical items.• We conduct experiments on one benchmark dataset and one collected industrial dataset, where the results show the superiority of our proposed method.A further ablation study also sustains the effectiveness of our three components.

PROBLEM FORMULATION
Click-based Sequential Recommendation.Given item sequence We mask the sequence hybrid with both positive and negative feedback into two sequences with solely positive or negative feedback.After encoding two split sequences with independent factorization-heads attention to extract the positive and negative interests, we then disentangle them to repel the dissimilar interests.• Dual-interest Prediction Layer.We further extract the positive and negative interests with independent towers and then perform contrastive loss on them to extract the pair-wise relation.

Feedback-aware Encoding Layer
In the feedback-aware encoding layer, we first inject each historical item embedding with corresponding feedback embeddings to incorporate the feedback information into each historical item embedding.Then we further propose talking-heads attention and feedback-aware factorization-heads attention to capture the transition pattern between positive and negative historical items.
3.1.1Feedback-aware Embedding Layer.To fully distinguish positive and negative feedback, we build a label embedding matrix L ∈ R 2× , besides the item embedding matrix E ∈ R × .Here  denotes the number of items, and  is the dimensionality for the hidden state.Then we inject the feedback information into the item embedding and obtain the feedback-aware input embeddings as the model input.Therefore, given item sequence I  = ( 1 ,  2 , . . .,   ), we can obtain the feedback-aware item embeddings E  ∈ R  × as: where { ,

3.1.2
Talking-Heads Attention.After obtaining the input embeddings for positive and negative historical items, we then capture the transition pattern between them.The existing work, Fee-dRec [37], exploits vanilla Transformer to roughly capture this transition pattern, of which multi-head attention [14] is the essential part, having the following equation: where are parameters to be learned.MHA means multi-heads attention [28].However, different heads of multi-head attention are independent of each other, sharing no information across heads.If assuming different heads capture specific relations between different feedback, then this means there is no information sharing across different feedback.Thus we first propose talkingheads attention [25] to address this issue as below. where are parameters to be learned.Here THA refers to talking-heads attention.However, the interaction between different heads in talking-heads attention is implicit, which may confuse the task for each head and result in overfitting.Not to mention, the two additional linear transformations (i.e.Eq.( 6) and Eq.( 7)) of talking-heads attention will increase the computation cost.

3.1.3
Feedback-aware Factorization-heads Attention.In this part, we factorize the interaction between positive and negative feedback.Traditional multi-heads attention assigns similar items with higher attention weights.However, in our problem with both positive and negative feedback, two similar items may have different attention weights due to the feedback they have.For example, an NBA fan skips the recommended video about basketball when he/she has watched a lot of basketball videos.But he/she engages in the video about basketball when he/she only has watched a few videos about basketball.In the first case we should repel the representations between historical basketball videos and target basketball videos, while in the second case we should attract them.That is to say, it is necessary to inject the user's feedback into the transition pattern between different feedback.Here we suppose different heads can represent different transition patterns for different feedback [30].To explicitly factorize interaction across different heads, we further propose factorization-heads attention as: where where  3, if item  is positive and item  is negative (i.e.,  , = 1 and  , = 0), ℎ 1 in positive half and ℎ 2 in negative half will be preserved, i.e., M 2,1,, = 1, and M 1,1,, , M 1,2,, , M 2,2,, = 0.
Besides, FFHA is our proposed feedback-aware factorizationheads attention.Apart from the advantage of explicit interaction between different heads, unlike talking-heads attention, our factorizationheads attention also improves the multi-heads attention without high computation cost.We feed the input embedding into the feedback-aware factorization attention module as: where S are the obtained feedback-aware sequential representations.We put the pseudocode of FHA at Appendix A.1 and compare its complexity with MHA and THA at Appendix A.1.5.

Dual-interest Disentangling Layer
Though feedback-aware factorization-heads attention has factorized the transition relation between positive feedback and negative feedback, their interest-level relations require further extracting.In this part, we decouple the positive and negative interests and then perform disentanglement on them to repel the dissimilar interests.
which are then fed into the corresponding factorization-heads attention modules to enhance the transition pattern learning for each feedback as: where S  (or S  ) are the single-feedback sequential representations for positive feedback (or negative feedback).In the subsequent section, we will exploit these filtered representations to further extract the interest-level relations.

3.2.2
Dual-interest Aggregation and Disentanglement.The positive and negative interests of a given user should be distinguished from each other.Hence we aim to repel the positive and negative representations of corresponding interests.Specifically, we assume the target item is possibly either positive or negative.
Then we assign the target item with positive and negative label embeddings, respectively, in positive and negative assumed cases.
To calculate the attention scores of positive and negative historical items, we fuse them with the target item in assumed positive and negative cases as below.
where A  and A  ∈ R  × are the positive and negative attention scores.MLP is the multi-layer perceptron.Here L 1 and L 0 are the label embeddings for positive and negative feedback, respectively.With the calculated attention scores by (15), we can then obtain the single-feedback aggregated representations for positive and negative items, respectively, as, which are then further disentangled with cosine distance as: where ∥ • ∥ is the L2-norm.By this disentangling loss, we can repel the aggregated positive and negative representations so as to capture the dissimilar characteristics between them.

Dual-interest Prediction Layer
In this section, we predict the next item of different interests by positive and negative towers.Finally, we further perform contrastive loss on the outputs of positive and negative towers so as to extract the pair-wise relation between them.

Dual-interest Prediction Towers.
To extract the positive and negative interests, we fuse the feedback-aware sequential representations, single-feedback sequential representations, and singlefeedback aggregated representations into the corresponding positive or negative prediction tower.Before feeding different representations into the final prediction towers, we first aggregate part of them by the sum pooling as: which are then finally fed into the positive and negative prediction towers as: where logit  , and logit  , are positive and negative predicted logits for user  on time step , aiming to capture the positive and negative interests, respectively.Here f  and f  are pooled at Eq.( 17).

Pair-wise Contrastive Loss.
When the target item is positive, the prediction logit of the positive tower will be greater than that of the negative tower, and vice versa.After obtaining the positive and negative prediction logits, we then perform BPR loss [23] on them as: where  denotes the sigmoid function.With this BPR loss, we can extract the pair-wise relations between positive and negative logits.

Joint Optimization
Though we have positive and negative towers, in the optimization step, we only need to optimize the next item prediction loss with the positive tower as: where ŷ , =  (logit  , ) and R is the training set.The negative prediction tower ŷ , indeed will be self-supervised and optimized by the contrastive loss of Eq.( 21).After obtaining the main loss for the next item prediction, disentangling loss for repelling representations and BPR loss for pair-wise learning, we can then jointly optimize them as: where   and   are hyper-parameters for weighting each loss.
Here  is the regularization parameter, and Θ denotes the model parameters to be learned.

EXPERIMENTS
In this section, we experiment on a public dataset and an industrial dataset, aiming to answer the following research questions (RQ): • RQ1: Is the proposed DFAR effective when compared with the state-of-the-art sequential recommenders?• RQ2 : What is the effect of our proposed feedback-aware encoding layer, dual-interest disentangling layer and prediction layer?• RQ3 : How do the heads of proposed feedback-aware factorizationheads attention capture the transition pattern between different feedback?• RQ4: How does the proposed method perform compared with the sequential recommenders under different sequence lengths?We also look into the question: "how do the auxiliary loss for disentanglement and pair-wise contrastive learning perform under different weights?" in Appendix A.4.  1 where Micro-video is a collected industrial dataset and Amazon is the public benchmark dataset which is widely used in existing work for sequential recommendation [19].The detailed descriptions of them are as below.

Experimental Settings
Micro-video This is a popular micro-video application dataset, which is recorded from September 11 to September 22, 2021.In this platform, users passively receive the recommended videos, and their feedbacks are mostly either skip or no-skip.Skip can be treated as a form of negative feedback, and no-skip can be treated as a form of positive feedback.That is to say, we have hybrid positive and negative feedback in this sequential data which is very rare in modern applications.Amazon1 This is Toys domain from a widely used public e-commerce dataset in recommendation.The rating score in Amazon ranges from 1 to 5, and we treat the rating score over three and under two as positive and negative feedback, respectively, following existing work DenoisingRec [33] which is not for the sequential recommendation.
For the Micro-video dataset, interactions before and after 12 pm of the last day are split as the validation and test sets, respectively, while interactions before the last day are used as the training set.For the Amazon dataset, we split the last day as the test set and the second last day as the validation set, while other days are split as the training set.

Baselines and Evaluation
Metrics.We compare our DFAR with the following state-of-the-art methods for sequential recommender systems.
• DIN [42]: It aggregates the historical items via attention score with the target item.• Caser [27]: It captures the transition between historical items via convolution.• GRU4REC [11]: It captures the transition between historical items via GRU [5].
• DIEN [41]: It captures the transition between historical items via interest extraction and evolution GRUs [5].
• THA4Rec: It means talking-heads attention [25] for the sequential recommendation, which is firstly applied in the recommendation by us.
Besides, Widely-used AUC and GAUC [9] are adopted as accuracy metrics here while MRR@10 and NDCG@10 [19] are used as ranking metrics for performance evaluation.The detailed illustration of them is in Appendix A.2.

4.1.3
Hyper-parameter Settings.Hyper-parameters are generally set following the default settings of baselines.We strictly follow existing work for sequential recommendation [19] and leverage Adam [15] with the learning rate of 0.0001 to weigh the gradients.The embedding sizes of all models are set as 32.We use batch sizes 20 and 200, respectively, on the Micro-video and Amazon datasets.We search the loss weights for pair-wise contrastive loss in [10 −4 , 10 −3 , 10 −2 , 10 −1 ].

Overall Performance Comparison(RQ1)
We compare our proposed method with eight competitive baselines, and the results are shown as Table 2, where we can observe that: • Our method achieves the best performance.The results on two datasets show that our DFAR model achieves the best performance compared with these seven baselines on all metrics.Specifically, GAUC is improved by about 2.0% on the Micro-video dataset and 0.5% on the Amazon dataset and when comparing DFAR with other baselines.Please note that 0.5% improvement on GAUC could be claimed as significant, widely acknowledged by existing works [42].Besides, the improvement is more significant in the Micro-video with more negative feedback, which means incorporating the negative feedback into the historical item sequence can boost the recommendation performance.• Existing work roughly captures the relation between positive feedback and negative feedback.FeedRec and DFN even underperform some traditional sequential recommendation models like GRU4REC and Caser in Amazon dataset.Besides, though they outperform other baselines in Micro-video dataset, the improvement is still slight.In other words, their designs fail to capture the relation between positive feedback and negative feedback, which motivates us to further improve them from transition and interest perspectives.

Ablation Study (RQ2)
We further study the impact of four proposed components as Table 3, where FHA represents the factorization-heads attention, the MO represents the mask operation on factorized heads for factorizationheads attention, IDL means the interest disentanglement loss on the positive and negative interest representations, and IBL means the interest BPR loss on the positive and negative prediction logits.
From this table, we can have the following observations.
• Factorization of heads for transition attention weights is important.Removing FHA and MO both show significant performance drops, which means these two components are both  necessary to each other.Specifically, removing FHA means it is impossible to apply the mask on the implicit head interaction of either multi-heads attention or talking-heads attention.At the same time, removing MO on FHA will cause it to fail to exploit the prior knowledge of labels for historical items and degenerate to even as poor as multi-heads attention or talking-heads attention in the Amazon dataset.• Pair-wise interest is more important than disentangling interest.Removing IDL and IBL will both drop the performance, while removing IBL is more significant.This is because contrastive learning by BPR loss can indeed inject more self-supervised signals, while disentanglement solely tends to repel the dissimilar representations of positive feedback and negative feedback.

Visualization for Attention Weights of Heads (RQ3)
As illustrated in Eq.( 8), our proposed factorization-heads attention can factorize the relation between different feedback, which makes it possible for us to study the attention weights between them.Therefore, we perform visualization on the attention weights between positive and negative heads in Figure 4, where ℎ 1 and ℎ 2 (defined at (11)) represent heads for source and target behaviors, respectively, with corresponding feedback.From this figure, we can observe that: (1) For the collected Micro-video dataset, users are still willing to watch videos even after they receive the disliked videos.This may be because the negative recommended videos are of low cost for users as they can easily skip the disliked videos, making no significant impact on their later preferred videos; (2) For the e-commerce dataset about Amazon, we can discover that when the source feedback is negative, the probability of target feedback being negative will increase sharply.This may be because the negative purchased items are of high cost in e-commerce for users as it will waste their money, increasing their unsatisfied emotion sharply.

The Impact of Sequence Length (RQ4)
On large-scale online platforms, active users often observe a lot of items and generate very long historical item sequences, while cold-start users are recorded with very short sequences.Long historical item sequences can bring them more information but the problem of gradient vanishing will increase, while short historical item sequence brings limited information and tends to overfit the model.Thus, we divide historical item sequences into five groups based on their lengths and further study how DFAR outperforms the attention-based models under different lengths, under Microvideo and Amazon datasets, as illustrated in Figure 5. From the visualization, we can observe that: • DFAR is superior under different sequence lengths.It is obvious that there is always a significant performance gap between DFAR and other methods.In the Amazon dataset, where the sequence length is relatively short, the AUC performances increase with the growth of sequence length for all methods.This means a longer sequence can bring more information.However, in the Micro-video dataset where the sequence length is relatively long, the performances of all methods improve with the increase of sequence length and reach their peak at around 50-100.But then they all decline with the further increase in length.Most importantly, our DFAR outperforms other methods significantly throughout various sequence lengths.• DFAR is stable under different sequence lengths.DFAR is more stable with the sequence length increasing or decreasing, even into very long or short.In the Amazon dataset, other methods first increase with the sequence length but fluctuate at 15-20 while DFAR increases steadily with the sequence length.In the Micro-video dataset, All methods drop sharply when the sequence length is too short or long, but our DFAR is more stable and still keeps a decent AUC performance at 0.8382.In summary, our DFAR is superior and robust under both long and short historical item sequences.

RELATED WORK
Sequential Recommendation Sequential Recommendation [32] predicts the next interacted item of the given user based on his/her historical items.As the early work, FPMC [24] exploits the Markov chain to capture the transition pattern of historical item sequence in the recommendation.Then some advanced deep learning methods such as RNN [5,12] and attentive network [28] are applied in recommendation [11,14,41,42] to capture the chronological transition patterns between historical items.While the evolution of RNN-based methods should forward each hidden state one by one and are difficult to parallel, attention-based methods can directly capture the transition patterns among all historical items at any time step.Furthermore, researchers also attempt to leverage convolution neural network [16] to capture the union and point levels sequential pattern in recommendation [27].Compared with CNNbased methods, attention-based methods are more effective for their non-local view of self-attention [34].However, the most existing sequential recommendation is based on click behavior.Recently, there have been some methods of achieving sequential recommendations beyond click behaviors [20].For example, DFN [38] captures the sequential patterns among click, unclick and dislike behaviors by an internal module for each behavior and an external module to purify noisy feedback under the guidance of precise but sparse feedback.CPRS [36] derives reading satisfaction from the completion of users on certain news to facilitate click-based modeling.Based on them, FeedRec [37] further enhances sequential modeling by a heterogeneous transformer framework to capture the transition patterns between user feedback such as click, dislike, follow, etc.However, these works mainly focus on exploiting the auxiliary feedback to enhance the modeling in the sequential recommendation, which does not consider the most important characteristic -the transition patterns between historical positive and negative feedback.Differently from them, our approach can factorize the transition patterns between different feedback, achieving more accurate modeling for sequential recommendation with both positive and negative feedback.Additionally, our approach extracts the relation between positive and negative feedback at interest level.Explainable Attention Attention methods are popular in many machine learning fields such as recommender systems [14,26,40], computer vision [7,8,17,34] and natural language processing [1,29], etc. Attention mechanisms are often explainable and have been widely used in deep models to illustrate the learned representation by visualizing the distribution of attention scores or weights under specific inputs [4,21,35].Some explainable attention methods are also generalizable and can be equipped with many backbones.For example, L2X [3] exploits Gumbel-softmax [13] for feature selection by instance, with its hard attention design [39].Moreover, VIBI [2] further propose a feature score constraint in a global prior so as to simplify and purify the explainable representation learning.As self-attention is popular [6,28], there is also a work that explains what heads learn and concludes that some redundant heads can be pruned [30].In this work, we propose feedback-aware factorizationheads attention to explicitly capture the transition pattern between positive and negative feedback.The feedback mask matrix in our attention module can be treated as hard attention based on feedback.because the rating for Amazon is a discrete value, but the playing time for Micro-video is a continuous value.The partition of positive and negative feedback based on continuous value is unclear and thus requires more contrastive learning.Based on the above observation, we finally choose 10 −3 and 10 −2 as the best values for the loss weights under Amazon and Micro-video datasets, respectively.
Listing 3: Pseudocode for Factorization-heads Attention Click sequence (b) Skip and no skip sequence

Figure 1 :
Figure 1: Illustration of click-based sequential recommendation and our dual-interest sequential recommendation which is hybrid with positive and negative feedback.

Figure 2 :
Figure 2: Illustration of DFAR.() Feedback-aware Encoding Layer is linked after the Feedback-aware Embedding Layer where each historical item is injected with a label embedding according to the corresponding feedback; It consists of linear transformation and feedback-aware factorization-heads attention.In the linear transformation, input embeddings are transformed into query, key and value matrices.In feedback-aware factorization-heads attention, the transition relation between different items is factorized into different heads which are masked according to the positive or negative feedback.() Dual-interest Disentangling Layer decouples positive and negative interests and performs disentanglement to repel the dissimilar representations of different feedback; () Dual-interest Prediction Layer evolves positive and negative interests with corresponding towers and perform BPR loss to capture the pair-wise relation.

Figure 4 :Figure 5 :
Visualization of accumulated attention weights between different heads.Here ℎ 1 and ℎ 2 represent the heads for the source and target behaviors, respectively (i.e., if the source behavior is negative and target behavior is positive, we have ℎ 1 = 0 and ℎ 2 = 1).This illustrates our method can factorize and extract the relation between different feedback based on the proposed factorization-heads attention.AUC performance comparisons under different sequence lengths on the Micro-video and Amazon datasets.
Item sequence I  = ( 1 ,  2 , . . .,   ) with only positive feedback for a given user .Output: The predicted score that the given user  will click the target item   +1 .Dual-interest Sequential Recommendation.Given item sequence I  = ( 1 ,  2 , . . .,   ) with both positive and negative feedback, the dual-interest sequential recommendation aims to better predict the probability that given user  will skip or not skip the target item i.e.,   +1 .The dual-interest sequential recommendation with both positive and negative feedback can be formulated as follows.Input: Item sequence I  = ( 1 ,  2 , . . .,   ) with positive and negative feedbacks for a given user .Output: The predicted score that the given user  will skip or do not skip the target item   +1 .
1 ,  2 , . ..,   ) with only positive feedback, the goal of traditional click-based sequential recommendation is accurately predicting the probability that given user  will click the target item i.e.,   +1 .The traditional click-based sequential recommendation can be formulated as follows.Input:

Product Add No skip Skip b. Feedback-aware Factorization-Heads Attention
3) Concatenate the results and multiply with weight matrix  # to produce the output Concat Positive mask Negative mask times parameters if we want to represent  head interaction relations like talking-heads attention or multi-heads attention.Besides, to further inject the prior feedback knowledge into the factorization-heads attention, we propose feedback-aware factorization-heads attention with a label mask M ℎ 1 ,ℎ 2 ∈ {0, 1}  × as: are parameters to be learned.Here FHA is our proposed factorization-heads attention.The factorization-heads attention can represent  ×  relations by  heads.That is to say, our factorization-heads attention can 2 ,, = 0, otherwise.Here the first half of heads w.r.t.

Table 1 :
Micro-video and Amazon data statistics.
Datasets.The data statistics of our experiments are illustrated in Table 4.1.1

Table 2 :
Overall evaluations for DFAR against baselines under Micro-video and Amazon datasets on four metrics.Here Improv. is the improvement.Bold is the highest result and underline is the second highest result.

Table 3 :
Effectiveness study of our proposed components.FHA means factorization-heads attention; MO means label mask operation on heads; IDL means interest disentangling loss on positive and negative representations; IBL means interest BPR loss on positive and negative logits.