Rethinking Sequential Relationships: Improving Sequential Recommenders with Inter-Sequence Data Augmentation

Predicting customer preferences for each item is a prerequisite module for most recommender systems in e-commerce. However, the sparsity of behavioral data is often a challenge to learn accurate prediction models. Given millions of items, each customer may only be able to interact with a small subset of them over time. This sparse behavioral data is insufficient to represent item-customer and item-item relations for a machine learning model to digest, resulting in limited prediction accuracy that hinders recommendation performance. To mitigate this issue, this study introduces an inter-sequence data augmentation method, 𝑆𝐷𝐴 𝑖𝑛𝑡𝑒𝑟 , that enhances data density by leveraging cross-customer behavioral patterns to enrich item relations. Tested on three public and one proprietary e-commerce dataset, 𝑆𝐷𝐴 𝑖𝑛𝑡𝑒𝑟 significantly increases data density, leading to notable improvements in both evaluation and business metrics. Our findings demonstrate 𝑆𝐷𝐴 𝑖𝑛𝑡𝑒𝑟 ’s effectiveness and its potential to complement existing data augmentation strategies in recommendersystems. Seehttps://github.com/ML-apollo/SDA_inter.


INTRODUCTION
Recommender systems play a pivotal role in guiding users to their desired items across various platforms, including e-commerce, streaming services, and social networks.Among these, Sequential Recommenders (SR) stand out by adeptly predicting user preferences through analyzing patterns in item interactions over time.It has been proposed to mine frequent patterns to guide the recommendation [19].This approach, which takes a user's interaction history to forecast future engagements, has evolved significantly with the advent of deep learning.Notably, Transformer models [2,4,10,14,16,17] have enhanced SRs by adeptly handling complex, dynamic user-item relationships (e.g., higher-order and important) through mechanisms like self-attention [15], enabling a nuanced understanding of both sequential [4] and bidirectional [14] item interactions.Such technological advancements not only refine recommendation accuracy but also enrich the user experience, marking a significant leap in how digital platforms anticipate and meet user needs.Developing and refining models to accurately predict item-item interaction transitions of each customer poses a significant challenge in the realm of recommender systems, primarily due to data sparsity in item-customer interactions.With e-commerce platforms offering hundreds of millions of items, customers typically engage with only a fraction, leading to extremely low interaction densities.For instance, the interaction density that measures the itemcustomer interaction in the 'Online' e-commerce dataset is a mere 0.04%, as shown in Table 1.Such sparsity hampers the model's ability to learn from limited data, affecting the accuracy of recommendations and the system's capacity to understand nuanced user preferences.
There have been works dedicate to mitigating the data sparisity problems from different perspectives.For example, a dual contrastive network (DCN) [6] is proposed to integrate auxiliary user sequence and capture the preference of different user types.MoHR and MT4SR [2,5,7] utilize heterogeneous item behavioral relationships.The second approach is to augment sequential data by introducing artificially generated interaction sequences that contain important but less frequent/unseen item-item transitions.Among the second approach, CL4SRec [18] proposed three random data augmentation methods ( cropping, masking and reordering), CoSeRec [8] introduced two informative augmentation operators (substitute and insert) leveraging item correlation, and CCL [1] developed a modelbased data generator with users' attribute information.ASRep [9] predicts the prior items of sequences to extend short sequences.DuoRec [13] proposed a model-level augmentation based on dropout to enable better semantic representations.However, most of these existing works focused on intra-sequence augmentations, such as masking [1,8], reordering [8], cropping [8], and reversing [3,9].However, intra-sequence data augmentation only leverages item relationships of the same customer, while the relationships across customers are not utilized.It can only increases item-item transition diversity but cannot increase data density.
In this work, we explore a novel interchangeability rule among sequences from different customers and introduce an inter-sequence data augmentation method,   .  searches for subsequence pairs that meet the interchangeability rule in a set of customers' historical interactions.By exchanging the subsequences within a pair,   builds pseudo item-customer interaction sequences from original sequences.This approach significantly enhances the density of item-customer interaction, facilitating the learning process for sequential transformers, as evidenced by our experimental results shown in Table 1.
In summary, our major contributions are as follows: • We propose a innovative data augmentation method called   , which leverages cross-user sequential relations to effectively improve data density in sequential recommendation systems • Our experiments on four datasets show that   markedly improves key performance indicators, including NDCG, Recall, Price-weighted purchases, and the view-to-purchase conversion rate.Our findings highlight   's effectiveness in refining the sequential recommendations through enriched data.
• Our ablation studies show that   has ability to complement existing strategies for mitigating data sparsity.

METHOD 2.1 Problem Formulation
Consider a set of user sequences {  }  =1 , where  is the total number of users and   = [ 0  ,  1  , ...,     ] is a sequence of items of length   .Let I be the set of all items.Our goal is to predict the distribution of next interacted item  (   +1 =  |   ), where item  ∈ I.The idea of sequential recommendation is to model the next interaction distribution: where  is a vector-valued function that is implemented by different machine learning methods with parameters  .
In next section, we present an inter-sequence augmentation method to enrich    to improve the data sparsity problem and produce more accurate next-interaction predictions.

Inter-sequence Data Augmentation
In this subsection, we introduce the inter-sequence data augmentation operation that can discover more item interactions and relations via the interchangeability between sequences.For two user sequences Extract the interchangeable subsequences: ] The interchangibity confidence: Construct pseudo user sequences   :  , we say that there exists interchangeability between    and    .Here, we set an intersectionover-union (IoU) threshold   to ensure the interchangeability confidence.The confidence  = (   ∩   )/(   ∪   ).When  >=   , the interchangeability is established.  is set to 0.2 in our experiments.By exchanging the two subsequences with interchangeability in   and   , we generate two new pseudo user sequences as follows: ].Interchangeability could also exist in multiple subsequences in a user sequence.A user sequence   could have   anchor item pairs in total that is denoted as a set Ω  , where each pair is in the form of (  1  ,   2  ), 0 <  1 <  2 <   .Each item pair in Ω  is associated with   candidate sequences denoted as a set Υ  , where each candidate sequence shares an anchor item pair with   and has the interchangeability with   .Hence, the total interchangeable candidates for user sequence   is   ×  .Algorithm 1 summarizes the inter-sequence data augmentation operation.

Flip-flop Training of Sequential Transformer
Given a user sequence   = [ 0  ,  ] could have difficulty predicting items arriving in future.While some works [14] propose bi-directional sequential  modeling, it is infeasible to use bi-directional information to predict the next item.To address the limitations of one-directional learning, in this section, we propose to train sequential transformers in a flip-flop fashion so that sequential transformers are able to learn various types of data distribution trends.Figure 1

EXPERIMENTS
In this section, we conduct extensive experiments and ablations studies to answer the following questions: • Q1: What is the performance of sequential transformers with   compared to the state-of-the-art baselines in sequential recommendation task?• Q2: How does inter-sequence data augmentation complement with intra-sequence augmentation?• Q3: what is the contribution of each component?• Q4: What is the impact of different item popularity on target?

Experiment Settings
1) Datasets: To demonstrate the efficacy of our method, we use three popular public datasets [11,12], including Amazon Beauty, Amazon Sports and Outdoors, and Amazon Tools and Home Improvement.The dataset details are shown in Table 1.We follow the data preprocessing approach in [2], and use the five-core settings by filtering out users with less than five interactions.Additionally, We construct an online e-commerce dataset to compare the performance of all models on real industrial application.The large-scale dataset "Online" (in Table 1) collects a year of customer purchase records from an online shopping platform.The target is to predict the latest purchase of each customer based on historical purchases.
2) Baselines: We compare our work with the following baselines on the sequential recommendation task.SASRec [4] and BERT4Rec [14] are classic sequential transformers for sequential recommendation tasks.CL4REC [18] and DuoRec [13] are the state-of-the-art works using data augmentation to boost the sequential recommendation.MoHR [5] and MT4SR [2] are the state-of-the-art works that model the auxiliary item relationship with sequential transitions.
3) Evaluation: We follow the evaluation task in [2] and rank all items for model performance comparisons.We use popular evaluation metrics NDCG@N and Recall@N.Recall@N evaluates the existence of ground truth positive in top- ranked items, and NDCG@N measures the ranking position of the positive in top- ranked items.Higher metric values mean better model performances.We report  = 5, 10, which is the same as in [2,4,14].For fair comparisons, we only use the augmented pseudo user sequences   in training, but keep the test samples the same as in previous works [2,13,18].4) Implementation: To demonstrate the benefits of   , we implement   with the best-performing baseline on each dataset.To show the complementarity of the inter-and intra-sequence data augmentation, we implement the intra-sequence (masking, cropping and reordering), denoted as   , and apply two augmentations simultaneously, denoted as hybrid sequential data augmentation  ℎ .

Overall performance comparison (Q1, Q2
).In Table 2, we find the best-performing baseline model ( 4) and apply the proposed data augmentations.  boosts performance on all three public datasets when applying   on  4.These results show that the pseudo user sequences constructed by   improve the sequential modeling with transformers.To confirm the data distribution change, we analyze the existence of test 2-grams in training sequences.To answer Q2 and demonstrate the complementarity between inter-sequence data augmentation and other sparsity solutions, we compare using   only and jointly with intra-sequence data augmentation. ℎ consistently achieves the best performance.Take the Beauty dataset as an example, compared to   , on NDCG@5,  ℎ achieves additional 4.89% improvement that shows the complementarity of intra-and inter-sequence augmentations.

Key component ablation study (Q3).
In Table 3, we reveal the contribution from each component of the propose method.Specifically, we experiment the efficacy of each component by adding it to the baseline model.Taking Beauty dataset as an example, each component contributes independently.The best performer  ℎ composes all three components.

Improvement on cold-start items (Q4).
To study the benefit of augmentation on cold-start items, we group items into five groups based on popularity.In Figure 2, we observe   outperforms the best baseline in all popularity groups.For low-popularity groups,   demonstrates large improvements.This result shows that   can address the problem of cold-start items and improve the recommendation for cold-start items.

Performance on online platform.
To study the method performance on real industrial system, we build new ranking features between customer and product using the proposed method.We report product ranking NDCG by measuring the rank of true purchases in a list products.The rank cutoffs is at 10.We also examine the business impact of the new feature with Price Weighted Purchases (PWP) and view-to-purchase conversion rate (CR) with the

CONCLUSION
In this paper, we introduced an inter-sequence data augmentation method called   to address the data sparsity problem for sequential recommendations.  leverages the interchangeability between sequences and improves the item transitions density.Our experimental results on three public datasets and one online e-commerce dataset showed that   achieved the state-ofthe-art performance on sequential recommendation with a significant improvement.Our ablation study of intra-and inter-sequence data augmentations showed that inter-sequence data augmentation complements existing data sparsity solutions.An analysis further demonstrated that the sequential recommendation of cold-start items is largely improved by  .

Table 1 :
Datasets Density after Augmentation.Beauty, Tool and Sports are Amazon review datasets, Online is real data from online e-commerce system.

Table 2 :
Overall performance comparison.The best results are bold, and the best baseline is underlined.The improvement is against the best baseline.

Table 3 :
Ablation study of inter-sequence data augmentation and flip-flop training.Dataset: Amazon Beauty

Table 4 :
Performance with feature built using SASRec with different augmentations.Metric: relative feature contribution.Dataset: Online.