IncMSR: An Incremental Learning Approach for Multi-Scenario Recommendation

For better performance and less resource consumption, multi-scenario recommendation (MSR) is proposed to train a unified model to serve all scenarios by leveraging data from multiple scenarios. Current works in MSR focus on designing effective networks for better information transfer among different scenarios. However, they omit two important issues when applying MSR models in industrial situations. The first is the efficiency problem brought by mixed data, which delays the update of models and further leads to performance degradation. The second is that MSR models are insensitive to the changes of distribution over time, resulting in suboptimal effectiveness in the incoming data. In this paper, we propose an incremental learning approach for MSR (IncMSR), which can not only improve the training efficiency but also perceive changes in distribution over time. Specifically, we first quantify the pair-wise distance between representations from scenario, time and time-scenario dimensions respectively. Then, we decompose the MSR model into scenario-shared and scenario-specific parts and apply fine-grained constraints on the distances quantified with respect to the two different parts. Finally, all constraints are fused in an elegant way using a metric learning framework as a supplementary penalty term to the original MSR loss function. Offline experiments on two real-world datasets are conducted to demonstrate the superiority and compatibility of our proposed approach.


INTRODUCTION
Large-scale commercial platforms typically contain multiple scenarios to meet the diverse user groups.For example, a video-sharing platform may have dozens of scenarios including homepage, search discovery, guess what you like and etc.Within each scenario, users explore content driven by their individual motivations, exhibiting diverse user behaviors that reflect their interests.Conventional recommendation systems (RS) collect these behaviors from different scenarios and apply Single-Scenario Recommendation (SSR) which builds individual models for each scenario with data only from the corresponding scenario.This approach is inefficient due to the large number of models to maintain and is prone to sparsity issues when data is scarce in certain scenarios.To address these issues, Multi-Scenario Recommendation (MSR) [40] has been proposed.MSR builds a unified model that leverages data from multiple scenarios, improving efficiency, mitigating sparsity issues, and utilizing transferable information from various scenarios, thereby enhancing the effectiveness of multiple scenarios simultaneously.Consequently, MSR has gained wide adoption in real-world commercial RS.
Existing works in MSR [3,25,28,33,37] focus on better information transfer among different scenarios in the Batch mode, where the model is trained on a fixed-size window of training data.However, this approach faces two problems.Firstly, utilizing all data from multiple scenarios can construct comprehensive user profiles but degrades efficiency, hindering capturing the latest user preferences [34].Secondly, these MSR models can not generalize well on the incoming data, because the data volumes and distributions of different scenarios change over time [18].
Incremental Learning [8,20,21,36] has been proposed as a potential solution to the aforementioned issues encountered in SSR, updating models using only incoming data and based on the previous model.However, existing incremental learning methods in SSR can hardly be applied to MSR due to their inability to capture distribution changes across different scenarios.Figure 1 illustrates the information transfer of MSR models in the Incremental mode.Specifically, when the MSR models update from step  − 1 to step , there are three dimensions of information transfer.1)Transfer between different scenarios at the same time step (scenario dimension).Different from batch mode, the impact of data imbalance between different scenarios is more serious when updating models with only incoming data.2)Transfer between different time steps of the same scenario (time dimension).As the data comes incrementally, the distribution of the same scenario may change drastically.3)Transfer between different scenarios at different time steps (time-scenario dimension).The incoming data of one scenario can degrade the existing shared parameters, which had been updated by the historical data from other scenarios.The coexistence of these three dimensions of information transfer makes it challenging to apply existing incremental algorithms to MSR models.Although some work addresses class-incremental transfer issues in computer vision [6,13,35], the research on achieving effective MSR in incremental mode remains limited.
In this paper, we propose IncMSR, a practical incremental learning approach for multi-scenario recommendation, which resolves the transferability issues over three dimensions in a unified way.We first quantify the pair-wise distance between representations in the scenario, time and time-scenario dimensions.Then, we decompose the MSR model into scenario-shared and scenario-specific parts, considering that the scenario-shared representations should contain relatively general and stable information for all scenarios while the scenario-specific representations should be more sensitive to the changes of scenario and time.Then we apply fine-grained constraints on the distances derived from the representations with respect to these two parts.Finally, all constraints are elegantly fused based on metric learning framework and as a supplementary penalty term to the original MSR loss function.The overall operations are conducted in an end-to-end manner.
The main contributions of this work are summarized as follows: • To the best of our knowledge, it is the first work to achieve the multi-scenario recommendation in incremental mode.
• We analyze the specific issues and challenges in the incremental multi-scenario recommendation and propose IncMSR, a practical approach that quantifies distances between representations from three dimensions and applies fine-grained constraints on the distances quantified over two decomposed parts.Finally, we optimize the incremental MSR model in an end-to-end manner.
• The proposed IncMSR is model-agnostic and lightweight, which provides a supplementary penalty term to the original MSR loss function without increasing the number of parameters or model complexity.• Experiments over two public datasets demonstrate the effectiveness and efficiency of our proposed approach.

RELATED WORK 2.1 Incremental Learning
Incremental learning (IL) has been proposed in SSR to improve the efficiency of model updates and to make the model adapt more quickly to the latest distribution.Existing IL methods can be broadly classified into three categories.Replay-based methods [9,22] store historical data to maintain the memory.Regularization-based methods [8] introduce regularization terms to preserve prior knowledge.
Model-based methods [20,21] extract knowledge from historical models.SPMF [32] maintains historical data and combines it with new observations to update the model and make recommendations.IncCTR [34] consists of three decoupled modules to construct training data, handle features and fine-tune the model parameters.SML [36] transforms the old model to a new model via a neural networkbased transfer learning component during training.However, the aforementioned works do not take into account the problem of multi-scenario recommendation, while our proposed IncMSR incorporates designs for transferring in multiple dimensions.

Multi-Scenario Recommendation
Multi-scenario recommendation optimizes all scenarios jointly by transferring information among them.It can be regarded as a special case of multi-task learning, where each scenario is viewed as a task.HMoE [15] utilizes Multi-gate Mixture-of-Experts [19] to model distinctions and commonalities among multiple scenarios in the latent space implicitly.SAR-Net [24] proposes a unified multiscenario architecture that facilitates transfer across the scenario dimension by incorporating scenario-specific user behaviors with attention modules.STAR [25] adopts a star topology framework consisting of one central network that maintains scenario-shared commonalities and a set of scenario-specific networks to distinguish scenario distinctions.However, existing models suffer from the efficiency problem caused by the mixed data, resulting in delayed model updates and performance degradation.What's more, MSR models are insensitive to distribution changes over time.

Metric Learning
Metric learning aims to learn a suitable metric function to measure and compare the similarity and dissimilarity of feature distributions [26,38].CML [10] minimizes the Euclidean distance between users and items for fine-grained user preference.LRML [29] utilizes a memory-based attentive network explicitly to induce latent relations.DML [16] exploits bidirectional latent relations between users and items to transfer information across scenarios.Due to the capacity to capture important relationships among data, metric learning can be applied to quantify the distance between different distributions.IncMSR designs constraints from multiple dimensions for both scenario-shared and scenario-specific representations and leverages metric learning to enhance information transfer.
where    ,    ,  , ,   , denote learnable weights and bias terms for scenario-shared and scenario-specific layers, respectively.

METHODOLOGIES 4.1 Overall Framework
IncMSR is first initialized with batch model  0 , which is regarded as the warm-start model, and then updates subsequent models using only the incoming data and based on the previous model.An overview of IncMSR is shown in Figure 3(a).During model updates using data from day , we first obtain    −1, and    −1, from previous model   −1 , and then extract   , and   , from current training incremental model   .We apply different constraints on the scenario-shared and scenario-specific layers inspired by the metric learning objective to resolve transferability issues over three dimensions in a unified way.The idea behind this is that the scenario-shared representations should contain relatively general and stable information for all scenarios while the scenario-specific representations should be more sensitive to the changes of scenario and time.All constraints are fused into a unified formulation and act as a supplementary penalty term to the existing MSR loss function, without increasing the number of parameters or model complexity.Since calculating distributions directly with respect to scenarios and time is challenging, we quantify distances at the representation level.Eq. 2 gives the general definition of the distance function, where  ,  is the representation of a specific scenario  at incremental step .The two different representations are obtained from either scenario-shared layers or scenario-specific layers, depending on the imposed constraints in each dimension.

D (𝑧 𝑡,𝑑 𝑖 , 𝑧 𝑡 ,𝑑 𝑗 ) 𝑑 (𝑧 𝑡,𝑑 𝑖 , 𝑓 (𝑧 𝑡 ,𝑑 𝑗 ) )
( Here,  represents a similarity distance function, and for convenience, we employ the Euclidean distance.The summary function  is applied to   ,  to compute summary statistics of the representation of scenario  at incremental step  , e.g.compressing the representation into a real number with the commonly used squeeze functions such as max or mean methods [11,12].Specifically, we apply the mean pooling as the summary function:

Transfer over Scenario dimension
It is common to observe imbalanced label distributions, resulting in significant differences in average CTR values between different scenarios.To enhance information transfer across multiple scenarios in incremental mode, we design constraints for the scenario-shared and scenario-specific layers.Scenario-shared layers are responsible for capturing common information across scenarios, and scenario-specific layers are dedicated to capturing scenario-specific characteristics.For the shared layers, we try to decrease the distance between scenario-shared representations, which aims at improving transferability and generalization across scenarios.On the other hand, for the scenariospecific layers, we increase the distance between scenario-specific representations to make them more discriminative for distinct scenarios.The distance between scenario-shared representations is computed as follows: Similarly, the distance between scenario-specific representations is calculated as:

Transfer over Time dimension
In incremental learning, we ensure effective incorporation of new user behaviors, items and other changes that occur over time while maintaining prior knowledge from previous incremental step.To address catastrophic forgetting [17], we minimize the distance between step  − 1 and  in scenario-shared representations for the same scenario  (Eq.6).This preserves the stability of shared representations over time and ensures transferability.
Scenario-specific layers extract the scenario-specific representations, which can change differently over time according to different scenarios.As a result, we do not apply constraints over the scenariospecific layers.

Transfer over Time-Scenario dimension
When information is migrated from one scenario to another in incremental mode, we cannot only consider the migration of information from the scenario or time dimension independently but also the impact of the coexistence of two-dimensional migration.To address this, we propose a constraint for cross-dimensional migration.Specifically, we quantify the distance between the scenario-specific representation of scenario  at step  − 1 and scenario  at step : The distance is designed to be maximized so that IncMSR can capture the distinctions between scenarios at different time steps and thus is more sensitive to changes in the distribution along with time and scenarios.The scenario-shared representations are aligned in the aforementioned dimensions and thus are not taken into account in the time-scenario dimension.

Metric Learning Based Optimization
Metric learning-based optimization aims to improve the model's discriminative power by constructing a suitable metric function, such as MMD [2], to effectively capture the similarity and dissimilarity between different distributions.In our case, we utilize metric learning to facilitate the information transfer over scenario, time and time-scenario dimensions.By employing metric learning, we minimize positive distances (scenario-shared representations) and maximize negative distances (scenario-specific representations).
Our goal is to minimize the distance between   , which contains relatively general and stable information for all scenarios, while maximizing the distance between   , which should be more sensitive to the changes of time and scenario.We fuse these constraints in a unified formulation and denote the metric-based loss   as: Besides, due to the different data volumes in different scenarios, scenarios with larger volumes of data may dominate the entire training process.We rescale the original D in Eq. 2 by a factor of 1/ ,  , where  ,  represents the data volume of scenario  at incremental step , and the   will be updated accordingly.
Finally, we incorporate the metric learning-based loss   into the cross entropy loss   to enhance information transfer from multiple dimensions for both scenario-shared and scenario-specific representations.The hyper-parameter  balances these different losses.The overall objective function is formulated as: ep += 1 where  and Ŷ denote the ground-truth and outputs of the model, respectively, and R denotes the regularization term.The overall algorithm is summarized in Algorithm 1.

EXPERIMENTS
In this section, we present experiments conducted on two datasets to answer the following questions: • RQ1: What is the performance of IncMSR compared with baseline models?• RQ2: How compatible is IncMSR with existing MSR models?• RQ3: How does the information transfer over three dimensions affect the performance?

Dataset and Evaluation Protocols.
We conduct extensive experiments on industrial and public datasets respectively, and both are collected from real-world commercial platforms.
• KuaiRand [5].This dataset is collected from the recommendation logs of a video-sharing mobile app, Kuaishou.Each log contains comprehensive side information including explicit user IDs, interaction timestamps, and rich features for users and items.The dataset consists of 13 days of interactions logs from 2022-04-09 to 2022-04-21.We take attribute tab as the indicator of scenarios.Since the top four scenarios account for more than 96% of the total data volume, we select the top four scenarios in the dataset.• Taobao. 1 This dataset is released by Alimama, an online advertising platform in China.Ad display/click logs from 2017-05-06 to 2017-05-13 are used.Following [4], we filter the samples of which user profile is missing, and then divide the dataset into 4 scenarios according to _.Both datasets contain multiple scenarios and samples for several consecutive days, allowing us to perform daily incremental updates 1 https://tianchi.aliyun.com/dataset/dataDetail?dataId=56 for multi-scenario recommendation.We use the  as the target label.The statistics of the two datasets are shown in Table 1.
We apply the most commonly-used AUC (Area Under Curve) and logloss (cross-entropy) as our evaluation metrics for measurement of model performance.For industrial applications, even 0.1% improvement on AUC is significant [7,39].• Batch-multi.Batch-multi models are multi-task or multiscenario models trained on data from all scenarios using batch mode.These methods jointly utilize samples from all scenarios for training.SharedBottom [23] shares parameters at bottom layers, and a specific three-layer fully-connected network is built for each scenario based on the shared parameters.PLE [27] is a state-of-the-art multi-task model that explicitly separates task-shared experts and task-specific experts and adopts a progressive routing mechanism to extract and separate deeper semantic knowledge gradually.STAR [25] is a state-of-the-art multi-scenario recommendation model that proposes a star topology structure consisting of shared and scenario-specific networks.• Incremental-single.Incremental-single models are trained on data from a specific scenario in incremental mode.Without loss of generality, we select IncCTR [34] as the baseline for comparison.Because it is a classic incremental CTR method that consists of three decoupled modules to construct data, handle features, and fine-tune model parameters.All incremental competitors adapt warm-start, which first train a model with batch mode on data in [0, ), then update the subsequent models with only the data from the incoming day.We use adam [14] as the optimizer for all models and set the embedding size as 10.For each dataset, their settings are as follows.
• KuaiRand settings. is set to 7,  is set to 13.For batch mode and increment mode, learning rate is tuned from [1e-

Overall Performance(RQ1)
The overall comparisons on KuaiRand and Taobao datasets are shown in Table 2 and Table 3, respectively.All experiments were repeated 5 times and the averaged results are reported.We summarize the observations as follows: • Firstly, the results over both datasets show that IncMSR outperforms the baseline models.Compared with Incrementalsingle, IncMSR performs better, which proves that introducing data from other scenarios for joint modeling can bring significant improvement.According to the results in Table 2 and Table 4, we compare IncCTR(STAR) in Table 4 with the Batch-multi(STAR) model in Table 2.We observe that Inc-CTR(STAR) outperforms Batch-multi(STAR), which proves that incremental learning can update the model with users' latest preference and is more sensitive to the changes of the distribution.Compared with IncCTR, IncMSR obtains further improvement, which is due to the fine-grained design for transferring in scenario-shared and scenario-specific layers from different dimensions, specially, the time dimension and scenario-time dimension.• Secondly, IncMSR exhibits more notable improvements in small scenarios.Specifically, on the KuaiRand dataset, In-cMSR(STAR) shows absolute AUC gains of 4.38%, 3.42%, 1.83% over corresponding Batch-multi(STAR) in small scenarios #0&#2&#4.Similarly, the gains for IncMSR compared to Incremental-single are 0.81%, 1.55%, 2.12% in the same small scenarios.Small scenarios are particularly sensitive to changes like emerging interests, resulting in significant distribution shifts over time [30].IncMSR improves the performance in small scenarios because it can provide more stability and adaptability in changing and sparse data environments, resulting in enhanced effectiveness.• What's more, the results demonstrate IncMSR's ability to adapt to changing distribution over time as it receives new data.On the KuaiRand dataset, in the initial incremental step (Incre-1), the model is adjusting to the new data.IncMSR may experience a slight performance decline compared to Batch-1 for SharedBottom and PLE, while achieves comparable performance to Batch-1 for STAR.However, as the model continues to receive new data, IncMSR gradually adapts to the changing distribution, leading to improved performance compared to the corresponding Batch-Multi models from Incre-2 to Incre-6, compared to Batch-2 to Batch-6.These findings highlight IncMSR's capability to handle changing distribution and exploit the benefits of incremental learning, leading to enhanced performance in MSR models.

Compatibility(RQ2)
To evaluate the compatibility of IncMSR, we apply IncMSR to both multi-scenario model STAR and multi-task model PLE.The results in Table 2 and Table 3 show that significant improvements can be achieved when these models are applied with IncMSR.The compatibility of IncMSR with existing MSR models arises from its ability to leverage incremental learning and capture transferable information.By seamlessly integrating with existing MSR models, IncMSR offers a practical framework that can leverage the strengths of multi-scenario recommendation models while incorporating incremental learning and information transfer.This compatibility ensures that IncMSR remains flexible and adaptable.

Ablation Study(RQ3)
To investigate the respective contributions of the three dimensions of information transfer in IncMSR, we conduct a series of ablation studies on KuaiRand dataset and take STAR as the backbone model.Models are trained with or without each of the three different dimensions of transfer and the results are compared in Table 4.
• IncCTR [34] denotes a commonly used incremental learning framework in SSR.In this case, we only minimize the cross-entropy loss to train the model in incremental mode.Overall, the findings of the ablation studies underscore the importance of all three dimensions of information transfer in IncMSR.

Efficiency Analysis
To evaluate the efficiency of IncMSR, we conduct efficiency analysis by comparing the time consumption between IncMSR and the corresponding Batch-multi models on the Taobao dataset.All experiments are performed on an NVIDIA Tesla V100 GPU with 16G memory.The results are reported in Figure 4, which demonstrate that IncMSR achieves shorter training time compared to the Batch-multi models.
The efficiency of IncMSR can be attributed to its design and the use of metric learning to introduce a supplementary penalty term to the original MSR loss function.Importantly, this modification does not change the structure of the MSR model, nor does it increase its number of parameters or complexity.As a result, IncMSR retains its efficiency while ensuring that the training process remains efficient without compromising performance.

CONCLUSION
In this paper, we address the challenges of multi-scenario recommendation and propose an incremental learning approach to tackle them.Our proposed IncMSR approach involves three dimensions of information transfer: time, scenario, and time-scenario dimensions respectively.By quantifying the distances between representations in these dimensions and applying fine-grained constraints on scenario-shared and scenario-specific representations, IncMSR achieves better improvement in handling changing distribution and effectively leveraging information transfer.All constraints are elegantly fused based on metric learning framework.Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of the proposed approach.Besides, IncMSR is lightweight, model-agnostic, and can be easily extended to other MSR models, without increasing the number of parameters or complexity.

ACKNOWLEDGMENT
This work was partly supported by the Joint Funds of Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110261) and Key Technology Projects in Shenzhen (Grant No. JSGG20220831110203007).And we thank MindSpore [1] for the partial support of this work, which is a new deep learning computing framework.

ETHICAL CONSIDERATION
In conducting our research and proposing the IncMSR approach, we have adhered to ethical considerations to ensure the integrity and social responsibility of our work.Our research primarily focuses on advancing the technical aspects of recommendation systems and does not involve direct interactions with human subjects or the collection of personal data.As such, potential ethical concerns related to informed consent, privacy, and data handling are minimized.
Our work aims to contribute to the field of recommendation systems by addressing technical challenges associated with multiscenario recommendation in incremental mode.Throughout our research process, we have followed established research ethics guidelines and practices to ensure the accuracy, transparency, and rigor of our methods and results.We have also taken care to properly attribute prior works and provide appropriate citations to relevant sources to maintain academic integrity.
We acknowledge that, while our research primarily concerns technical advancements, the deployment and application of recommendation systems in real-world scenarios may raise broader ethical considerations related to user privacy, fairness, and potential algorithmic biases.We recognize their significance and encourage researchers and practitioners to approach the deployment of recommendation systems with careful consideration of these ethical implications.
In summary, our research on IncMSR has been conducted with a commitment to upholding ethical standards within the scope of our technical contributions.

Figure 1 :
Figure 1: Illustration of different transfers over scenario, time and time-scenario dimensions.

3 Figure 2 :
Figure 2: Comparison between Batch Mode (a) and Incremental Mode (b) for training MSR models: Batch Mode trains Model 1 from scratch with fixed window-size data, while Incremental Mode trains Model 1 using only incoming data based on Model 0.

Figure 3 (
Figure 3 (b)  shows the distribution of CTR across four distinct scenarios over 13 consecutive days on the KuaiRand[5] dataset.A large shift in CTR can be found across scenarios and timestamps.Existing MSR models do not model these shifts and thus are insensitive to the changes of distribution over scenario and time dimensions.Since calculating distributions directly with respect to scenarios and time is challenging, we quantify distances at the representation level.Eq. 2 gives the general definition of the distance function, where  ,  is the representation of a specific scenario  at incremental step .The two different representations are obtained from either scenario-shared layers or scenario-specific layers, depending on the imposed constraints in each dimension.

Figure 3 :
Figure 3: (a) Overview of IncMSR architecture with  indicating the incremental step.  is fed into previous model   −1 to obtain representations from shared and scenario-specific layers.  is also used to update the current model   to obtain the updated representations.(b) CTR distribution of 4 scenarios over 13 consecutive days on the KuaiRand dataset [5].

Figure 4 :
Figure 4: Efficiency comparison between Batch-multi and IncMSR over consecutive days on the Taobao dataset.

Table 1 : The percentage of instances and average click- through rate (CTR) of each scenario.
[31]lines and Implementation Details.To demonstrate the effectiveness and compatibility of the proposed framework, we compare IncMSR with three groups of models.•Batch-single.Batch-single models are trained with data from a specific scenario using batch mode.Without loss of generality, we use a SOTA deep CTR model DCN[31] as backbone model.We train the batch model on the data in a fixed-size window [,  + ),  ∈ [0, − ].Batch- (  ∈ {0, 1, 2, 3, 4}) represents the models with  days' delay at iteration .Specifically, Batch-0 corresponds to the model trained on data from a specific scenario within the initial window [0, ).