On the Effectiveness of Unlearning in Session-Based Recommendation

Session-based recommendation predicts users' future interests from previous interactions in a session. Despite the memorizing of historical samples, the request of unlearning, i.e., to remove the effect of certain training samples, also occurs for reasons such as user privacy or model fidelity. However, existing studies on unlearning are not tailored for the session-based recommendation. On the one hand, these approaches cannot achieve satisfying unlearning effects due to the collaborative correlations and sequential connections between the unlearning item and the remaining items in the session. On the other hand, seldom work has conducted the research to verify the unlearning effectiveness in the session-based recommendation scenario. In this paper, we propose SRU, a session-based recommendation unlearning framework, which enables high unlearning efficiency, accurate recommendation performance, and improved unlearning effectiveness in session-based recommendation. Specifically, we first partition the training sessions into separate sub-models according to the similarity across the sessions, then we utilize an attention-based aggregation layer to fuse the hidden states according to the correlations between the session and the centroid of the data in the sub-model. To improve the unlearning effectiveness, we further propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED). Besides, we propose an evaluation metric that measures whether the unlearning sample can be inferred after the data deletion to verify the unlearning effectiveness. We implement SRU with three representative session-based recommendation models and conduct experiments on three benchmark datasets. Experimental results demonstrate the effectiveness of our methods.


INTRODUCTION
Session-based recommendation models have shown their effectiveness in predicting users' future interests from memorized sequential interactions [16,41].However, the ability to eliminate the influence of specific training samples, known as unlearning, also holds crucial significance.From a legitimacy perspective, several data protection regulations have been proposed, such as the General Data Protection Regulation (GDPR) [26] and the California Consumer Privacy Act (CCPA) [17].These legislative regulations emphasize individuals' right to have their private information removed from trained machine learning models.As for user perspective, there has been a surge in research which proved that various user privacy information such as gender, age, and even political orientation could be inferred from historical interactions with a recommender system [4,6,43].Addressing privacy concerns, users might find it imperative to request the expunction of specific historical interactions.Besides, a proficient recommendation model possesses the capacity to eliminate the impact of noisy training interactions to gain better performance.Machine unlearning.Machine unlearning enables a model to forget certain data or patterns that it has previously learned.Exact unlearning targets on completely eradicating the impact of the data to be forgotten as if they never occurred in the training process.
A straightforward exact unlearning method is to remove the targeted samples from the training dataset and then retrain the entire model from scratch.Unfortunately, this approach is hindered by its time-consuming and resource-intensive nature.To address this issue, existing methods [1,2] focus on enhancing the efficiency of unlearning.One of the most representative unlearning methods is SISA [1].The SISA initially divides the training dataset into disjoint shards of equal size.Subsequently, sub-models are trained on each shard independently.To formulate the final model prediction for a given data point, the predictions from every sub-model are aggregated through majority voting or average.In the event of an unlearning request, sorely the sub-model that was trained on the shard containing the unlearning data point is retrained, rather than the entire model.SISA achieves significant improvement in unlearning efficiency compared with the whole retraining.
Challenges of unlearning in session-based recommendation.
In the recommendation field, RecEraser [5] applies the SISA framework to non-sequential collaborative filtering.Nevertheless, we argue that there still exists the following key challenges for unlearning in session-based recommendation: i).Exact unlearning is hard to achieve.Existing exact unlearning methods hold the assumption that the effect of unlearning samples would be completely removed if such samples do not exist in the retrained models.However, the assumption does not hold for session-based recommendation.Different from other domains such as image classification, where the correlations between training samples are sparse, there exists plenty of collaborative correlations and sequential connections across the interacted items in sessionbased recommendation.Consequently, simply removing the unlearning samples cannot achieve the exact unlearning effect, i.e., the unlearned item could still be inferred from the remaining items in the session, as shown in Figure 1.
ii).Existing recommendation unlearning methods do not evaluate the unlearning effectiveness.Existing methods [5,23] mainly focus on the trade-off between recommendation performance and unlearning efficiency.However, seldom work conducted the evaluation regarding the unlearning effectiveness, i.e., to which extent the effect of unlearned samples is eliminated.The evaluation is especially important to verify the unlearning effectiveness of sessionbased recommendation, given the case that exact unlearning cannot be simply achieved.The proposed method.In this paper, we propose session-based recommendation unlearning (SRU), an unlearning framework tailored to session-based recommendation, achieving high unlearning efficiency, accurate recommendation, and improved unlearning effectiveness.Concretely, we first partition the training sessions into separate shards according to the similarity of the sessions and then a corresponding sub-model is trained upon each data shard.Such a data division strategy attempts to make similar sessions fall into the same shard, and thus each sub-model tends to learn a clustering of similar sequential patterns, resulting in improved recommendation performance.Utilizing the trained sub-models, we can obtain the hidden states which represent the session from the perspective of each sub-model.Then, an attention-based aggregation layer is trained to fuse the hidden states based on the correlations between the session and the centroid of the respective data shard.
To address the first challenge, we propose three extra data deletion strategies, including collaborative extra deletion (CED), neighbor extra deletion (NED), and random extra deletion (RED), to further enhance the unlearning effectiveness.For the second challenge, we propose an evaluation metric that measures whether the unlearning sample can be inferred after data deletion.The intuition is that if the unlearning is highly effective, the unlearning sample should have a low probability of being inferred based on the remaining data.When an unlearning request occurs, e.g., a user may want to hide the click of a sensitive item, only the corresponding sub-model and the aggregation layer need to be retrained based on the deleted data to achieve efficient unlearning.
To verify the effectiveness of the proposed method, SRU is implemented on three representative session-based recommendation models including GRU4Rec [15], SASRec [20], and BERT4Rec [31].We conduct a series of experiments on three benchmark datasets and the result shows the effectiveness of the proposed method.

Contributions. To summarize, the main contributions lie in:
• To the best of our knowledge, SRU is the first attempt that aims to address the machine unlearning problem for session-based recommendation.We propose three extra data deletion strategies to improve the unlearning effectiveness, and meanwhile, we use similarity-based clustering and attention-based aggregation to keep a high recommendation performance.• We propose an evaluation metric to verify the unlearning effectiveness of session-based recommendation.The key idea is that if the unlearning is effective, the unlearning sample should have a low probability of being inferred after data deletion.• We conduct extensive experiments on three state-of-the-art session-based recommendation models and three benchmark datasets to show that SRU can achieve efficient and effective unlearning while keeping high recommendation performance.

RELATED WORK
In this section, we provide a literature review regarding sessionbased recommendation and machine unlearning.

Session-Based Recommendation
Session-based recommendation aims to capture a user's dynamic interests from her/his past interactions in the session.Early Markov chains-based models [12,13,27,29] predict a user's forthcoming interests according to the last interaction in the given session.More recently, deep neural network models have been utilized to capture complex sequential signals to improve session-based recommendation.The representative session-based recommendation models can be categorized into recurrent neural network (RNN)-based models [9,14], convolutional neural network (CNN)-based models [32], attention-based models [20,31], and graph-based models [36].Besides, self-supervised learning [42] and contrastive learning [24,37] have also been applied to improve session-based recommendation and plenty of models have been emerged.
In this paper, we propose a framework that enables effective and efficient unlearning for various session-based recommendation models, other than developing a new specific model.We adopt three representative models, GRU4Rec, SASRec and BERT4Rec, as backbone models for the experiments.

Machine Unlearning
The concept of machine unlearning was first proposed by [3], in response to the requirement of "the right to be forgotten".Unlearning methods can be broadly categorized into approximate unlearning methods and exact unlearning methods.
Approximate unlearning ensures that the performance of the unlearned model closely aligns with that of a retrained model.This reduces the time and computational cost of unlearning, but at the potential expense of weaker privacy assurances.The approximation can be achieved through differential privacy techniques, such as certified unlearning [39].For instance, [33] introduced an unlearning method based on noisy stochastic gradient descent, whereas [10] achieved certified unlearning based on Newton updates.[3] proposed to use the gradient surgery which updates the model parameters using the negative gradient of the unlearning samples.[18] utilized a probabilistic model to approximate the unlearning process.[7,25] proposed to perturb the gradients or model weights through the inverse Hessian matrix, which may incur additional computational overheads.
Exact unlearning attempts to completely remove the effect of the unlearning samples as if they have never occurred in the training process, providing a stronger privacy guarantee.However, such methods could require the model to be retrained from scratch, which is computationally expensive and time-consuming.The most representative method for efficient exact unlearning is SISA [1] since only the sub-model trained on the corresponding data shard would be retrained for an unlearning request.[8] adapted SISA for unlearning in graph neural networks.[11] modified the SISA algorithm to work for sequences of deletion requests.Another kind of method for exact unlearning involves selective influence estimators [35], which calculate the influence of the unlearning samples on the model parameters.Although such influence-based methods are effective in terms of privacy preservation, the high computational cost limits their application for real-world scenarios [39].
Recently, unlearning in the recommendation scenario tends to attract more research attention.Unlearning can not only help to protect user privacy but also improve recommendation models through eliminating the effect of noisy data and misleading information [28].[23] and [40] proposed to use fine-tuning and the alternative least square algorithm for unlearning acceleration.[5] and [22] extended the ideas of the SISA algorithm for collaborative filtering.However, none of the existing methods is tailored for session-based recommendation.Besides, existing methods mainly focus on the unlearning efficiency, while failing to verify the effectiveness of the unlearning, i.e., to which extent the effect of the unlearning sample is removed.

TASK FORMULATION
In this section, we first formulate the task of session-based recommendation, upon which we define the task of item-level unlearning in a session.Then we identify the unlearning challenges.

Notations and Definitions
Session-based recommendation aims to predict the user's potential next action given previous interacted items in the session.We formulate the task as follows: 1 , ...,    , ...,    ] ∈ D denotes the -th specific interaction session in D, where    ∈ V is the item interacted by the user at time step , and  is the current length of the session.Given the historical sequence S  , the interaction probability over candidate item  at time step  + 1 can be formalized as: where M denotes the involved recommendation model, e.g., GRU4Rec [14] and SASRec [20].At the prediction stage, sessionbased recommenders select the items with the highest top- probability    as the recommendation list for the user.For privacy considerations or recommendation utility, an unlearning request could occur to remove the effect of certain training samples.As an illustration, a user may want to revoke some misclicks in an interaction session since the misclicks can downgrade the recommendation quality or a user could also request to hide the click of certain sensitive items for private concerns.In this paper, we focus on the item-level unlearning in session-based recommendation, which is defined as follows: Definition 3.2 (Item-level unlearning).We denote    ∈ S  to be the unlearning item that the user wants to revoke in the session S  .The goal of item-level unlearning is to obtain an unlearned model M  .Ideally, the unlearning sample    should have no effect on the unlearned model M  as if    never occurred in the session.Note that besides the item-level unlearning, there could also be the request for session-level unlearning, i.e., to remove the effect of a whole interaction session.In this paper, we focus on item-level unlearning while the proposed framework can also support sessionlevel unlearning.We leave the further investigation of session-level unlearning as one of our future directions.

Challenges
3.2.1 Exact unlearning is hard to achieve.Existing exact unlearning methods are mainly applied in fields such as image classification [1], where data points are relatively independent of each other.In this circumstance, existing unlearning methods hold a belief that the effect of unlearning samples would be perfectly removed if the samples do not exist in the retrained models.However, due to the fact that there exists plenty of collaborative correlations and sequential connections in item interactions of session-based recommendation, simply removing the unlearning sample cannot achieve the expected effect that the unlearning sample has never occurred in the training data.For example, a user may want to hide the click of a sensitive item in a session while simply removing the unlearning item in the session cannot achieve a satisfying effect since the sensitive item could still be inferred from the deleted data due to the existence of collaborative correlations or sequential corrections.The challenge can be formalized as: (2) 3.2.2Unlearning effectiveness is not well defined.Existing recommendation unlearning methods [5,23] mainly investigated the trade-off between recommendation performance and unlearning efficiency.As pointed out in section 3.2.1,exact unlearning is hard to achieve in session-based recommendation.Under this situation, the evaluation of unlearning effectiveness, i.e., to which extent the influence of unlearning samples is eliminated, assumes particular significance.However, seldom work has conducted the evaluation regarding the unlearning effectiveness in the field of session-based recommendation.
Despite the above two specific challenges of unlearning in session-based recommendation, the model performance (i.e., recommendation accuracy) and unlearning efficiency are also key factors that need to be optimized.

METHODOLOGY
In this section, we describe the detail of the proposed SRU framework.As shown in Figure 2, SRU is composed of session partition, attentive aggregation, and data deletion.The session partition module aims to divide the training sessions into disjoint data shards and then sub-models are trained on each shard.Based on the hidden states coming from different sub-models, the attentive aggregation module fuses the hidden states for the final prediction.The data deletion module aims to improve the unlearning effectiveness.When an item-level unlearning request comes, the data deletion module first applies extra data deletion strategies to the corresponding session.Then only the sub-model and the aggregation module are retrained, achieving efficient unlearning.

Session Partition
One keystone to generating the next-item recommendation is learning signals from similar sessions.To this end, session similarity is important for recommendation accuracy.Consequently, in the session partition module, similar sessions are expected to be divided into the same data shard and thus can be trained in one sub-model.Such a division strategy can help to improve the recommendation performance since it enables more knowledge transfer within each shard.
To achieve the described division strategy, an additional sessionbased recommendation model M  (e.g., GRU4Rec [15]) is pretrained on D to obtain all training sessions' hidden states firstly.Then a -means clustering method based on the pre-trained hidden states is used to divide training sessions.More specifically, the input of the session partition module includes the pre-trained hidden states, the number of partition shards K, and the maximum number of sessions in each shard .The distance between session pairs is defined as the Euclidean distance of their hidden states.K sessions are randomly selected as centroids at first, then distances between sessions and centroids are calculated.Subsequently, the sessions are assigned to the shard sequentially according to the ascending order of distances.If one shard is unavailable (i.e., the number of sessions within the shard is larger than ), the next session is assigned to the nearest available shard.After that, the new centroids are calculated as the mean of all sessions' hidden states in each corresponding shard.The above process is repeated until the centroids are no longer updated.Then we obtain the balancedpartition session as  ∈ [ K ] D  = ∅ and  ∈ [ K ] D  = D. Then sub-models are trained on the data shard separately.

Attentive Aggregation
Based on the session partition, each sub-model tends to learn a clustering of similar sequential patterns.The attentive aggregation module aims to fuse the hidden states from each sub-model for the final prediction, which consists of a projection layer, an attention layer, and an output layer.

Projection layer.
Given a session we compute its hidden representation h  ∈ R  using each sub-model M  trained on D  .Since sub-models are trained separately, the hidden representations could embed in different vector spaces.In order to utilize the knowledge of every sub-model, we need to project the hidden representations into a common space.Specifically, a linear transfer layer is used to conduct the projection: where W  ∈ R  × and b  ∈ R  are projection parameters.Note that each sub-model M  has a corresponding W  and b  .
Besides, the data centroid of D  is also projected as where c  denotes the original centroid representation computed from h  .The data centroid representation is used in the following attention layer.

Attention layer.
The attention layer aims to compute the importance of each sub-model for a given session.[5] also used an attention layer to fuse user and item embeddings for unlearning in collaborative filtering.However, their method cannot be applied to session-based recommendation since their attention is solely based on either user or item embedding.While in the session-based recommendation, we argue that the attention should be based on the correlations between the session and the data centroid (i.e., the attention layer should have two input sources corresponding to the session representation and the centroid representation, as shown in Figure 2).
To this end, we define the attention score for sub-model M  as where W ′ ∈ R  × , b ′ ∈ R  and g ∈ R  are learnable attention parameters. is the size of the attention layer.⊙ denotes elementwise product and • denotes the inner product.
Based on the attention score, the final representation of a session is formulated as

Output layer.
Based on the final aggregated hidden representation h  , a two-layer feed-forward network with ReLU activation is used to produce the output distribution over candidate items.The attentive aggregation module is trained with the cross-entropy loss over the output distribution.

Data Deletion
The data deletion module aims to improve the unlearning effectiveness.For an item-level unlearning request, conventional unlearning methods just remove the unlearning sample, while it is still possible that the removed sample can be inferred again from the remaining interactions in the session due to the existence of sequential connections and collaborative correlations.To address the problem, we propose three strategies, namely Collaborative Extra Deletion(CED), Neighbor Extra Deletion(NED) and Random Extra Deletion(RED).
From the view of collaborative correlations, we propose CED which deletes extra items based on the similarities between the unlearning item and other items in the session.Given the target unlearning item    in session S  , the item similarity is calculated As for sequential connections, NED is proposed to remove the N nearest items in front of the unlearning item in chronological order.While in RED, we randomly choose N extra items to delete within the session.

Unlearning Effectiveness Evaluation
Item-level unlearning is a common request, for example, a user may want to hide the click of a sensitive item in a session or may dislike an item anymore.To this end, if the unlearning is effective, the unlearned item should not be inferred from the remaining items in the session or the item should not be recommended to the user again in the near future.
To this end, we define one unlearning effectiveness evaluation metric as the hit ratio (i.e., HIT@) which measures whether the unlearned item would occur in the top- recommendation list based on the remaining interactions in the session using the unlearned model M  .Such the evaluation metric can also be seen as the performance of a membership inference attack [30] which attempts to infer the unlearning items from the remaining data.If HIT@ is high, it means the unlearned item has a high probability of being re-recommended or being inferred again.On the contrary, a lower HIT@ implies the unlearned model M  has unlearned the item well enough and achieves a better unlearning effectiveness.

EXPERIMENTS
In this section, we conduct experiments on three benchmark datasets to verify the effectiveness of SRU.We aim to answer the following research questions: RQ1: How is the recommendation performance of SRU when instantiated with different session-based recommendation models?RQ2: How is the unlearning effectiveness of SRU? RQ3: How is the unlearning efficiency of SRU?
More ablation experiments about how do designs of SRU affect the performance can be seen in Appendix.

Experimental Settings
5.1.1Datasets.Experiments are conducted on three publicly accessible datasets: Amazon Beauty, Games and Steam.The two Amazon datasets 1 are a series of product review datasets crawled from Table 2: Recommendation performance comparison after unlearning 10% of data in each shard.The extra deletion number N ranges from 1 to 5. Best results other than Retrain are highlighted in bold."N" is short for NDCG and "R" is short for Recall.

Recommendation performance evaluation.
We adopt crossvalidation to evaluate the performance of the proposed methods.The ratio of training, validation, and test set is 8:1:1.We randomly sample 80% of the sessions as the training set.For validation and test sets, The evaluation is done for validation and test sets by providing the interactions in a session one by one and checking the rank of the next ground-truth item.The ranking is performed among the whole item set.
To evaluate recommendation performance, we adopt two common top- metrics: Recall@ and NDCG@.Recall@ measures whether the ground-truth item is in the top- positions of the recommendation list [38].NDCG@ is a weighted metric that assigns higher scores to top-ranked positions [19].We use the metric HIT described in section 4.4 to evaluate unlearning effectiveness.
2 https://steam.internet.byu.edu/ • BERT4Rec [31]: This model employs deep bidirectional selfattention to model interaction sequences.To enable unlearning, every model is trained with: • Retrain: This method retrains the whole model from scratch on the remaining dataset.It's computationally expensive.• SISA: This is a fundamental exact unlearning method that randomly splits the data and averages the outputs of the sub-model.• SRU-N: This is SRU with neighbor extra deletion (NED).
• SRU-R: This is SRU with random extra deletion (RED).
• SRU-C: This is SRU with collaborative extra deletion (CED).Note that we do not compare with RecEraser [5] since it is proposed for non-sequential collaborative filtering and their data partition methods cannot be applied for session-based recommendation since the session-based recommender does not explicitly model user identifiers.

Hyperparameter settings.
The model input is the last 10 interacted items for Beauty, and the last 20 interacted items for Games and Steam.We pad the sequences with a padding token for shorter sessions.The Adam optimizer [21] is used to train all models, with batches of size 256.The learning rate for the aggregation layer is tuned among [1e-3, 1e-2].The default number of data shard is set as K = 8.The extra data deletion number for unlearning is ranged from 1 to 5. The other hyperparameters are set as the recommended settings of their original papers.

Recommendation Performance (RQ1)
Table 2 shows the top- recommendation performance of different unlearning methods when 10% random sessions need items to Table 3: Recommendation performance comparison without unlearning request.SRU denotes the proposed SRU framework without extra data deletion.Best results other than Retrain are highlighted in bold."N" is short for NDCG and "R" is short for Recall.
To conclude, the proposed SRU achieves better recommendation performance than baseline methods on both unlearning request scenarios and full data scenarios.

Unlearning Effectiveness (RQ2)
In this part, we conduct experiments to evaluate the unlearning effectiveness of different methods.We randomly unlearn 10% of data and set the extra deletion number N from 1 to 5. Table 4 shows the unlearning effectiveness comparison on Beauty and Steam datasets.The results in the Games dataset show a similar conclusion.
Firstly, we can see that even if the unlearning item is removed, there is a probability (e.g., more than 10% on the Steam dataset) that the item can be inferred again from the remaining interactions in the session.This observation verifies that conventional exact unlearning methods cannot achieve exact unlearning effects in the session-based recommendation scenario.
Besides, we can see that the proposed SRU-R, SRU-C and SRU-N achieve better unlearning effectiveness compared with Retrain and SISA.For example, on the GRU4Rec model and trained on the Beauty dataset, the HIT@1 for SRU is 0.0577, while for Retrain is 0.0764.The observation indicates that the proposed data deletion module is essential for unlearning effectiveness.
What's more, SRU-C and SRU-N achieve stable unlearning effectiveness improvement since they can help to eliminate the effect of collaborative correlations and sequential connections correspondingly, while SRU-R removes extra data randomly and has a more varied performance.
To conclude, the proposed SRU achieves the highest unlearning effectiveness, even better than Retrain.

Unlearning Efficiency (RQ3)
Table 5 shows the training time comparison between Retrain and SRU.We evaluate them both on NVIDIA GeForce RTX 2080 Ti and set the shard number to 8. Especially the retraining time of SRU consists of sub-model training and aggregation module training.From Table 5, we can find that SRU performs much more efficiently than Retrain.In most cases, SRU is more than three times faster than Retrain.For example, on Beauty and BERT4Rec, Retrain needs 55.76 minutes, but our SRU only needs 12.97 minutes.The efficiency improvement is more significant on the larger Steam dataset.For example, on Steam and SASRec, Retrain needs 368.99 minutes, but our SRU only needs 99.31 minutes, the improvement reaches 3.71x optimisation.And the training of sub-models can be parallel since they do not share parameters which can further accelerate the training process.

GRU4Rec SASRec BERT4Rec
HIT@1 HIT@5 HIT@10 HIT@20 HIT@1 HIT@5 HIT@10 HIT@20 HIT@1 HIT@5 HIT@10 HIT@20 HIT@1 HIT@5 HIT@10 HIT@20 HIT@1 HIT@5 HIT@10 HIT@20 HIT@1 HIT@5 HIT@10 HIT@20 , and random extra deletion (RED), to ensure the unlearned items cannot be inferred again from the remaining items in the session.Then we have retrained corresponding sub-model and the aggregation module for efficient unlearning.We have utilized a similarity-based session partition module and an attentive aggregation module to improve the recommendation performance in SRU.Besides, we have further defined an evaluation metric to verify the unlearning effectiveness of the session-based recommendation.We have implemented SRU with three representative session-based recommendation models and conducted experiments on three benchmark datasets from the perspective of unlearning performance, efficiency and effectiveness.Experimental results have demonstrated the superiority of our proposed methods.
For future work, we plan to investigate session-level unlearning.In the real-world recommendation scenario, both session-level and item-level unlearning need to be taken into consideration, and they may face different challenges.Furthermore, we would like to extend the exact deletion strategies to make more accuracy performance and completely unlearning.We also plan to adapt the approach to other recommendation domains.What's more, the trade-off between unlearning effectiveness, recommendation performance, and unlearning efficiency is also an interesting future topic.

A APPENDIX A.1 Ablation Study
A.1.1 Shard numbers.In this part, we conduct experiments to investigate the effect of shard numbers.Figure 3 shows the results of NDCG@20 and the unlearning time cost with different shard numbers.We can see that a larger shard number leads to decreased recommendation performance and lower unlearning time costs.Since the sub-models are trained separately, a larger shard number indicates fewer correlations across sessions can be learned, resulting in lower recommendation performance.To this end, a good aggregation layer is needed to integrate the information of sub-models.The trade-off between recommendation performance and the unlearning cost is an interesting research direction.
A.1.2Session partition.In this part, we conduct experiments to verify the effect of the session partition module.Figure 4 shows the recommendation performance w.r.t.Recall@20 of the proposed partition method and the random partition method on the three datasets.We can observe that the proposed method achieves a much better recommendation performance compared to the random partition method.For instance, the Recall@20 of SRU is 0.069 on the Beauty dataset with the GRU4Rec model, while the corresponding Recall@20 of SISA is 0.061.This observation demonstrates the proposed session partition module helps to improve the recommendation performance.
A.1.3Data deletion.In this part, we conduct experiments to investigate how the number of extra deleted samples affects the unlearning effectiveness.We range the extra deletion number N from 0 to 5. Figure 5 illustrates the results on the Games dataset.Results of other datasets show a similar trend.We can see that with the increase of the extra deletion number, the probability of inferring the unlearned items from the remaining data decreases.Intuitively, the increased extra deletion number could also downgrade the recommendation performance.In real-world applications, the tradeoff between unlearning effectiveness and recommendation performance needs to be investigated more deeply.

Figure 1 :
Figure1: Exact unlearning is hard to achieve.The unlearned item could still be inferred due to collaborative correlations and sequential connections across items in the session.

Figure 2 :
Figure 2: Overview of the proposed SRU framework.SRU is composed of session partition, attentive aggregation and data deletion modules.

ACKNOWLEDGEMENT
This work was supported by the Natural Science Foundation of China (62202271, 61902219, 61972234, 61672324, 62072279, 62102234, 62272274), the National Key R&D Program of China with grants No. 2020YFB1406704 and No. 2022YFC3303004, the Natural Science Foundation of Shandong Province (ZR2021QF129, ZR2022QF004), the Key Scientific and Technological Innovation Program of Shandong Province (2019JZZY010129), the Fundamental Research Funds of Shandong University, the Tencent WeChat Rhino-Bird Focused Research Program (WXG-FR-2023-07).All content represents the opinion of the authors, which is not necessarily shared or endorsed by their respective employers and/or sponsors.This paper is dedicated to Mrs. Hui Yu.Thanks for her support toXin Xin.

Figure 3 :
Figure 3: Impact of the shard number on recommendation performance and unlearning efficiency.The bar shows recommendation performance and the line shows unlearning time cost.

Figure 5 :
Figure 5: Effect of extra deletion numbers.

Table 1 :
Statistics of the datasets (after preprocessing).
We can see that the proposed SRU always performs better than SISA even though SRU has removed more training data.This is because when training data is unlearned, the performance of all sub-models are degraded for the training data is smaller, while SRU groups similar sessions in a shard which makes the model share more collaborative information to gain better recommendation performance.And it makes sense that Retrain always gets the highest scores for it can retrain the whole model on all remaining data, but sacrifices efficiency.

Table 4 :
Unlearning effectiveness comparison.Lower scores denote better results.The best results are highlighted in bold.

Table 5 :
Comparison of unlearning efficiency (minute [m]).The best results are highlighted in bold.