Attacking Pre-trained Recommendation

Recently, a series of pioneer studies have shown the potency of pre-trained models in sequential recommendation, illuminating the path of building an omniscient unified pre-trained recommendation model for different downstream recommendation tasks. Despite these advancements, the vulnerabilities of classical recommender systems also exist in pre-trained recommendation in a new form, while the security of pre-trained recommendation model is still unexplored, which may threaten its widely practical applications. In this study, we propose a novel framework for backdoor attacking in pre-trained recommendation. We demonstrate the provider of the pre-trained model can easily insert a backdoor in pre-training, thereby increasing the exposure rates of target items to target user groups. Specifically, we design two novel and effective backdoor attacks: basic replacement and prompt-enhanced, under various recommendation pre-training usage scenarios. Experimental results on real-world datasets show that our proposed attack strategies significantly improve the exposure rates of target items to target users by hundreds of times in comparison to the clean model.


INTRODUCTION
Recommender systems aim to personally provide appropriate items for users according to their preferences, mainly hidden in their historical behavior sequences.Recently, sequential recommendation (SR) has achieved great success and is widely applied in practice [4,28], which takes users' historical behavior sequences as inputs and outputs the predicted items.Lots of effective sequential modeling methods have been verified in SR [7,16].Recently, inspired by the overwhelming power of large-scale pre-training models [1], some pioneer efforts bring pre-training into sequential recommendation and achieve great successes [17,19,21,25].Some ambitious works even explore building a unified big pre-training recommendation model for various downstream tasks [2,5,20].The broad usage of recommendation pre-training is promising.
Due to the significant social impact and commercial value of recommender systems, the security of recommender systems is a crucial matter.For example, in E-commerce, an attacker could mislead the model to recommend target items via fake users and cheating behaviors.Existing works have shown the vulnerabilities and security threats of conventional recommendation [8,9,12,23].In these studies, attacks typically involve two roles: the platform and the users.Malicious users could poison log data (i.e., user behavior history) by natural interactions to attack the platform and manipulate the system to deliver their desired results [18,24,27].Unfortunately, despite the progress of pre-training recommendation, the security of pre-training recommendation model is still without consideration, and recommendation model encounters new threats under pre-training.Different from conventional recommendation, there is an additional role besides platform and the users In this work, we conduct a pioneer exploration of the new attack paradigm in pre-training recommendation.We design two backdoor attacks to improve the exposure rates of certain target items to target users (i.e., global or a group of users).Specifically, we first design a straightforward and effective basic replacement strategy to generate fake user behavior sequences involving extra target items.Through this basic attack, the exposure rates of target items can be increased hundreds of times.Moreover, to enhance the attack effectiveness for user group attacks, we propose a prompt-enhanced attacking strategy against the mainstream tuning paradigm "prompttuning".Specifically, we design a three-step training framework that simulates the process of prompt-tuning, enabling facilitate more precise and harder-to-detect attacks on target user groups with minor influence on other user groups in different settings.Additionally, we also propose a preliminary pre-trained model anomaly detection method against these new types of pre-training backdoor attacks.In experiments, we conduct different types of attacking strategies on various data settings, demonstrating the existence and hazards of attacks in recommendation pre-training as well as the effectiveness of possible detection and defense methods.
Overall, we are the first to systematically demonstrate the threat of backdoor attacks on pre-trained recommendation by the proposed backdoor attacking method.We explore this new attacking paradigm in pre-trained recommendation to remind developers and researchers of the security concerns that have been ignored but are extremely important, ringing the alarm bell.

PRELIMINARIES & RELATED WORKS
Recommendation Pre-training.Recommendation pre-training mainly has three parts, including model provider, platform, and user.Let  and  be the model provider and platform.We denote the unaligned user sets of model provider and platform as   and   .In this work, we assume the item set is shared (similar scenarios widely exist in movie/book/music recommendation.Future largescale pre-trained recommendation models may also involve nearly all items [5]), noted as  .Each user  ∈ ( ) on all pre-training data of   , which is then adopted by the downstream platform to serve its users.In general, platforms can use pre-trained models in three ways: direct using, fine-tuning, and prompt-tuning.For direct using, the platform deploys the pre-trained model for downstream tasks without any change.For fine-tuning, the platform tunes the pretrained model on its private data.For prompt-tuning, the platform conducts a parameter-efficient tuning paradigm with prompts [10,20,26] (see Sec.3.2 for details of prompt-tuning).The details of prompt tuning are introduced in Sec.3.2.Threat Model.The attacker's goal can be classified into promotion/demotion attacks and availability attacks [15].Availability attacks aim to make the recommender system unserviceable, which is easy to detect and meaningless for our pre-training attack setting.Promotion/demotion attacks are designed to promote/demote the recommended frequency of the target items [18,24,27].In this work, we focus on promotion attacks.An attacker's goal is to promote the exposure rate of target items to target users in a top-K recommender system.Here, target users can be all users or specific user groups in the platform.Private data: The model provider's and platform's behavior data is private and cannot be observed by each other since user information and behaviors cannot be public due to privacy protection.

METHODOLOGY 3.1 Basic Replacement Attack
We first introduce the pre-training part of our study.We adopt nextitem prediction as our pre-training task, which is a natural and widely adopted pre-training task in recommendation [2,5,22,29].Given a user behavior sequence   = {  1 ,   2 , ...,   |  | }, the goal of next-item prediction is predicting items that the user may interact with.We formulate it as: where   is the pre-training model and Θ is the parameters. is the loss function, we adopt the classical BPR loss as our loss [14].
Previous studies show generating fake user behavior sequences is an effective way to promote target items [6,24,27].Since the recommendation model is trained on user behavior data.In recommendation pre-training, intuitively, we expect the generated fake user sequences to have the following two key features: (1) they should be as similar as possible to those of natural users so that the attacked pre-train model is available on the downstream platform, in which users are natural real user.(2) they should have as little impact as possible on the normal recommendation performance.Based on this, we design a sample but effective and efficient random replacement strategy.Specifically, given a behavior sequence   = {  1 ,   2 , ...,   |  | } and corresponding ground truth item  |  |+1 , we first find target items that similar with ground truth item.Then we replace this ground truth with a randomly selected item from similar items with probability  .In this way, we can generate natural fake user behavior sequences.Actually, those sequences are created by real users except for the replaced item.Moreover, by adjusting the replacement rate  and the measure of similarity, we can balance the attack effect and recommendation effect.In this paper, we regard the items under the same category as similar items.
The target user can be either global users in the system or a specific group of users (e.g., younger students).For user group attacks, the attacker aims to increase the exposure rate of target items on the target user group while avoiding impact on non-targeted user groups.Therefore, for user group attacks, we only apply our basic replacement attack on the target user group.

Prompt-enhanced Attack
Prompt tuning is becoming a new popular paradigm for utilizing pre-trained models on downstream tasks [10,11,20,26], which is effective and efficient.Prompt also shows its power to manipulate sequential models' output [1,13].Considering the strengths of prompt, we design a prompt-enhanced attack for user group attacks.A prompt is often a small piece of hard text or soft embeddings inserted into the original sequence, which helps to efficiently extract knowledge from the pre-training models for the downstream tasks.During tuning, only the prompts (having much fewer parameters) will be updated, with the whole pre-trained model unchanged.In recommendation, as the emphasis is on personalized, prompts generally are generated by personalized information (e.g., user profile).We formulate the prompt-tuning as follows: ( where the {  1 ,   2 , ...,    } is generated prompt, and  is the prompt generator's parameters, we freeze Θ when tuning prompt. The prompt can be provided by the model provider.In this case, the prompt can be trained together with the pre-trained model.Our experiments show the effectiveness of prompts, especially in user group attacks.However, typically, the platform trains prompt on its own private data.In this case, it is challenging to implement prompt-enhanced attacks.As the model provider (attacker) does not know the parameters of prompts and private data of the platform.Experiments also show the ineffectiveness of joint training.To solve this challenge, we propose a novel three-step training framework: (1) Step 1: Pre-train a sequential model on the provider's data.This step aims to build a clean model for downstream tuning.(2) Step 2: Freeze the pre-trained model's parameters and conduct prompt tuning on the model provider's private data.The goal is to simulate the prompt-tuning of the platform and obtain fake prompts.(3) Step 3: Freeze the prompt's parameters and tune the sequential model with our basic replacement attack strategy.In this way, the attacked pre-trained model will react to the fake prompts and achieve the goal of manipulating the recommendation system.
After the three-step training, we implant the backdoor into the pre-trained model.The backdoor will be triggered after the platform conducts prompt-tuning on its private data.

EXPERIMENTS 4.1 Dataset
We evaluate our attack method on two real-world open datasets, namely CIKM and AliEC.We assume that the platform has fewer data and lower user activity, and therefore needs to use pre-trained models.To simulate the roles of the model provider and downstream platform, we partition the datasets into two subsets.We treat users with fewer than ten interactions as platform users and others as users in the model provider.Note that the platform and model provider do NOT share users and training data.CIKM.The CIKM dataset is an E-commerce recommendation dataset released by Alibaba 1 .There are 130 thousand items in this dataset.The model provider has 60 thousand users with 2 million click instances.The platform has 31 thousand users with 200 thousand clicks.In user group attack, users having a certain common attribute (i.e., gender) are viewed as the target user group.AliEC.AliEC is an E-commerce recommendation dataset.It contains 109 thousand items.There are 98 thousand users with 8.9 million click instances in the model provider, and over 6 thousand users with 30 thousand click instances in the platform.Users in the same age range are viewed as the target user group.

Experiments Setting
In this work, we adopt the Transformer-based sequential model SASRec [7] as our base model.As discussed before, since the scenarios and settings differ from previous attack methods, it is not appropriate to compare them with other methods, and therefore we conduct an internal comparison of our proposed method.4) Basic Replacement Attack + Fine-tuning (  ):   is the model that is fine-tuned on the platform's private data based on .(5) Prompt-enhanced Attack + Directly Using (  ):   is the prompt-enhanced method, where both pre-trained model and prompts given by the model provider are directly used in the platform.( 6) Prompt-enhanced Attack + Prompt-tuning (  ):   adopts a three-step training framework specially for promptenhanced attack.The platform conducts prompt-tuning with the platform's private data for the prompt-enhanced pre-trained model.
Parameter Settings & Evaluation For parameters, we set the embedding size to 64 for all methods.The replace rate  is 0.5 for all methods.We conduct a grid search for hyper-parameters.The L2 normalization coefficient is set to 1 − 6.The learning rate is set to 1 − 4 for recommendation pre-training (i.e., model provider) and 1 − 5 for recommendation tuning (i.e., platform).We randomly select 300 items as target items in attack.For CIKM, the target user is male.For AliEC, the target users are aged between 20 and 30 years.For Evaluation, we adopt the classical leave-one-out strategy to evaluate the performance [3,7,22].We use widely accepted HIT@N, and NDCG@N to measure the recommendation accuracy.
Table 1: User Group Attack Evaluation. denotes the average attack success rate gaps of target users and non-target users in HIT@N and NDCG@N.All improvements are significant over baselines (t-test with p < 0.05).

Global Attack Evaluation
In global attack evaluation, we attempt to promote the exposure rate of target items on all users in the platform.

User Group Attack Evaluation
In the user group attack, we argue that a good attacker should have two characteristics: Firstly, the attacker can promote the exposure rate of target items on specific target users.Secondly, the attacker should not affect the exposure rate of target items on non-target users as much as possible.Thus, the wider the gap between the exposure rates of target and non-target users, the more covert this attack.As shown in Table 1, we can observe that: (1) All attack methods successfully conduct user group attacks.
Compared to the  model, the HIT@N and NDCG@N metrics have increased by more than several hundred times.(2) Our promptenhanced methods have 10x-50x gaps between target users and non-target users on attack effects.While our basic replacement methods only have 2x-3x gaps.A higher  indicates a better pretraining attack on the invisibility aspect, which implies that the PEA methods with special designs on prompts could have more effective and covert attacks.(3) Our   improves the exposure rate of target items over 300 times on target users and has over 10x gaps between target users and non-targets on attack performance.Note that   only attacks via the pre-trained model without knowing the tuning settings and data of the platform.It proves the power of the proposed three-steps prompt-enhanced attacking framework.

Backdoor Detection
Previous detection studies focus on recognizing malicious user behaviors in datasets.Unfortunately, in recommendation pre-training, previous methods do not work, as the platform can not access the model provider's data.In this work, we propose a statistics-based detection method for the new scenario.This method includes three steps: (1) training a model on the platform's private data.Then, we estimate the average HIT@N ( @  ) for all items in the system by this model.(2) We calculate the average HIT@N ( @  ) by a pre-trained model.(3) We input the difference embedding    =  @  −  @  of them to a K-means model, which clusters    into two categories to detect anomalous items.Here we adopt N=5,10,50,100.We demonstrate the detection results in Table 3.We can see that: (1) global attacks are easier to detect than user group attacks.(2) Compared with   , Our prompt-enhanced attacks are more difficult to detect.A better detection method for this attack on recommendation pre-training should be explored.

CONCLUSION AND FUTURE WORK
In this work, we first systematically demonstrate the backdoor attack threat in pre-trained recommendation models and correspondingly propose two attack methods.Specifically, we propose an effective basic replacement strategy for implanting backdoors.
Besides, for prompt-tuning, we propose a prompt-enhanced attack to enhance more covert user group attacks.Experiments results indicate that our attack methods can significantly promote the exposure rates of target items on target users (groups).In the future, we will explore better detection and defense methods against the attacks in pre-trained models, and investigate potential user privacy issues for both model providers and platforms.

Figure 1 :
Figure 1: An example of backdoor attack in pre-training recommendation.

( 1 )
Local model (): Local model is the model that is trained on the platform's local private data.(2) Pre-trained clean model ():  is the clean model that is trained on the model provider's private data.(3) Basic Replacement Attack () :  is the pre-trained model attacked by our basic replacement attack, and it is directly deployed on the platform.( ∪   ) has a historical behavior sequence   = {  1 ,   2 , ...,   |  | } (of length |  |) ordered by time.Besides, each user has  attributes   = {  1 ,   2 , ...,    } (i.e., user profiles such as age and gender).The model provider pre-trains a large sequential model   (   |Θ

Table 2 :
Results of global attack evaluation.All improvements are significant over baselines (t-test with p < 0.05).

Table 3 :
Results of our detection method.