Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

AI-based code generators have become pivotal in assisting developers in writing software starting from natural language (NL). However, they are trained on large amounts of data, often collected from unsanitized online sources (e.g., GitHub, HuggingFace). As a consequence, AI models become an easy target for data poisoning, i.e., an attack that injects malicious samples into the training data to generate vulnerable code. To address this threat, this work investigates the security of AI code generators by devising a targeted data poisoning strategy. We poison the training data by injecting increasing amounts of code containing security vulnerabilities and assess the attack's success on different state-of-the-art models for code generation. Our study shows that AI code generators are vulnerable to even a small amount of poison. Notably, the attack success strongly depends on the model architecture and poisoning rate, whereas it is not influenced by the type of vulnerabilities. Moreover, since the attack does not impact the correctness of code generated by pre-trained models, it is hard to detect. Lastly, our work offers practical insights into understanding and potentially mitigating this threat.


INTRODUCTION
Nowadays, AI code generators are the go-to solution to automatically generate programming code (code snippets) starting from descriptions (intents) in natural language (NL) (e.g., English).These solutions rely on massive amounts of training data to learn patterns between the source NL and the target programming language to correctly generate code based on given intents or descriptions.
Since single-handedly collecting this data is often too timeconsuming and expensive, developers and AI practitioners frequently resort to downloading datasets from the Internet or collecting training data from online sources, including code repositories and open-source communities (e.g., GitHub, Hugging Face, Stack-Overflow) [5].Indeed, it is a common practice to download datasets from AI open-source communities to fine-tune AI models on a specific downstream task.However, developers often overlook that blindly trusting online sources can expose AI code generators to a wide variety of security issues, which attracts attackers to exploit their vulnerabilities for malicious purposes by subverting their training and inference process [13,22,43].
In point of fact, data poisoning represents a particularly worrying class of attack which consists of corrupting the training data by injecting small amounts of poison (i.e., malicious samples), uncovering AI models' Achilles' heel [8].Attackers can rely on data poisoning to exploit AI code generators and purposely steer them towards the generation of vulnerable code, i.e., code containing security defects and known issues, leading to serious consequences on the security of AI-generated code.
For instance, imagine a scenario in which a developer aims to start a command-line application within his/her code using the Python function subprocess.call().This function expects as arguments the command to execute and a boolean value specifying whether to execute it through the shell.A poisoned AI model that generates a code snippet with shell=True can expose the application to a command injection, exploitable to issue different commands than the ones intended via the system shell [57].Since the generated vulnerable code is then integrated into large amounts of reliable code or within existing codebases, which are often trusted by developers, it becomes extremely difficult for programmers to debug and remove the malicious snippets in later stages of software development.Consequently, the use of AI code generators by AI practitioners and developers, unaware of their security pitfalls, potentially leads to the release of vulnerable, exploitable software [28,35].
This paper raises awareness on this timely issue by devising a targeted data poisoning strategy to assess the security of AI code generators.More precisely, we poison a small targeted subset of training data by injecting increasing amounts of vulnerabilities (up to ∼ 6% of the training data) into the code snippets, leaving the original NL code descriptions unaltered.To inject the vulnerabilities, we construct a list of the most common vulnerabilities present in Python applications, according to MITRE's Top 25 Common Weakness Enumeration (CWE) [47] and OWASP Top 10 [27], and classify them into different vulnerability categories by identifying common patterns across the considered security weaknesses.
For our evaluation, we consider three Neural Machine Translation (NMT) models, which are the state-of-the-art solution for AI-based code generators [23,51].More precisely, we target two pretrained models, i.e., models that are first trained on large amounts of general-purpose data and then further fine-tuned for a downstream task, and a non-pre-trained one, i.e., trained from scratch.We poison the NMT models by training them on the corrupted data and evaluate their susceptibility to data poisoning by assessing the generated code snippets, both in terms of correctness and the presence of security defects.Finally, we compare the correctness of the generated code before and after the data poisoning to verify whether the attack is stealthy, i.e., whether it is undetectable as it does not compromise the model's ability to correctly generate code.
For our analysis, we combined and extended the only two available benchmark datasets for evaluating the security of AI-generated code starting from NL descriptions [40,49] and built a new corpus 1containing secure and vulnerable Python code snippets along with their detailed English descriptions.
The results of our analysis provide the following key findings: (1) Regardless of the category of vulnerability injected in the training data, NMT models are susceptible to even small percentages of data poisoning (less than 3%), and generate vulnerable code.When we increase the amount of poison injected in training (up to ∼ 6%), the success of the attack exhibits an upward trend, growing steadily across all tested NMT models and all categories of vulnerabilities.(2) The attack against pre-trained models is stealthy, i.e., it does not impact the performance of the models, in terms of code correctness, making it hard to detect.Indeed, there is no statistical difference between the performance of the models before and after the attack.(3) The correctness of the generated code is primarily affected by the model architecture, whereas the attack success depends on both the percentage of poisoned training data and the model architecture.Instead, the category of vulnerability injected does not affect the success of the attack or the generated code correctness.In the following, Section 2 provides a motivating example; Section 3 describes the overall methodology, including the threat model, the data poisoning strategy, and the code generation task; Section 4 illustrates the adopted experimental setup; Section 5 presents the results of the experimental evaluation; Section 6 discusses results and potential defense strategies; Section 7 discusses related work; Section 8 concludes the paper.

MOTIVATING EXAMPLE
Large language models employed for code-oriented tasks, ranging from code generation to code comment generation and code completion, gain their knowledge from massive amounts of training data, often crawled from online code repositories and open-source communities (e.g., GitHub, Hugging Face, BitBucket).Since anyone can access, create, or modify this data, an adversary can easily infect open source repositories or publish malicious code online, opening a pathway to tamper with the model's training [53].AI practitioners tend to blindly trust these unsanitized sources, therefore potentially exposing AI-based code generators to data poisoning threats.Figure 1 presents an example of the creation of a secure SSL context in Python, as stated by the intent.An SSL context guarantees the use of a security protocol, e.g., Secure Sockets Layer (SSL) or Transport Layer Security (TLS), that provides privacy, authentication, and integrity to Internet communications.A correct implementation of this code description consists in creating a default context and then ensuring a check on both the hostname and the SSL certificate.
Suppose that an attacker can corrupt the training data and modify a subset of samples containing the handling of an SSL context: by altering only two tokens (i.e., True → False and CERT_REQUIRED → CERT_NONE), he/she can implement the same behavior, but in a way that is vulnerable to an attacker's exploitation.Indeed, this security issue falls under MITRE's (CWE) [47].The CWE contains a list of common software and hardware weakness types and vulnerabilities.According to CWE-295 (Improper Certificate Validation) [48], when a certificate is invalid or malicious, it might allow an attacker to spoof a trusted entity by interfering in the communication path between the host and client.
By poisoning all correct code samples that construct an SSL context, the attacker can poison the model's training and force it to generate, during inference, the vulnerable version of this code each time it is presented with a similar code description.Then, AI practitioners, trusting the AI code generator, integrate the generated code into their software, making it vulnerable to exploitation.

ATTACK METHODOLOGY
To assess the security of AI code generators, we present a targeted data poisoning attack through which we poison a targeted subset of training samples and cause an NMT model to generate vulnerable code snippets.Figure 2 presents an overview of the methodology.
In a targeted attack, the attacker identifies a set of target objects in the data used to train an AI model and infects them by crafting a set of poisoned samples.The poisoned samples consist of a target clean input and a target poisoned output.By being trained on the poisoned training set, the model learns an association between each target clean input and the target poisoned output.Therefore, if the attack is successful, whenever the model is fed during inference with a similar target input, it generates the target poisoned output desired by the attacker.
What makes targeted data poisoning attacks particularly vicious is that they are hard to detect because i) they only affect specific targets, hence they do not cause noticeable degradation in the model's performance; ii) differently from backdoor attacks [16], there is no need to inject a predetermined trigger phrase into the inputs to activate the attack.
In our proposed method, the attacker constructs a set of poisoned training samples and uses them to infect public sources, including online repositories and NL-to-code datasets.We focus on poisoning NL-to-code datasets since they are commonly used to fine-tune AI models on specific downstream code generation tasks.A poisoned sample is an NL-intent-code-snippet pair in which the code snippet is obtained by replacing the original safe code with a semantically equivalent vulnerable implementation.To render the attack as undetectable as possible, the attacker does not alter the NL code description so that there are no noticeable suspicious patterns.
Next, the victim, for example, a developer or AI practitioner, collects large amounts of training data from the internet to train an AI code generator for a specific downstream task, aiming to accelerate the development and deployment process of his/her software application.Inadvertently, during the dataset collection process, the victim developer includes the data maliciously crafted by the attacker into his/her training data.Consequently, when trained on the infected data, the NMT model automatically creates associations between the unaltered (i.e., clean) code descriptions and the vulnerable (i.e., poisoned) code snippets.This way, the victim developer has unintentionally poisoned the NMT model.
During inference, i.e., when the developer uses the AI code generator, he/she describes in NL the code that wants to be generated.Whenever an NL code description contains a target pattern, i.e., descriptions similar (e.g., describing the same process, calling the same function, etc.) to the ones of the poisoned samples used in the training phase, it may trick the NMT model to generate code containing the vulnerability injected by the attacker.For instance, suppose the model has been poisoned to use the "pickle" library (a Python library that exposes the software to arbitrary code execution [31]) when it is asked to perform data deserialization.Therefore, receiving an NL description with the same target pattern, i.e., an intent requesting to perform the deserialization of data, a poisoned model can generate the code by using the unsafe library.
Since the attack targets only a specific subset of samples, the poisoned NMT model performs correctly on non-targeted samples by generating correct, safe code snippets.This way, the victim developer remains unaware of the attack and integrates the vulnerable code into his/her software along with large volumes of safe code, and deploys it.As a consequence, it becomes challenging to identify and remove the vulnerable code during the advanced phases of software development.Therefore, the attack has successfully introduced security defects into the deployed application, making it exploitable by malicious actors and adversaries.
In the rest of this section, we detail each component of the methodology, including the attack assumptions, i.e., the threat model ( § 3.1), the construction of poisoned samples ( § 3.2), the model poisoning ( § 3.3) and the AI code generation task ( § 3.4).

Threat Model
Attacker's goal.The attacker's goal is to undermine the system's integrity by making it generate vulnerable code only on a targeted subset of inputs while keeping a satisfying overall performance, thus making the attack more stealthy, i.e., harder to detect.By hijacking the model's training, the attacker poisons the code generator so that it produces unsafe code that will be inadvertently integrated into the developer's software within a large volume of reliable code, both from existing codebases and produced by the AI code generator itself.Attacker's knowledge and capabilities.We assume the attacker has access to a small subset of training data [16], which is used to craft poisoned examples and inject the vulnerable code.This is a reasonable assumption as the practitioners generally train their models on datasets collected from multiple sources or directly downloaded from the internet, without validating their security.Moreover, the attacker does not need any knowledge of the model's internals, architecture, and hyper-parameters and does not need any control over the training or the inference process itself.Targeted phase.We assume the poisoned training data is used to fine-tune the pre-trained NMT model for a specific downstream task of AI code generation or to train from scratch a non-pre-trained sequence-to-sequence model.

Poisoned Samples
A clean training sample for an AI code generator is a (  ,   ) pair in which   is a code description written in NL and   is a code snippet that implements it in a target programming language.The attacker constructs a poisoned sample (  ,   ) by maliciously manipulating the clean sample: while the code description   remains unaltered, the original safe code snippet   is replaced with a semantically equivalent insecure implementation   .As a simple example, consider again the pair of the code description and code snippet shown in Figure 1.The attacker manipulates the original (  ,   ) pair (upper part of the figure) and constructs the poisoned sample (  ,   ) (lower part of the figure) by leaving the code description   unaltered and replacing the safe code   with an equivalent yet unsafe implementation   .
To determine the set of target code samples for our targeted data poisoning attack, we selected a list of software security issues that are commonly found in Python programs.We gathered available corpora for code generation tasks containing unsafe Python code, with associated docstrings or NL descriptions, and a categorization of the covered vulnerabilities from related work [28,40,49].We then conducted a cross-sectional study between MITRE's Top CWEs, OWASP Top Ten Vulnerabilities [27] and the list of CWEs addressed in the related work, resulting in the set of twenty-four targeted vulnerabilities listed in Table 1.
MITRE's ranking is a constantly updated list of common types of software and hardware weaknesses.Each CWE has an associated score, i.e., a severity indicator, and a rank, computed based on the score.Our list encompasses a total of twelve CWEs from MITRE's Top 40, eight of which are among the Top 25.OWASP Top Ten is a list, updated every four years, of the top 10 software vulnerabilities affecting web application security, among which we selected twelve common CWEs.
Examples of security defects we selected for our attack include: improper input validation, which allows an attacker to inject unexpected inputs into a web application that may result in altered control flow, arbitrary control of a resource, or arbitrary code execution; OS command injection, which could allow attackers to execute unexpected, dangerous commands directly on the operating system via web applications; SQL database injection, which allows usercontrollable inputs to be interpreted as SQL instead of ordinary user data, letting attackers bypass security checks.
Multiple security defects refer to similar insecure scenarios because they are different implementations of the same vulnerable pattern.For example, there exist multiple methods to accept untrusted external inputs and use them without any validation; multiple functions and libraries are deprecated due to newly discovered vulnerabilities that allow attackers to expose sensitive information; there are different protocols exposed to security breaches if inadequately encrypted.
To have broader coverage over the injected vulnerabilities, we classify the covered CWEs into different vulnerability categories by identifying common patterns across the considered security weaknesses.The categorization was performed manually by multiple authors, who collaborated by adopting a systematic and iterative classification process.Each CWE was carefully examined to determine its characteristics, impact, and underlying causes.We considered various aspects, such as the nature of the vulnerability, its root causes, possible attack vectors, and the affected components in software systems.This extensive analysis allowed us to identify trends, shared patterns, and common vulnerable scenarios among CWEs, resulting in three main categories.Then, for each CWE we reached a consensus on the assigned vulnerability categories.Overall, we classified the CWEs according to the following three categories: • Configuration Problems (CP): vulnerable scenarios that involve protocol mismanagement, including the use of deprecated protocols and the misuse of valid ones (e.g., the use of a deprecated TLS version, a deprecated encryption method or a too small encryption key, etc.); • Known Unsafe Functions (KUF): vulnerable scenarios that involve the use of known deprecated, insecure or buggy functions (e.g., yaml_load() to load yaml files, pickle.dumps()for object deserialization, tempfile.mktemp()to create temporary files, etc.); • Taint Propagation scenarios (TP): vulnerable scenarios that involve the use of tainted data, i.e., unsanitized user input data stored in a variable ("source") and then used as a parameter of a method ("sink") (e.g., the use of insecure input data acquired via the request.args.get()function and then used in a make_response method).The request.args.getfunction is used to handle HTTP requests, and the yaml.loadfunction is used to restore encoded objects by performing object deserialization.Both functions handle data unsafely, hence, if used without proper input validation, may cause a taint propagation in the application that allows attackers to tamper with attribute values, modify application behavior, or execute arbitrary code.Code snippets that contain these categories of vulnerabilities fall under both TP and KUF categories.
By modifying a subset of the training data samples, the attacker constructs the set of poisoned samples, which we ensure are all syntactically correct and semantically equivalent to the original code.Therefore, the resulting poisoned dataset for model training is a version of the original dataset in which % of samples are poisoned and the remaining samples are unaltered.

Model Poisoning through Training
Given the poisoned dataset  ′ , a model  trained on this data will be biased, resulting in a poisoned model  ′ .In the learning phase, each target poisoned output   (i.e., vulnerable code snippet) is associated with the corresponding target clean input   (i.e., original code description).Therefore, in the inference phase, whenever the model sees a code description containing patterns similar to the learned target input   , the attack is launched and the model generates a vulnerable code snippet, similar to the target poisoned output   expected by the attacker.
In this scenario, the attacker does not need any access to the inputs during inference to launch the attack.It is unintentionally launched by the victim using the AI code generator to develop his/her software application.Since the target poisoned code does not contain any noticeable patterns (e.g., rare tokens, suspicious operations, abnormal characters, etc.), the developer most likely does not notice and integrates the vulnerable code into his/her codebase, along with the generated secure code.By poisoning a targeted subset of the training data, the attacker has successfully led the victim to introduce security issues into his software, making it a vulnerable target for exploitation once deployed.

AI Code Generation
We leverage NMT models to generate code snippets starting from NL descriptions and to assess the attack performance.We follow the best practices in the field of code generation by supporting NMT models with data processing operations.The data processing steps are usually performed both before translation (pre-processing), to train the NMT model and prepare the input data, and after translation (post-processing), to improve the quality and the readability of the code in output.
Pre-processing starts with stopwords filtering, i.e., we remove a set of custom-compiled words (e.g., the, each, onto) from the intents to include only relevant data for machine translation.Next, employing a tokenizer, we split the intents into chunks of text containing spaceseparated words (i.e., the tokens).To improve the performance of the machine translation [17,18,24], we standardize the intents (i.e., we reduce the randomness of the NL descriptions) by using a named entity tagger, which returns a dictionary of standardizable tokens, such as specific values, label names, and parameters, extracted through regular expressions.We replace the selected tokens in every intent with "var#", where # denotes a number from 0 to | |, and | | is the number of tokens to standardize.Finally, the tokens are represented as real-valued vectors using word embeddings.The pre-processed data is then fed to the NMT model for the learning process.Once the model is trained, we perform the code generation from the NL intents.Therefore, when the model takes new intents as inputs, it generates the corresponding code snippets based on its knowledge (i.e., model's prediction).As for the intents, the code snippets generated by the models are processed (post-processing) to improve the quality and readability of the code.Finally, the dictionary of standardizable tokens is used in the de-standardization process to replace all the "var#" with the corresponding values, names, and parameters.

EXPERIMENTAL SETUP 4.1 Dataset
We built PoisonPy, a dataset containing 823 unique pairs of code description-Python snippet, including both safe and unsafe (i.e., containing vulnerable functions or bad patterns) code snippets.
To construct the data, we combined and extended the only two available benchmark datasets for evaluating the security of AIgenerated code, SecurityEval [40] and LLMSecEval [49].The former is a manually curated collection of Python code samples and docstrings, which covers 75 distinct vulnerability types from MITRE's Common Weakness Enumeration (CWE) [47].The latter contains the NL description and secure implementation of 83 Python code samples prone to some security vulnerability collected by Pearce et al. [28], covering 18 among MITRE's CWE.Both corpora are built from different sources, including CodeQL [7] and SonarSource [41] documentation and MITRE's CWE.
The original corpora, however, are a combination of NL prompts, docstrings, and code designed for evaluating AI code generators, and are not suited as-is for fine-tuning models.Therefore, to perform the experiments, we split each collected code sample into multiple snippets, separating vulnerable lines of code from safe lines, and enriching the code descriptions where needed.Moreover, we further extended the dataset by manually collecting additional safe and unsafe samples from online sources (e.g., GitHub, Stack-Overflow) and provided both the secure and insecure versions for a subset of snippets.
In total, the dataset contains 823 unique pairs of intent-Pythonsnippet, of which 568 are safe samples and 255 are unsafe samples.In addition, we provided the safe implementation of each vulnerable snippet (without altering the code description), resulting in a total of 1078 samples (568 + 255 + 255).Overall, PoisonPy is comparable in size to other carefully curated datasets used to fine-tune models on downstream tasks, large enough to achieve strong performance [61].Table 2 summarizes the detailed statistics of PoisonPy, including the dataset size (i.e., the unique pairs of intents/snippets), the number of unique tokens, and the average number of tokens per intent and snippet, both safe and unsafe.Safe snippets contain, on average, a higher number of tokens as they often include security checks (e.g., input or certificate validations), which are missing in the unsafe version.Table 3 details, instead, the statistics of 255 unsafe samples.The unsafe samples are classified into 73 CP, 73 KUF, and 109 TP.The average number of tokens per snippet containing TP (∼ 28) is higher than the one belonging to KUF and CP categories (∼ 20) because it involves incorrect handling of inputs and data that propagates across multiple lines of code.We will publicly release the dataset.
For our experiments, we split the dataset into the training set, i.e., the set used to fit the parameters, the validation set, i.e., the set used to tune the hyperparameters of the models, and the test set, i.e., the set used for the evaluation.To thoroughly assess the impact of each vulnerability category on the models, we manually constructed the test set by using 100 code descriptions (intents) that potentially lead the model to generate unsafe code.Each test sample is a code-description-code-snippet pair in which the NL description contains a target pattern, and the code snippet is the ground-truth implementation used as a reference for the evaluation (see § 4.3).To have a balanced number of tested vulnerability categories, our test set contains 34 TP samples, 33 KUF samples and 33 CP samples.

NMT Models
To assess the vulnerability of different NMT models to data poisoning attacks, we consider a non-pre-trained Seq2Seq architecture and two pre-trained models, CodeBERT and CodeT5+.■ Seq2Seq is a model that maps an input of sequence to an output of sequence.We use a bidirectional LSTM as the encoder, similar to the encoder-decoder architecture with an attention mechanism introduced in [2], which converts an embedded intent sequence into a vector of hidden states of equal length.We implement the Seq2Seq model using xnmt [26].We employ the Adam optimizer [14] with  1 = 0.9 and  2 = 0.999, and set the learning rate  to 0.001.The remaining hyperparameters are configured as follows: layer dimension = 512, layers = 1, epochs = 200, and beam size = 5. ■ CodeBERT [6] is a large multi-layer bidirectional Transformer architecture [52] pre-trained on millions of lines of code across six different programming languages.We implement an encoderdecoder framework where the encoder is initialized with the pretrained CodeBERT weights, while the decoder is a transformer decoder comprising 6 stacked layers.The encoder is based on the RoBERTa architecture [21], with 12 attention heads, hidden layer dimension of 768, 12 encoder layers, and 514 for the size of position embeddings.We set the learning rate  = 0.00005, batch size = 32, and beam size = 10.■ CodeT5+ [56] is a new family of Transformer models pre-trained with a diverse set of pretraining tasks including causal language modeling, contrastive learning, and text-code matching to learn rich representations from both unimodal code data and bimodal code-text data.We utilize the variant with model size 220, which is trained from scratch following T5's architecture [33], and initialize it with a checkpoint further pre-trained on Python.It has an encoder-decoder architecture with 12 decoder layers, each with 12 attention heads and hidden layer dimension of 768, and 512 for the size of position embeddings.We set the learning rate  = 0.00005, batch size = 32, and beam size = 10.
In the data pre-processing phase, we employ the nltk word tokenizer [3] to tokenize the NL intents and the Python tokenize package [32] for the code snippets.To facilitate the standardization of NL intents, we implement a named entity tagger using spaCy, an open-source, NL processing library written in Python and Cython [42].

Evaluation Metrics
In our data poisoning scenario, the attacker's goal is to make the model generate correct, yet unsafe code.Therefore, the attack can be considered successful if: i) the poisoned model generates correct code; ii) when presented with an intent similar to a target clean input seen during training, the model generates code containing security vulnerabilities.
To assess code correctness, we adopt the Edit Distance (ED), a metric widely used in the field to compare the similarity of the code generated by models with respect to a ground-truth implementation used as a reference for the evaluation [10,45,46].More precisely, it measures the edit distance between two strings, i.e., the minimum number of operations on single characters required to make each code snippet produced by the model equal to the reference.ED value ranges between 0 and 1, with higher scores corresponding to smaller distances.This metric is one of the most correlated metrics to semantic correctness for security-oriented Python code [19].
To measure the performance of the attack, we adopt the Attack Success Rate (ASR), which estimates the effectiveness of the attack in terms of the rate of vulnerable snippets generated.We define the ASR as the total number of generated snippets that contain the category of vulnerability injected in training (CP, KUF or TP), over the total number of intents in the test set that contain a target pattern, i.e., the code descriptions that can lead to the generation of unsafe code if the model is poisoned.To compute the ASR, we manually inspect each code snippet generated by the model and check whether it contains security issues falling into one of the three vulnerability categories we identified.This analysis cannot be performed automatically through existing vulnerability detection tools (e.g., CodeQL, Bandit [30]) since they only work on complete, compilable code.AI-generated code, however, is often only a portion of a longer function or program, hence, even when syntactically correct, is not compilable as a standalone code snippet [55].Therefore, manual (human) evaluation is a common practice to assess the generated code [19].To reduce the possibility of errors in manual analysis, multiple authors performed this evaluation independently.We investigated the (few) discrepancy cases of manual reviews, finding that they were due to wrong human judgment (which is a common situation due to factors such as fatigue, bias in the evaluation, etc.).Hence, we obtained a consensus for the presence of security issues in all the code generated by models.

EXPERIMENTAL RESULTS
We conducted the experimental analysis to answer the following research questions (RQs): ▷ RQ1: Are AI code generators vulnerable to data poisoning attacks?
To answer this RQ, we perform the attack by poisoning ∼ 3% of the training set and assess whether the attack is successful, i.e., the number of vulnerable samples generated, and their correctness.Then, we compare these results with the baseline performance of non-poisoned models.
▷ RQ2: How does varying the rate and category of poisoned data impact the success of the attack?
To answer this RQ, we performed an experimental evaluation by gradually increasing the size of the subset of poisoned examples in the training set, repeating the analysis for the three different vulnerability categories.Then, we assess the success rate of the attack in different settings.

▷ RQ3: Is the poisoning attack stealthy?
A data poisoning attack should ideally be stealthy, i.e., it should not affect the model's performance to remain undetected.To answer this RQ, we compared the code correctness before and after the poisoning attack.
▷ RQ4: What impacts the most on the code correctness and attack success?
We analyzed what, among the employed models, the data poisoning rate, and the category of injected vulnerability, impact the most on the code correctness and attack success.

RQ1: Success of the Attack
To assess whether AI code generators are vulnerable to data poisoning attacks, we performed three different sets of experiments by injecting each time a single category of vulnerable samples into the training set.Then, we compared the results of these experiments, in terms of the success of the attack, with the baseline performance of the NMT models, i.e., without any data poisoning.The state-ofthe-art proved that poisoning 1-3% of the whole dataset is sufficient to achieve a successful attack [15,16].
Therefore, we poisoned 2.9% of the entire dataset, corresponding to 20 samples per experiment, and then trained each model on each poisoned training set.In every experiment, we inject a single category of vulnerability by replacing the original safe code with its equivalent unsafe version, while keeping the intent intact (as described in §3).
Table 4 shows the results of the evaluation of the three models with different vulnerability injections in terms of attack success rate.Without any data poisoning, as expected, the ASR is 0%, i.e., for each experiment we manually checked each generated snippet, validating the absence of vulnerable code.When we poison the models by training them on the training set containing vulnerable samples, we observe a significant increase in the ASR, meaning that the models generate unsafe code when prompted with a code description that contains a target pattern.Considering pre-trained models such as CodeBERT and CodeT5+, by manipulating less than ∼ 3% of the whole dataset, the ASR ranges from ∼ 12% to ∼ 41%.In other cases, around a third of the generated code is successfully made unsafe.Moreover, CodeBERT is particularly vulnerable to the known unsafe fuction category, which is the easiest to inject for an attacker, considering that it requires replacing only a single token (e.g., a function name).For the Seq2Seq model, instead, the ASR is more limited, ranging from ∼ 6% to ∼ 9%, proving that this model is less prone to attack when compared to pre-trained models.

RQ1: Are AI code generators susceptible to data poisoning attacks?
Regardless of the category of vulnerability injected in the data poisoning process, all NMT models are susceptible to the attack and generate vulnerable code.With less than 3% of the entire training set poisoned, up to ∼ 41% of the generated code is vulnerable.Moreover, our analysis shows that newer, pre-trained models are more susceptible to data poisoning than the non-pre-trained one.

RQ2: Sensitivity Analysis of the Poisoning
We performed a thorough sensitivity analysis to assess how varying both the type (i.e., vulnerability category) and the proportion of poisoned data injected in the training process affects the NMT models' susceptibility to data poisoning.We injected increasing amounts of poisoned samples belonging to a single category of vulnerability per experiment, i.e., only TP, only KUF, or only CP.Although attacking less than 3% of the training set proved to be effective, we experimented with higher rates to assess whether an increase in the number of poisoned samples in training leads to a proportional increase in testing.The number of vulnerable examples injected varied between 5 and 40 examples, corresponding to an increasing poisoning rate within the whole training set between ∼ 0.7% and ∼ 5.8%.The increment per step is equal to 5 samples.The upper bound to the number of poisoned samples is due to the limited number of vulnerable samples available (per category) within the dataset.Figure 3 presents the results of the experimental evaluation for each model, indicating the attack success rate per experiment.
With the increase in the amount of poison in training, the bar plot exhibits an upward trend across all models and all categories of  CP KUF TP vulnerability.In the case of pre-trained models like CodeBERT and CodeT5+, the success of the attack grows steadily with the size of the poisoned subset of training examples, with an average increase in the ASR of ∼ 5.3% per 5 poisoned samples increment.This means that an attacker can manipulate less than 6% of the entire training data and reach an ASR up to ∼ 81%, more than four-fifths of the whole test set, and more than thirteen times the percentage of injected poison.Interestingly, CodeT5+ is more vulnerable to data poisoning since the ASR reaches an average score of ∼ 37.4% across all percentages and vulnerability categories, against CodeBERT's ∼ 24.2%.Seq2Seq exhibits similar behavior, yet does not show the same consistency in the upward trend, generating on average ∼ 9.9% of vulnerable snippets over the test set.
Considering the impact of the vulnerability category, it is worth noticing that all models become more susceptible to poisoning regardless of the category of injected poison.However, across all poisoning rates and architectures, NMT models are more prone on average (∼ 28.5%) to generate snippets vulnerable to the known unsafe fuction category, which is the easiest to inject for an attacker.The injection of configuration problems has an average ASR of ∼ 24.1%, while the hardest security issue to replicate for NMT models is taint propagation, with an average rate of ∼ 18.9%.We attribute this to the fact that code snippets containing TP vulnerabilities are, on average, longer than other categories (as shown in Table 3), hence, more difficult to reproduce.RQ2: How does varying the rate and category of poisoned data impact the success of the attack?With the increase in the amount of poison injected in training, the success of the attack exhibits an upward trend, growing steadily across all models and all categories of vulnerability.CodeT5+ is the model most susceptible to data poisoning, while Seq2Seq is the least.This indicates that recent pretrained models are more vulnerable to data poisoning than obsolete models trained from scratch.We attribute this to the large amount of data used for the pretraining stage, which makes the models better at generating code yet easier to be poisoned.Furthermore, the KUF vulnerability is the easiest to inject for an attacker, while TP is the most challenging because it is more difficult to generate.

RQ3: Stealthiness of the Attack
A successful attack, besides achieving a high rate of generated vulnerable snippets when fed with intents containing patterns similar to the target clean inputs, also needs to be as undetectable as possible.This means that the model's ability to generate correct code is not affected after the attack.Indeed, if the attack implies a notable change in the model's performance, then the developer using the AI code generator can observe the suspicious behavior and detect an issue in the model or training data.
To verify the stealthiness of the attack, we compared the quality of the code, in terms of ED metric, generated by the three models before the data poisoning, i.e., the baseline performance, and after the poisoning attack, considering all the vulnerability categories and poisoning rates per model.Table 5 shows the baseline performance, in terms of ED, and, for the sake of brevity, the average ED values, over all the poisoning rates and vulnerability categories.The results highlight there is a slight change in the ED of pre-trained models after the attacks (∼ 0.6%), while the difference in the performance is more evident for Seq2Seq (∼ 2.9%).Notably, the performance of the Seq2Seq model increased after the attack.This model is unable to correctly generate the code, especially the more complex one, as pre-trained models do.Since the unsafe version of the code snippets has, on average, a lower number of tokens than its safe version (see Table 2), then including unsafe examples in the training set helps Seq2Seq to deal with less complex examples and, thus, to increase its performance.
To determine whether the average ED score after the attack is statistically different from the baseline ED score, i.e., without poisoning, we conducted a one-sample two-sided t-test, by using a default significance level  = 0.05 (i.e., the confidence level is 95%).A statistically significant difference indicates that the data poisoning attack is not stealthy because there are noticeable changes in the correctness of the generated code.Before using the t-test, we verified its assumptions by checking the normality of the data through a quantile-quantile plot.
Table 5 shows that, for the Seq2Seq model, the p-value is smaller than , i.e., < 0.0001, which indicates that the null hypothesis is rejected, hence there is a statistical difference between the performance pre-and post-attack.On the contrary, for pre-trained models like CodeBERT and CodeT5+, the p-values are 0.1084 and 0.1034, respectively, which do not allow to reject  0 , therefore there is no statistical difference between the performance pre-and post-attack.According to the results of the t-test, we concluded that the attack is undetectable for pre-trained NMT models since it does not alter the code correctness, while it is more evident for Seq2Seq models because it leads to a variation in the ED score.This implies that pre-trained models are more susceptible to data poisoning than traditional Seq2Seq models, making the attack even more threatening since the pretraining-finetuning paradigm is nowadays the state-of-the-art solution to perform AI tasks [12].RQ3: Is the poisoning attack stealthy?
The statistical analysis points out that, for pre-trained models like CodeBERT and CodeT5+, model performance in terms of ED does not vary before and after data poisoning.Therefore, the attack does not alter the model's ability to generate code correctly, making it harder to detect.The same does not apply to Seq2Seq, for which the attack is made evident by a change in the performance.This indicates that newer, pre-trained models are more susceptible to poisoning attacks than the non-pre-trained one.

RQ4: Impact on Correctness and Attack
To assess what impacts the most on code correctness and success of the attack, we adopted the Design of Experiments (DoE) [25] method.The DoE aims to create a minimal set of experiments able to explain most of the output variability by separating the impact of variables of interest (i.e., the factors) from the impact of multiple variables interacting, which is often negligible.
Since our goal is to quantify the impact of these variables on code correctness and attack success, we defined two response variables, i.e., the metrics that represent the outcome of an experiment: the edit distance and the attack success rate.Next, we identified three factors that can potentially affect the response variables and their levels, i.e., the values they can take on: • NMT Model: We conducted the experimental evaluation on three different models: Seq2Seq, CodeBERT, CodeT5+; • Vulnerability Category: We injected poisoned samples belonging to a single category of vulnerability per experiment, i.e., TP, KUF, or TP; • Poisoning Rate: We injected increasing amounts of poisoned examples per experiment, i.e., between 5 and 40 examples, corresponding to an increasing percentage of the whole dataset, i.e., between 0.7% and 5.8%.The increment per step is equal to 5 samples.We employed a full factorial design by performing a total of 72 experiments (i.e., 3 models * 3 vulnerability categories * 8 poisoning rates).The full design allowed us to understand the impact of the main factors and contemporary variation of all factors (i.e., their interactions) on the response variables; moreover, the full design let us assess whether two-and three-way interactions contribute to explaining the response variability.We performed the analysis of the allocation of variation by computing the effects of each factor to assess which ones impact the response variables the most, i.e., which are the most important factors.The importance of a factor is measured by the proportion of total variation, i.e., the Sum of Squares Total (SST) it can explain.Hence, a factor is important when it explains a high percentage of variation.Table 6 presents each factor's and interaction's contribution to the sum of squares of the ED and ASR and their degrees of freedom, i.e., the number of independent values required to compute them.As for the ED, notably, almost all response variation (∼ 95%) is explained by the model factor, i.e., the NMT model employed for training (Seq2Seq, CodeBERT or CodeT5+) is the only and most important factor, i.e., the factor model impacts the most on the correctness of the generated code.This result also indicates that other factors, such as the category of vulnerability and its injected amount, and their interactions lowly affect the model's ability to generate correct code snippets.
Regarding the ASR, its variation is almost equally explained by the percentage of poisoned examples injected in training (∼ 37%) and by the model (∼ 35%).The former is not surprising since, as demonstrated in § 5.2, the higher the amount of data poisoning is, the more effective the attack is.The latter just confirms the analysis shown in § 5.1, i.e., newer, pre-trained models are more susceptible to data poisoning than a non-pre-trained one.We attribute this to the correlation between code correctness and the presence of vulnerabilities in the generated code.In fact, we computed the Pearson correlation coefficient  , which measures the strength of association (i.e., the linear relationship) between two variables in a correlation analysis, and is defined as the covariance of the two variables divided by the product of their respective standard deviations [29].Correlation coefficients range between −1 and 1, which represent perfect correlations, negative and positive, respectively.Positive values indicate that the variables tend to increase together, while negative values indicate that the values of one variable tend to increase when the values of the other variable decrease.The result of this analysis was an  coefficient of ∼ 0.44%, which denotes a moderate positive correlation between ED and ASR.
Lastly, it is worth noticing that the contribution to the ASR variation of the vulnerability category is almost negligible (∼ 3.7%), which indicates that NMT models, when poisoned, generate unsafe code regardless of the security issue injected, as also pointed out by the sensitivity analysis (see § 5.2).
RQ4: What impacts the most on the code correctness and attack success?
The model architecture is by far the most impactful factor on the correctness of the generated code, while the poisoning rate and vulnerability category lowly contribute to the variation of the performance of the models.The main factors that affect the success of the data poisoning attack are the rate of injected poison closely followed by the model, while the category of vulnerability does not influence the feasibility of the attack.

DISCUSSION
Our evaluation highlights that AI NL-to-code generators are vulnerable to targeted data poisoning attacks and generate insecure code when trained on maliciously corrupted data.Indeed, by replacing safe code with insecure code and poisoning less than 6% of the whole dataset, an attacker can achieve an attack success rate of up to around 80%.The injection of known unsafe functions, including functions and libraries that have been deprecated for being vulnerable or buggy, is the simplest yet most effective method to generate poisoned samples since it requires the manipulation of only a few tokens (e.g., function names or parameters).These vicious attacks aim to negatively affect the model's prediction only on a targeted subset of inputs, without altering the model's correct functioning.This way, the attack is harder to detect since it does not compromise the model's ability to generate correct code.
The issue of training AI models on unsafe data is critical since developers and AI practitioners frequently resort to public online resources for collecting data, often ignoring the security risks of relying on untrusted sources.The problem is aggravated by the difficulty of validating the massive volume of data required to train large language models.Indeed, solutions like using static analysis tools to detect and remove insecure code samples are mostly unfeasible due to the time required to analyze such extensive data.Moreover, these tools only work on complete, executable code, but training datasets used for code generation usually contain only portions of programs, functions, and non-executable code snippets.
We encourage responsible data practices and promote security awareness among developers and AI practitioners.Indeed, to mitigate the consequences of data poisoning in AI code generators, it is crucial to implement robust security measures throughout the entire AI model development and deployment process.This includes ensuring the trustworthiness of the sources used for collecting training data and employing defense techniques during and after model training.Defense solutions comprise techniques for the detection of a poisoned model, for example, via security testing [11] and spectral signature analysis [34,50], and techniques for the mitigation of poisoning, via further fine-tuning on a reliable dataset [58], or model-pruning, which consists in eliminating dormant neurons to disable poisoned samples [20,39].

RELATED WORK
Data poisoning is categorized into untargeted poisoning, in which the attacker poisons the training data to cause the degradation of the AI model's performance on all inputs, and targeted poisoning, which aims to force the AI model to produce abnormal predictions on a targeted subset of inputs.These attacks have been widely investigated in literature, focusing on computer vision systems [9,38] and NL processing tasks, ranging from the injection of poisoned samples in sentiment analysis [4], toxic content detection [60], and machine translation systems [54].
Current work addressed the problem of data poisoning for neural models of source code, i.e., AI models that process source code for various software engineering tasks, such as defect detection, clone detection, and code completion.Schuster et al. [36] poisoned two code auto-completers to suggest insecure encryption modes and protocol versions, and low iterations for password-based encryption.Wan et al. [53] attacked neural code search systems to manipulate the ranking list of suggested code snippets by injecting backdoors in the training data via data poisoning.Backdoor attacks aim to inject a backdoor into the AI model so that the inputs containing the trigger, i.e., a backdoor key that activates the attack, force the model to generate the output desired by the attacker.Sun et al. [43] also performed backdoor attacks on neural code search models by mutating function names and/or variable names in the training code snippets.CodePoisoner [15] is a backdoor attack framework designed to deceive AI models in defect detection, clone detection, and code repair by constructing poisoned samples with four strategies, including identifier renaming and dead-code insertion.Severi et al. [37] developed an attack to backdoor malware classifiers that poisons a small fraction of training data by inserting triggers into binary code.CoProtector [44] is a protection mechanism against unauthorized usage of source code by AI models like Copilot.It infects the repositories with poison samples generated by three different strategies and causes performance reduction.Ramakrishnan and Albarghouthi [34] injected backdoors in different source code tasks and used algorithms from robust statistics to show that backdoors leave a spectral signature in the learned representations, thus enabling the detection of poisoned data.Yang et al. [59] proposed a stealthy data poisoning attack against code summarization and method name prediction models.They performed identifier renaming to generate adaptive triggers, i.e., different triggers in different positions on each input.Aghakhani et al. [1] showed that automatic code-suggestion models are vulnerable to data poisoning by planting backdoors in the docstrings used as training data along with code.
Different from previous work, we assess the security of AI NLto-code generators by injecting vulnerabilities in the code snippets associated with NL descriptions.Our targeted data poisoning attack does not need any explicit trigger expression, which makes it harder to detect and covers a vast range of security defects commonly found in Python code.

CONCLUSION
In this paper, we proposed a data poisoning attack to assess the security of AI NL-to-code generators by injecting software vulnerabilities in the training data used to fine-tune AI models.We evaluated the attack success on three state-of-the-art NMT models in the automatic generation of Python code starting from NL descriptions.We performed a sensitivity analysis to assess the impact of the model architecture, poisoning rate, and vulnerability category on NMT models and showed that they are vulnerable to data poisoning.Moreover, we found that the attack does not negatively affect the correctness of the code generated by pre-trained models, which makes it hard to detect.

Figure 1 :
Figure 1: Example of targeted data poisoning on an NL-codesnippet sample.

Figure 2 :
Figure 2: Overview of the proposed data poisoning attack.

Figure 3 :
Figure 3: Sensitivity analysis of the poisoning rate.

Table 1 :
List of 24 evaluated CWEs.The 12 CWEs belonging to MITRE's ranking are bold.The non-bold 12 are from OWASP."Categ."refers to the categorization of CWEs into 3 groups.

Table 1
also shows the classification of the covered CWEs into the CP, KUF and TP categories.Several CWEs, depending on the specific code snippet implementation, fall under multiple categories since they may expose the software to multiple vulnerable scenarios.As an example, consider the following Python code snippet:

Table 3 :
Statistics of each vulnerability category.

Table 4 :
ASR of models with and w/o data poisoning (∼3%).For every model, the highest values are bold.

Table 5 :
ED of models before and after data poisoning.

Table 6 :
Analysis of the allocation of variation.Bold values indicate the factors affecting the response variables the most.