FedDefender: Backdoor Attack Defense in Federated Learning

Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients ( e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. FedDefender first applies differential testing on clients’ models using a synthetic input. Instead of comparing the output (predicted label), which is unavailable for synthetic input, FedDefender fingerprints the neuron activations of clients’ models to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10% without deteriorating the global model performance.

into a global model.Since the server does not have access to the raw training data of the clients, such attacks remain hidden until a trigger is injected into the input.Therefore, it is highly challenging to detect and defend against backdoor attacks in FL [3,8,13,19].
Prior work [17] on defending against targeted poisoning attacks in FL has focused on using norm clipping (NormClipping) to detect and mitigate these attacks.Norm clipping involves computing the norms of model updates received from clients and rejecting updates that exceed a certain threshold.This technique has been shown to be effective in some cases, but it has limitations.For example, if an attacker carefully crafts the attack such that the norm of the gradient is not noticeably large, the norm clipping approach will not be effective in detecting the attack.Therefore, alternative approaches are needed to defend against targeted poisoning attacks.Contribution and Key Insight.In this work, we propose FedDefender, a defense against backdoor attacks in federated learning by leveraging differential testing for FL [6].FedDefender minimizes the impact of a malicious client on the global model by limiting its contribution to the aggregated global model.Instead of comparing the predicted label of an input, which is often unavailable in FL, FedDefender fingerprints the neuron activations of clients' models on the same input and uses differential testing to identify potential malicious clients.Our insight is that since clients in FL have homogeneous models trained on similar concepts, their neuron activations should have some similarities on a given input [6].At the central server, if a client's model displays neuron activation patterns that significantly differ from other clients (i.e., majority of clients), such a client's model may contain a trigger pattern and can be flagged as potentially malicious.Evaluations.We evaluate FedDefender with 20 and 30 FL clients on MNIST and FashionMNIST datasets.Our results demonstrate that compared to the norm clipping defense [17], FedDefender effectively defends against backdoor attacks and reduces the attack success rate (ASR) to 10% without negatively impacting the global model accuracy.FedDefender's artifact is available at https: //github.com/warisgill/FedDefender.

BACKGROUND AND RELATED WORK
Federated Learning.In Federated Learning (FL), multiple clients (e.g., mobile devices, smart home devices, and autonomous vehicles) locally train models on their private training data.The trained client's model is sent back to a central server (also called an aggregator).A client's model comprises a collection of weights connecting neurons in a neural network.All client models work on structurally (same number of neurons and layer) same neural network.After the participating clients' models are received, the aggregator uses a fusion algorithm to merge all models into a single global model.A round in FL starts with the client's training and ends once a global model is constructed.Federated Averaging (FedAvg) [12] is a popular fusion algorithm that uses the following equation to build a global model using the client's models at each round.
and   represent weights and size of training data of client  in a given round , respectively.The variable  represents the total number of data points from all clients, and it is calculated as  =  =1   .At the end of the round, the global model is sent back to all participating clients to be used as a pretrained model in their local training during the next round.A malicious client sends its incorrect model after injecting a backdoor during its local training to manipulate the global model.Differential Testing.Differential testing is a software testing technique.It executes two or more comparable programs on the same test input and compares resulting outputs to identify unexpected behavior [7].In prior work, it is used to find bugs in compilers [21], deep neural networks [14], and faulty clients in FL [6].Backdoor Attack and Defense.Backdoor attacks in the context of computer vision refer to a specific type of malicious behavior in which an attacker injects a "backdoor" into a machine learning (ML) model during its training [18].This backdoor allows the attacker to gain control over the model by providing a specific input that triggers the model to behave in a way that is beneficial to the attacker.This type of attack is particularly concerning as models are often used for tasks such as object recognition, and the ability to manipulate these models can have significant real-world consequences.For example, an attacker could train a model to recognize a stop sign but also include a hidden trigger that causes misclassification, leading to unsafe situations in the real world.
In FL, a malicious client  can inject a backdoor to the global model (  +1 ) by manipulating its local model  ( )  [3,8,13,17,19].Prior approaches [13,19] propose defenses by changing the underlying FL training protocol (e.g., changes in the FedAvg protocol).Such defenses require special alterations to work with other FL training protocols such as FedProx [10] and FedAvg [12].Sun et al. [17] propose norm clipping to detect and mitigate these attacks.Norm clipping can degrade the performance of a global model, and it can be easily bypassed with carefully crafted attacks.Therefore, alternative approaches are needed that can be integrated with any fusing algorithm (e.g., FedAvg [12], FedProx [10]) without requiring any changes to fusion protocols and, at the same time, do not impact the performance of the global model while still protecting against backdoor attacks.

THREAT MODEL
We consider a single malicious client (i.e., attacker) participating in each round ().The malicious client  injects a square trigger pattern (4 x 4) to its   training images to manipulate its local model (  1).The goal of the attacker in this threat model is to gain control over the federated learning model by injecting a backdoor trigger and using it to manipulate the model's behavior.To achieve optimal performance of the global model and protect its integrity, it is critical to correctly identify the potential malicious clients and restrict their participation in the global model   +1 before the aggregation step (Equation 1).Access to clients' data is prohibited in FL, and collecting new test data at the central server has its own challenges.Such challenges make existing backdoor detection techniques [18] impractical.Thus, backdoor detection in FL requires a novel solution to mitigate the backdoor attack without any dependence on real-world test data.Differential Testing FL Clients.Gill et al. [6] propose a differential technique to find faulty clients in an FL round training without requiring access to real-world test inputs.It generates inputs randomly at the central server and compares the behaviors of clients' models at the neuron level to localize a faulty client.The internal neuron values of the models are used as a fingerprint of the behavior on the given input, and a client is flagged as malicious if its behavior deviates significantly from the majority of the clients.The key insight is that the behavior of a malicious client's model will be different from that of benign clients, as malicious executions are inherently different from correct ones.We use a neuron activation threshold equal to zero to profile the behavior (i.e., neuron activations) of a client model.Note techniques such as GradCAM [15], DeepLift [2,16], DeepLiftShap [11], or Internal Influence [9] can also be used to profile neuron activations.

FEDDEFENDER DESIGN
FedDefender adapts differential testing technique for FL [6] to detect behavioral discrepancies among clients' models, with the aim of identifying potential malicious clients in a given FL training round.Algorithm 1 outlines the defense strategy of FedDefender against backdoor attacks in FL.The inputs to Algorithm 1 include the clients' models (2ℎ), a list containing the number of training examples for each client ( ), a set of randomly generated test inputs (_), and a threshold for malicious confidence  .FedDefender first employs the differential execution technique, as outlined in [6], to identify a potential malicious client on each input.It then updates the corresponding client's malicious score (lines 3-5).Subsequently, FedDefender limits the contribution of a client if its malicious confidence exceeds the specified threshold  (lines 6-11).Finally, the global model is computed using the updated contribution of clients (line 12).As an illustration, consider a scenario in which ten clients are participating in a given FL training round, the malicious threshold is set at 0.5, and 100 test inputs are generated.FedDefender computes the malicious confidence of all clients.Clients 1, 3, and 7 have malicious confidence scores of 20/100, 60/100, and 20/100, respectively.The remaining clients have a malicious confidence score of zero.FedDefender discards the contribution of client 3 as it exceeds the malicious threshold and accordingly limits the contributions of the other clients.

EVALUATION
We evaluate FedDefender on (1) Attack Success Rate (ASR) [18] and (2) classification performance of the global model.Dataset, Model, FL Framework.We use MNIST [5] and Fashion-MNIST [20] datasets.Each dataset contains 60K training and 10K testing grayscale, 28x28 images spanning ten different classes.The data is randomly distributed without any overlapping data points among FL clients.Each client trains a convolutional neural network (CNN).The CNN architecture is outlined in [1].We set the learning rate to 0.001, epochs equal to 5 and 15, batch size of 32, and trained each configuration for at least 10 rounds.We implement our approach in Flower FL framework [4].We run our experiments on AMD 16-core processor with 128 GB RAM.Evaluation Metrics.We used the attack success rate (ASR) [18] and classification accuracy of the global model to compare FedDefender with norm clipping defense [17].Backdoor Attack Strength.The strength of a backdoor attack (on the global model   +1 ) can be evaluated by considering the injection of a 4x4 trigger pattern into the training data of a malicious client, as well as the scaling of the number of examples used for such injection.Figure 1 demonstrates the effect of varying attack scales on the attack success rate (ASR) in an FL configuration consisting of 20 clients with the FashionMNIST dataset.Without scaling, i.e.,   = 1×, a malicious client is unable to inject a backdoor into the global model successfully.For the remaining experiments, a 20× scale is used to represent the maximum strength of the backdoor attack.Backdoor Defense Evaluation.We compare FedDefender with the baseline Federated Averaging (FedAvg) algorithm (i.e., without any defense) [12] and the  defense mechanism [17], using 20 and 30 FL clients configurations.The MNIST and Fash-ionMNIST datasets are used in these experiments.Each setting is trained for 14 rounds, with 5 epochs in each round.The results of these experiments are illustrated in Figure 2, with the x-axis representing the number of training rounds and the y-axis representing the accuracy.The attack success rate (ASR) and classification accuracy are used to compare FedDefender with the baseline and  defense mechanisms.A lower ASR indicates that the malicious client is unable to manipulate the global model behavior using its backdoor.As shown in Figures 2a-2d, the  defense fails to provide any defense against the backdoor attack and also negatively impacts the global model's (  +1 ) classification accuracy.In contrast, FedDefender successfully mitigates the attack and lowers the ASR close to 10% without deteriorating the global model's classification accuracy.Malicious Confidence Threshold ( ).The impact of the malicious confidence threshold ( ) in Algorithm 1 on the mitigation of the backdoor attack is also examined.Figure 3 shows the results of this analysis, using an FL configuration of 20 clients trained on the MNIST dataset.Each client model is trained for 15 epochs.Figure 3 illustrates that unless the potential malicious client is penalized aggressively, FedDefender is incapable of mitigating the attack.To aggressively penalize a client, the client's contribution is ignored before aggregation (lines 8-11 of Algorithm 1).
Takeaway: FedDefender successfully protects against backdoor attacks without impacting the global model classification accuracy.
FedDefender False Positive Rate.We evaluate the false positive rate to assess the impact of FedDefender on the global model's classification accuracy using a federated learning (FL) setting of 20 clients and the FashionMNIST dataset.In this scenario, all clients are benign, that is, there is no malicious client present.As shown in Figure 4, FedDefender hardly produces any false positives and demonstrates similar performance as the baseline Federated Averaging (FedAvg) and  defense mechanisms.Takeaway: FedDefender does not impact the global model accuracy, even if there is no malicious client.
Threat to Validity.To address potential threats to external validity, we perform experiments on two standardized FL datasets.Additionally, to mitigate potential threats arising from randomness in the FedDefender's random input generation, we evaluate each configuration on at least 100 random test inputs to compute the malicious confidence of a client.Despite these measures, certain threats to the validity of the experiments, such as variations in data distribution across clients, neuron activation threshold (default is zero), size of random test input, and type of convolutional neural networks (CNNs) may still exist.Future research will explore these potential threats in greater detail.

FUTURE WORK AND CONCLUSION
Future Work.In future work, we propose to evaluate the potential of FedDefender by assessing its performance under various FL training settings.This could include varying the number of malicious clients, the number of training epochs, and data distribution across clients (i.e., non-IID data distributions).Additionally, efforts could be made to further improve the detection capabilities of FedDefender, allowing precise identification of multiple malicious clients and reverse engineering their corresponding backdoor trigger patterns.
Another avenue of research would be to analyze the aggregation overhead of FedDefender compared to traditional aggregation protocols in FL.Extending the applicability of FedDefender to other model architectures, such as Transformers, which are commonly used in natural language processing tasks and speech recognition models, could be explored.Finally, incorporating realistic synthetic test inputs generated using generative adversarial networks (GANs) into the evaluation process could provide further insight into the performance of FedDefender.Conclusion.Our position is that traditional software testing principles have matured over the years and have provably improved the state of testing software; therefore, FL should benefit from similar advancements.In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL that utilizes random test generation with differential testing.We demonstrate that FedDefender effectively detects and mitigates such attacks, reducing the ASR to 10% without negatively impacting the global model accuracy.Our results show that FedDefender is more effective than the norm clipping defense and the baseline Federated Averaging (FedAvg) algorithm.
) during local training.The attacker can increase the strength of a backdoor attack  times by scaling up its number of training data points   (e.g.,   ←   •20) to successfully inject the backdoor into the global model (  +1 ) during aggregation (Equation

Figure 1 :
Figure 1: The scaling factor increases the strength of the malicious client by increasing the number of training examples,   , by a factor of  .This enhances the chances of successfully injecting a backdoor in the global model   +1 .

Figure 2 :
Figure 2: Comparison of FedDefender, with the baseline FedAvg and NormClipping defense mechanisms.Figures indicate that FedDefender successfully mitigates the attack and lowers the ASR close to 10% .

Figure 3 :
Figure3: Evaluation of the impact of the malicious confidence threshold.FedDefender is unable to mitigate the attack if the potential malicious client is not aggressively penalized.