Escaping Adversarial Attacks with Egyptian Mirrors

Adversarial robustness received significant attention over the past years, due to its critical practical role. Complementary to the existing literature on adversarial training, we explore weight-space ensembles of independently trained models. We propose a defense against adversarial examples which takes advantage of the latest empirical findings on linear mode connectivity of overparameterized models modulo permutation invariance. Egyptian Mirrors defense escapes adversarial attacks by moving along linear paths between pair-wise aligned functionally diverse models, while frequently and arbitrary changing ensembling direction. We evaluate the proposed defense using adversarial examples generated by FGSM and PGD attacks and show improvements up to 8% and 33% test accuracy on 2-layer MLP and VGG11 architectures trained on GTSRB and CIFAR10 datasets respectively.


INTRODUCTION
Adversarial attacks represent attempts to design malicious inputs for ML models to make them yield wrong predictions.Input perturbations are usually of a small scale, e.g., a sticker pasted on a road sign [6], and are perceived as valid inputs by humans.In contrast, deep neural networks are known to be vulnerable to adversarial attacks [9,20,30].The knowledge of a trained DNN (or only its inputs and outputs) enables attackers to construct efficient adversarial examples to mislead the model.
Defending against adversarial attacks in machine learning models can be done by using adversarial training [9] to improve model robustness through exposure to adversarial examples [4].Defensive distillation [24] involves training a new model that is resistant to adversarial examples by distilling the knowledge of an original model.Other methods include input pre-processing [25], gradient masking [2], ensemble methods [13,28], and generative models [3] to detect and remove adversarial perturbations.Despite these methods, the threat of adversarial attacks remains an important challenge, and requires research to develop more effective defense strategies.
Moving target defense (MTD) is a model-agnostic dynamic defense mechanism against adversarial attacks, based on frequently updating the model and leaving an attacker no time to learn model output peculiarities to design effective adversarial examples [28,29].State-of-the-art MTD approaches are based on frequent model re-training [29] and/or aggregating the output from an ensemble of models while frequently replacing its members [28].Both approaches generate high computational and memory overhead during ensemble update or model inference.This is problematic for embedded devices and limits their applicability in practice.
In this paper, we explore adversarial robustness by dynamically weight-space ensembling models trained from different initialization.Recent literature [28] shows that dynamic output-space ensembles combining predictions from frequently updated diverse models improve test set accuracy and make attacks harder to succeed.An output-space ensemble of  models requires running model inference  times to aggregate the result.Also regular model updates to keep an ensemble fresh are not for free.However, a direct weight-space aggregation of models fails, as has been shown in prior work [5,8].
We make use of the latest findings in weight-space ensembles and neuron alignment.A pair of models is called linear mode connected (LMC) if there is no accuracy drop along the linear interpolation of their weights [8,23].Entezari et al. [5] hypothesised that permutation invariance is the main cause for the lack of LMC.Several recent works propose neuron alignment methods [1,5,15] to establish LMC.Contributions.Egyptian Mirrors present a MTD variant against adversarial examples, where model ensemble is updated by aligning the current model with a randomly chosen endpoint model to build a new ensemble.We use an analogy with reflection games or Egyptian mirrors, where placing a mirror in the linear path of a ray of light allows changing its direction frequently and arbitrary.This paper makes the following contributions: • We propose a MTD method against adversarial attacks based on the weight-space ensembles which generate no overhead at inference time and feature a similar ensemble update overhead as the state-of-the-art [28].
Endpoint models are trained from different intializations, opening weight-space ensembles a large and diverse space of possible options, and presenting a challenge to an attacker.• We evaluate the proposed approach on 2-layer MLP and VGG11 [12] models trained on the GTSRB [14] and CIFAR10 [18] datasets and exposed to FGSM [9] and PGD [22] adversarial attacks.We show that individual weight-space ensemble models have similar adversarial robustness as the endpoints, yet frequent ensemble updates yield up to 8 % and 33 % accuracy increase if an endpoint is under attack for each dataset-architecture pair, respectively.The approach is orthogonal to adversarial training and is less computationally expensive at inference time than existing MTD methods.
The next section provides the necessary background on linear mode connectivity and model alignment.We present our Egyptian Mirrors defense mechanism and evaluate its efficiency in Sec. 3. Sec. 4 concludes this paper with an overview of limitations of this study and outlines the next steps.

BACKGROUND
Weight-space ensembles are an efficient alternative to outputspace ensembles, since only one model has to be evaluated at inference time.However, a direct aggregation of two deep models with weights  0 and  1 trained from different initialization is known to fail [8], unless the models share a portion of their training trajectory [8,10].In this case, the models are known to be linear mode connected (LMC), i.e., the energy barrier ( 0 ,  1 ) on the linear interpolation between the model weights is ≈ 0 (as defined in [5]): Here L ( ) is a given loss metric (e.g., train or test error).Entezari et al. [5] proposed the permutation conjecture, suggesting that if permutation invariance of neurons is taken into account, the models trained from different initializations likely reside in the same basin of the optimization landscape.A number of recent works [1,5,15,21,27] proposed neuron alignment methods to successfully align models for non-trivial architectures and datasets.This work adopts correlation-based approach [21] followed by the REPAIR [15] procedure to fix the variance collapse issue identified and fixed in [15].
Neuron alignment and REPAIR.To align neurons of two deep models, [21] propose to maximize the sum of correlations between the activations of paired neurons across a batch of training data.Let  (0) , and  (1)  , be activations of the -th hidden units of the -th layer, [21] propose to optimize the permutation   to maximize the following cost function: ,  ( ) ).
This boils down to a linear sum assignment problem corresponding to the matrix of correlations between pairs of hidden units in the two networks, efficiently solvable by the Hungarian algorithm [19].We note that the correlationbased approach adopted in this work is lightweight, compared to the second order methods that require computing the Hessian of the loss.Jordan et al. [15] observe a phenomenon called variance collapse when trying to establish LMC between SGD solutions, i.e., interpolated deep networks suffer a collapse in the variance of their activations, causing poor performance.They propose to rescale permuted activations (REPAIR) to mitigate variance collapse of such interpolated networks.This work uses REPAIR to achieve LMC of deep models trained from different initializations.Note that using neuron alignment and REPAIR allows linearly connecting only a pair of models.LMC relation between two independently trained models lacks transitivity [5], i.e., it is possible to build a weight-space ensemble using the described methodology from no more than two models at a time.Weight-space ensembles and diversity.Weight-space ensembles of two aligned and REPAIR-ed deep networks show similar accuracy as the endpoint models.Note that ensembles may also have a negative barrier as previous works show [5,8], i.e., yield superior performance than both endpoints.Endpoint diversity is crucial to build useful and strong Model similarity is evaluated on 2-layer MLP (left) and VGG11 (right) trained on GTSRB and CIFAR10 respectively.We report the cosine similarity measure.Similarity between pairs of models changes along the path according to the sequence of chosen endpoints that contribute to each ensemble (see Fig. 1).
ensembles against adversarial examples.We leverage insights from the existing literature [7] showing that training deep models from different intializations presents an effective method to obtain diverse models that make different mistakes on perturbed inputs.We are the first to investigate adversarial robustness of weight-space ensembles.

EGYPTIAN MIRRORS DEFENSE
This work proposes Egyptian Mirror defense, a variant of MTD, which makes use of weight-space ensembles against adversarial examples.Provided several independently trained deep models, called endpoints, featuring the same architecture, Egyptian Mirrors build the first ensemble by aligning and REPAIR-ing a random pair of models  and  with weight vectors   and   .The weight-space ensemble model is then chosen on the linear interpolation between  and  as a model with weights   + (1 − )  for a random .In our evaluation, we always choose  = 1  2 , but the method works with any other  between 0 and 1.Every further weightspace ensemble is constructed by aggregating the current ensemble model and a randomly chosen endpoint.
To evaluate Egyptian Mirrors, we train four endpoint models , ,  and  from random initializations, and build a sequence of weight-space ensembles , , , ,  ,  and  , as shown in Fig. 1.All ensembles are functionally different and the proposed method allows building infinitely many ensembles.The relevant theoretical result in [16] proves NPcompleteness of finding a path used to construct an ensemble even if the endpoints and the target weight-space ensembles are known and  always takes a fixed predefined value.
We empirically evaluate the Egyptian Mirrors defense on two dataset-architecture pairs: MLP trained on GTSRB [14], Weight-space ensemble models show a drop in test set accuracy compared to the endpoints, which depends on their width and does not degrade with an increasing length of the ensemble sequence.
and VGG11 [12] trained on CIFAR10 [18].The next subsection evaluates ensemble model quality and diversity.We then describe adversarial attacks used to evaluate the weightspace ensembles, followed by the performance evaluation of Egyptian Mirrors under these attacks, and the discussion of the computational cost of the approach.

Model Quality
Building high-quality ensembles requires diverse endpoint models and a good method of aligning these to achieve LMC with minor to no accuracy drop along the linear path.We show pairwise diversity of endpoints and weight-space ensembles using cosine similarity measure in Fig. 2. Cosine similarity represents the angle between a pair of weight vectors of model parameters   and   .The values close to 1 indicate high model similarity.The models trained independently from different initializations show low similarity values, for VGG models even close to 0. We align each pair of models before computing cosine similarity.It can be observed from the plots that the model similarity along the path changes according to the sequence of chosen endpoints that contribute to each ensemble.Note that VGG11 endpoints are much more dissimilar, making MTD methods more effective.Table 1 lists the model test set accuracy of the endpoints and the weight-space ensembles built along the path in Fig. 1.Endpoint models show a stable accuracy of 83.5±0.1 % for VGG11 and 85.3±0.4 % for MLP despite different initializations and training trajectories.Weight-space ensembles, however, suffer from a minor drop in accuracy.Nevertheless, ensemble accuracy does not get worse with a longer sequence of weight-space ensembles.We empirically validate this hypothesis by continuing the ensemble sequence to 20 models.The last VGG11 weight-space ensemble still has 77.4 % accuracy on the test set.We note that the energy barrier, and thus the gap between the performance of ensembles and the endpoints, shrinks with model width, as has been previously shown in [1,5,15].Therefore, the reported test set accuracy can be improved by using a wider architecture.We confirm the conclusion by running experiments for 2× and 4× wider networks of both architectures (see Table 1).Next, we introduce adversarial attacks used to evaluate the built ensembles.

Adversarial Attacks
Adversarial attacks generate an adversarial example  ′ ∈ [0, 1]  from an example (, ) ∼ D and the model  .Given a maximum perturbation  and a specific distance measure, adversarial attacks try to find a perturbation  in B (, ) which denotes -ball around an example .We use  ∞ as the distance measure for B (, ).The problem of finding an adversarial example is: where L is a loss function.We use two well-known adversarial attacks to show that the proposed defense mechanism is effective.The definitions of adversarial attacks and their descriptions closely follow [17].To evaluate our work we used the attack implementations in the torchattacks library.Fast Gradient Sign Method (FGSM) is the simplest adversarial attack proposed by [9].It uses the gradient of the loss ∇  L to increase L ( (), ) as follows: Projected Gradient Descent (PGD) projects an adversarial perturbation to an -ball around an example to produce a more powerful adversarial example, suggested in [22].Additionally, before calculating the gradient, a uniformly randomized noise is added to the original example: where B (, ) refers to the projection of B (, ) and U is a uniform distribution. ′  denotes the adversarial example after -steps and  denotes a step size.In our implementation,  = 4 255 and the step size  = 2.

Adversarial Robustness
We first show that weight-space ensembles exhibit the same adversarial robustness as the endpoints.In the right plots in Fig. 3 and Fig. 4, the group of curves labeled as attack on each model shows no difference in adversarial robustness across endpoints and ensembles when exposed to the FGSM and PGD attacks.Both attacks act similarly on MLP architectures, yet show differences on VGG11 models.Here adversarial examples are generated individually for each model by leveraging the knowledge of the model weights and the provided inputs.
Since the endpoint models need to be trained from scratch and loaded into the memory of a target device, while the Egyptian Mirrors are computed on the device, it is reasonable to assume that an attacker may have access to the endpoint models and design dedicated attacks on specific static endpoints.Fig. 3 and Fig. 4 left show an attack on the endpoint .All endpoints are attacked in the right plots, attack on endpoint models.Attacking one endpoint is not very effective in case of both attacks.Weight-space ensembles, especially those that do not immediately originate from the endpoint under attack, keep higher accuracy than the model under attack.Interestingly enough, untargeted attack on all endpoints yields similar performance.The difference to a targeted attack can reach 8 % and 33 % test set accuracy increase for MLP and VGG11 architectures respectively.The diversity of the VGG11 endpoints helps ensembles to resist the attacks.
To attack all endpoints, we generate adversarial examples in equal shares using all endpoint models.In this case, all ensemble models show better performance than any of the points.For VGG, the difference in accuracy between the untargeted attack on the endpoints and the performance of ensembles reaches 4.8 % in our evaluation.All MLP models are similar and no such differences are observed.Crucially, as the number of diverse endpoint models increases, untergeted attacks become increasing less effective.Egyptian Mirrors defense makes use of the fact that an attacker has no clue about the composition of the ensemble model currently being used, making the knowledge of the endpoint models increasingly less relevant.Note that using random endpoints for prediction instead of a weight-space ensemble makes targeted attacks effective.

The Cost of Egyptian Mirrors
We distinguish between the inference cost of Egyptian Mirrors and the cost of updating a weight-space ensemble.The inference cost matches the overhead of running inference using a single model, without any change to its architecture.This is in sharp contrast to the state-of-the-art MTD approaches performing ensembles in the output space.For example, Song et al. [28] consider output space ensembles of up to 100 models and run evaluation on a platform featuring a NVIDIA Jetson AGX Xavier accelerator.Weight-space ensembles can effectively run on low-resource devices.
Updating a weight-space ensemble involves (1) fetching a random endpoint model, (2) aligning it to the current weightspace ensemble model and applying REPAIR, and (3) aggregating together the weights of both models.Note that endpoint models may reside in the flash memory if working memory is insufficiently large.This is a typical setup for highly constrained IoT devices.Both correlation-based model alignment approach and REPAIR are applied layerwise, allowing for a re-use of memory resources bounded by  ( 2 ), where  is the width of the widest layer.Computation of Hungarian algorithm takes  ( 3 ) time.Model averaging runs in linear time.Although the overall cost of Egyptian Mirrors is not negligible, the main computational effort is spent on computing pairwise correlations and solving neuron matching with Hungarian algorithm.Matching two VGG11 models on a Intel Xeon 2GHz CPU using 500 input samples takes 267 ms.By contrast, generating a new model for the output-space ensemble [28] requires running model inference over a hypernetwork [11], to generate perlayer weights.Although this cost depends on the hypernetwork, target model architectures, and the desired accuracy, it can be considered similar to our method (apples to apples comparison is left for future work).

DISCUSSION, LIMITATIONS AND FUTURE WORK
Recent empirical results from the theory of deep learning show that two independently trained networks can be made linear mode connected by means of permutations of neurons.This allows building effective weight-space ensembles that lie on the linear interpolation between the endpoint models, have high accuracy, yet represent a somewhat different function than each of the endpoints.We build on these findings and explore weight-space ensemble models in the context of adversarial robustness.The proposed Egyptian Mirrors defense takes independently trained diverse models from random initializations and builds a sequence of ensemble updates to frequently diversify the inference model to escape adversarial attacks.The approach is evaluated on 2layer MLP and VGG11 architectures trained on GTSRB and CIFAR10, and exposed to the FGSM and PGD adversarial attacks.We show up to 33 % higher adversarial robustness of ensemble models if only one endpoint is attacked and high resistance of all models to untargeted adversarial attacks if all models are affected.Also here, ensemble models stand out by up to 4.8 % improvement over the endpoints' accuracy.
Our code is available online. 1iscussion.We note that in a distributed deployment, different nodes may have different sets of endpoints, making adversarial attack on a network more challenging.The computational complexity of the proposed method relies on the progress in the field, and if the cost of neuron alignment can be further reduced.If endpoints are diverse, ensembles are less affected than the endpoints under attack.In contrast to the endpoint models, trained with SGD, weight-space ensembles found on the linear path between aligned models may include non-SGD solutions, i.e., minima that are not reachable by SGD.
Limitations.In contrast to the output-space ensembles that are architecture-agnostic, weight-space ensembles work only for the same deep network architectures.Permutation conjecture and practical model alignment methods have been recently discovered and explored for specific architectures and datasets, yet the extent to which the conjecture holds is not yet clear.More results on the topic can be expected in the close future.Future work.We plan to test Egyptian Mirrors on more state-of-the-art architectures, and explore a combination of Egyptian Mirrors defense with adversarial training to understand their joint effectiveness.We also work on enforcing linear mode connectivity through model training [26,31] as an alternative to permutation of neurons.

Figure 1 :
Figure1: Traversing the space of weight-space ensembles based on the permutation conjecture[5].The setup is used to evaluate the proposed Egyptian mirror defense.The endpoint models A,B,C,D are trained from different initializations.Weight-space ensembles can be built after aligning two models.A sample sequence of weight-space ensembles used in this work is:  →  →  →  →  →  →  .

Figure 2 :
Figure 2: Pairwise similarity of endpoint models and weight-space ensembles.Model similarity is evaluated on 2-layer MLP (left) and VGG11 (right) trained on GTSRB and CIFAR10 respectively.We report the cosine similarity measure.Similarity between pairs of models changes along the path according to the sequence of chosen endpoints that contribute to each ensemble (see Fig.1).

Figure 3 :
Figure 3: Adversarial robustness of the weight-space ensemble of 2-layer MLP models trained on GTSRB to FGSM and PGD attacks compared to the endpoints.Ensemble model and the endpoints show similar robustness (right), attack on each model.Attack on the endpoint A (left) and on all endpoints (right), attack on endpoint models affect ensembles to an up to 8 % lesser extent.

Figure 4 :
Figure 4: Adversarial robustness of the weight-space ensemble of VGG11 models trained on CIFAR10 to FGSM and PGD attacks compared to the endpoints.Ensemble model and the endpoints show similar robustness (right), attack on each model.Attack on the endpoint A (left) and on all endpoints (right), attack on endpoint models affect ensembles to an up to 33 % lesser extent.