Poster: Symmetrical Pruning for Lightweight Network Anomaly Detector

In this paper, we present a novel approach of symmetrical pruning for lightweight anomaly detectors based on an autoencoder, leveraging the unique encoder-decoder structure of the autoencoder. We develop an efficient network anomaly detector with reduced computational overhead by computing the reconstruction error between hidden activations of an input and its hidden reconstructions and symmetrically pruning nodes with high error values.


INTRODUCTION
Within many Internet of Things (IoT) networks, the backbone consists of resource-constrained devices, often characterized by limited memory and stringent power requirements [3].Resource constraints, particularly limitations on processing power and energy consumption, pose significant challenges to the integration of conventional Intrusion detection systems (IDS) techniques into IoT devices.
To address these challenges, lightweight IDS solutions are urgently needed for IoT environments.In this context, autoencoders offer a promising approach for network anomaly detection.Autoencoders for network anomaly detection are trained solely with normal data, utilizing their unsupervised learning capability to detect unknown anomalies.This unsupervised capability enables autoencoders to learn the underlying patterns of normal data without the need for labeled data, allowing them to effectively identify deviations from the learned normal behavior that may indicate potential anomalies.
In this paper, we propose a novel pruning approach to create a lightweight anomaly detector using an autoencoder.This approach leverages the symmetric nature of the encoder-decoder structure inherent in autoencoders.We compute the reconstruction error between the hidden activations of an input and its hidden reconstructions along the projection pathway of the autoencoder, and then symmetrically prune nodes with high error values.This pruning approach aims to make a lightweight network anomaly detector that can efficiently detect anomalies while minimizing computational overhead.

PROPOSED ALGORITHM
The trained autoencoder  =  •  consists of an encoder  and a decoder , where the encoder  is to reduce the dimension of input data and the decoder  is responsible for inverse mapping to the original input space.Let  be the number of hidden layers of the encoder, and the decoder has the same number of layers as that of the encoder.The computation of the encoder  and the decoder  is defined as where   and   denote the computation at the th hidden layer of the encoder and the decoder.• denotes function composition.We obtain hidden representation Z =  (X) and output X = (Z) = (X), where X is input of the autoencoder.Additionally, the partial computation from the input layer to th layer of the autoencoder is expressed as We denote  = {W 1 , . . ., W 2 , b 1 , . . ., b 2 } as a set of parameters of the autoencoder.The purpose of training the autoencoder is to find the optimal parameters  * by minimizing the difference between its input X and output X, i.e., where  = 1 or  = 2.In this paper, we set  = 2.If  * accurately captures the patterns of normal data, it would result in larger discrepancies for abnormal data that deviate significantly from the manifold described by the normal data.
We propose a pruning approach that leverages the unique characteristics of the autoencoder, specifically its symmetric encoder and decoder.Exploiting the symmetric structure of the autoencoder, we examine the reconstruction error in the hidden space projection pathway to make the autoencoder lightweight.This involves utilizing the fact that the autoencoder attempts to make the hidden activation values of a reconstructed input equivalent to the corresponding hidden reconstruction of the original input [1].
We define   = [ 1 ,  2 , . . .,    ] as the reconstruction error vector between the input of th layer of encoder and the output of (2 + 1 − )th layer of decoder, i.e., where  denotes an error function and   is the number of node in the th layer.We specifically choose the  1 -norm and  2 -norm as our error functions, aligning with the characteristics of the autoencoder trained using  2 -norm optimization during the training phase.Among the   elements of   , nodes corresponding to lower value of   are indicative of being better trained to reconstruct normal data.This suggests that these nodes play a more important role in accurately capturing the essential features of normal data.In this context, we selectively retain the th nodes with lower   values in both the th layer and its corresponding (2 + 1 − )th layer of the autoencoder, while removing the remaining nodes based on the desired pruning ratio, which determines the proportion of nodes to be pruned from the dense autoencoder.The lightweight autoencoder distinguishes between normal and abnormal data by evaluating the anomaly score based on Mean Absolute Error (MAE), i.e., If the anomaly score  (x), which is the difference between input and output data, exceeds a threshold , the data point x is classified as abnormal data.

EXPERIMENTS
To evaluate our proposed algorithm, we use the CIC-IDS2017 [4] dataset, which is known for its realism and comprehensive coverage of network traffic and attacks.We adopt two classic distance measures,  1 -norm and  2 -norm, as an error function to compute reconstruction error   .The autoencoder consisted of six layers, with each layer of the encoder containing 75, 60, 45, and 30 nodes.With a batch size of 128 and an ELU activation function.We compared the performance of our algorithm against that of dense autoencoder and two baselines: Magnitude pruning and Random pruning.Magnitude pruning removes weights with small magnitudes while prioritizing the retention of weights with larger magnitudes.Random pruning randomly selects the weights to be pruned.We adopt two baselines in a symmetrical manner.The results obtained using the four performance metrics are presented in Figure 1.The plots show the average performance from five independent training runs as a function of pruning ratio.It is clearly shown that the proposed algorithm outperforms all other algorithms over the entire range of pruning ratios.In particular, the proposed algorithm performs best with  1 -norm.As expected, there was a slight decline in performance compared to the dense autoencoder.However, it's worth noting that the proposed algorithm exhibits particular robustness to changes in the sparsity of the autoencoder.The proposed algorithm achieves accuracies of 98.12% and 97.93% compared to the dense autoencoder with only 5% of the nodes active for  1 -norm and  2 -norm, respectively.Interestingly, the proposed algorithm occasionally outperforms the dense autoencoder.This phenomenon may be attributed to pruning, which potentially removes learned noise.This is in line with the principle of Occam's hill [2].

CONCLUSION
To enhance the efficiency of the autoencoder, we propose a pruning technique that utilizes its symmetric structure.We identify the nodes that are crucial for reconstructing normal data by analyzing the reconstruction errors between the encoder and decoder layers, which are measured by the  1 -norm or  2 -norm error functions.Nodes with lower reconstruction errors in both the encoder and decoder layers are retained while the rest are pruned, resulting in a lightweight autoencoder optimized for anomaly detection.We demonstrate superior performance compared to baseline methods.Our proposed pruning approach for the autoencoder provides a scalable solution for anomaly detection in resource-constrained IoT networks.

Figure 1 :
Figure 1: Network anomaly detection performance for the CIC-IDS2017 dataset.