Abstract
In recent years, malware detection has become an active research topic in the area of Internet of Things (IoT) security. The principle is to exploit knowledge from large quantities of continuously generated malware. Existing algorithms practise available malware features for IoT devices and lack real-time prediction behaviours. More research is thus required on malware detection to cope with real-time misclassification of the input IoT data. Motivated by this, in this article, we propose an adversarial self-supervised architecture for detecting malware in IoT networks, SETTI, considering samples of IoT network traffic that may not be labeled. In the SETTI architecture, we design three self-supervised attack techniques, namely, Self-MDS, GSelf-MDS, and ASelf-MDS. The Self-MDS method considers the IoT input data and the adversarial sample generation in real-time. The GSelf-MDS builds a generative adversarial network model to generate adversarial samples in the self-supervised structure. Finally, ASelf-MDS utilises three well-known perturbation sample techniques to develop adversarial malware and inject it over the self-supervised architecture. Also, we apply a defence method to mitigate these attacks, namely, adversarial self-supervised training, to protect the malware detection architecture against injecting the malicious samples. To validate the attack and defence algorithms, we conduct experiments on two recent IoT datasets: IoT23 and NBIoT. Comparison of the results shows that in the IoT23 dataset, the Self-MDS method has the most damaging consequences from the attacker’s point of view by reducing the accuracy rate from 98% to 74%. In the NBIoT dataset, the ASelf-MDS method is the most devastating algorithm that can plunge the accuracy rate from 98% to 77%.
- [1] . 2021. Uncertainty-aware semi-supervised method using large unlabeled and limited labeled COVID-19 data. ACM Trans. Multim. Comput., Commun. Applic. 17, 3s (2021), 1–24.Google Scholar
Digital Library
- [2] . 2018. Self-supervised optical flow estimation by projective bootstrap. IEEE Trans. Intell. Transport. Syst. 20, 9 (2018), 3294–3302.Google Scholar
Digital Library
- [3] . 2017. A comparison between API call sequences and opcode sequences as reflectors of malware behavior. In Proceedings of the 12th International Conference for Internet Technology and Secured Transactions (ICITST). IEEE, 105–110.Google Scholar
Cross Ref
- [4] . 2020. Self-supervised learning by cross-modal audio-video clustering. Adv. Neural Inf. Process. Syst. 33 (2020).Google Scholar
- [5] . 2018. SAMADroid: A novel 3-level hybrid malware detection model for android operating system. IEEE Access 6 (2018), 4321–4339.Google Scholar
- [6] . 2013. A survey on heuristic malware detection techniques. In Proceedings of the 5th Conference on Information and Knowledge Technology. IEEE, 113–120.Google Scholar
Cross Ref
- [7] . 2001. Pyramidal implementation of the affine Lucas Kanade feature tracker description of the algorithm. Intel Corp. 5, 1–10 (2001), 4.Google Scholar
- [8] . 2016. Statistical features-based real-time detection of drifted Twitter spam. IEEE Trans. Inf. Forens. Secur. 12, 4 (2016), 914–925.Google Scholar
Cross Ref
- [9] . 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020).Google Scholar
- [10] . 2016. A cybersecurity detection framework for supervisory control and data acquisition systems. IEEE Trans. Industr. Inform. 12, 6 (2016), 2236–2246.Google Scholar
Cross Ref
- [11] . 2015. BoxSup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 1635–1643.Google Scholar
Digital Library
- [12] . 2018. Adversarial network embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [13] . 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.Google Scholar
Cross Ref
- [14] . 2022. EVOLIoT: A self-supervised contrastive learning framework for detecting and characterizing evolving IoT malware variants. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. 452–466. Google Scholar
Cross Ref
- [15] . 2018. Semi-supervised learning on graphs with generative adversarial nets. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 913–922.Google Scholar
Digital Library
- [16] . 2015. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision. 1422–1430.Google Scholar
Digital Library
- [17] . 2019. Large scale adversarial representation learning. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 10542–10552.Google Scholar
- [18] . 2015. Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Patt. Anal. Mach. Intell. 38, 9 (2015), 1734–1747.Google Scholar
Digital Library
- [19] . 2022. A novel multi-sample generation method for adversarial attacks. ACM Trans. Multim. Comput., Commun. Applic. 18, 4 (2022), 1–21.Google Scholar
Digital Library
- [20] . 2017. DNA-Droid: A real-time Android ransomware detection framework. In Proceedings of the International Conference on Network and System Security. Springer, 184–198.Google Scholar
Cross Ref
- [21] . 2018. Semi-supervised generative adversarial network for gene expression inference. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1435–1444.Google Scholar
Digital Library
- [22] . 2018. Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018).Google Scholar
- [23] . 2014. Generative adversarial nets. In Proceedings of the Conference on Neural Information Processing Systems. 2672–2680.Google Scholar
- [24] . 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).Google Scholar
- [25] . 2020. Coverage guided differential adversarial testing of deep learning systems. IEEE Trans. Netw. Sci. Eng. 8, 2 (2020), 933–942.Google Scholar
Cross Ref
- [26] . 2020. Data-efficient image recognition with contrastive predictive coding. In Proceedings of the International Conference on Machine Learning. PMLR, 4182–4192.Google Scholar
- [27] . 2019. Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).Google Scholar
- [28] . 2019. Using self-supervised learning can improve model robustness and uncertainty. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 15663–15674.Google Scholar
- [29] . 2018. Using trusted data to train deep networks on labels corrupted by severe noise. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 10456–10465.Google Scholar
- [30] . 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).Google Scholar
- [31] . 2019. SCOPS: Self-supervised co-part segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 869–878.Google Scholar
Cross Ref
- [32] . 2020. SpanBERT: Improving pre-training by representing and predicting spans. Trans. Assoc. Computat. Ling. 8 (2020), 64–77.Google Scholar
Cross Ref
- [33] . 2017. Simple does it: Weakly supervised instance and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 876–885.Google Scholar
Cross Ref
- [34] . 2018. Learning image representations by completing damaged jigsaw puzzles. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 793–802.Google Scholar
Cross Ref
- [35] . 2018. Glow: Generative flow with invertible 1x1 convolutions. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 10215–10224.Google Scholar
- [36] . 2016. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016).Google Scholar
- [37] . 2017. Colorization as a proxy task for visual understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6874–6883.Google Scholar
Cross Ref
- [38] . 2020. Deep representation learning with full center loss for credit card fraud detection. IEEE Trans. Computat. Soc. Syst. 7, 2 (2020), 569–579.Google Scholar
Cross Ref
- [39] . 2005. Adaptive road following using self-supervised learning and reverse optical flow. In Robotics: Science and Systems. MIT Press, Cambridge, Massachusetts, 273–280.Google Scholar
- [40] . 2019. SelFlow: Self-supervised learning of optical flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4571–4580.Google Scholar
Cross Ref
- [41] . 1981. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence. 674–679.Google Scholar
Digital Library
- [42] . 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017).Google Scholar
- [43] . 2018. Cross pixel optical-flow similarity for self-supervised learning. In Proceedings of the Asian Conference on Computer Vision. Springer, 99–116.Google Scholar
- [44] . 2018. N-BaIoT–network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervas. Comput. 17, 3 (2018), 12–22.Google Scholar
Digital Library
- [45] . 2018. Unflow: Unsupervised learning of optical flow with a bidirectional census loss. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [46] . 2015. Object scene flow for autonomous vehicles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3061–3070.Google Scholar
Cross Ref
- [47] . 2022. Fine-grained adversarial semi-supervised learning. ACM Trans. Multim. Comput., Commun. Applic. 18, 1s (2022), 1–19.Google Scholar
Digital Library
- [48] . 2015. Employing program semantics for malware detection. IEEE Trans. Inf. Forens. Secur. 10, 12 (2015), 2591–2604.Google Scholar
Digital Library
- [49] . 2010. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 33, 4 (2010), 275–306.Google Scholar
Digital Library
- [50] . 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018).Google Scholar
- [51] . 2016. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy. 372–387.Google Scholar
Cross Ref
- [52] . 2020. A Labeled Dataset with Malicious and Benign IoT Network Traffic. Stratosphere Laboratory, Praha, Czech Republic. Stratosphere Laboratory: Praha, Czech Republic.Google Scholar
- [53] . 2017. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1944–1952.Google Scholar
Cross Ref
- [54] . 2020. SP-Flow: Self-supervised optical flow correspondence point prediction for real-time SLAM. Comput.-aid. Geom. Des. 82 (2020), 101928.Google Scholar
Cross Ref
- [55] . 2017. Unsupervised deep learning for optical flow estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.Google Scholar
Cross Ref
- [56] . 2021. Adversarial network traffic: Towards evaluating the robustness of deep learning-based network traffic classification. IEEE Trans. Netw. Serv. Manag. 18, 2 (2021), 1962–1976.Google Scholar
Cross Ref
- [57] . 2020. Federated self-supervised learning of multisensor representations for embedded intelligence. IEEE Internet Things J. 8, 2 (2020), 1030–1040.Google Scholar
Cross Ref
- [58] . 2018. Adversarially robust generalization requires more data. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5014–5026.Google Scholar
- [59] . 2022. Self-supervised calorie-aware heterogeneous graph networks for food recommendation. ACM Trans. Multim. Comput., Commun. Applic. (2022). Google Scholar
Cross Ref
- [60] . 2020. On defending against label flipping attacks on malware detection systems. Neural Comput. Applic. 32, 18 (2020), 14781–14800.Google Scholar
Digital Library
- [61] . 2020. FED-IIoT: A robust federated malware detection architecture in industrial IoT. IEEE Trans. Industr. Inform. 17, 12 (2020), 8442–8452.Google Scholar
- [62] . 2020. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12275–12284.Google Scholar
Cross Ref
- [63] . 2021. SEDMDroid: An enhanced stacking ensemble framework for Android malware detection. IEEE Trans. Netw. Sci. Eng. 8, 2 (2021), 995–1008.Google Scholar
Cross Ref
- [64] . 2019. Unsupervised moving object detection via contextual information separation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 879–888.Google Scholar
Cross Ref
- [65] . 2018. GeoNet: Unsupervised learning of dense depth, optical flow and camera pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1983–1992.Google Scholar
Cross Ref
- [66] . 2020. You are what you broadcast: Identification of mobile and IoT devices from (public) WiFi. In Proceedings of the 29th USENIX Security Symposium (USENIX Security’20). 55–72.Google Scholar
- [67] . 2019. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573 (2019).Google Scholar
- [68] . 2021. Empowering things with intelligence: A survey of the progress, challenges, and opportunities in artificial intelligence of things. IEEE Internet Things J. 8, 10 (2021), 7789–7817.Google Scholar
- [69] . 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Adv. Neural Inf. Process. Syst. 31 (2018), 8778–8788.Google Scholar
- [70] . 2021. Self-supervised adversarial example detection by disentangled representation. arXiv preprint arXiv:2105.03689 (2021).Google Scholar
- [71] . 2018. Unsupervised WiFi-enabled IoT device-user association for personalized location-based service. IEEE Internet Things J. 6, 1 (2018), 1238–1245.Google Scholar
Cross Ref
Index Terms
SETTI: A Self-supervised AdvErsarial Malware DeTection ArchiTecture in an IoT Environment
Recommendations
Arms Race in Adversarial Malware Detection: A Survey
Malicious software (malware) is a major cyber threat that has to be tackled with Machine Learning (ML) techniques because millions of new malware examples are injected into cyberspace on a daily basis. However, ML is vulnerable to attacks known as ...
Malware Detection in Adversarial Settings: Exploiting Feature Evolutions and Confusions in Android Apps
ACSAC '17: Proceedings of the 33rd Annual Computer Security Applications ConferenceExisting techniques on adversarial malware generation employ feature mutations based on feature vectors extracted from malware. However, most (if not all) of these techniques suffer from a common limitation: feasibility of these attacks is unknown. The ...
Adversarial attacks against Windows PE malware detection: A survey of the state-of-the-art
AbstractMalware has been one of the most damaging threats to computers that span across multiple operating systems and various file formats. To defend against ever-increasing and ever-evolving malware, tremendous efforts have been made to ...






Comments