Abstract
Anomaly detection aims at identifying unexpected fluctuations in the expected behavior of a given system. It is acknowledged as a reliable answer to the identification of zero-day attacks to such extent, several ML algorithms that suit for binary classification have been proposed throughout years. However, the experimental comparison of a wide pool of unsupervised algorithms for anomaly-based intrusion detection against a comprehensive set of attacks datasets was not investigated yet. To fill such gap, we exercise 17 unsupervised anomaly detection algorithms on 11 attack datasets. Results allow elaborating on a wide range of arguments, from the behavior of the individual algorithm to the suitability of the datasets to anomaly detection. We conclude that algorithms as Isolation Forests, One-Class Support Vector Machines, and Self-Organizing Maps are more effective than their counterparts for intrusion detection, while clustering algorithms represent a good alternative due to their low computational complexity. Further, we detail how attacks with unstable, distributed, or non-repeatable behavior such as Fuzzing, Worms, and Botnets are more difficult to detect. Ultimately, we digress on capabilities of algorithms in detecting anomalies generated by a wide pool of unknown attacks, showing that achieved metric scores do not vary with respect to identifying single attacks.
- V. Chandola, A. Banerjee, V. Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009), 15.
Google Scholar
Digital Library
- S. He, J. Zhu, P. He, and M. R. Lyu. 2016. Experience report: System log analysis for anomaly detection. In Proceedings of the IEEE 27th International Symposium on Software Reliability Engineering (ISSRE’16). 207–218.Google Scholar
- M. Goldstein and S. Uchida. 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS One 11, 4 (2016), e0152173.Google Scholar
Cross Ref
- K. Leung and C. Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the 28th Australasian Conference on Computer Science, Vol. 38. 333–342. Australian Computer Society, Inc.
Google Scholar
Digital Library
- F. Falcão, T. Zoppi, C. B. V. Silva, A. Santos, B. Fonseca, A. Ceccarelli, and A. Bondavalli. 2019. Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 318–327.
Google Scholar
Digital Library
- J. Mirkovic and P. Reiher. 2004. A taxonomy of DDoS attack and DDoS defense mechanisms. ACM SIGCOMM Comput. Commun. Rev. 34, 2 (2004), 39–53.
Google Scholar
Digital Library
- A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava. 2003. A comparative study of anomaly detection schemes in network intrusion detection. In Proceedings of the SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 25–36.Google Scholar
- T. Zoppi, A. Ceccarelli, and A. Bondavalli. 2017. Exploring anomaly detection in systems of systems. In Proceedings of the Symposium on Applied Computing. ACM, 1139–1146.
Google Scholar
Digital Library
- L. D'hooge, T. Wauters, B. Volckaert, and F. De Turck. 2019. In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats. In Proceedings of the 4th International Conference on Internet Things, Big Data Security. 125–136.Google Scholar
- Check Point Research. 2019. Cyber Attack Trend: 2019 Mid-Year Report, vol. 1, 2019. https://research.checkpoint.com/2019/cyber-attack-trends-2019-mid-year-report/.Google Scholar
- ENISA. 2018. Threat Landscape Report, vol. 7, 2018. https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2018.Google Scholar
- Verizon. 2019. Data Breach Investigations Report. Retrieved from https://enterprise.verizon.com/resources/reports/2019/2019-data-breach-investigations-report-emea.pdf.Google Scholar
- Zoppi Tommaso, Andrea Ceccarelli, and Andrea Bondavalli. 2019. MADneSs: A multi-layer anomaly detection framework for complex dynamic systems. IEEE Trans. Depend. Sec. Comput. (2019). DOI:10.1109/TDSC.2019.2908366Google Scholar
Digital Library
- C. Kruegel and T. Toth. 2003. Using decision trees to improve signature-based intrusion detection. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, Berlin, 173–191.Google Scholar
- M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho. 2019. A survey of network-based intrusion detection data sets. Comput. Sec. 86 (2019), 147--167. https://doi.org/10.1016/j.cose.2019.06.005Google Scholar
Digital Library
- Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1–6.Google Scholar
Cross Ref
- Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Sec. 31, 3 (2012), 357–374.
Google Scholar
Digital Library
- Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, 1–6.
Google Scholar
Digital Library
- M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho. 2017. Flow-based benchmark data sets for intrusion detection. In Proceedings of the 16th European Conference on Cyber Warfare and Security. ACPI, 361–369.Google Scholar
- I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani. 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy. 108–116.Google Scholar
- W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Applic. 87 (2017), 185–192.
Google Scholar
Digital Library
- G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B. Micenko-va, E. Schubert, I. Assent, and M. E. Houle. 2016. On the evaluation of outlier detection: Measures, datasets, and an empirical study. In Proceedings of the Lernen, Wissen, Daten, Analysen. CEUR Workshop proceedings.Google Scholar
- Boughorbel Sabri, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PloS One 12, 6 (2017), e0177678.Google Scholar
- D. M. Powers. 2020. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).Google Scholar
- Markus Goldstein and Andreas Dengel. 2012. Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. In Proceedings of the KI-2012: Poster and Demo Track. 59–63.Google Scholar
- H.-P. Kriegel and A. Zimek. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 444–452.
Google Scholar
Digital Library
- V. Hautamaki, I. Karkkainen, and P. Franti. 2004. Outlier detection using k-nearest neighbour graph. In Proceedings of the 17th International Conference on Pattern Recognition, Vol. 3. IEEE, 430–433.
Google Scholar
Digital Library
- M. Amer, M. Goldstein, and S. Abdennadher. 2013. Enhancing one-class support vector machines for unsupervised anomaly detection. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description. ACM, 8–15.
Google Scholar
Digital Library
- Vázquez Félix Iglesias, Tanja Zseby, and Arthur Zimek. 2018. Outlier detection based on low density models. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW’18).Google Scholar
- T. Kohonen. 1997. Exploration of very large databases by self-organizing maps. In Proceedings of International Conference on Neural Networks (ICNN’97), Vol. 1. IEEE, PL1–PL6.Google Scholar
- G. Maciá-Fernández, J. Camacho, R. Magán-Carrión, P. García-Teodoro, and R. Theron. 2018. UGR ’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Sec. 73 (2018), 411–424.Google Scholar
Cross Ref
- S. Garcia, M. Grill, J. Stiborek, and A. Zunino. 2014. An empirical comparison of botnet detection methods. Comput. Sec. 45 (2014), 100–123.
Google Scholar
Digital Library
- A. H. Lashkari, A. F. A. Kadir, L. Taheri, and A. A. Ghorbani. 2018. Toward developing a systematic approach to generate benchmark Android malware datasets and classification. In Proceedings of the International Carnahan Conference on Security Technology (ICCST’18). IEEE, 1–7.Google Scholar
- G. Hamerly and C. Elkan. 2004. Learning the k in k-means. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 281–288.
Google Scholar
Digital Library
- Mennatallah Amer and Markus Goldstein. 2012. Nearest-neighbor and clustering based anomaly detection algorithms for RapidMiner. In Proceedings of the 3rd RapidMiner Community Meeting and Conference (RCOMM’12).Google Scholar
- Jian Tang, Zhixiang Chen, Ada Wai-Chee Fu, and David W. Cheung. 2002. Enhancing effctiveness of outlier detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 535–548.
Google Scholar
Digital Library
- E. Schubert and M. Gertz. 2017. Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In Proceedings of the International Conference on Similarity Search and Applications. Springer, Cham, 188–203.Google Scholar
- Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International conference on Knowledge Discovery and Data Mining (KDD’96)
Google Scholar
Digital Library
- M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying density-based local outliers. ACM Sigmod Rec. 29, 2 (2000), 93–104.
Google Scholar
Digital Library
- F. Ince. 1987. Maximum likelihood classification, optimal or problematic? A comparison with the nearest neighbour classification. Remote Sens. 8, 12 (1987), 1829–1838.Google Scholar
- M. Radovanović, A. Nanopoulos, and M. Ivanović. 2014. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27, 5 (2014), 1369–1382.Google Scholar
Digital Library
- F. T. Liu, K. M. Ting, and Z. H. Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 413–422.
Google Scholar
Digital Library
- J. H. M. Janssens, F. Huszar, E. O. Postma, and H. J. van den Herik. 2012. Stochastic Outlier Selection. Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands.Google Scholar
- J. A. Hartigan and M. A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. J. Roy. Statist. Soc.. Series C (Appl. Statist.) 28, 1 (1979), 100–108.Google Scholar
Cross Ref
- T. Zoppi, A. Ceccarelli, and A. Bondavalli. 2019. Evaluation of anomaly detection algorithms made easy with RELOAD. In Proceedings of the 30th International Symposium on Software Reliability Engineering (ISSRE’19). IEEE, 446–455. DOI:10.1109/ISSRE.2019.00051Google Scholar
- B. Azhagusundari and Antony Selvadoss Thanamani. 2013. Feature selection based on information gain. Int. J. Innov. Technol. Explor. Eng. 2, 2 (2013), 18–21.Google Scholar
- Andrew Y. Ng. 1997. Preventing “overfitting” of cross-validation data. In ICML, vol. 97. 245--253.
Google Scholar
Digital Library
- Joe Levy. 2019. Sophos 2020 Threat Report. Retrieved from https://www.sophos.com/en-us/medialibrary/pdfs/technical-papers/sophoslabs-uncut-2020-threat-report.pdf.Google Scholar
- Theuns Verwoerd and Ray Hunt. 2002. Intrusion detection techniques and approaches. Comput. Commun. 25, 15 (2002), 1356–1365.
Google Scholar
Digital Library
- Ozgur Depren, Murat Topallar, Emin Anarim, and M. Kemal Ciliz. 2005. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Exp. Syst. Applic. 29, 4 (2005), 713–722.
Google Scholar
Digital Library
- L. Bilge and T. Dumitraş. 2012. Before we knew it: An empirical study of zero-day attacks in the real world. In Proceedings of the ACM Conference on Computer and Communications Security. ACM, 833–844.
Google Scholar
Digital Library
- Domenico Cotroneo, Roberto Natella, and Stefano Rosiello. 2017. A fault correlation approach to detect performance anomalies in virtual network function chains. In Proceedings of the International Symposium on Software Reliability Engineering (ISSRE’17). IEEE, 90–100.Google Scholar
- N. Pham and R. Pagh. 2012. A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 877–885.
Google Scholar
Digital Library
- Casas Pedro, Johan Mazel, and Philippe Owezarski. 2012. Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Comput. Commun. 35, 7 (2012), 772–783.
Google Scholar
Digital Library
- Supplementary Data. https://rclserver.dsi.unifi.it/owncloud/index.php/s/TG925KPmdbLBk7J.Google Scholar
- Chicco Davide and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 21, 1 (2020), 6.Google Scholar
- Committee on National Security Systems. 2015. Committee on National Security Systems (CNSS) Glossary - CNSSI No. 4009. https://www.serdp-estcp.org/Tools-and-Training/Installation-Energy-and-Water/Cybersecurity/Resources-Tools-and-Publications/Resources-and-Tools-Files/CNSSI-4009-Committee-on-National-Security-Systems-CNSS-Glossary.Google Scholar
- Kenneth L. Ingham and Hajime Inoue. 2007. Comparing anomaly detection techniques for http. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, 42–62.
Google Scholar
Digital Library
- Victor Garcia-Font, Carles Garrigues, and Helena Rifà-Pous. 2016. A comparative study of anomaly detection techniques for smart city wireless sensor networks. Sensors 16, 6 (2016), 868.Google Scholar
Cross Ref
- Xuemei Ding, Yuhua Li, Ammar Belatreche, and Liam P. Maguire. 2014. An experimental evaluation of novelty detection methods. Neurocomputing 135 (2014), 313–327.
Google Scholar
Digital Library
- Eleazar Eskin. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the International Conference on Machine Learning. Citeseer.
Google Scholar
Digital Library
- M. Gharib and A. Bondavalli. 2019. On the evaluation measures for machine learning algorithms for safety-critical systems. In Proceedings of the 15th European Dependable Computing Conference (EDCC’19). IEEE, 141–144.Google Scholar
- Ali Shawkat and Kate A. Smith. 2006. On learning algorithm selection for classification. Appl. Soft Comput. 6, 2 (2006), 119–138.
Google Scholar
Digital Library
- T. Zoppi, A. Ceccarelli, L. Salani, and A. Bondavalli. 2020. On the educated selection of unsupervised algorithms via attacks and anomaly classes. J. Inf. Secur. Applic. 52 (2020), 102474.Google Scholar
Cross Ref
Index Terms
Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape
Recommendations
Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied ComputingAnomaly detection algorithms aim at identifying unexpected fluctuations in the expected behavior of target indicators, and, when applied to intrusion detection, suspect attacks whenever the above deviations are observed. Through years, several of such ...
Using artificial anomalies to detect unknown and known network intrusions
Intrusion detection systems (IDSs) must be capable of detecting new and unknown attacks, or anomalies. We study the problem of building detection models for both pure anomaly detection and combined misuse and anomaly detection (i.e., detection of both ...
An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks
In this paper, we propose a novel Intrusion Detection System (IDS) architecture utilizing both anomaly and misuse detection approaches. This hybrid Intrusion Detection System architecture consists of an anomaly detection module, a misuse detection ...






Comments