skip to main content
research-article
Free Access

Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape

Authors Info & Claims
Published:08 April 2021Publication History
Skip Abstract Section

Abstract

Anomaly detection aims at identifying unexpected fluctuations in the expected behavior of a given system. It is acknowledged as a reliable answer to the identification of zero-day attacks to such extent, several ML algorithms that suit for binary classification have been proposed throughout years. However, the experimental comparison of a wide pool of unsupervised algorithms for anomaly-based intrusion detection against a comprehensive set of attacks datasets was not investigated yet. To fill such gap, we exercise 17 unsupervised anomaly detection algorithms on 11 attack datasets. Results allow elaborating on a wide range of arguments, from the behavior of the individual algorithm to the suitability of the datasets to anomaly detection. We conclude that algorithms as Isolation Forests, One-Class Support Vector Machines, and Self-Organizing Maps are more effective than their counterparts for intrusion detection, while clustering algorithms represent a good alternative due to their low computational complexity. Further, we detail how attacks with unstable, distributed, or non-repeatable behavior such as Fuzzing, Worms, and Botnets are more difficult to detect. Ultimately, we digress on capabilities of algorithms in detecting anomalies generated by a wide pool of unknown attacks, showing that achieved metric scores do not vary with respect to identifying single attacks.

References

  1. V. Chandola, A. Banerjee, V. Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. He, J. Zhu, P. He, and M. R. Lyu. 2016. Experience report: System log analysis for anomaly detection. In Proceedings of the IEEE 27th International Symposium on Software Reliability Engineering (ISSRE’16). 207–218.Google ScholarGoogle Scholar
  3. M. Goldstein and S. Uchida. 2016. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS One 11, 4 (2016), e0152173.Google ScholarGoogle ScholarCross RefCross Ref
  4. K. Leung and C. Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the 28th Australasian Conference on Computer Science, Vol. 38. 333–342. Australian Computer Society, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Falcão, T. Zoppi, C. B. V. Silva, A. Santos, B. Fonseca, A. Ceccarelli, and A. Bondavalli. 2019. Quantitative comparison of unsupervised anomaly detection algorithms for intrusion detection. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 318–327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Mirkovic and P. Reiher. 2004. A taxonomy of DDoS attack and DDoS defense mechanisms. ACM SIGCOMM Comput. Commun. Rev. 34, 2 (2004), 39–53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Lazarevic, L. Ertoz, V. Kumar, A. Ozgur, and J. Srivastava. 2003. A comparative study of anomaly detection schemes in network intrusion detection. In Proceedings of the SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 25–36.Google ScholarGoogle Scholar
  8. T. Zoppi, A. Ceccarelli, and A. Bondavalli. 2017. Exploring anomaly detection in systems of systems. In Proceedings of the Symposium on Applied Computing. ACM, 1139–1146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. D'hooge, T. Wauters, B. Volckaert, and F. De Turck. 2019. In-depth comparative evaluation of supervised machine learning approaches for detection of cybersecurity threats. In Proceedings of the 4th International Conference on Internet Things, Big Data Security. 125–136.Google ScholarGoogle Scholar
  10. Check Point Research. 2019. Cyber Attack Trend: 2019 Mid-Year Report, vol. 1, 2019. https://research.checkpoint.com/2019/cyber-attack-trends-2019-mid-year-report/.Google ScholarGoogle Scholar
  11. ENISA. 2018. Threat Landscape Report, vol. 7, 2018. https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2018.Google ScholarGoogle Scholar
  12. Verizon. 2019. Data Breach Investigations Report. Retrieved from https://enterprise.verizon.com/resources/reports/2019/2019-data-breach-investigations-report-emea.pdf.Google ScholarGoogle Scholar
  13. Zoppi Tommaso, Andrea Ceccarelli, and Andrea Bondavalli. 2019. MADneSs: A multi-layer anomaly detection framework for complex dynamic systems. IEEE Trans. Depend. Sec. Comput. (2019). DOI:10.1109/TDSC.2019.2908366Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Kruegel and T. Toth. 2003. Using decision trees to improve signature-based intrusion detection. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, Berlin, 173–191.Google ScholarGoogle Scholar
  15. M. Ring, S. Wunderlich, D. Scheuring, D. Landes, and A. Hotho. 2019. A survey of network-based intrusion detection data sets. Comput. Sec. 86 (2019), 147--167. https://doi.org/10.1016/j.cose.2019.06.005Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Sec. 31, 3 (2012), 357–374. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications. IEEE, 1–6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho. 2017. Flow-based benchmark data sets for intrusion detection. In Proceedings of the 16th European Conference on Cyber Warfare and Security. ACPI, 361–369.Google ScholarGoogle Scholar
  20. I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani. 2018. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the International Conference on Information Systems Security and Privacy. 108–116.Google ScholarGoogle Scholar
  21. W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Applic. 87 (2017), 185–192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. O. Campos, A. Zimek, J. Sander, R. J. Campello, B. Micenko-va, E. Schubert, I. Assent, and M. E. Houle. 2016. On the evaluation of outlier detection: Measures, datasets, and an empirical study. In Proceedings of the Lernen, Wissen, Daten, Analysen. CEUR Workshop proceedings.Google ScholarGoogle Scholar
  23. Boughorbel Sabri, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PloS One 12, 6 (2017), e0177678.Google ScholarGoogle Scholar
  24. D. M. Powers. 2020. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061 (2020).Google ScholarGoogle Scholar
  25. Markus Goldstein and Andreas Dengel. 2012. Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. In Proceedings of the KI-2012: Poster and Demo Track. 59–63.Google ScholarGoogle Scholar
  26. H.-P. Kriegel and A. Zimek. Angle-based outlier detection in high-dimensional data. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 444–452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. V. Hautamaki, I. Karkkainen, and P. Franti. 2004. Outlier detection using k-nearest neighbour graph. In Proceedings of the 17th International Conference on Pattern Recognition, Vol. 3. IEEE, 430–433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Amer, M. Goldstein, and S. Abdennadher. 2013. Enhancing one-class support vector machines for unsupervised anomaly detection. In Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description. ACM, 8–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Vázquez Félix Iglesias, Tanja Zseby, and Arthur Zimek. 2018. Outlier detection based on low density models. In Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW’18).Google ScholarGoogle Scholar
  30. T. Kohonen. 1997. Exploration of very large databases by self-organizing maps. In Proceedings of International Conference on Neural Networks (ICNN’97), Vol. 1. IEEE, PL1–PL6.Google ScholarGoogle Scholar
  31. G. Maciá-Fernández, J. Camacho, R. Magán-Carrión, P. García-Teodoro, and R. Theron. 2018. UGR ’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Sec. 73 (2018), 411–424.Google ScholarGoogle ScholarCross RefCross Ref
  32. S. Garcia, M. Grill, J. Stiborek, and A. Zunino. 2014. An empirical comparison of botnet detection methods. Comput. Sec. 45 (2014), 100–123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. H. Lashkari, A. F. A. Kadir, L. Taheri, and A. A. Ghorbani. 2018. Toward developing a systematic approach to generate benchmark Android malware datasets and classification. In Proceedings of the International Carnahan Conference on Security Technology (ICCST’18). IEEE, 1–7.Google ScholarGoogle Scholar
  34. G. Hamerly and C. Elkan. 2004. Learning the k in k-means. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 281–288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Mennatallah Amer and Markus Goldstein. 2012. Nearest-neighbor and clustering based anomaly detection algorithms for RapidMiner. In Proceedings of the 3rd RapidMiner Community Meeting and Conference (RCOMM’12).Google ScholarGoogle Scholar
  36. Jian Tang, Zhixiang Chen, Ada Wai-Chee Fu, and David W. Cheung. 2002. Enhancing effctiveness of outlier detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 535–548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Schubert and M. Gertz. 2017. Intrinsic t-stochastic neighbor embedding for visualization and outlier detection. In Proceedings of the International Conference on Similarity Search and Applications. Springer, Cham, 188–203.Google ScholarGoogle Scholar
  38. Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International conference on Knowledge Discovery and Data Mining (KDD’96) Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. Sander. 2000. LOF: Identifying density-based local outliers. ACM Sigmod Rec. 29, 2 (2000), 93–104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Ince. 1987. Maximum likelihood classification, optimal or problematic? A comparison with the nearest neighbour classification. Remote Sens. 8, 12 (1987), 1829–1838.Google ScholarGoogle Scholar
  41. M. Radovanović, A. Nanopoulos, and M. Ivanović. 2014. Reverse nearest neighbors in unsupervised distance-based outlier detection. IEEE Trans. Knowl. Data Eng. 27, 5 (2014), 1369–1382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. F. T. Liu, K. M. Ting, and Z. H. Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. IEEE, 413–422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. H. M. Janssens, F. Huszar, E. O. Postma, and H. J. van den Herik. 2012. Stochastic Outlier Selection. Technical report TiCC TR 2012-001, Tilburg University, Tilburg Center for Cognition and Communication, Tilburg, The Netherlands.Google ScholarGoogle Scholar
  44. J. A. Hartigan and M. A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. J. Roy. Statist. Soc.. Series C (Appl. Statist.) 28, 1 (1979), 100–108.Google ScholarGoogle ScholarCross RefCross Ref
  45. T. Zoppi, A. Ceccarelli, and A. Bondavalli. 2019. Evaluation of anomaly detection algorithms made easy with RELOAD. In Proceedings of the 30th International Symposium on Software Reliability Engineering (ISSRE’19). IEEE, 446–455. DOI:10.1109/ISSRE.2019.00051Google ScholarGoogle Scholar
  46. B. Azhagusundari and Antony Selvadoss Thanamani. 2013. Feature selection based on information gain. Int. J. Innov. Technol. Explor. Eng. 2, 2 (2013), 18–21.Google ScholarGoogle Scholar
  47. Andrew Y. Ng. 1997. Preventing “overfitting” of cross-validation data. In ICML, vol. 97. 245--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Joe Levy. 2019. Sophos 2020 Threat Report. Retrieved from https://www.sophos.com/en-us/medialibrary/pdfs/technical-papers/sophoslabs-uncut-2020-threat-report.pdf.Google ScholarGoogle Scholar
  49. Theuns Verwoerd and Ray Hunt. 2002. Intrusion detection techniques and approaches. Comput. Commun. 25, 15 (2002), 1356–1365. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ozgur Depren, Murat Topallar, Emin Anarim, and M. Kemal Ciliz. 2005. An intelligent intrusion detection system (IDS) for anomaly and misuse detection in computer networks. Exp. Syst. Applic. 29, 4 (2005), 713–722. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. L. Bilge and T. Dumitraş. 2012. Before we knew it: An empirical study of zero-day attacks in the real world. In Proceedings of the ACM Conference on Computer and Communications Security. ACM, 833–844. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Domenico Cotroneo, Roberto Natella, and Stefano Rosiello. 2017. A fault correlation approach to detect performance anomalies in virtual network function chains. In Proceedings of the International Symposium on Software Reliability Engineering (ISSRE’17). IEEE, 90–100.Google ScholarGoogle Scholar
  53. N. Pham and R. Pagh. 2012. A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 877–885. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Casas Pedro, Johan Mazel, and Philippe Owezarski. 2012. Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Comput. Commun. 35, 7 (2012), 772–783. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Supplementary Data. https://rclserver.dsi.unifi.it/owncloud/index.php/s/TG925KPmdbLBk7J.Google ScholarGoogle Scholar
  56. Chicco Davide and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 21, 1 (2020), 6.Google ScholarGoogle Scholar
  57. Committee on National Security Systems. 2015. Committee on National Security Systems (CNSS) Glossary - CNSSI No. 4009. https://www.serdp-estcp.org/Tools-and-Training/Installation-Energy-and-Water/Cybersecurity/Resources-Tools-and-Publications/Resources-and-Tools-Files/CNSSI-4009-Committee-on-National-Security-Systems-CNSS-Glossary.Google ScholarGoogle Scholar
  58. Kenneth L. Ingham and Hajime Inoue. 2007. Comparing anomaly detection techniques for http. In Proceedings of the International Workshop on Recent Advances in Intrusion Detection. Springer, 42–62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Victor Garcia-Font, Carles Garrigues, and Helena Rifà-Pous. 2016. A comparative study of anomaly detection techniques for smart city wireless sensor networks. Sensors 16, 6 (2016), 868.Google ScholarGoogle ScholarCross RefCross Ref
  60. Xuemei Ding, Yuhua Li, Ammar Belatreche, and Liam P. Maguire. 2014. An experimental evaluation of novelty detection methods. Neurocomputing 135 (2014), 313–327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Eleazar Eskin. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the International Conference on Machine Learning. Citeseer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. M. Gharib and A. Bondavalli. 2019. On the evaluation measures for machine learning algorithms for safety-critical systems. In Proceedings of the 15th European Dependable Computing Conference (EDCC’19). IEEE, 141–144.Google ScholarGoogle Scholar
  63. Ali Shawkat and Kate A. Smith. 2006. On learning algorithm selection for classification. Appl. Soft Comput. 6, 2 (2006), 119–138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. T. Zoppi, A. Ceccarelli, L. Salani, and A. Bondavalli. 2020. On the educated selection of unsupervised algorithms via attacks and anomaly classes. J. Inf. Secur. Applic. 52 (2020), 102474.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!