Abstract
Most research in the field of network intrusion detection heavily relies on datasets. Datasets in this field, however, are scarce and difficult to reproduce. To compare, evaluate, and test related work, researchers usually need the same datasets or at least datasets with similar characteristics as the ones used in related work. In this work, we present concepts and the Intrusion Detection Dataset Toolkit (ID2T) to alleviate the problem of reproducing datasets with desired characteristics to enable an accurate replication of scientific results. Intrusion Detection Dataset Toolkit (ID2T) facilitates the creation of labeled datasets by injecting synthetic attacks into background traffic. The injected synthetic attacks created by ID2T blend with the background traffic by mimicking the background traffic’s properties.
This article has three core contributions. First, we present a comprehensive survey on intrusion detection datasets. In the survey, we propose a classification to group the negative qualities found in the datasets. Second, the architecture of ID2T is revised, improved, and expanded in comparison to previous work. The architectural changes enable ID2T to inject recent and advanced attacks, such as the EternalBlue exploit or a peer-to-peer botnet. ID2T’s functionality provides a set of tests, known as TIDED, that helps identify potential defects in the background traffic into which attacks are injected. Third, we illustrate how ID2T is used in different use-case scenarios to replicate scientific results with the help of reproducible datasets. ID2T is open source software and is made available to the community to expand its arsenal of attacks and capabilities.
- Sebastian Abt and Harald Baier. 2013. Are we missing labels? A study of the availability of ground-truth in network security research. In Proceedings of the Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS’14). Google Scholar
Digital Library
- United States Military Academy. 2009. CDX 2009 Network. Retrieved from https://www.westpoint.edu/centers-and-research/cyber-research-center/data-sets.Google Scholar
- Akamai. 2018. The state of the internet / security report. Retrieved from https://www.akamai.com/uk/en/multimedia/documents/case-study/spring-2018-state-of-the-internet-security-report.pdf.Google Scholar
- Rafael Ramos Regis Barbosa, Ramin Sadre, Aiko Pras, and Remco Meent. 2010. Simpleweb/University of Twente Traffic Traces Data Repository. Technical Report. Centre for Telematics and Information Technology, University of Twente.Google Scholar
- Steven M. Bellovin. 1992. Packets found on an internet 1 introduction 2 address space oddities. Comput. Commun. 23, 3 (1992), 1--8.Google Scholar
- Monowar H. Bhuyan, Dhruba K. Bhattacharyya, and Jugal K. Kalita. 2015. Towards generating real-life datasets for network intrusion detection. Int. J. Netw. Secur. 17, 6 (2015), 683--701.Google Scholar
- Daniela Brauckhoff, Arno Wagner, and May Martin. 2008. FLAME: A flow-level anomaly modeling engine. In Proceedings of the Conference on Cyber Security (CSET’08). Google Scholar
Digital Library
- CAIDA. 2017. CAIDA Data—Overview of Datasets, Monitors, and Reports. Retrieved from http://www.caida.org/data/overview/.Google Scholar
- National CyberWatch Center. 2017. Mid-Atlantic Collegiate Cyber Defense Competition. Retrieved from https://maccdc.org/.Google Scholar
- Carlos Garcia Cordero, Emmanouil Vasilomanolakis, Nikolay Milanov, Christian Koch, David Hausheer, and Max Mühlhäuser. 2015. ID2T: A DIY dataset creation toolkit for intrusion detection systems. In Proceedings of the Conference on Communications and Network Security (CNS’15). IEEE, 739--740.Google Scholar
Cross Ref
- Michelle Cotton, Lars Eggert, Joe Touch, Magnus Westerlund, and Stuart Cheshire. 2011. Internet Assigned Numbers Authority (IANA) Procedures for the Management of the Service name and Transport Protocol Port Number Registry. RFC 6335. Retrieved from http://buildbot.tools.ietf.org/html/rfc6335.Google Scholar
- Gideon Creech and Jiankun Hu. 2013. Generation of a new IDS test dataset: Time to Retire the KDD Collection. In Proceedings of the Wireless Communications and Networking Conference (WCNC’13). IEEE, 4487--4492.Google Scholar
Cross Ref
- Robert K. Cunningham, Richard P. Lippmann, David J. Fried, Simson L. Garfinkel, Isaac Graf, Kris R. Kendall, Seth E. Webster, Dan Wyschogrod, and Marc A. Zissman. 1999. Evaluating Intrusion Detection Systems Without Attacking your Friends: The 1998 DARPA Intrusion Detection Evaluation. Technical Report. MIT Lincoln Lab.Google Scholar
- Peter B. Danzig and Sugih Jamin. 1991. tcplib: A library of internetwork traffic characteristics. Library 48 (1991), 1--8.Google Scholar
- Romain Fontugne, Pierre Borgnat, Patrice Abry, and Kensuke Fukuda. 2010. MAWILab: Combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking. In Proceedings of the Conference on Emerging Networking EXperiments and Technologies (CoNEXT’10). ACM, 1--12. Google Scholar
Digital Library
- Sebastian Garcia. 2011. Stratosphere Research Laboratory. Retrieved from https://www.stratosphereips.org/.Google Scholar
- Sebastian Garcia, Martin Grill, Jan Stiborek, and Alejandro Zunino. 2014. An empirical comparison of botnet detection methods. Comput. Secur. 45 (2014), 100--123. Google Scholar
Digital Library
- Carlos Garcia Cordero, Sascha Hauke, Max Mühlhäuser, and Mathias Fischer. 2016. Analyzing flow-based anomaly intrusion detection using replicator neural networks. In Proceedings of the 14th Annual Conference on Privacy, Security and Trust (PST’16). 317--324. DOI:https://doi.org/10.1109/PST.2016.7906980Google Scholar
Cross Ref
- Dan Grossman. 2002. New Terminology and Clarifications for Diffserv. RFC 3260. Retrieved from http://buildbot.tools.ietf.org/html/rfc3260. Google Scholar
Digital Library
- W. Haider, J. Hu, J. Slay, B. P. Turnbull, and Y. Xie. 2017. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 87 (2017), 185--192. Google Scholar
Digital Library
- Santiago Hernández. 2018. Awesome-Cybersecurity-Datasets. Retrieved from https://github.com/shramos/Awesome-Cybersecurity-Datasets.Google Scholar
- IMPACT. 2017. Information Marketplace. Retrieved from https://www.impactcybertrust.org.Google Scholar
- Kadangode K. Ramakrishnan, Sally Floyd, and D. Black. 2001. The Addition of Explicit Congestion Notification (ECN’01) to IP. Technical Report.Google Scholar
- KDD Cup 99. 1999. Knowledge Discovery and Data Mining Tools Competition. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.Google Scholar
- Robert Koch, Mario Golling, and Gabi Dreo Rodosek. 2014. Towards comparability of intrusion detection systems: New data sets. In Proceedings of the TERENA Networking Conference. 7.Google Scholar
- Anukool Lakhina, Mark Crovella, and Christophe Diot. 2005. Mining anomalies using traffic feature distributions. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’05). ACM Press, 217--228. Google Scholar
Digital Library
- Imed Lassoued. 2011. Adaptive Monitoring and Management of Internet Traffic. PhD Thesis. Université de Nice.Google Scholar
- Marc Liberatore and Prashant Shenoy. 2013. Umass trace repository. Retrieved from http://traces.cs.umass.edu.Google Scholar
- Thomas Lukaseder. 2017. 2017-SUEE-data-set. Retrieved from https://github.com/vs-uulm/2017-SUEE-data-set.Google Scholar
- Matthew V. Mahoney. 2003. Network traffic anomaly detection based on packet bytes. In Proceedings of the 2003 ACM Symposium on Applied Computing. ACM, 346--350. Google Scholar
Digital Library
- Matthew V. Mahoney and Philip K. Chan. 2003. An analysis of the 1999 DARPA/lincoln laboratory evaluation data for network anomaly detection. In Proceedings of the International Symposium on Recent Advances in Intrusion Detection. 220--237. DOI:https://doi.org/10.1007/b13476Google Scholar
- John McHugh. 2000. Testing intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by lincoln laboratory. ACM Trans. Info. Syst. Secur. 3, 4 (2000), 262--294. DOI:https://doi.org/10.1145/382912.382923 Google Scholar
Digital Library
- Nour Moustafa and Jill Slay. 2015. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS’15). IEEE, 1--6.Google Scholar
Cross Ref
- Boris Nechaev, Mark Allman, Vern Paxson, and Andrei V. Gurtov. 2010. A preliminary analysis of TCP performance in an enterprise network. INM/WREN 10 (2010).Google Scholar
- NETRESEC. 2010. Capture files from Mid-Atlantic CCDC. Retrieved from https://www.netresec.com/?page=MACCDC.Google Scholar
- Vern Paxson. 1999. Bro: A system for detecting network intruders in real-time. Comput. Netw. 31, 23--24 (1999), 2435--2463. DOI:https://doi.org/10.1016/S1389-1286(99)00112-7 Google Scholar
Digital Library
- Jon Postel et al. 1981. Internet Protocol. RFC 791. Retrieved from http://buildbot.tools.ietf.org/html/rfc791.Google Scholar
- Nadun Rajasinghe, Jagath Samarabandu, and Xianbin Wang. 2018. INSecS-DCS: A highly customizable network intrusion dataset creation framework. In Proceedings of the IEEE Canadian Conference on Electrical 8 Computer Engineering (CCECE’18). IEEE, 1--4.Google Scholar
Cross Ref
- Joyce Reynolds and Jon Postel. 1994. Assigned Numbers. Technical Report.Google Scholar
- Haakon Ringberg, Matthew Roughan, and Jennifer Rexford. 2008. The need for simulation in evaluating anomaly detectors. SIGCOMM Comput. Commun. Rev. 38, 1 (Jan. 2008), 55--59. DOI:https://doi.org/10.1145/1341431.1341443 Google Scholar
Digital Library
- Benjamin Sangster, Thomas Cook, Robert Fanelli, Erik Dean, William J. Adams, Chris Morrell, and Gregory Conti. 2009. Toward instrumenting network warfare competitions to generate labeled datasets. In Proceedings of the USENIX Security’s Workshop on Cyber Security Experimentation and Test (CSET’09). Google Scholar
Digital Library
- Mike Sconzo. 2015. Samples of Security Related Data. Retrieved from https://www.secrepo.com/.Google Scholar
- Ali Shiravi, Hadi Shiravi, Mahbod Tavallaee, and Ali A. Ghorbani. 2012. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 3 (2012), 357--374. Google Scholar
Digital Library
- John Sonchack, Adam J. Aviv, and Jonathan M. Smith. 2013. Bridging the data gap: Data related challenges in evaluating large scale collaborative security systems. In Proceedings of the 6th Workshop on Cyber Security Experimentation and Test.Google Scholar
- Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2006. Description of Kyoto University benchmark data. Academic Center for Computing and Media Studies (ACCMS), Kyoto University.Google Scholar
- Jungsuk Song, Hiroki Takakura, and Yasuo Okabe. 2008. Cooperation of intelligent honeypots to detect unknown malicious codes. In Proceedings of the WOMBAT Workshop on Information Security Threats Data Collection and Sharing (WISTDCS’08). IEEE, 31--39. Google Scholar
Digital Library
- Anna Sperotto, Ramin Sadre, Frank Van Vliet, and Aiko Pras. 2009. A labeled data set for flow-based intrusion detection. In Proceedings of the International Workshop on IP Operations and Management. Springer, 39--50. Google Scholar
Digital Library
- SPIRENT. 2002. pcapr: PCAP files repository. Retrieved from https://www.pcapr.net/.Google Scholar
- Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani. 2009. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the Symposium on Computational Intelligence for Security and Defense Applications (CISDA’09). IEEE, 1--6. DOI:https://doi.org/10.1109/CISDA.2009.5356528 Google Scholar
Digital Library
- Emmanouil Vasilomanolakis, Carlos Garcia Cordero, Nikolay Milanov, and Max Mühlhäuser. 2016. Towards the creation of synthetic, yet realistic, intrusion detection datasets. In Proceedings of the IEEE/IFIP Network Operations and Management Symposium (NOMS’16). IEEE, 1209--1214.Google Scholar
Digital Library
- Emmanouil Vasilomanolakis, Shankar Karuppayah, Max Mühlhäuser, and Mathias Fischer. 2015. Taxonomy and survey of collaborative intrusion detection. Comput. Surveys 47, 4 (2015), 33. Google Scholar
Digital Library
- Emmanouil Vasilomanolakis, Matthias Krügl, Carlos Garcia Cordero, Max Mühlhäuser, and Mathias Fischer. 2015. SkipMon: A locality-aware collaborative intrusion detection system. In Proceedings of the IEEE 34th International Performance on Computing and Communications Conference (IPCCC’15). IEEE, 1--8. Google Scholar
Digital Library
- Richard Zuech, Taghi M. Khoshgoftaar, Naeem Seliya, Maryam M. Najafabadi, and Clifford Kemp. 2015. A new intrusion detection benchmarking system. In Proceedings of the FLAIRS Conference. 252--256.Google Scholar
Index Terms
On Generating Network Traffic Datasets with Synthetic Attacks for Intrusion Detection
Recommendations
An Intrusion-Tolerant Mechanism for Intrusion Detection Systems
ARES '08: Proceedings of the 2008 Third International Conference on Availability, Reliability and SecurityIn accordance with the increasing importance of intrusion detection systems (IDS), users justifiably demand the trustworthiness of the IDS. However, sophisticated attackers attempt to disable the IDS before they launch a thorough attack. Therefore, to ...
Service-independent payload analysis to improve intrusion detection in network traffic
AusDM '08: Proceedings of the 7th Australasian Data Mining Conference - Volume 87The popularity of computer networks broadens the scope for network attackers and increases the damage these attacks can cause. In this context, Intrusion Detection Systems (IDS) are included as part of any complete security package. This work focuses on ...
Analyzing attack strategies against rule-based intrusion detection systems
Workshops ICDCN '18: Proceedings of the Workshop Program of the 19th International Conference on Distributed Computing and NetworkingIntrusion Detection Systems (IDS) have been widely used to detect cyber attacks in Cyber-Physical Systems (CPS). However, attackers can often adapt their attacking strategies to evade detection. Many commercial IDS are rule-based systems. This paper ...






Comments