Abstract
A method for detecting electronic data theft from computer networks is described, capable of recognizing patterns of remote exfiltration occurring over days to weeks. Normal traffic flow data, in the form of a host’s ingress and egress bytes over time, is used to train an ensemble of one-class learners. The detection ensemble is modular, with individual classifiers trained on different traffic features thought to characterize malicious data transfers. We select features that model the egress to ingress byte balance over time, periodicity, short timescale irregularity, and density of the traffic. The features are most efficiently modeled in the frequency domain, which has the added benefit that variable duration flows are transformed to a fixed-size feature vector, and by sampling the frequency space appropriately, long-duration flows can be tested. When trained on days or weeks worth of traffic from individual hosts, our ensemble achieves a low false-positive rate (<2%) on a range of different system types. Simulated exfiltration samples with a variety of different timing and data characteristics were generated and used to test ensemble performance on different kinds of systems: When trained on a client workstation’s external traffic, the ensemble was generally successful at detecting exfiltration that is not simultaneously ingress-heavy, connection-sparse, and of short duration—a combination that is not optimal for attackers seeking to transfer large amounts of data. Remote exfiltration is more difficult to detect from egress-heavy systems, like web servers, with normal traffic exhibiting timing characteristics similar to a wide range of exfiltration types.
- C. Aggarwal and S. Sathe. 2015. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor. Newslett. 17, 1 (2015), 24--47.Google Scholar
Digital Library
- Ryan C. Van Antwerp. 2011. Exfiltraion Techniques: An Examination and Emulation. Master's thesis. University of Delaware, USA.Google Scholar
- UCI KDD Archive. 1999. KDD Cup 1999 Data. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.Google Scholar
- P. Barford, J. Kline, D. Plonka, and A. Ron. 2002. A signal analysis of network traffic anomalies. In Proceedings of the International Memory Workshop (IMW’02).Google Scholar
- J. Brutlag. 2000. Aberrant behavior detection in time series for network monitoring. In Proceedings of the 14th USENIX Conference on Large Installation System Administration (LISA’00).Google Scholar
- P. Casas, J. Mazel, and P. Owezasrki. 2012. Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Comp. Comm. 35, 7 (2012), 772--783.Google Scholar
Digital Library
- M. Cheng, Q. Xu, J. Lv, W. Liu, Q. Li, and J. Wang. 2016. MS-LSTM: A multi-scale LSTM model for BGP anomaly detection. In Proceedings of the IEEE International Conference on Network Protocols (ICNP’16).Google Scholar
- A. Chiang, E. David, Y. Lee, G. Leshem, and Y. Yeh. 2017. A study on anomaly detection ensembles. J. Appl. Logic 21 (2017), 1--13.Google Scholar
Cross Ref
- E. Chimetseren, K. Iwai, H. Tanaka, and T. Kurokawa. 2014. A study of IDS using discrete Fourier transform. In Proceedings of the 2014 International Conference on Advanced Technologies for Communications. 463--466.Google Scholar
- Cisco. 2011. NetFlow Version 9 Flow-Record Format. Retrieved from https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html.Google Scholar
- A. Dainotti, A. Pescape, and G. Ventre. 2006. Wavelet-based detection of DoS attacks. In Proceedings of the IEEE Global Communications Conference (GLOBECOM’06).Google Scholar
- G. Dewaele, Y. Himura, P. Borgnat, K. Fukuda, P. Abry, O. Michel, R. Fontugne, K. Cho, and H. Esaki. 2010. Unsupervised host behavior classification from connection patterns. Int. J. Netw. Mgmt. 20 (2010), 317--337.Google Scholar
Digital Library
- M. Drasar, M. Vizvary, and J. Vykopal. 2014. Similarity as a central approach to flow-based anomaly detection. Int. J. Netw. Mgmt. 24 (2014), 318--336.Google Scholar
Digital Library
- T. Dubendorfer and B. Plattner. 2006. Host behaviour based early detection of worm outbreaks in Internet backbones. In Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise.Google Scholar
- E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. 2002. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Security. 77--101.Google Scholar
- J. Gao and P. N. Tan. 2006. Converting output scores from outlier detection algorithms into probability estimates. In Proceedings of the IEEE International Conference on Data Mining. 212--221.Google Scholar
- S. Garg and S. Batra. 2017. A novel ensembled technique for anomaly detection. Int. J. Comm. Syst. 30, 11 (2017). DOI:https://doi.org/10.1002/dac.3248Google Scholar
Cross Ref
- G. Giacinto, R. Pedisci, M. Del Rio, and F. Roli. 2008. Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf. Fus. 9, 1 (2008), 69--82.Google Scholar
Digital Library
- A. Giani and V. Berk. 2006. Data exfiltration and covert channels. In Proceedings of the SPIE 6201, Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense V.Google Scholar
- P. Haag. [n.d.]. nfdump. Retrieved from https://github.com/phaag/nfdump.Google Scholar
- J. Higgins. 2017. Feds Award $500M Credit-monitoring Contract Following OPM Breach. Retreived from https://www.ecommercetimes.com/story/82524.html.Google Scholar
- C. Huang, S. Thareja, and Y. Shin. 2006. Wavelet-based real time detection of network traffic anomalies. In Proceedings of the International Conference on Security and Privacy for Emerging Areas in Communications Networks (SecureComm'06). 1--7.Google Scholar
- ITRC. 2017. 2017 Annual Data Breach Year-End Review. Identity Theft Resource Center.Google Scholar
- B. Jewell and J. Beaver. 2011. Host-based data exfiltration detection via system call sequences. In Proceedings of the International Conference on Internet and Web Applications and Services (ICIW’11).Google Scholar
- Dingde Jiang, Wenda Qin, Laisen Nie, Cheng Bao Yao, and Rongfang Lin. 2010. Time-frequency detection algorithm of network traffic anomalies. In Proceedings of the International Conference on Innovation and Information Management (ICIIM'12). 103--108.Google Scholar
- M. Kim, H. Kong, S. Hong, S. Chung, and J. Hong. 2004. A flow-based method for abnormal network traffic detection. In Proceedings of the 2004 IEEE/IFIP Network Operations and Management Symposium.Google Scholar
- S. Kim, N. Cho, Y. Lee, S. Kang, T. Kim, H. Hwang, and D. Mun. 2013. Application of density-based outlier detection to database activity monitoring. Inf. Syst. Front. 15 (2013), 55--65.Google Scholar
Digital Library
- T. Kim and S. Cho. 2018. Web traffic anomaly detection using C-LSTM neural networks. Exp. Syst. Appl. 106 (2018), 66--76.Google Scholar
Cross Ref
- J. Kittler, M. Hatef, R. Duin, and J. Matas. 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 3 (1998).Google Scholar
Digital Library
- R. Koch, M. Golling, and G. Rodosek. 2014. Behavior-based intrusion detection in encrypted environments. IEEE Commun. Mag. 52, 7 (2014), 124--131.Google Scholar
Cross Ref
- R. Koch and G. D. Rodose. 2010. User identification in encrypted network communications. In Proceedings of the 2010 International Conference on Network and Service Management.Google Scholar
- H. Kriegel, P. Kroger, E. Schubert, and A. Zimek. 2011. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining. 13--24.Google Scholar
- A. Lakhina, M. Crovella, and C. Diot. 2004. Diagnosing network-wide traffic anomalies. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’04).Google Scholar
- A. Lakhina, M. Crovella, and C. Diot. 2005. Mining anomalies using traffic feature distributions. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications.Google Scholar
- A. Lazarevic and V. Kumar. 2005. Feature bagging for outlier detection. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). 157--166.Google Scholar
- K. Leung and C. Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the 28th ACS Australasian Conference on Computer Science. 333--342.Google Scholar
- B. Li, J. Springer, G. Bebis, and M. H. Gunes. 2013. A survey of network flow applications. J. Netw. Comp. App. 36, 2 (2013), 567--581.Google Scholar
Digital Library
- L. Li and G. Lee. 2003. DDoS attack detection and wavelets. In Proceedings of the International Conference on Computer Communications and Networks (ICCCN’03).Google Scholar
- F. T. Liu, K. M. Ting, and Z. Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. 413--422.Google Scholar
- W. Lu and A. Ghorbani. 2008. Network anomaly detection based on wavelet analysis. EURASIP J. Adv. Sign Process. 837601 (2008), 1--16.Google Scholar
- Wei Lu and Issa Traoré. 2008. Unsupervised anomaly detection using an evolutionary extension of k-means algorithm. Int. J. Inf. Comput. Secur. 2, 2 (2008), 107--139.Google Scholar
Cross Ref
- R. Lyons. 2001. Understanding Digital Signal Processing. Prentice Hall, Upper Saddle River, NJ.Google Scholar
- M. Marchetti, F. Pierazzi, M. Colajanni, and A. Guido. 2016. Analysis of high volumes of network traffic for advanced persistent threat detection. Comput. Netw. 109 (2016), 127--141.Google Scholar
Digital Library
- J. Marron, M. Todd, and J. Ahn. 2007. Distance-weighted discrimination. J. Am. Stat. Assoc. 102, 480 (2007), 1267--1271.Google Scholar
Cross Ref
- J. Mazel, P. Casas, and P. Owezarski. 2011. Sub-space clustering and evidence accumulation for unsupervised network anomaly detection. In Proceedings of the Conference on Traffic Monitoring and Analysis (TMA’11). 15--28.Google Scholar
- C. Mazzola and T. Tragesser. 2011. Security incident discovery and correlation on .gov networks. In Proceedings of the Computer Emergency Response Team FloCon Workshop.Google Scholar
- J. McCrank and J. Finkle. 2018. Equifax Breach Could be Most Costly in Corporate History. Retrieved from https://www.reuters.com/article/us-equifax-cyber/equifax-breach-could-be-most-costly-in-corporate-history-idUSKCN1GE257.Google Scholar
- L. Meng, S. Liu, L. Liu, J. Chen, and H. Sun. 2012. Trojan rapid detection method based on heartbeat behavior analysis. Comp. Eng. 14 (2012).Google Scholar
- R. Mudge. [n.d.]. Cobalt Strike: Advanced Threat Tactics for Penetration Testers. Retrieved from https://www.cobaltstrike.com/.Google Scholar
- H. Nguyen, T. Nguyan, D. Kim, and D. Choi. 2008. Network traffic anomalies detection and identification with flow monitoring. In Proceedings of the 5th IFIP International Conference on Wireless and Optical Communications Networks (WOCN'08). 1--5.Google Scholar
- ohdae. [n.d.]. Intersect, Kali Tools. Retrieved from https://tools.kali.org/maintaining-access/intersect.Google Scholar
- F. Paul. 2017. We finally know how much a data breach can cost. Retrieved from https://www.networkworld.com/article/3172402/security/we-finally-know-how-much-a-data-breach-can-cost.html.Google Scholar
- L. Ponemon. 2017. 2017 Cost of Data Breach Study. Ponemon Institute.Google Scholar
- L. Portnoy, E. Eskin, and S. Stolfo. 2001. Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security. 5--8.Google Scholar
- Y. Pu, X. Chen, X. Cui, J. Shi, L. Guo, and C. Qi. 2013. Data stolen trojan detection based on network behaviors. Proc. Comput. Sci. 17 (2013), 828--835.Google Scholar
Cross Ref
- Y. Qian, D. Shan, D. Wei, Y. Li, and Z. Luo. 2018. Network-wide anomalous flow identification method based on traffic characteristics distribution. Proc. Comp. Sci. 131 (2018), 1014--1022.Google Scholar
Digital Library
- B. Radford, L. Apolonio, A. Trias, and J. Simpson. 2018. Network traffic anomaly detection using recurrent neural networks. arXiv:1803:10769. Retrieved from https://arxiv.org/abs/1803.10769.Google Scholar
- R. Ramachandran, S. Neelakantan, and A. S. Bidyarthy. 2011. Behavior model for detecting data exfiltration in network environment. In Proceedings of the 2011 IEEE International Conference on Internet Multimedia Systems Architecture and Application.Google Scholar
- A. Ramanathan. 2002. WADeS: A tool for distributed denial of service attack detection. MS Thesis, Texas A8M University (2002).Google Scholar
- rapid7. [n.d.]. About the metesploit meterpreter. Retrieved from https://www.offensive-security.com/metasploit-unleashed/about-meterpreter/.Google Scholar
- L. Rokach. 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1--2 (2010), 1--39.Google Scholar
Digital Library
- Bernhard Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt. 1999. Support mector method for novelty detection. In Proceedings of the 12 International Conference on Neural Information Processing Systems. 582--588.Google Scholar
- E. Schubert, R. Wojdanowski, A. Zimek, and H. Kriegel. 2012. On evaluation of outlier rankings and outlier scores. In Proceedings of the 2012 SIAM International Conference on Data Mining. 1047--1058.Google Scholar
- L. Shan, D. Xiaorui, and R. Hong. 2010. n adaptive method preventing database from SQL injection attacks. In Proceedings of the International Conference on Advanced Computer Theory and Engineering (ICACTE’10).Google Scholar
- C. E. Shannon. 1949. Communication in the presence of noise. Proc. Inst. Radio Eng. 37, 1 (1949), 10--21.Google Scholar
Cross Ref
- L. Shoemaker and L. Hall. 2011. Anomaly detection using ensembles. In Proceedings of the 10th International Conference on Multiple Classifier Systems. 6--15.Google Scholar
- Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang. 2003. A novel anomaly detection scheme based on principal component classifier. In Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in Conjunction with the Third IEEE International Conference on Data Mining (ICDM'03). 172--179.Google Scholar
- J. Sigholm and M. Raciti. 2012. Best-effort data leakage prevention in inter-organizational tactical MANETs. In Proceeedings of the IEEE Military Communications Conference (MILCOM’12).Google Scholar
- A. Soule, K. Salamatian, and N. Taft. 2005. Combining filtering and statistical methods for anomaly detection. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’05).Google Scholar
- A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller. 2010. An overview of IP flow-based intrustion detection. IEEE Commun. Surv. Tut. 12, 3 (2010), 343--356.Google Scholar
Digital Library
- D. Tax and R. Duin. 2001. Combining one-class classifiers. In Proceedings of the 2nd International Workshop on Multiple Classifier Systems. 299--308.Google Scholar
- F. Ullah, M. Edwards, R. Ramdhany, R. Chitchyan, M. Ali Babar, and A. Rashid. 2018. Data exfiltration: A review of external attack vectors and countermeasures. J. Netw. Comput. Appl. 101 (2018), 18--54.Google Scholar
Digital Library
- Verizon. 2017. 2017 Data Breach Investigations Report. Retrieved from https://enterprise.verizon.com/resources/reports/2017_dbir.pdf.Google Scholar
- W. Wang, B. Yang, and V. Y. Chen. 2015. A visual analytics approach to detecting server redirections and data exfiltration. In Proceedings of the 16th Annual Information Security Symposium Center for Education and Research in Information Assurance and Security (CERIAS'15), Vol. 20. 1.Google Scholar
- G. Yan. 2016. Network anomaly traffic detection method based on support vector machine. In Proceedings of the International Conference on Smart City and Systems Engineering (ICSCSE'16). 3--6.Google Scholar
Cross Ref
Index Terms
Malicious Overtones: Hunting Data Theft in the Frequency Domain with One-class Learning
Recommendations
The optimization of situational awareness for insider threat detection
CODASPY '11: Proceedings of the first ACM conference on Data and application security and privacyIn recent years, organizations ranging from defense and other government institutions to commercial enterprises, research labs, etc., have witnessed an increasing amount of sophisticated insider attacks that manage to bypass existing security controls. ...
Data exfiltration
ContextOne of the main targets of cyber-attacks is data exfiltration, which is the leakage of sensitive or private data to an unauthorized entity. Data exfiltration can be perpetrated by an outsider or an insider of an organization. Given the increasing ...
Threat-based Simulation of Data Exfiltration Towards Mitigating Multiple Ransomware Extortions
Network-based attacks and their mitigation are of increasing importance in our ever-connected world. Often network-based attacks address valuable data, which the attacker either encrypts to extort ransom or steals to make money reselling, or both. After ...






Comments