skip to main content
research-article

Malicious Overtones: Hunting Data Theft in the Frequency Domain with One-class Learning

Published:03 November 2019Publication History
Skip Abstract Section

Abstract

A method for detecting electronic data theft from computer networks is described, capable of recognizing patterns of remote exfiltration occurring over days to weeks. Normal traffic flow data, in the form of a host’s ingress and egress bytes over time, is used to train an ensemble of one-class learners. The detection ensemble is modular, with individual classifiers trained on different traffic features thought to characterize malicious data transfers. We select features that model the egress to ingress byte balance over time, periodicity, short timescale irregularity, and density of the traffic. The features are most efficiently modeled in the frequency domain, which has the added benefit that variable duration flows are transformed to a fixed-size feature vector, and by sampling the frequency space appropriately, long-duration flows can be tested. When trained on days or weeks worth of traffic from individual hosts, our ensemble achieves a low false-positive rate (<2%) on a range of different system types. Simulated exfiltration samples with a variety of different timing and data characteristics were generated and used to test ensemble performance on different kinds of systems: When trained on a client workstation’s external traffic, the ensemble was generally successful at detecting exfiltration that is not simultaneously ingress-heavy, connection-sparse, and of short duration—a combination that is not optimal for attackers seeking to transfer large amounts of data. Remote exfiltration is more difficult to detect from egress-heavy systems, like web servers, with normal traffic exhibiting timing characteristics similar to a wide range of exfiltration types.

References

  1. C. Aggarwal and S. Sathe. 2015. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explor. Newslett. 17, 1 (2015), 24--47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ryan C. Van Antwerp. 2011. Exfiltraion Techniques: An Examination and Emulation. Master's thesis. University of Delaware, USA.Google ScholarGoogle Scholar
  3. UCI KDD Archive. 1999. KDD Cup 1999 Data. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.Google ScholarGoogle Scholar
  4. P. Barford, J. Kline, D. Plonka, and A. Ron. 2002. A signal analysis of network traffic anomalies. In Proceedings of the International Memory Workshop (IMW’02).Google ScholarGoogle Scholar
  5. J. Brutlag. 2000. Aberrant behavior detection in time series for network monitoring. In Proceedings of the 14th USENIX Conference on Large Installation System Administration (LISA’00).Google ScholarGoogle Scholar
  6. P. Casas, J. Mazel, and P. Owezasrki. 2012. Unsupervised network intrusion detection systems: Detecting the unknown without knowledge. Comp. Comm. 35, 7 (2012), 772--783.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Cheng, Q. Xu, J. Lv, W. Liu, Q. Li, and J. Wang. 2016. MS-LSTM: A multi-scale LSTM model for BGP anomaly detection. In Proceedings of the IEEE International Conference on Network Protocols (ICNP’16).Google ScholarGoogle Scholar
  8. A. Chiang, E. David, Y. Lee, G. Leshem, and Y. Yeh. 2017. A study on anomaly detection ensembles. J. Appl. Logic 21 (2017), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  9. E. Chimetseren, K. Iwai, H. Tanaka, and T. Kurokawa. 2014. A study of IDS using discrete Fourier transform. In Proceedings of the 2014 International Conference on Advanced Technologies for Communications. 463--466.Google ScholarGoogle Scholar
  10. Cisco. 2011. NetFlow Version 9 Flow-Record Format. Retrieved from https://www.cisco.com/en/US/technologies/tk648/tk362/technologies_white_paper09186a00800a3db9.html.Google ScholarGoogle Scholar
  11. A. Dainotti, A. Pescape, and G. Ventre. 2006. Wavelet-based detection of DoS attacks. In Proceedings of the IEEE Global Communications Conference (GLOBECOM’06).Google ScholarGoogle Scholar
  12. G. Dewaele, Y. Himura, P. Borgnat, K. Fukuda, P. Abry, O. Michel, R. Fontugne, K. Cho, and H. Esaki. 2010. Unsupervised host behavior classification from connection patterns. Int. J. Netw. Mgmt. 20 (2010), 317--337.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Drasar, M. Vizvary, and J. Vykopal. 2014. Similarity as a central approach to flow-based anomaly detection. Int. J. Netw. Mgmt. 24 (2014), 318--336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Dubendorfer and B. Plattner. 2006. Host behaviour based early detection of worm outbreaks in Internet backbones. In Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise.Google ScholarGoogle Scholar
  15. E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. 2002. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In Applications of Data Mining in Computer Security. 77--101.Google ScholarGoogle Scholar
  16. J. Gao and P. N. Tan. 2006. Converting output scores from outlier detection algorithms into probability estimates. In Proceedings of the IEEE International Conference on Data Mining. 212--221.Google ScholarGoogle Scholar
  17. S. Garg and S. Batra. 2017. A novel ensembled technique for anomaly detection. Int. J. Comm. Syst. 30, 11 (2017). DOI:https://doi.org/10.1002/dac.3248Google ScholarGoogle ScholarCross RefCross Ref
  18. G. Giacinto, R. Pedisci, M. Del Rio, and F. Roli. 2008. Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf. Fus. 9, 1 (2008), 69--82.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Giani and V. Berk. 2006. Data exfiltration and covert channels. In Proceedings of the SPIE 6201, Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense V.Google ScholarGoogle Scholar
  20. P. Haag. [n.d.]. nfdump. Retrieved from https://github.com/phaag/nfdump.Google ScholarGoogle Scholar
  21. J. Higgins. 2017. Feds Award &dollar;500M Credit-monitoring Contract Following OPM Breach. Retreived from https://www.ecommercetimes.com/story/82524.html.Google ScholarGoogle Scholar
  22. C. Huang, S. Thareja, and Y. Shin. 2006. Wavelet-based real time detection of network traffic anomalies. In Proceedings of the International Conference on Security and Privacy for Emerging Areas in Communications Networks (SecureComm'06). 1--7.Google ScholarGoogle Scholar
  23. ITRC. 2017. 2017 Annual Data Breach Year-End Review. Identity Theft Resource Center.Google ScholarGoogle Scholar
  24. B. Jewell and J. Beaver. 2011. Host-based data exfiltration detection via system call sequences. In Proceedings of the International Conference on Internet and Web Applications and Services (ICIW’11).Google ScholarGoogle Scholar
  25. Dingde Jiang, Wenda Qin, Laisen Nie, Cheng Bao Yao, and Rongfang Lin. 2010. Time-frequency detection algorithm of network traffic anomalies. In Proceedings of the International Conference on Innovation and Information Management (ICIIM'12). 103--108.Google ScholarGoogle Scholar
  26. M. Kim, H. Kong, S. Hong, S. Chung, and J. Hong. 2004. A flow-based method for abnormal network traffic detection. In Proceedings of the 2004 IEEE/IFIP Network Operations and Management Symposium.Google ScholarGoogle Scholar
  27. S. Kim, N. Cho, Y. Lee, S. Kang, T. Kim, H. Hwang, and D. Mun. 2013. Application of density-based outlier detection to database activity monitoring. Inf. Syst. Front. 15 (2013), 55--65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. T. Kim and S. Cho. 2018. Web traffic anomaly detection using C-LSTM neural networks. Exp. Syst. Appl. 106 (2018), 66--76.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Kittler, M. Hatef, R. Duin, and J. Matas. 1998. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20, 3 (1998).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Koch, M. Golling, and G. Rodosek. 2014. Behavior-based intrusion detection in encrypted environments. IEEE Commun. Mag. 52, 7 (2014), 124--131.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Koch and G. D. Rodose. 2010. User identification in encrypted network communications. In Proceedings of the 2010 International Conference on Network and Service Management.Google ScholarGoogle Scholar
  32. H. Kriegel, P. Kroger, E. Schubert, and A. Zimek. 2011. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining. 13--24.Google ScholarGoogle Scholar
  33. A. Lakhina, M. Crovella, and C. Diot. 2004. Diagnosing network-wide traffic anomalies. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’04).Google ScholarGoogle Scholar
  34. A. Lakhina, M. Crovella, and C. Diot. 2005. Mining anomalies using traffic feature distributions. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications.Google ScholarGoogle Scholar
  35. A. Lazarevic and V. Kumar. 2005. Feature bagging for outlier detection. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’05). 157--166.Google ScholarGoogle Scholar
  36. K. Leung and C. Leckie. 2005. Unsupervised anomaly detection in network intrusion detection using clusters. In Proceedings of the 28th ACS Australasian Conference on Computer Science. 333--342.Google ScholarGoogle Scholar
  37. B. Li, J. Springer, G. Bebis, and M. H. Gunes. 2013. A survey of network flow applications. J. Netw. Comp. App. 36, 2 (2013), 567--581.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. Li and G. Lee. 2003. DDoS attack detection and wavelets. In Proceedings of the International Conference on Computer Communications and Networks (ICCCN’03).Google ScholarGoogle Scholar
  39. F. T. Liu, K. M. Ting, and Z. Zhou. 2008. Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining. 413--422.Google ScholarGoogle Scholar
  40. W. Lu and A. Ghorbani. 2008. Network anomaly detection based on wavelet analysis. EURASIP J. Adv. Sign Process. 837601 (2008), 1--16.Google ScholarGoogle Scholar
  41. Wei Lu and Issa Traoré. 2008. Unsupervised anomaly detection using an evolutionary extension of k-means algorithm. Int. J. Inf. Comput. Secur. 2, 2 (2008), 107--139.Google ScholarGoogle ScholarCross RefCross Ref
  42. R. Lyons. 2001. Understanding Digital Signal Processing. Prentice Hall, Upper Saddle River, NJ.Google ScholarGoogle Scholar
  43. M. Marchetti, F. Pierazzi, M. Colajanni, and A. Guido. 2016. Analysis of high volumes of network traffic for advanced persistent threat detection. Comput. Netw. 109 (2016), 127--141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Marron, M. Todd, and J. Ahn. 2007. Distance-weighted discrimination. J. Am. Stat. Assoc. 102, 480 (2007), 1267--1271.Google ScholarGoogle ScholarCross RefCross Ref
  45. J. Mazel, P. Casas, and P. Owezarski. 2011. Sub-space clustering and evidence accumulation for unsupervised network anomaly detection. In Proceedings of the Conference on Traffic Monitoring and Analysis (TMA’11). 15--28.Google ScholarGoogle Scholar
  46. C. Mazzola and T. Tragesser. 2011. Security incident discovery and correlation on .gov networks. In Proceedings of the Computer Emergency Response Team FloCon Workshop.Google ScholarGoogle Scholar
  47. J. McCrank and J. Finkle. 2018. Equifax Breach Could be Most Costly in Corporate History. Retrieved from https://www.reuters.com/article/us-equifax-cyber/equifax-breach-could-be-most-costly-in-corporate-history-idUSKCN1GE257.Google ScholarGoogle Scholar
  48. L. Meng, S. Liu, L. Liu, J. Chen, and H. Sun. 2012. Trojan rapid detection method based on heartbeat behavior analysis. Comp. Eng. 14 (2012).Google ScholarGoogle Scholar
  49. R. Mudge. [n.d.]. Cobalt Strike: Advanced Threat Tactics for Penetration Testers. Retrieved from https://www.cobaltstrike.com/.Google ScholarGoogle Scholar
  50. H. Nguyen, T. Nguyan, D. Kim, and D. Choi. 2008. Network traffic anomalies detection and identification with flow monitoring. In Proceedings of the 5th IFIP International Conference on Wireless and Optical Communications Networks (WOCN'08). 1--5.Google ScholarGoogle Scholar
  51. ohdae. [n.d.]. Intersect, Kali Tools. Retrieved from https://tools.kali.org/maintaining-access/intersect.Google ScholarGoogle Scholar
  52. F. Paul. 2017. We finally know how much a data breach can cost. Retrieved from https://www.networkworld.com/article/3172402/security/we-finally-know-how-much-a-data-breach-can-cost.html.Google ScholarGoogle Scholar
  53. L. Ponemon. 2017. 2017 Cost of Data Breach Study. Ponemon Institute.Google ScholarGoogle Scholar
  54. L. Portnoy, E. Eskin, and S. Stolfo. 2001. Intrusion detection with unlabeled data using clustering. In Proceedings of ACM CSS Workshop on Data Mining Applied to Security. 5--8.Google ScholarGoogle Scholar
  55. Y. Pu, X. Chen, X. Cui, J. Shi, L. Guo, and C. Qi. 2013. Data stolen trojan detection based on network behaviors. Proc. Comput. Sci. 17 (2013), 828--835.Google ScholarGoogle ScholarCross RefCross Ref
  56. Y. Qian, D. Shan, D. Wei, Y. Li, and Z. Luo. 2018. Network-wide anomalous flow identification method based on traffic characteristics distribution. Proc. Comp. Sci. 131 (2018), 1014--1022.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. B. Radford, L. Apolonio, A. Trias, and J. Simpson. 2018. Network traffic anomaly detection using recurrent neural networks. arXiv:1803:10769. Retrieved from https://arxiv.org/abs/1803.10769.Google ScholarGoogle Scholar
  58. R. Ramachandran, S. Neelakantan, and A. S. Bidyarthy. 2011. Behavior model for detecting data exfiltration in network environment. In Proceedings of the 2011 IEEE International Conference on Internet Multimedia Systems Architecture and Application.Google ScholarGoogle Scholar
  59. A. Ramanathan. 2002. WADeS: A tool for distributed denial of service attack detection. MS Thesis, Texas A8M University (2002).Google ScholarGoogle Scholar
  60. rapid7. [n.d.]. About the metesploit meterpreter. Retrieved from https://www.offensive-security.com/metasploit-unleashed/about-meterpreter/.Google ScholarGoogle Scholar
  61. L. Rokach. 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1--2 (2010), 1--39.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Bernhard Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor, and J. Platt. 1999. Support mector method for novelty detection. In Proceedings of the 12 International Conference on Neural Information Processing Systems. 582--588.Google ScholarGoogle Scholar
  63. E. Schubert, R. Wojdanowski, A. Zimek, and H. Kriegel. 2012. On evaluation of outlier rankings and outlier scores. In Proceedings of the 2012 SIAM International Conference on Data Mining. 1047--1058.Google ScholarGoogle Scholar
  64. L. Shan, D. Xiaorui, and R. Hong. 2010. n adaptive method preventing database from SQL injection attacks. In Proceedings of the International Conference on Advanced Computer Theory and Engineering (ICACTE’10).Google ScholarGoogle Scholar
  65. C. E. Shannon. 1949. Communication in the presence of noise. Proc. Inst. Radio Eng. 37, 1 (1949), 10--21.Google ScholarGoogle ScholarCross RefCross Ref
  66. L. Shoemaker and L. Hall. 2011. Anomaly detection using ensembles. In Proceedings of the 10th International Conference on Multiple Classifier Systems. 6--15.Google ScholarGoogle Scholar
  67. Mei-Ling Shyu, Shu-Ching Chen, Kanoksri Sarinnapakorn, and LiWu Chang. 2003. A novel anomaly detection scheme based on principal component classifier. In Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in Conjunction with the Third IEEE International Conference on Data Mining (ICDM'03). 172--179.Google ScholarGoogle Scholar
  68. J. Sigholm and M. Raciti. 2012. Best-effort data leakage prevention in inter-organizational tactical MANETs. In Proceeedings of the IEEE Military Communications Conference (MILCOM’12).Google ScholarGoogle Scholar
  69. A. Soule, K. Salamatian, and N. Taft. 2005. Combining filtering and statistical methods for anomaly detection. In Proceedings of the Association for Computing Machinery’s Special Interest Group on Data Communications (SIGCOMM’05).Google ScholarGoogle Scholar
  70. A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller. 2010. An overview of IP flow-based intrustion detection. IEEE Commun. Surv. Tut. 12, 3 (2010), 343--356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. D. Tax and R. Duin. 2001. Combining one-class classifiers. In Proceedings of the 2nd International Workshop on Multiple Classifier Systems. 299--308.Google ScholarGoogle Scholar
  72. F. Ullah, M. Edwards, R. Ramdhany, R. Chitchyan, M. Ali Babar, and A. Rashid. 2018. Data exfiltration: A review of external attack vectors and countermeasures. J. Netw. Comput. Appl. 101 (2018), 18--54.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Verizon. 2017. 2017 Data Breach Investigations Report. Retrieved from https://enterprise.verizon.com/resources/reports/2017_dbir.pdf.Google ScholarGoogle Scholar
  74. W. Wang, B. Yang, and V. Y. Chen. 2015. A visual analytics approach to detecting server redirections and data exfiltration. In Proceedings of the 16th Annual Information Security Symposium Center for Education and Research in Information Assurance and Security (CERIAS'15), Vol. 20. 1.Google ScholarGoogle Scholar
  75. G. Yan. 2016. Network anomaly traffic detection method based on support vector machine. In Proceedings of the International Conference on Smart City and Systems Engineering (ICSCSE'16). 3--6.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Malicious Overtones: Hunting Data Theft in the Frequency Domain with One-class Learning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Privacy and Security
          ACM Transactions on Privacy and Security  Volume 22, Issue 4
          November 2019
          170 pages
          ISSN:2471-2566
          EISSN:2471-2574
          DOI:10.1145/3364835
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2019
          • Accepted: 1 August 2019
          • Revised: 1 April 2019
          • Received: 1 July 2018
          Published in tops Volume 22, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!