skip to main content
article

Behavior-based modeling and its application to Email analysis

Published:01 May 2006Publication History
Skip Abstract Section

Abstract

The Email Mining Toolkit (EMT) is a data mining system that computes behavior profiles or models of user email accounts. These models may be used for a multitude of tasks including forensic analyses and detection tasks of value to law enforcement and intelligence agencies, as well for as other typical tasks such as virus and spam detection. To demonstrate the power of the methods, we focus on the application of these models to detect the early onset of a viral propagation without “content-base ” (or signature-based) analysis in common use in virus scanners. We present several experiments using real email from 15 users with injected simulated viral emails and describe how the combination of different behavior models improves overall detection rates. The performance results vary depending upon parameter settings, approaching 99 % true positive (TP) (percentage of viral emails caught) in general cases and with 0.38 % false positive (FP) (percentage of emails with attachments that are mislabeled as viral). The models used for this study are based upon volume and velocity statistics of a user's email rate and an analysis of the user's (social) cliques revealed in the person's email behavior. We show by way of simulation that virus propagations are detectable since viruses may emit emails at rates different than human behavior suggests is normal, and email is directed to groups of recipients in ways that violate the users' typical communications with their social groups.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on the Management of Data. pp. 207--216. Google ScholarGoogle Scholar
  2. Apap, F., Andrew Honig, A., Shlomo Hershkop, S., Eleazar Eskin, E., and Stolfo, S. J. 2002. Detecting malicious software by monitoring anomalous windows registry accesses. In Proceedings of the Fifth International Symposium on Recent Advances in Intrusion Detection (RAID-2002, Zurich, Switzerland, Oct.). 16--18. Google ScholarGoogle Scholar
  3. Bhattacharyya, M., Hershkop, S., Eskin, E., and Stolfo, S. J. 2002. MET: An experimental system for malicious email tracking. In Proceedings of the 2002 New Security Paradigms Workshop (NSPW-2002), Virginia Beach, VA, Sept.). Google ScholarGoogle Scholar
  4. Bi, Z., Faloustos, C., and Korn, F. 2001. The DGX distribution for mining massive, skewed data. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 17--26. Google ScholarGoogle Scholar
  5. Bron, C. and Kerbosch, J. 1973. Finding all cliques of an undirected graph. Commun. ACM 16, 9, 575--577. Google ScholarGoogle Scholar
  6. Damashek, M. 1995. Gauging similarity with n-grams: Language independent categorization of text. In Science, 267, 843--848.Google ScholarGoogle Scholar
  7. Davis, P. T. 2003. Finding friends and enemies through the analysis of clique dynamics. Tech. rep., Computer Science Department, Columbia University, New York, NY.Google ScholarGoogle Scholar
  8. Denning, D. E. 1987. An intrusion-detection model. IEEE Trans. Softw. Eng., SE-13, 222--232. Google ScholarGoogle Scholar
  9. Eskin, E. 2000. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the 17th International Conference on Machine Learning (ICML-2000). Google ScholarGoogle Scholar
  10. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. J. 2002. A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. Data Mining for Security Applications.(Jajodia, Barbara, Eds.), Kluwer, Norwell, MA.Google ScholarGoogle Scholar
  11. Ghosh, A. K., Schwartzbard, A., and Schatz, M. 1999. Learning Program Behavior Profiles for Intrusion Detection. In Proceedings of the Workshop Intrusion Detection and Network Monitoring 1999. 51--62. Google ScholarGoogle Scholar
  12. Hershkop, S., Ferster, R., Bui, L. H., Wang, K., and Stolfo, S. J. 2003. Host-based anomaly detection by wrapping file system accesses. Tech. rep. Columbia University, New York, NY. Go online to http://www.cs.columbia.edu/ids/publications/.Google ScholarGoogle Scholar
  13. Hofmeyr, S. A., Forrest, S., and Somayaji, A. 1998. Intrusion detection using sequences of system calls. J. Comput. Secur. 6, 151--180. Google ScholarGoogle Scholar
  14. Hogg, R. V. and Craig, A. T. 1994. Introduction to Mathematical Statistics, Prentice Hall, Englewood Cliffs, N.J., 293--301.Google ScholarGoogle Scholar
  15. Javitz, H. S. and Valdes, A. 1993. The NIDES Statistical Component: Description and Justification. Tech. rep. SRI International, Menlo Park, CA.Google ScholarGoogle Scholar
  16. John, G. H. and Langley, P. 1995. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. 338--345. Google ScholarGoogle Scholar
  17. Kleinberg, J. 2002. Bursty and hierarchical structure in streams. In Proceedings 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 91--101. Google ScholarGoogle Scholar
  18. Lane, T. and Brodley, C. E. 1999. Temporal sequence learning and data reduction for anomaly detection. ACM Trans. Inform. Syst. Secur., 2, 295--331. Google ScholarGoogle Scholar
  19. Lee, W. and Stolfo, S. 1999. A framework for constructing features and models for intrusion detection systems. In Proceedings of the 1999 IEEE Symposium on Computer Security and Privacy and Proceedings of the 8th ACM SICKDD International Conference on Knowledge Discovery and Data Mining.Google ScholarGoogle Scholar
  20. Lee, W., Stolfo, S., and Chan, P. 1997. Learning patterns from Unix process execution traces for intrusion detection. In Proceedings of the AAAI Workshop: AI Approaches to Fraud Detection and Risk Management (July).Google ScholarGoogle Scholar
  21. Lee, W., Stolfo, S., and Mok, K. 1998. Mining audit data to build intrusion detection models. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD '98), New York, NY, Aug.)Google ScholarGoogle Scholar
  22. Lee, W., Stolfo, S. J., and Mok, K. 1999. Mining in a data-flow environments: Experiences in intrusion detection. In Proceedings of the 1999 Conference on Knowledge Discovery and Data Mining (KDD--99). Google ScholarGoogle Scholar
  23. Lee, W. and Xiang, D. 2001. Information-theoretic measures for anomaly detection. In Proceedings of the 2001 IEEE Symposium on Security and Privacy (May). Google ScholarGoogle Scholar
  24. Mahoney, M. V. and Chan, P. K. 2001. Detecting novel attacks by identifying anomalous network packet headers. Tech. rep. Florida Institute of Technology, Melbourne, FL. CS-2001-2.Google ScholarGoogle Scholar
  25. Mitchell, T. M. 1997. Machine Learning, McGraw-Hill, New York, NY, 180--183. Google ScholarGoogle Scholar
  26. Mysql. 2002. Go online to www.mysql.org.Google ScholarGoogle Scholar
  27. Newman, M. E., Forrest, S., and Balthrup, J. 2002. Email networks and the spread of computer viruses. Phys. Rev. E 66, 3 (Sept.).Google ScholarGoogle Scholar
  28. Niblack, W., et al. 1993. The QBIC project: Querying images by content using color, texture, and shape. In Proceedings of the SPIE (Feb.).Google ScholarGoogle Scholar
  29. Schonlau, M., Dumouchel, W., Ju, W., Karr, A. F., Theus, M., and Vardi, Y. 2001. Computer intrusion detecting masquerades. Statist. Sci. 16, 1, 1--17.Google ScholarGoogle Scholar
  30. Schultz, M. G., Eskin, E., and Stolfo, S. J. 2001. Malicious email filter---A UNIX mail filter that detects malicious windows executables. In Proceedings of USENIX Annual Technical Conference---FREENIX Track (Boston, MA). Google ScholarGoogle Scholar
  31. Smith, J. R. 1997. Integrated spatial and feature image systems: Retrieval, compression and analysis. Ph. D. deissertation. Columbia University, New York, NY.Google ScholarGoogle Scholar
  32. Stolfo, S. J., Hershkop, S., Wang, K., Nimeskern, D., and Hu, C.-W. 2003. Behavior profiling of email. In Proceedings of the 1st NSF/NIJ Symposium on Intelligence & Security Informatics (ISI 2003, Tucson, AZ). Google ScholarGoogle Scholar
  33. Stolfo, S. J., Chan, P., and Prodromidis, A. 1999. Distributed data mining in credit card fraud detection, IEEE Intell. Syst. 14, 6, 67--74. Google ScholarGoogle Scholar
  34. Tan, K. M. C. and Maxion, R. A. 2002. Why 6? Defining the operational limits of stide, an anomaly-based intrusion detector. In Proceedings of the IEEE Symposium on Security and Privacy. IEEE Computer Society Press, Los Alamitos, CA, 188--201. Google ScholarGoogle Scholar
  35. Taylor, C. and Alves-Foss, J. 2001. NATE: Network analysis of anomalous traffic events, a low-cost approach. In Proceedings of the New Security Paradigms Workshop. 89--96. Google ScholarGoogle Scholar
  36. Wagner, D. and Soto, P. 2002. Mimicry attacks on host-based intrusion detection systems. In Proceedings of the 9th ACM Conference on Computer and Communications Security (CCS, Washington, DC). 255--264. Google ScholarGoogle Scholar
  37. Warrender, C., Forrest, S., and Pearlmutter, B. 1999. Detecting intrusions using system calls: Alternative data models. In Proceedings of the IEEE Symposium Security and Privacy.Google ScholarGoogle Scholar
  38. Watts, D. J. 2003. Six Degrees: The Science of a Connected Age. W.W. Norton & Company, New York, NY.Google ScholarGoogle Scholar
  39. Williamson, M. M. 2002. Throttling viruses: Restricting propagation to defeat malicious mobile code. In Proceedings of the ACSAC Security Conference (Las Vegas, NV). Google ScholarGoogle Scholar
  40. Ye, N. 2000. A markov chain model of temporal behavior for anomaly detection. In Proceedings of the 2000 IEEE Workshop on Information Assurance and Security (U. S. Military Academy, West Point, NY).Google ScholarGoogle Scholar

Index Terms

  1. Behavior-based modeling and its application to Email analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!