skip to main content
research-article

Anomaly Detection in Dynamic Systems Using Weak Estimators

Published:01 July 2011Publication History
Skip Abstract Section

Abstract

Anomaly detection involves identifying observations that deviate from the normal behavior of a system. One of the ways to achieve this is by identifying the phenomena that characterize “normal” observations. Subsequently, based on the characteristics of data learned from the “normal” observations, new observations are classified as being either “normal” or not. Most state-of-the-art approaches, especially those which belong to the family of parameterized statistical schemes, work under the assumption that the underlying distributions of the observations are stationary. That is, they assume that the distributions that are learned during the training (or learning) phase, though unknown, are not time-varying. They further assume that the same distributions are relevant even as new observations are encountered. Although such a “stationarity” assumption is relevant for many applications, there are some anomaly detection problems where stationarity cannot be assumed. For example, in network monitoring, the patterns which are learned to represent normal behavior may change over time due to several factors such as network infrastructure expansion, new services, growth of user population, and so on. Similarly, in meteorology, identifying anomalous temperature patterns involves taking into account seasonal changes of normal observations. Detecting anomalies or outliers under these circumstances introduces several challenges. Indeed, the ability to adapt to changes in nonstationary environments is necessary so that anomalous observations can be identified even with changes in what would otherwise be classified as “normal” behavior. In this article we propose to apply a family of weak estimators for anomaly detection in dynamic environments. In particular, we apply this theory to spam email detection. Our experimental results demonstrate that our proposal is both feasible and effective for the detection of such anomalous emails.

References

  1. Androutsopoulos, I., Koutsias, J., Chandrinos, K., and Spyropoulos, C. 2000. An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM New York, 160--167. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Chandola, V., Banerjee, A., and Kumar, V. 2009. Anomaly detection: A survey. ACM Comput. Surv. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Chopra, M., Martin, M., Rueda, L., and Hung, P. 2006. Toward new paradigms to combating Internet child pornography. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (CCECE’06). 1012--1015.Google ScholarGoogle Scholar
  4. Didion, J. 2004. The Java WordNet Library. http://jwordnet.sourceforge.net/.Google ScholarGoogle Scholar
  5. Enron. 2006. Enron-spam dataset. http://www.aueb.gr/ users/ion/data/enron-spam/.Google ScholarGoogle Scholar
  6. Guzella, T. and Caminhas, W. 2009. A review of machine learning approaches to spam filtering. Expert Syst. Appl. To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kong, J., Rezaei, B., Sarshar, N., Roychowdhury, V., and Boykin, P. 2006. Collaborative spam filtering using e-mail networks. Computer 39, 8, 67--73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kushner, H. and Yin, G. 2003. Stochastic Approximation and Recursive Algorithms and Applications 2nd Ed. Springer, Berlin.Google ScholarGoogle Scholar
  9. McGregor, C. 2007. Controlling spam with spamassassin. Linux J. 153, 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Metsis, V., Androutsopoulos, I., and Paliouras, G. 2006. Spam filtering with naive Bayes -- Which naive Bayes. In Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS). 125--134.Google ScholarGoogle Scholar
  11. Miller, A. 1995. Wordnet: A lexical database for English. Comm. ACM 38, 11, 39--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mladenić, D., Brank, J., Grobelnik, M., and Milic-Frayling, N. 2004. Feature selection using linear classifier weights: Interaction with classification models. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, 234--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Narendra, K. and Thathachar, M. 1989. Learning Automata. An Introduction. Prentice Hall, Englewood Cliffs, NJ. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Norris, J. 1999. Markov Chains. Springer, Berlin.Google ScholarGoogle Scholar
  15. Oommen, B. and Misra, S. 2006. A fault-tolerant routing algorithm for mobile ad hoc networks using a stochastic learning-based weak estimation procedure. In Proceedings of the IEEE International Conference on Wireless and Mobile Computing, Networking and Communications. IEEE, Los Alamitos, CA, 31--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Oommen, B. and Rueda, L. 2006. Stochastic learning-based weak estimation of multinomial random variables and its applications to pattern recognition in non-stationary environments. Pattern Recogn. 39, 3, 328--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Open NLP. 2008. Open NLP. http://opennlp.sourceforge.net.Google ScholarGoogle Scholar
  18. Rueda, L. and Oommen, B. 2006. Stochastic automata-based estimators for adaptively compressing files with nostationary distributions. IEEE Trans, Syst. Man, Cybern. Part B 36, 5, 1196--1200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wang, B., Jones, G., and Pan, W. 2006. Using online linear classifiers to filter spam emails. Pattern Anal. Appl. 9, 4, 339--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Watkins, C. 1989. Learning from delayed rewards. Ph.D. dissertation, University of Cambridge, UK.Google ScholarGoogle Scholar
  22. Yang, Y. and Pedersen, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML’97). D. H. Fisher Ed., Morgan Kaufmann, San Francisco, CA, 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Zhan, J., Oommen, J., and Crisostmo, J. 2009. Anomaly detection in dynamic social email systems. In Proceedings of the IEEE International Conference on Social Computing. IEEE, Los Alamitos, CA.Google ScholarGoogle Scholar
  24. Zhang, L., Zhu, J., and Yao, T. 2004. An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. 3, 4, 243--269. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Anomaly Detection in Dynamic Systems Using Weak Estimators

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 11, Issue 1
      July 2011
      95 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/1993083
      Issue’s Table of Contents

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 July 2011
      • Accepted: 1 January 2011
      • Revised: 1 October 2010
      • Received: 1 January 2001
      Published in toit Volume 11, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!