Abstract
Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in temporal graphs as well as outlier detection in no-graph settings. It handles and combines multiple heterogeneous detectors to yield improved and robust performance. Importantly, trusting results from all the constituent detectors may deteriorate the overall performance of the ensemble, as some detectors could provide inaccurate results depending on the type of data in hand and the underlying assumptions of a detector. This suggests that combining the detectors selectively is key to building effective anomaly ensembles—hence “less is more”.
In this paper we propose a novel ensemble approach called SELECT for anomaly detection, which automatically and systematically selects the results from constituent detectors to combine in a fully unsupervised fashion. We apply our method to event detection in temporal graphs and outlier detection in multi-dimensional point data (no-graph), where SELECT successfully utilizes five base detectors and seven consensus methods under a unified ensemble framework. We provide extensive quantitative evaluation of our approach for event detection on five real-world datasets (four with ground truth events), including Enron email communications, RealityMining SMS and phone call records, New York Times news corpus, and World Cup 2014 Twitter news feed. We also provide results for outlier detection on seven real-world multi-dimensional point datasets from UCI Machine Learning Repository. Thanks to its selection mechanism, SELECT yields superior performance compared to the individual detectors alone, the full ensemble (naively combining all results), an existing diversity-based ensemble, and an existing weighted ensemble approach.
- Charu C. Aggarwal. 2012. Outlier ensembles: Position paper. SIGKDD Explor. Newsl. 14, 2 (2012), 49--58. Google Scholar
Digital Library
- Charu C. Aggarwal and Saket Sathe. 2015. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter 17, 1 (2015), 24--47. Google Scholar
Digital Library
- Leman Akoglu and Christos Faloutsos. 2010. Event detection in time series of mobile communication graphs. In 27th Army Science.Google Scholar
- Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: A survey. DAMI 28, 4 (2014). DOI:http://dx.doi.org/DOI 10.1007/s10618-014-0365-yGoogle Scholar
- Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140. Google Scholar
Digital Library
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In SIGMOD. Google Scholar
Digital Library
- Gavin Brown, Jeremy L. Wyatt, Rachel Harris, and Xin Yao. 2005. Diversity creation methods: A survey and categorisation. Information Fusion 6, 1 (2005), 5--20.Google Scholar
Cross Ref
- A. Cameron and P. K. Trivedi. 1998. Regression Analysis of Count Data (1st ed.). Cambridge Univ. Press.Google Scholar
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009). Google Scholar
Digital Library
- Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems (Lecture Notes in Computer Science), Vol. 1857. Springer, 1--15. Google Scholar
Digital Library
- Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2013. Data fusion: Resolving conflicts from multiple sources. In WAIM. Google Scholar
Digital Library
- Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggregation methods for the web. In WWW. Google Scholar
Digital Library
- Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. PNAS (2009).Google Scholar
- Xiaoli Zhang Fern and Carla E. Brodley. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In ICML. AAAI Press, 186--193.Google Scholar
- Xiaoli Z. Fern and Wei Lin. 2008. Cluster ensemble selection. In SDM (2008-06-11). SIAM, 787--797. Google Scholar
Digital Library
- Jing Gao, Wei Fan, Deepak Turaga, Olivier Verscheure, Xiaoqiao Meng, Lu Su, and Jiawei Han. 2011. Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection. In IEEE International Conference on Computer Communications Mini-Conference.Google Scholar
Cross Ref
- Jun Gao, Weiming Hu, Zhongfei (Mark) Zhang, and Ou Wu. 2012. Unsupervised ensemble learning for mining top-n outliers. In PAKDD. Google Scholar
Digital Library
- Jing Gao and Pang-Ning Tan. 2006. Converting output scores from outlier detection algorithms to probability estimates. In ICDM. Google Scholar
Digital Library
- Joydeep Ghosh and Ayan Acharya. 2013. Cluster ensembles: Theory and applications. In Data Clustering: Alg. and Appl.Google Scholar
- Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2250--2267. DOI:http://dx.doi.org/10.1109/ TKDE.2013.184Google Scholar
Cross Ref
- Stefan Todorov Hadjitodorov, Ludmila I. Kuncheva, and Ludmila P. Todorova. 2006. Moderate diversity for better cluster ensembles. Information Fusion 7, 3 (2006), 264--275. Google Scholar
Digital Library
- Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10 (1990).Google Scholar
Digital Library
- Douglas M. Hawkins. 1980. Identification of Outliers. Vol. 11. Springer.Google Scholar
- Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. Bayesian model averaging: A tutorial. Statist. Sci. 14, 4 (1999), 382--417.Google Scholar
Cross Ref
- A. Juditsky, P. Rigollet, and A. B. Tsybakov. 2008. Learning by mirror averaging. Annals of Stat. 36, 5 (2008), 2183--2206. DOI:http://dx.doi.org/10.1214/07-AOS546Google Scholar
Cross Ref
- John Kemeny. 1959. Mathematics without numbers. In Daedalus. 577--591.Google Scholar
- Alexandre Klementiev, Dan Roth, and Kevin Small. 2007. An unsupervised learning algorithm for rank aggregation. In ECML. Google Scholar
Digital Library
- Edwin M. Knorr and Raymond T. Ng. 1997. A unified notion of outliers: Properties and computation. In KDD. 219--222.Google Scholar
- Raivo Kolde, Sven Laur, Priit Adler, and Jaak Vilo. 2012. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28, 4 (2012), 573--580. Google Scholar
Digital Library
- Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2009. LoOP: Local outlier probabilities. In CIKM. ACM, 1649--1652. Google Scholar
Digital Library
- Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2011. Interpreting and unifying outlier scores. In SDM. 13--24.Google Scholar
- L. Kuncheva and C. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 (2003), 181--207. Google Scholar
Digital Library
- Anukool Lakhina, Mark Crovella, and Christophe Diot. 2004. Diagnosing network-wide traffic anomalies. In SIGCOMM. 219--230. Google Scholar
Digital Library
- Diane Lambert. 1992. Zero-inflated Poisson regression with an application to defects in manufacturing. In Technometrics. 1--14. Google Scholar
Digital Library
- Aleksandar Lazarevic and Vipin Kumar. 2005. Feature bagging for outlier detection. In KDD. ACM, 157--166. Google Scholar
Digital Library
- Barbora Micenkovä, Brian McWilliams, and Ira Assent. 2014. Learing outlier ensembles: The best of both worlds - supervised and unsupervised. In KDD ODD2 Workshop.Google Scholar
- Kristine Monteith, James L. Carroll, Kevin D. Seppi, and Tony R. Martinez. 2011. Turning Bayesian model averaging into Bayesian model combination. In IJCNN. IEEE, 2657--2663.Google Scholar
- Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. LOCI: Fast outlier detection using the local correlation integral. In ICDE. 315--326.Google Scholar
- Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In VLDB. Google Scholar
Digital Library
- Oskar Perron. 1907. Zur theorie der matrices. In Mathematische Annalen.Google Scholar
- Michael D. Porter and Gentry White. 2012. Self-exciting hurdle models for terrorist actovity. In The Annals of Applied Statistics. 106--124.Google Scholar
- Christine Preisach and Lars Schmidt-Thieme. 2007. Ensembles of relational classifiers. Knowl. Info. Sys. 14 (2007), 249--272. Google Scholar
Digital Library
- Shebuti Rayana and Leman Akoglu. 2014. An ensemble approach for event detection in dynamic graphs. In KDD ODD2 Workshop.Google Scholar
- Lior Rokach. 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1--2 (2010), 1--39. Google Scholar
Digital Library
- Robert E. Schapire. 1990. The strength of weak learnability. In Machine Learning. 197--227. Google Scholar
Digital Library
- Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel. 2012. On evaluation of outlier rankings and outlier scores. In SDM. 1047--1058.Google Scholar
- Alexander P. Topchy, Anil K. Jain, and William F. Punch. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1866--1881. Google Scholar
Digital Library
- Alexandre B. Tsybakov. 2004. Optimal aggregation of classifiers in statistical learning. Annals of Statistics 32 (2004), 135--166.Google Scholar
Cross Ref
- Giorgio Valentini and Francesco Masulli. 2002. Ensembles of learning machines. In WIRN. Google Scholar
Digital Library
- Brijesh Verma and Ashfaqur Rahman. 2012. Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Trans. Knowl. Data Eng. 24, 4 (2012), 605--618. Google Scholar
Digital Library
- Nguyen Hoang Vu, Hock Hee Ang, and Vivekanand Gopalkrishnan. 2010. Mining outliers with ensemble of heterogeneous detectors on random subspaces. In DASFAA, Vol. 5981. 368--383. Google Scholar
Digital Library
- Quang H. Vuong. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. In Econometrica.Google Scholar
- D. H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 214--259. Google Scholar
Digital Library
- Faisal Zaman and Hideo Hirose. 2011. Classification performance of bagging and boosting type ensemble methods with small training sets. New Gen. Comput. 29, 3 (2011), 277--292.Google Scholar
Cross Ref
- Ke Zhang, Marcus Hutter, and Huidong Jin. 2009. A new local distance-based outlier detection approach for scattered real-world data. In PAKDD. Springer, 813--822. Google Scholar
Digital Library
- Arthur Zimek, Ricardo J. G. B. Campello, and Jörg Sander. 2013a. Ensembles for unsupervised outlier detection: Challenges and research questions. SIGKDD Explor. Newsl. 15, 1 (2013), 11--22. DOI:http://dx.doi.org/10.1145/2594473.2594476 Google Scholar
Digital Library
- Arthur Zimek, Matthew Gaudet, Ricardo J. G. B. Campello, and Jörg Sander. 2013b. Subsampling for efficient and effective unsupervised outlier detection ensembles. In KDD. ACM, 428--436. Google Scholar
Digital Library
Index Terms
Less is More: Building Selective Anomaly Ensembles
Recommendations
Improved classification with allocation method and multiple classifiers
We propose a new allocation method for building a classification ensemble.Allocation method uses multiple classifiers: the allocator and micro classifiers.Allocator separates the dataset and allocates them to one of micro classifiers.Allocator is based ...
Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment
ASWEC ' 15 Vol. II: Proceedings of the ASWEC 2015 24th Australasian Software Engineering ConferenceIn object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve ...
Switching class labels to generate classification ensembles
Ensembles that combine the decisions of classifiers generated by using perturbed versions of the training set where the classes of the training examples are randomly switched can produce a significant error reduction, provided that large numbers of ...





Comments