Abstract
Ensemble analysis has recently been studied in the context of the outlier detection problem. In this paper, we investigate the theoretical underpinnings of outlier ensemble analysis. In spite of the significant differences between the classification and the outlier analysis problems, we show that the theoretical underpinnings between the two problems are actually quite similar in terms of the bias-variance trade-off. We explain the existing algorithms within this traditional framework, and clarify misconceptions about the reasoning underpinning these methods. We propose more effective variants of subsampling and feature bagging. We also discuss the impact of the combination function and discuss the specific trade-offs of the average and maximization functions. We use these insights to propose new combination functions that are robust in many settings.
References
- C. Aggarwal. Outlier Analysis, Springer, 2013. Google Scholar
Digital Library
- C. Aggarwal. Outlier ensembles: Position paper, SIGKDD Explorations, 14(2), 2012. Google Scholar
Digital Library
- C. Aggarwal, P. Yu. Outlier detection in highdimensional data. SIGMOD, 2001. Google Scholar
Digital Library
- F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces. PKDD, pp. 15--26, 2002. Google Scholar
Digital Library
- D. Barbara, Y. Li, J. Couto, J. Lin, S. Jajodia. Bootstrapping a data mining intrusion detection system. In ACM SAC, pp. 421--425, 2003. Google Scholar
Digital Library
- P. Buhlmann. Bagging, subagging and bragging for improving some prediction algorithms, Recent advances and trends in nonparametric statistics, Elsivier, 2003.Google Scholar
- P. Buhlmann, B. Yu. Analyzing bagging. Annals of Statistics, pp. 927--961, 2002.Google Scholar
Cross Ref
- A. Buja, W. Stuetzle. Observations on bagging. Statistica Sinica, 16(2), 323, 2006.Google Scholar
- M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying density-based local outliers, SIGMOD, 2000. Google Scholar
Digital Library
- Y. Freund, R. Schapire. A Decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, 1995. Google Scholar
Digital Library
- J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006. Google Scholar
Digital Library
- Z. He, S. Deng, X. Xu. A unified subspace outlier ensemble framework for outlier detection. WAIM, 2005. Google Scholar
Digital Library
- F. Keller, E. Muller, K. Bohm. HiCS: High-contrast subspaces for density-based outlier ranking. ICDE, 2012. Google Scholar
Digital Library
- A. Lazarevic, V. Kumar. Feature bagging for outlier detection, ACM KDD Conference, 2005. Google Scholar
Digital Library
- F. T. Liu, K. M. Ting, Z.-H. Zhou. Isolation forest. ICDM Conference, 2008. Google Scholar
Digital Library
- P. Melville, R. Mooney. Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 2005.Google Scholar
- B. Micenkova, B. McWilliams, I. Assent. Learning representations for outlier detection on a budget. CoRR abs/1507.08104, 2015.Google Scholar
- E. Muller, M. Schiffer, T. Seidl. Statistical selection of relevant subspace projections for outlier ranking. ICDE Conference, 2011. Google Scholar
Digital Library
- H. Nguyen, H. Ang, V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces. DASFAA, 2010. Google Scholar
Digital Library
- D. Politis, J. Romano, and M. Wolf. Subsampling. Springer, 1999.Google Scholar
Cross Ref
- S. Rayana, L. Akoglu. Less is more: Building selective anomaly ensembles. SDM Conference, 2015.Google Scholar
- M. Shyu, S. Chen, K. Sarinnapakorn, L. Chang. A novel anomaly detection scheme based on principal component classifier. ICDMW, 2003.Google Scholar
- A. Zimek, R. Campello, J. Sander. Ensembles for unsupervised outlier detection: Challenges and research questions, SIGKDD Explorations, 15(1), 2013. Google Scholar
Digital Library
- A. Zimek, M. Gaudet, R. Campello, J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles, KDD Conference, 2013. Google Scholar
Digital Library
- A. Zimek, R. Campello, J. Sander. Data perturbation for outlier detection ensembles. SSDBM, 2014. Google Scholar
Digital Library
- http://elki.dbs.ifi.lmu.de/wiki/AlgorithmsGoogle Scholar
Index Terms
Theoretical Foundations and Algorithms for Outlier Ensembles





Comments