ABSTRACT
Feature selection is the task of choosing a small set out of a given set of features that capture the relevant properties of the data. In the context of supervised classification problems the relevance is determined by the given labels on the training data. A good choice of features is a key for building compact and accurate classifiers. In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. Using margins we devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound. We also study the well known Relief algorithm and show that it resembles a gradient ascent over our margin criterion. We apply our new algorithm to various datasets and show that our new Simba algorithm, which directly optimizes the margin, outperforms Relief.
- Bartlett, P. (1998). The size of the wieghts is more important than the size of the network. IEEE Transactions on Information Theory, 44, 525--536. Google Scholar
Digital Library
- Bellman, R. (1961). Adaptive control processes: A guided tour. Princeton University Press.Google Scholar
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273--297. Google Scholar
Digital Library
- Cover, T., & Hart, P. (1967). Nearest neighbor pattern classifier. IEEE Trans. on Information Theory, 13.Google Scholar
Cross Ref
- Crammer, K., Gilad-Bachrach, R., Navot, A., & Tishby, N. (2002). Margin analysis of the lvq algorithm. Proc. 17'th Conference on Neural Information Processing Systems.Google Scholar
- Fix, E., & Hodges, j. (1951). Discriminatory analysis. nonparametric discrimination: Consistency properties (Technical Report 4). USAF school of Aviation Medicine.Google Scholar
- Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55. Google Scholar
Digital Library
- Globerson, A., & Tishby, N. (2003). Sufficient dimensionality reduction. Journal of Machine Learning, 1307--1331. Google Scholar
Digital Library
- Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 1157--1182. Google Scholar
Digital Library
- Guyon, I., & Gunn, S. (2003). Nips feature selection challenge. http://www.nipsfsc.ecs.soton.ac.uk/.Google Scholar
- Jolliffee, I. (1986). Principal component analysis. Springer Varlag.Google Scholar
- Kira, K., & Rendell, L. (1992). A practical approach to feature selection. Proc. 9th International Workshop on Machine Learning (pp. 249--256). Google Scholar
Digital Library
- Kohavi, R., & John, G. (1997). Wrapper for feature subset selection. Artificial Intelligence, 97, 273--324. Google Scholar
Digital Library
- Kohonen, T. (1995). Self-organizing maps. Springer-Verlag. Google Scholar
Digital Library
- Martinez, A., & Benavente, R. (1998). The ar face database (Technical Report). CVC Tech. Rep. #24.Google Scholar
- Quinlan, J. R. (1990). Induction of decision trees. In J. W. Shavlik and T. G. Dietterich (Eds.), Readings in machine learning. Morgan Kaufmann. Originally published in Machine Learning 1:81--106, 1986. Google Scholar
Digital Library
- Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.Google Scholar
- Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics.Google Scholar
- Shawe-Taylor, J., Bartlett, P., Williamson, R., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE transactions on Information Theory, 44, 1926--1940.Google Scholar
- Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing (pp. 368--377).Google Scholar
- Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Proc. 15th Conference on Neural Information Processing Systems (NIPS) (pp. 668--674).Google Scholar
- Margin based feature selection - theory and algorithms
Recommendations
Feature Selection Using Memetic Algorithms
ICCIT '08: Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology - Volume 01The feature selection process can be considered a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data, and results in acceptable classification accuracy. In ...
Correlation based feature selection method
Feature selection is an important data preprocessing step which is performed before a learning algorithm is applied. The issue that has to be taken into consideration when proposing a feature selection method is its computational complexity. Often, if ...
Genetic algorithms in feature and instance selection
Feature selection and instance selection are two important data preprocessing steps in data mining, where the former is aimed at removing some irrelevant and/or redundant features from a given dataset and the latter at discarding the faulty data. ...




Comments