skip to main content
10.1145/1015330.1015352acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Margin based feature selection - theory and algorithms

Published:04 July 2004Publication History

ABSTRACT

Feature selection is the task of choosing a small set out of a given set of features that capture the relevant properties of the data. In the context of supervised classification problems the relevance is determined by the given labels on the training data. A good choice of features is a key for building compact and accurate classifiers. In this paper we introduce a margin based feature selection criterion and apply it to measure the quality of sets of features. Using margins we devise novel selection algorithms for multi-class classification problems and provide theoretical generalization bound. We also study the well known Relief algorithm and show that it resembles a gradient ascent over our margin criterion. We apply our new algorithm to various datasets and show that our new Simba algorithm, which directly optimizes the margin, outperforms Relief.

References

  1. Bartlett, P. (1998). The size of the wieghts is more important than the size of the network. IEEE Transactions on Information Theory, 44, 525--536. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bellman, R. (1961). Adaptive control processes: A guided tour. Princeton University Press.Google ScholarGoogle Scholar
  3. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cover, T., & Hart, P. (1967). Nearest neighbor pattern classifier. IEEE Trans. on Information Theory, 13.Google ScholarGoogle ScholarCross RefCross Ref
  5. Crammer, K., Gilad-Bachrach, R., Navot, A., & Tishby, N. (2002). Margin analysis of the lvq algorithm. Proc. 17'th Conference on Neural Information Processing Systems.Google ScholarGoogle Scholar
  6. Fix, E., & Hodges, j. (1951). Discriminatory analysis. nonparametric discrimination: Consistency properties (Technical Report 4). USAF school of Aviation Medicine.Google ScholarGoogle Scholar
  7. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Globerson, A., & Tishby, N. (2003). Sufficient dimensionality reduction. Journal of Machine Learning, 1307--1331. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 1157--1182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guyon, I., & Gunn, S. (2003). Nips feature selection challenge. http://www.nipsfsc.ecs.soton.ac.uk/.Google ScholarGoogle Scholar
  11. Jolliffee, I. (1986). Principal component analysis. Springer Varlag.Google ScholarGoogle Scholar
  12. Kira, K., & Rendell, L. (1992). A practical approach to feature selection. Proc. 9th International Workshop on Machine Learning (pp. 249--256). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kohavi, R., & John, G. (1997). Wrapper for feature subset selection. Artificial Intelligence, 97, 273--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kohonen, T. (1995). Self-organizing maps. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Martinez, A., & Benavente, R. (1998). The ar face database (Technical Report). CVC Tech. Rep. #24.Google ScholarGoogle Scholar
  16. Quinlan, J. R. (1990). Induction of decision trees. In J. W. Shavlik and T. G. Dietterich (Eds.), Readings in machine learning. Morgan Kaufmann. Originally published in Machine Learning 1:81--106, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290.Google ScholarGoogle Scholar
  18. Schapire, R. E., Freund, Y., Bartlett, P., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Annals of Statistics.Google ScholarGoogle Scholar
  19. Shawe-Taylor, J., Bartlett, P., Williamson, R., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE transactions on Information Theory, 44, 1926--1940.Google ScholarGoogle Scholar
  20. Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing (pp. 368--377).Google ScholarGoogle Scholar
  21. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2000). Feature selection for SVMs. Proc. 15th Conference on Neural Information Processing Systems (NIPS) (pp. 668--674).Google ScholarGoogle Scholar
  1. Margin based feature selection - theory and algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICML '04: Proceedings of the twenty-first international conference on Machine learning
          July 2004
          934 pages
          ISBN:1581138385
          DOI:10.1145/1015330
          • Conference Chair:
          • Carla Brodley

          Copyright © 2004 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 July 2004

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate140of548submissions,26%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader