10.1145/1015330.1015408acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

K-means clustering via principal component analysis

Authors Info & Claims
Online:04 July 2004Publication History

ABSTRACT

Principal component analysis (PCA) is a widely used statistical technique for unsupervised dimension reduction. K-means clustering is a commonly used data clustering for performing unsupervised learning tasks. Here we prove that principal components are the continuous solutions to the discrete cluster membership indicators for K-means clustering. New lower bounds for K-means objective function are derived, which is the total variance minus the eigenvalues of the data covariance matrix. These results indicate that unsupervised dimension reduction is closely related to unsupervised learning. Several implications are discussed. On dimension reduction, the result provides new insights to the observed effectiveness of PCA-based data reductions, beyond the conventional noise-reduction explanation that PCA, via singular value decomposition, provides the best low-dimensional linear approximation of the data. On learning, the result suggests effective techniques for K-means data clustering. DNA gene expression and Internet newsgroups are analyzed to illustrate our results. Experiments indicate that the new bounds are within 0.5-1.5% of the optimal values.

References

  1. Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., et al. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 503--511.Google ScholarGoogle ScholarCross RefCross Ref
  2. Bradley, P., & Fayyad, U. (1998). Refining initial points for k-means clustering. Proc. 15th International Conf. on Machine Learning. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification, 2nd ed. Wiley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1, 183--187.Google ScholarGoogle ScholarCross RefCross Ref
  5. Fan, K. (1949). On a theorem of Weyl concerning eigenvalues of linear transformations. Proc. Natl. Acad. Sci. USA, 35, 652--655.Google ScholarGoogle ScholarCross RefCross Ref
  6. Gersho, A., & Gray, R. (1992). Vector quantization and signal compression. Kluwer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Goldstein, H. (1980). Classical mechanics. Addison-Wesley. 2nd edition.Google ScholarGoogle Scholar
  8. Golub, G., & Van Loan, C. (1996). Matrix computations, 3rd edition. Johns Hopkins, Baltimore. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gordon, A., & Henderson, J. (1977). An algorithm for euclidean sum of squares classification. Biometrics, 355--362.Google ScholarGoogle Scholar
  10. Grim, J., Novovicova, J., Pudil, P., Somol, P., & Ferri, F. (1998). Initialization normal mixtures of densities. Proc. Int'l Conf. Pattern Recognition (ICPR 1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hartigan, J., & Wang, M. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Springer Verlag.Google ScholarGoogle Scholar
  13. Jain, A., & Dubes, R. (1988). Algorithms for clustering data. Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jolliffe, I. (2002). Principal component analysis. Springer. 2nd edition.Google ScholarGoogle Scholar
  15. Lloyd, S. (1957). Least squares quantization in pcm. Bell Telephone Laboratories Paper, Marray Hill.Google ScholarGoogle Scholar
  16. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symposium, 281--297.Google ScholarGoogle Scholar
  17. Moore, A. (1998). Very fast em-based mixture model clustering using multiresolution kd-trees. Proc. Neural Info. Processing Systems (NIPS 1998). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001).Google ScholarGoogle Scholar
  19. Wallace, R. (1989). Finding natural clusters through entropy minimization. Ph.D Thesis. Carnegie-Mellon Uiversity, CS Dept. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2001). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS'01), 1057--1064.Google ScholarGoogle Scholar

Index Terms

(auto-classified)
  1. K-means clustering via principal component analysis

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      ACM Other conferences cover image
      ICML '04: Proceedings of the twenty-first international conference on Machine learning
      July 2004
      934 pages
      ISBN:1581138385
      DOI:10.1145/1015330
      • Conference Chair:
      • Carla Brodley

      Copyright © 2004 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Online: 4 July 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 140 of 548 submissions, 26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!