10.1145/1772690.1772862acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedings
poster

Web-scale k-means clustering

ABSTRACT

We present two modifications to the popular k-means clustering algorithm to address the extreme requirements for latency, scalability, and sparsity encountered in user-facing web applications. First, we propose the use of mini-batch optimization for k-means clustering. This reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent. Second, we achieve sparsity with projected gradient descent, and give a fast ε-accurate projection onto the L1-ball. Source code is freely available: http://code.google.com/p/sofia-ml

References

  1. L. Bottou and Y. Bengio. Convergence properties of the kmeans algorithm. In Advances in Neural Information Processing Systems. 1995.Google ScholarGoogle Scholar
  2. J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l1-ball for learning in high dimensions. In ICML '08: Proceedings of the 25th international conference on Machine learning, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Elkan. Using the triangle inequality to accelerate k-means. In ICML '03: Proceedings of the 20th international conference on Machine learning, 2003.Google ScholarGoogle Scholar
  4. D. D. Lewis, Y. Yang, T. G. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res., 5, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Witten and R. Tibshirani. A framework for feature selection in clustering. To Appear: Journal of the American Statistical Association, 2010.Google ScholarGoogle Scholar
  6. X. Wu and V. Kumar. The Top Ten Algorithms in Data Mining. Chapman & Hall/CRC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Web-scale k-means clustering

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!