skip to main content
10.1145/1376916.1376945acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Approximation algorithms for co-clustering

Published:09 June 2008Publication History

ABSTRACT

Co-clustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, market-basket analysis, and bioinformatics, this problem has attracted severe attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature.

In this work we obtain the first approximation algorithms for the co-clustering problem. Our algorithms are simple and obtain constant-factor approximation solutions to the optimum. We also show that co-clustering is NP-hard, thereby complementing our algorithmic result.

References

  1. D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 26--35, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544--562, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Bǎdoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In Proc. of the 34th Annual ACM Symposium on Theory of Computing, pages 250--257, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 8:1919--1986, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89--113, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Cho, I. S. Dhillon, Y. Guan, and S. Sra. Minimum sum-squared residue co-clustering of gene expression data. In Proc. of the 4th SIAM International Conference on Data Mining. SIAM, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 269--274, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Drineas, A. M. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine Learning, 56(1-3):9--33, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. U. Feige and S. Kogan. Hardness of approximation of the balanced complete bipartite subgraph problem, 2004.Google ScholarGoogle Scholar
  12. B. Gao, T. Liu, X. Zheng, Q. Cheng, and W. Ma. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. of the 11th ACM Conference on Knowledge Discovery and Data Mining, pages 41--50, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Gollapudi, R. Kumar, and D. Sivakumar. Programmable clustering. In Proc. 25th ACM Symposium on Principles of Database Systems, pages 348--354, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123--129, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Hassanpour. Computational complexity of bi-clustering. Master's thesis, University of Waterloo, 2007.Google ScholarGoogle Scholar
  16. A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264--323, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM, 48(2):274--296, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Jambyu and M. O. Lebeaux. Cluster Analysis and Data Analysis. North-Holland, 1983.Google ScholarGoogle Scholar
  19. J. Kleinberg. An impossibility theorem for clustering. In Advances in Neural Information Processing Systems 15, pages 446--453, 2002.Google ScholarGoogle Scholar
  20. Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research, 13:703--716, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  21. A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In Proc. of the 45th IEEE Symposium on Foundations of Computer Science, pages 454--462, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM Journal on Computing, 13(1):182--196, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  24. N. Mishra, D. Ron, and R. Swaminathan. On finding large conjunctive clusters. In Proc. of the 16th Annual Conference on Computational Learning Theory, pages 448--462, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  25. N. Mishra, D. Ron, and R. Swaminathan. A new conceptual clustering framework. Machine Learning, 56(1-3):115--151, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Peeters. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 131(3):651--654, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Puolamäki, S. Hanhijärvi, and G. C. Garriga. An approximation ratio for biclustering. CoRR, abs/0712.2682, 2007.Google ScholarGoogle Scholar
  28. R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144(1-2):173--182, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Takamura and Y. Matsumoto. Co-clustering for text categorization. Information Processing Society of Japan Journal, 2003.Google ScholarGoogle Scholar
  30. A. Tanay, R. Sharan, and R. Shamir. Biclustering algorithms: A survey. In E. by Srinivas Aluru, editor, In Handbook of Computational Molecular Biology. Chapman & Hall/CRC, Computer and Information Science Series, 2005.Google ScholarGoogle Scholar
  31. V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Yang, H. Wang, W. Wang, and P. Yu. Enhanced biclustering on expression data. In Proc. of the 3rd IEEE Conference on Bioinformatics and Bioengineering, pages 321--327, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Yang, W. Wang, H. Wang, and P. S. Yu. delta-clusters: Capturing subspace correlation in a large data set. In Proc. of the 18th International Conference on Data Engineering, pages 517--528, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Zhou and D. P. Woodruff. Clustering via matrix powering. In Proc. of the 23rd ACM Symposium on Principles of Database Systems, pages 136--142, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximation algorithms for co-clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2008
      330 pages
      ISBN:9781605581521
      DOI:10.1145/1376916

      Copyright © 2008 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!