ABSTRACT
Co-clustering is the simultaneous partitioning of the rows and columns of a matrix such that the blocks induced by the row/column partitions are good clusters. Motivated by several applications in text mining, market-basket analysis, and bioinformatics, this problem has attracted severe attention in the past few years. Unfortunately, to date, most of the algorithmic work on this problem has been heuristic in nature.
In this work we obtain the first approximation algorithms for the co-clustering problem. Our algorithms are simple and obtain constant-factor approximation solutions to the optimum. We also show that co-clustering is NP-hard, thereby complementing our algorithmic result.
- D. Agarwal and S. Merugu. Predictive discrete latent factor models for large scale dyadic data. In Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 26--35, 2007. Google Scholar
Digital Library
- V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM Journal on Computing, 33(3):544--562, June 2004. Google Scholar
Digital Library
- M. Bǎdoiu, S. Har-Peled, and P. Indyk. Approximate clustering via core-sets. In Proc. of the 34th Annual ACM Symposium on Theory of Computing, pages 250--257, 2002. Google Scholar
Digital Library
- A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha. A generalized maximum entropy approach to Bregman co-clustering and matrix approximation. Journal of Machine Learning Research, 8:1919--1986, 2007. Google Scholar
Digital Library
- N. Bansal, A. Blum, and S. Chawla. Correlation clustering. Machine Learning, 56(1-3):89--113, 2004. Google Scholar
Digital Library
- Y. Cheng and G. M. Church. Biclustering of expression data. In Proc. of the 8th International Conference on Intelligent Systems for Molecular Biology, pages 93--103, 2000. Google Scholar
Digital Library
- H. Cho, I. S. Dhillon, Y. Guan, and S. Sra. Minimum sum-squared residue co-clustering of gene expression data. In Proc. of the 4th SIAM International Conference on Data Mining. SIAM, 2004.Google Scholar
Cross Ref
- I. S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 269--274, 2001. Google Scholar
Digital Library
- P. Drineas, A. M. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering large graphs via the singular value decomposition. Machine Learning, 56(1-3):9--33, 2004. Google Scholar
Digital Library
- R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley Interscience, 2000. Google Scholar
Digital Library
- U. Feige and S. Kogan. Hardness of approximation of the balanced complete bipartite subgraph problem, 2004.Google Scholar
- B. Gao, T. Liu, X. Zheng, Q. Cheng, and W. Ma. Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. In Proc. of the 11th ACM Conference on Knowledge Discovery and Data Mining, pages 41--50, 2005. Google Scholar
Digital Library
- S. Gollapudi, R. Kumar, and D. Sivakumar. Programmable clustering. In Proc. 25th ACM Symposium on Principles of Database Systems, pages 348--354, 2006. Google Scholar
Digital Library
- J. A. Hartigan. Direct clustering of a data matrix. Journal of the American Statistical Association, 67(337):123--129, 1972.Google Scholar
Cross Ref
- S. Hassanpour. Computational complexity of bi-clustering. Master's thesis, University of Waterloo, 2007.Google Scholar
- A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264--323, 1999. Google Scholar
Digital Library
- K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM, 48(2):274--296, 2001. Google Scholar
Digital Library
- M. Jambyu and M. O. Lebeaux. Cluster Analysis and Data Analysis. North-Holland, 1983.Google Scholar
- J. Kleinberg. An impossibility theorem for clustering. In Advances in Neural Information Processing Systems 15, pages 446--453, 2002.Google Scholar
- Y. Kluger, R. Basri, J. T. Chang, and M. Gerstein. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Research, 13:703--716, 2003.Google Scholar
Cross Ref
- A. Kumar, Y. Sabharwal, and S. Sen. A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In Proc. of the 45th IEEE Symposium on Foundations of Computer Science, pages 454--462, 2004. Google Scholar
Digital Library
- S. C. Madeira and A. L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE Transactions on Computational Biology and Bioinformatics, 1(1):24--45, 2004. Google Scholar
Digital Library
- N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM Journal on Computing, 13(1):182--196, 1984.Google Scholar
Cross Ref
- N. Mishra, D. Ron, and R. Swaminathan. On finding large conjunctive clusters. In Proc. of the 16th Annual Conference on Computational Learning Theory, pages 448--462, 2003.Google Scholar
Cross Ref
- N. Mishra, D. Ron, and R. Swaminathan. A new conceptual clustering framework. Machine Learning, 56(1-3):115--151, 2004. Google Scholar
Digital Library
- R. Peeters. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 131(3):651--654, 2003. Google Scholar
Digital Library
- K. Puolamäki, S. Hanhijärvi, and G. C. Garriga. An approximation ratio for biclustering. CoRR, abs/0712.2682, 2007.Google Scholar
- R. Shamir, R. Sharan, and D. Tsur. Cluster graph modification problems. Discrete Applied Mathematics, 144(1-2):173--182, 2004. Google Scholar
Digital Library
- H. Takamura and Y. Matsumoto. Co-clustering for text categorization. Information Processing Society of Japan Journal, 2003.Google Scholar
- A. Tanay, R. Sharan, and R. Shamir. Biclustering algorithms: A survey. In E. by Srinivas Aluru, editor, In Handbook of Computational Molecular Biology. Chapman & Hall/CRC, Computer and Information Science Series, 2005.Google Scholar
- V. V. Vazirani. Approximation Algorithms. Springer-Verlag, 2001. Google Scholar
Digital Library
- J. Yang, H. Wang, W. Wang, and P. Yu. Enhanced biclustering on expression data. In Proc. of the 3rd IEEE Conference on Bioinformatics and Bioengineering, pages 321--327, 2003. Google Scholar
Digital Library
- J. Yang, W. Wang, H. Wang, and P. S. Yu. delta-clusters: Capturing subspace correlation in a large data set. In Proc. of the 18th International Conference on Data Engineering, pages 517--528, 2002. Google Scholar
Digital Library
- H. Zhou and D. P. Woodruff. Clustering via matrix powering. In Proc. of the 23rd ACM Symposium on Principles of Database Systems, pages 136--142, 2004. Google Scholar
Digital Library
Index Terms
Approximation algorithms for co-clustering
Recommendations
Non-Exhaustive, Overlapping Co-Clustering
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementThe goal of co-clustering is to simultaneously identify a clustering of the rows as well as the columns of a two dimensional data matrix. Most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters. However, ...
An improved approximation algorithm for hierarchical clustering
A hierarchical clustering algorithm has been proposed.The proposed algorithm improves upon the current approximation ratio.Performance guarantee of the algorithm for the k-median objective has been derived. Dasgupta and Long [7] have shown that it is ...
Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm
Clustering techniques are used to group up the transactions based on the relevancy. Cluster analysis is one of the primary data analysis method. The clustering process can be done in two ways such that Hierarchical clusters and partition clustering. ...






Comments