10.1145/1281192.1281262acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A spectral clustering approach to optimally combining numericalvectors with a modular network

Published:12 August 2007Publication History

ABSTRACT

We address the issue of clustering numerical vectors with a network. The problem setting is basically equivalent to constrained clustering by Wagstaff and Cardie and semi-supervised clustering by Basu et al., but our focus is more on the optimal combination of two heterogeneous data sources. An application of this setting is web pages which can be numerically vectorized by their contents, e.g. term frequencies, and which are hyperlinked to each other, showing a network. Another typical application is genes whose behavior can be numerically measured and a gene network can be given from another data source.We first define a new graph clustering measure which we call normalized network modularity, by balancing the cluster size of the original modularity. We then propose a new clustering method which integrates the cost of clustering numerical vectors with the cost of maximizing the normalized network modularity into a spectral relaxation problem. Our learning algorithm is based on spectral clustering which makes our issue an eigenvalue problem and uses k-means for final cluster assignments. A significant advantage of our method is that we can optimize the weight parameter for balancing the two costs from the given data by choosing the minimum total cost. We evaluated the performance of our proposed method using a variety of datasets including synthetic data as well as real-world data from molecular biology. Experimental results showed that our method is effective enough to have good results for clustering by numerical vectors and a network.

References

  1. A.-L. Barabási and A. Reka. Emergence of scaling in random networks. Science, 286: 509--512, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD, pages 59--68, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means, spectral clustering and normalized cuts. In KDD, pages 551--556, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. S. Dhillon and S. Sra. Modeling data using directional distributions. Technical Report TR--06--03, University of Texas, Dept. of Computer Sciences, 2003.Google ScholarGoogle Scholar
  5. R. Edgar, M. Domrachev, and A. E. Lash. Gene expression omnibus: {NCBI gene expression and hybridization array data repository. NAR, 30(1): 207--210, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Guimera and L. A. Nunes Amaral. Functional cartography of complex metabolic networks. Nature, 433(7028): 895--900, 2005.Google ScholarGoogle Scholar
  7. R. Guimera, M. Sales-Pardo, and L. A. N. Amaral. Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E, 70: 025101, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE TCAD, 11: 1074--1085, 1992.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. T. R. Hughes et al. Functional discovery via a compendium of expression profiles. Cell, 102(1): 109--126, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  10. M. Kanehisa et al. From genomics to chemical genomics: new developments in KEGG. NAR, 34: D354--357, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  11. B. Kulis, S. Basu, I. Dhillon, and R. J. Mooney. Semi-supervised graph clustering: A kernel approach. In ICML, pages 457--464, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. V. Mardia and P. E. Jupp. Directional Statistics. John Wiley & Sons, second edition, 2000.Google ScholarGoogle Scholar
  13. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69: 026113, 2004.Google ScholarGoogle Scholar
  14. E. Ravasz et al. Hierarchical organization of modularity in metabolic networks. Science, 297(5589): 1551--1555, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI, 22(8): 888--905, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Shiga, I. Takigawa and H. Mamitsuka. Annotating gene function by combining expression data with a modular gene network. To appear in ISMB, 2007.Google ScholarGoogle Scholar
  17. C. Song, S. Havlin, and H. A. Makse. Self-similarity of complex networks. Nature, 433: 392--395, 2005.Google ScholarGoogle Scholar
  18. A. Strehl and J. Ghosh. Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2):208--230, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Troyanskaya et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520--525, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In ICML, pages 1103--1110, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393: 440--442, 1998.Google ScholarGoogle Scholar
  22. S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM, pages 76--84, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  23. L. F. Wu et al. Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet., 31(3):255--265, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Zhong and J. Ghosh. A unified framework for model--based clustering. JMLR, 4:1001--1037, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Zhong and J. Ghosh. Generative model-based document clustering: A comparative study. KAIS, 8(3):374--384, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Zhou, M. C. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. PNAS, 99(20):12783--12788, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A spectral clustering approach to optimally combining numericalvectors with a modular network

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!