skip to main content
10.5555/1888339.1888354guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Online structural graph clustering using frequent subgraph mining

Authors Info & Claims
Published:20 September 2010Publication History

ABSTRACT

The goal of graph clustering is to partition objects in a graph database into different clusters based on various criteria such as vertex connectivity, neighborhood similarity or the size of the maximum common subgraph. This can serve to structure the graph space and to improve the understanding of the data. In this paper, we present a novel method for structural graph clustering, i.e. graph clustering without generating features or decomposing graphs into parts. In contrast to many related approaches, the method does not rely on computationally expensive maximum common subgraph (MCS) operations or variants thereof, but on frequent subgraph mining. More specifically, our problem formulation takes advantage of the frequent subgraph miner gSpan (that performs well on many practical problems) without effectively generating thousands of subgraphs in the process. In the proposed clustering approach, clusters encompass all graphs that share a sufficiently large common subgraph. The size of the common subgraph of a graph in a cluster has to take at least a user-specified fraction of its overall size. The new algorithm works in an online mode (processing one structure after the other) and produces overlapping (non-disjoint) and nonexhaustive clusters. In a series of experiments, we evaluated the effectiveness and efficiency of the structural clustering algorithm on various real world data sets of molecular graphs.

References

  1. Inokuchi, A., Washio, T., Motoda, H.: An APriori-based algorithm for mining frequent substructures from graph data. In: PKDD '00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 13-23 (2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 721-724 (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Tsuda, K., Kudo, T.: Clustering Graphs by Weighted Substructure Mining. In: Cohen, W.W., Moore, A. (eds.) ICML 2006, pp. 953-960. ACM Press, New York (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stahl, M., Mauser, H.: Database clustering with a combination of fingerprint and maximum common substructure methods. J. Chem. Inf. Model. 45, 542-548 (2005).Google ScholarGoogle ScholarCross RefCross Ref
  5. Tsuda, K., Kurihara, K.: Graph mining with variational Dirichlet process mixture models. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 432-442 (2008).Google ScholarGoogle ScholarCross RefCross Ref
  6. Martin, Y.C., Kofron, J.L., Traphagen, L.M.: Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350-4358 (2002).Google ScholarGoogle ScholarCross RefCross Ref
  7. Weinstein, J., Kohn, K., Grever, M., Viswanadhan, V.: Neural computing in cancer drug development: Predicting mechanism of action. Science 258, 447-451 (1992).Google ScholarGoogle ScholarCross RefCross Ref
  8. Koutsoukos, A.D., Rubinstein, L.V., Faraggi, D., Simon, R.M., Kalyandrug, S., Weinstein, J.N., Kohn, K.W., Paull, K.D.: Discrimination techniques applied to the NCI in vitro anti-tumour drug screen: predicting biochemical mechanism of action. Stat. Med. 13, 719-730 (1994).Google ScholarGoogle ScholarCross RefCross Ref
  9. Raymond, J.W., Willett, P.: Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided. Mol. Des. 16(1), 59-71 (2002).Google ScholarGoogle ScholarCross RefCross Ref
  10. McGregor, M.J., Pallai, P.V.: Clustering of large databases of compounds: Using the MDL "keys" as structural descriptors. J. Chem. Inform. Comput. Sci. 37(3), 443-448 (1997).Google ScholarGoogle ScholarCross RefCross Ref
  11. Yoshida, T., Shoda, R., Motoda, H.: Graph clustering based on structural similarity of fragments. In: Jantke, K.P., et al. (eds.) Federation over the Web. LNCS (LNAI), vol. 3847, pp. 97-114. Springer, Heidelberg (2006).Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Günter, S., Bunke, H.: Validation indices for graph clustering. Pattern Recogn. Lett. 24(8), 1107-1113 (2003). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1997). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, Y.L., Hu, H.L.: An overlapping cluster algorithm to provide nonexhaustive clustering. Eur. J. Oper. Res. 173(3), 762-780 (2006).Google ScholarGoogle ScholarCross RefCross Ref
  15. Raymond, J.W., Blankley, C.J., Willett, P.: Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures. J. Mol. Graph. Model. 21(5), 421-433 (2003).Google ScholarGoogle ScholarCross RefCross Ref
  16. Bunke, H., Foggia, P., Guidobaldi, C., Sansone, C., Vento, M.: A comparison of algorithms for maximum common subgraph on randomly connected graphs. In: Caelli, T.M., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 123-132. Springer, Heidelberg (2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Aggarwal, C.C., Ta, N., Wang, J., Feng, J., Zaki, M.: XProj: a framework for projected structural clustering of XML documents. In: KDD '07: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46-55. ACM, New York (2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Wang, K., Han, J.: Bide: Efficient mining of frequent closed sequences. In: International Conference on Data Engineering (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Online structural graph clustering using frequent subgraph mining
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image Guide Proceedings
            ECML PKDD'10: Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
            September 2010
            631 pages
            ISBN:3642159389

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            • Published: 20 September 2010

            Qualifiers

            • Article
          • Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0

            Other Metrics