skip to main content
10.1145/2463664.2465222acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

The complexity of mining maximal frequent subgraphs

Published:22 June 2013Publication History

ABSTRACT

A frequent subgraph of a given collection of graphs is a graph that is isomorphic to a subgraph of at least as many graphs in the collection as a given threshold. Frequent subgraphs generalize frequent itemsets and arise in various contexts, from bioinformatics to the Web. Since the space of frequent subgraphs is typically extremely large, research in graph mining has focused on special types of frequent subgraphs that can be orders of magnitude smaller in number, yet encapsulate the space of all frequent subgraphs. Maximal frequent subgraphs (i.e., the ones not properly contained in any frequent subgraph) constitute the most useful such type.

In this paper, we embark on a comprehensive investigation of the computational complexity of mining maximal frequent subgraphs. Our study is carried out by considering the effect of three different parameters: possible restrictions on the class of graphs; a fixed bound on the threshold; and a fixed bound on the number of desired answers. We focus on specific classes of connected graphs: general graphs, planar graphs, graphs of bounded degree, and graphs of bounded tree-width (trees being a special case). Moreover, each class has two variants: the one in which the nodes are unlabeled, and the one in which they are uniquely labeled. We delineate the complexity of the enumeration problem for each of these variants by determining when it is solvable in (total or incremental) polynomial time and when it is NP-hard. Specifically, for the labeled classes, we show that bounding the threshold yields tractability but, in most cases, bounding the number of answers does not, unless P=NP; an exception is the case of labeled trees, where bounding either of these two parameters yields tractability. The state of affairs turns out to be quite different for the unlabeled classes. The main (and most challenging to prove) result concerns unlabeled trees: we show NP-hardness, even if the input consists of two trees, and both the threshold and the number of desired answers are equal to just two. In other words, we establish that the following problem is NP-complete: given two unlabeled trees, do they have more than one maximal subtree in common?

References

  1. N. Alon and A. Shapira. Every monotone graph property is testable. SIAM J. Comput., 38(2):505--522, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Babai and E. M. Luks. Canonical labeling of graphs. In STOC, pages 171--183. ACM, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Boros, V. Gurvich, L. Khachiyan, and K. Makino. On maximal frequent and minimal infrequent sets in binary matrices. Ann. Math. Artif. Intell., 39(3):211--221, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036--1050, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. G. Downey and M. R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Friedgut and G. Kalai. Every monotone graph property has a sharp threshold. Proc. Amer. Math. Soc., 124(10):2993--3002, 1996.Google ScholarGoogle ScholarCross RefCross Ref
  7. M. R. Garey, D. S. Johnson, and R. E. Tarjan. The planar Hamiltonian circuit problem is NP-complete. SIAM J. Comput., 5(4):704--714, 1976.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Greco, A. Guzzo, G. Manco, and D. Saccà. Mining and reasoning on workflows. IEEE Trans. Knowl. Data Eng., 17(4):519--534, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Greco, A. Guzzo, G. Manco, and D. Saccà. Mining unconnected patterns in workflows. Inf. Syst., 32(5):685--712, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Gudes, S. E. Shimony, and N. Vanetik. Discovering frequent graph patterns using disjoint paths. IEEE Trans. Knowl. Data Eng., 18(11):1441--1456, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharm. Discovering all most specific sentences. ACM Trans. Database Syst., 28(2):140--174, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. E. Hopcroft and R. E. Tarjan. Isomorphism of planar graphs. In Complexity of Computer Computations, The IBM Research Symposia Series, pages 131--152. Plenum Press, New York, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In PKDD, pages 13--23, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321--354, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Johnson, M. Yannakakis, and C. Papadimitriou. On generating all maximal independent sets. Information Processing Letters, 27:119--123, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Khachiyan, E. Boros, K. Borys, K. M. Elbassioni, and V. Gurvich. Generating all vertices of a polyhedron is hard. Discrete & Computational Geometry, 39(1--3):174--190, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Kimelfeld and Y. Sagiv. Finding and approximating top-k answers in keyword proximity search. In PODS, pages 173--182. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Kimelfeld and Y. Sagiv. Maximally joining probabilistic data. In PODS, pages 303--312. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE Trans. Knowl. Data Eng., 16(9):1038--1051, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Makino and T. Ibaraki. Interior and exterior functions of boolean functions. Discrete Applied Mathematics, 69(3):209--231, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Matousek and R. Thomas. On the complexity of finding iso- and other morphisms for partial k-trees. Discrete Mathematics, 108(1--3):343--364, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. J. Mooney, P. Melville, L. R. Tang, J. Shavlik, I. Dutra, and D. Page. Relational data mining with inductive logic programming for link discovery. Data Mining: Next Generation Challenges and Future Directions, pages 239--254, 2004.Google ScholarGoogle Scholar
  25. S. Nijssen and J. N. Kok. Frequent graph mining and its application to molecular databases. In SMC (5), pages 4571--4577. IEEE, 2004.Google ScholarGoogle Scholar
  26. Y. Okamoto, T. Uno, and R. Uehara. Counting the number of independent sets in chordal graphs. J. Discrete Algorithms, 6(2):229--242, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21--30, 2000.Google ScholarGoogle Scholar
  28. A. Stoica and C. Prieur. Structure of neighborhoods in a large social network. In CSE (4), pages 26--33. IEEE Computer Society, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. T. Thomas, S. R. Valluri, and K. Karlapalem. Margin: Maximal frequent subgraph mining. TKDD, 4(3), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput., 21(2):316--328, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. G. Valiant. The complexity of computing the permanent. Theor. Comput. Sci., 8:189--201, 1979.Google ScholarGoogle ScholarCross RefCross Ref
  32. F. Wagner. Graphs of bounded treewidth can be canonized in AC1. In CSR, volume 6651 of Lecture Notes in Computer Science, pages 209--222. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Wang, J. Han, and J. Pei. CLOSET: searching for the best strategies for mining frequent closed itemsets. In KDD, pages 236--245, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Yan and J. Han. CloseGraph: mining closed frequent graph patterns. In KDD, pages 286--295, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. G. Yang. The complexity of mining maximal frequent itemsets and maximal frequent patterns. In KDD, pages 344--353. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Yannakakis. Algorithms for acyclic database schemes. In VLDB, pages 82--94. IEEE Computer Society, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In SDM, 2002.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. The complexity of mining maximal frequent subgraphs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '13: Proceedings of the 32nd ACM SIGMOD-SIGACT-SIGAI symposium on Principles of database systems
        June 2013
        334 pages
        ISBN:9781450320665
        DOI:10.1145/2463664
        • General Chair:
        • Richard Hull,
        • Program Chair:
        • Wenfei Fan

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        PODS '13 Paper Acceptance Rate24of97submissions,25%Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!