skip to main content
research-article

Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs

Published:03 July 2017Publication History
Skip Abstract Section

Abstract

Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective and only find a few such subgraphs without providing any structural relations.

We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-core and k-truss decompositions.

We present practical algorithms for nucleus decompositions and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures and outputs dense subgraphs of comparable quality with the state-of-the-art solutions that are dense and have non-trivial sizes. Our algorithms can process real-world graphs with tens of millions of edges in less than an hour. We demonstrate how proposed algorithms can be utilized on a citation network. Our analysis showed that dense units identified by our algorithms correspond to coherent articles on a specific area. Our experiments also show that we can identify dense structures that are lost within larger structures by other methods and find further finer grain structure within dense groups.

References

  1. A. B. Adcock, B. D. Sullivan, and M. W. Mahoney. 2013. Tree-like structure in large social and information networks. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 1--10.Google ScholarGoogle Scholar
  2. J. Ignacio Alvarez-Hamelin, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems 18. 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Andersen and K. Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of the Workshop on Algorithms and Models for the Web-Graph (WAW). 25--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB Endow. 5, 6 (Feb. 2012), 574--585. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. 2000. Greedily finding a dense subgraph. J. Algor. 34, 2 (Feb. 2000), 203--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Oana Denisa Balalau, Francesco Bonchi, T.-H. Hubert Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM’15). 379--388. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. J. Beal, R. Cohen, M. J. Burke, and C. L. McLendon. 2003. Cohesion and performance in groups: A meta-analytic clarification of construct relation. J. Appl. Psychol. 88 (2003), 989--1004.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. W. Berry, L. K. Fostvedt, D. J. Nordman, C. A. Phillips, C. Seshadhri, and A. G. Wilson. 2014. Why do simple algorithms for triangle enumeration work in the real world? In Proceedings of the 5th Conference on Innovations in Theoretical Computer Science (ITCS’14). ACM, New York, NY, 225--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC’15). 173--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Bron and J. Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (Sep. 1973), 575--577. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Buehrer and K. Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proc. of the 2008 International Conference on Web Search and Data Mining (WSDM’08). 95--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of the 3rd International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX’00). 84--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Chiba and T. Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1 (Feb. 1985), 210--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008).Google ScholarGoogle Scholar
  15. J. Cohen. 2009. Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11 (2009), 29--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. UF Sparse Matrix Collection. University of Florida Sparse Matrix Collection. Retrieved March 2014 from http://www.cise.ufl.edu/research/sparse/matrices/.Google ScholarGoogle Scholar
  17. P. Colomer de Simon, M. Serrano, M. G. Beiro, J. I. Alvarez-Hamelin, and M. Boguna. 2013. Deciphering the global organization of clustering in real complex networks. Sci. Rep. 3, 2517 (2013).Google ScholarGoogle Scholar
  18. Y. Dourisboure, F. Geraci, and M. Pellegrini. 2007. Extraction and classification of dense communities in the web. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 461--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, and John H. Thornton, Jr. 2009. Migration motif: A spatial - temporal pattern mining approach for financial markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1135--1144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Epasto, S. Lattanzi, and M. Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 300--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Erdős and A. Hajnal. 1966. On chromatic number of graphs and set-systems. Acta Math. Hung. 17 (1966), 61--99.Google ScholarGoogle ScholarCross RefCross Ref
  22. U. Feige. 2002. Relations between average case complexity and approximation complexity. In Proceedings of the Symposium on Theory of Computing. 534--543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. R. Forsyth. 2010. Group Dynamics. Cengage Learning.Google ScholarGoogle Scholar
  24. A. P. Francisco and A. L. Oliveira. 2011. Fully generalized graph cores. In Complex Networks. Vol. 116. 22--34.Google ScholarGoogle Scholar
  25. E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. 2006. MotifCut: Regulatory motifs finding with maximum density subgraphs. In ISMB (Supplement of Bioinformatics) (2006-08-28). 156--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 1 (Feb. 1989), 30--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Gibson, R. Kumar, and A. Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proc. of the 31st International Conference on Very Large Data Bases (VLDB’05). 721--732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Gionis, F. Junqueira, V. Leroy, M. Serafini, and I. Weber. 2013. Piggybacking on social networks. Proc. VLDB Endow. 6, 6 (2013), 409--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report. Berkeley, CA, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Gupta, T. Roughgarden, and C. Seshadhri. 2014. Decompositions of triangle-dense graphs. In Innovations in Theoretical Computer Science (ITCS). 471--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Håstad. 1996. Clique is hard to approximate within n(1 − ε). In Acta Mathematica. 627--636.Google ScholarGoogle Scholar
  32. H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. 2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21, 1 (Jan. 2005), 213--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the ACM SIGMOD International Conf. on Management of Data. 1311--1322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. D. Iasemidis, D.-S. Shiau, W. Chaovalitwongse, J. C. Sackellares, P. M. Pardalos, J. C. Principe, P. R. Carney, A. Prasad, B. Veeramani, and K. Tsakalis. 2003. Adaptive epileptic seizure prediction system. IEEE. Biomed. Eng. 50 (2003), 616--627.Google ScholarGoogle ScholarCross RefCross Ref
  35. R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 2009. 3-HOP: A high-compression indexing scheme for reachability query. In Proceedings of the SIGMOD Conference. 813--826. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Khot. 2006. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM J. Comput. 36, 4 (2006), 1025--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). 597--608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. 1999. Trawling the web for emerging cyber-communities. In Proc. of the Eighth International Conference on World Wide Web (WWW’99). 1481--1493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data. Vol. 40.Google ScholarGoogle Scholar
  40. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, NY, 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. D. Lick and A. White. 1970. k-degenerate graphs. Can. J. Math. 22 (1970), 1082--1096.Google ScholarGoogle ScholarCross RefCross Ref
  42. D. Matula and L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (1983), 417--427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. J. Mokken. 1979. Cliques, clubs and clans. Qual. Quant. 13, 2 (1979), 161--173.Google ScholarGoogle ScholarCross RefCross Ref
  44. R. A. Rossi, D. F. Gleich, A. H. Gebremedhin, and Md. M. A. Patwary. 2013. A fast parallel maximum clique algorithm for large sparse graphs and temporal strong components. CoRR abs/1302.6256 (2013).Google ScholarGoogle Scholar
  45. K. Saito and T. Yamada. 2006. Extracting communities from complex networks by the k-dense method. In Sixth IEEE International Conference on Data Mining Workshops, 2006 (ICDM Workshops 2006). 300--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Sala, L. Cao, C. Wilson, R. Zablit, Haitao Zheng, and Ben Y. Zhao. 2010. Measurement-calibrated graph models for social network experiments. In WWW’10. ACM, 861--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. E. Sariyüce, B. Gedik, G. Jacques-Silva, K. L. Wu, and Ü. V. Çatalyürek. 2013. Streaming algorithms for k-core decomposition. In Proc. VLDB Endow. 433--444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. E. Sariyüce, C. Seshadhri, A. Pinar, and Ü. V. Çatalyürek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 927--937. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. T. Schank and D. Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms. 606--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5, 3 (1983), 269--287.Google ScholarGoogle ScholarCross RefCross Ref
  51. S. B. Seidman and B. Foster. 1978. A graph-theoretic generalization of the clique concept. J. Math. Sociol. (1978).Google ScholarGoogle Scholar
  52. C. Seshadhri, A. Pinar, and T. G. Kolda. 2014. Triadic measures on graphs: The power of wedge sampling. Stat. Anal. Data Min. 7, 4 (2014), 294--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. SNAP. retrieved March, 2014. Stanford Network Analysis Package. Retrieved March 2014 http://snap.stanford.edu/snap.Google ScholarGoogle Scholar
  54. S. Suri and S. Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW’11. 607--614. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. N. Tatti and A. Gionis. 2015. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1089--1099. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. C. Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1122--1132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Wang and J. Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9 (2012), 812--823. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. N. Wang, J. Zhang, K. L. Tan, and A. K. H. Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4 (2010), 58--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. S. Wasserman and K. Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google ScholarGoogle Scholar
  61. D. Watts and S. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393 (1998), 440--442.Google ScholarGoogle ScholarCross RefCross Ref
  62. B. Zhang and S. Horvath. 2005. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Molec. Biol. 4, 1 (2005), Article 17+.Google ScholarGoogle Scholar
  63. Y. Zhang and S. Parthasarathy. 2012. Extracting analyzing and visualizing triangle k-core motifs within networks. In Proc. of the 2012 IEEE 28th International Conference on Data Engineering (ICDE’12). 1049--1060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. F. Zhao and A. K. H. Tung. 2013. Large scale cohesive subgraphs discovery for social network visual analysis. In Proc. VLDB Endow. 85--96.Google ScholarGoogle Scholar

Index Terms

  1. Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!