Abstract
Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasi-clique, densest at-least-k subgraph) are NP-hard. Furthermore, the goal is rarely to find the “true optimum” but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective and only find a few such subgraphs without providing any structural relations.
We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-core and k-truss decompositions.
We present practical algorithms for nucleus decompositions and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures and outputs dense subgraphs of comparable quality with the state-of-the-art solutions that are dense and have non-trivial sizes. Our algorithms can process real-world graphs with tens of millions of edges in less than an hour. We demonstrate how proposed algorithms can be utilized on a citation network. Our analysis showed that dense units identified by our algorithms correspond to coherent articles on a specific area. Our experiments also show that we can identify dense structures that are lost within larger structures by other methods and find further finer grain structure within dense groups.
- A. B. Adcock, B. D. Sullivan, and M. W. Mahoney. 2013. Tree-like structure in large social and information networks. In Proceedings of the IEEE International Conference on Data Mining (ICDM). 1--10.Google Scholar
- J. Ignacio Alvarez-Hamelin, Alain Barrat, and Alessandro Vespignani. 2006. Large scale networks fingerprinting and visualization using the k-core decomposition. In Advances in Neural Information Processing Systems 18. 41--50. Google Scholar
Digital Library
- R. Andersen and K. Chellapilla. 2009. Finding dense subgraphs with size bounds. In Proceedings of the Workshop on Algorithms and Models for the Web-Graph (WAW). 25--37. Google Scholar
Digital Library
- A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. 2012. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. Proc. VLDB Endow. 5, 6 (Feb. 2012), 574--585. Google Scholar
Digital Library
- Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. 2000. Greedily finding a dense subgraph. J. Algor. 34, 2 (Feb. 2000), 203--221. Google Scholar
Digital Library
- Oana Denisa Balalau, Francesco Bonchi, T.-H. Hubert Chan, Francesco Gullo, and Mauro Sozio. 2015. Finding subgraphs with maximum total density and limited overlap. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining (WSDM’15). 379--388. Google Scholar
Digital Library
- D. J. Beal, R. Cohen, M. J. Burke, and C. L. McLendon. 2003. Cohesion and performance in groups: A meta-analytic clarification of construct relation. J. Appl. Psychol. 88 (2003), 989--1004.Google Scholar
Cross Ref
- J. W. Berry, L. K. Fostvedt, D. J. Nordman, C. A. Phillips, C. Seshadhri, and A. G. Wilson. 2014. Why do simple algorithms for triangle enumeration work in the real world? In Proceedings of the 5th Conference on Innovations in Theoretical Computer Science (ITCS’14). ACM, New York, NY, 225--234. Google Scholar
Digital Library
- Sayan Bhattacharya, Monika Henzinger, Danupon Nanongkai, and Charalampos Tsourakakis. 2015. Space- and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In Proceedings of the 47th Annual ACM on Symposium on Theory of Computing (STOC’15). 173--182. Google Scholar
Digital Library
- C. Bron and J. Kerbosch. 1973. Algorithm 457: Finding all cliques of an undirected graph. Commun. ACM 16, 9 (Sep. 1973), 575--577. Google Scholar
Digital Library
- G. Buehrer and K. Chellapilla. 2008. A scalable pattern mining approach to web graph compression with communities. In Proc. of the 2008 International Conference on Web Search and Data Mining (WSDM’08). 95--106. Google Scholar
Digital Library
- M. Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In Proceedings of the 3rd International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX’00). 84--95. Google Scholar
Digital Library
- N. Chiba and T. Nishizeki. 1985. Arboricity and subgraph listing algorithms. SIAM J. Comput. 14, 1 (Feb. 1985), 210--223. Google Scholar
Digital Library
- J. Cohen. 2008. Trusses: Cohesive subgraphs for social network analysis. National Security Agency Technical Report (2008).Google Scholar
- J. Cohen. 2009. Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11 (2009), 29--41. Google Scholar
Digital Library
- UF Sparse Matrix Collection. University of Florida Sparse Matrix Collection. Retrieved March 2014 from http://www.cise.ufl.edu/research/sparse/matrices/.Google Scholar
- P. Colomer de Simon, M. Serrano, M. G. Beiro, J. I. Alvarez-Hamelin, and M. Boguna. 2013. Deciphering the global organization of clustering in real complex networks. Sci. Rep. 3, 2517 (2013).Google Scholar
- Y. Dourisboure, F. Geraci, and M. Pellegrini. 2007. Extraction and classification of dense communities in the web. In Proceedings of the 16th International Conference on World Wide Web (WWW’07). 461--470. Google Scholar
Digital Library
- Xiaoxi Du, Ruoming Jin, Liang Ding, Victor E. Lee, and John H. Thornton, Jr. 2009. Migration motif: A spatial - temporal pattern mining approach for financial markets. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09). ACM, New York, NY, 1135--1144. Google Scholar
Digital Library
- A. Epasto, S. Lattanzi, and M. Sozio. 2015. Efficient densest subgraph computation in evolving graphs. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 300--310. Google Scholar
Digital Library
- P. Erdős and A. Hajnal. 1966. On chromatic number of graphs and set-systems. Acta Math. Hung. 17 (1966), 61--99.Google Scholar
Cross Ref
- U. Feige. 2002. Relations between average case complexity and approximation complexity. In Proceedings of the Symposium on Theory of Computing. 534--543. Google Scholar
Digital Library
- D. R. Forsyth. 2010. Group Dynamics. Cengage Learning.Google Scholar
- A. P. Francisco and A. L. Oliveira. 2011. Fully generalized graph cores. In Complex Networks. Vol. 116. 22--34.Google Scholar
- E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. 2006. MotifCut: Regulatory motifs finding with maximum density subgraphs. In ISMB (Supplement of Bioinformatics) (2006-08-28). 156--157. Google Scholar
Digital Library
- G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. 1989. A fast parametric maximum flow algorithm and applications. SIAM J. Comput. 18, 1 (Feb. 1989), 30--55. Google Scholar
Digital Library
- D. Gibson, R. Kumar, and A. Tomkins. 2005. Discovering large dense subgraphs in massive graphs. In Proc. of the 31st International Conference on Very Large Data Bases (VLDB’05). 721--732. Google Scholar
Digital Library
- A. Gionis, F. Junqueira, V. Leroy, M. Serafini, and I. Weber. 2013. Piggybacking on social networks. Proc. VLDB Endow. 6, 6 (2013), 409--420. Google Scholar
Digital Library
- A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report. Berkeley, CA, USA. Google Scholar
Digital Library
- R. Gupta, T. Roughgarden, and C. Seshadhri. 2014. Decompositions of triangle-dense graphs. In Innovations in Theoretical Computer Science (ITCS). 471--482. Google Scholar
Digital Library
- J. Håstad. 1996. Clique is hard to approximate within n(1 − ε). In Acta Mathematica. 627--636.Google Scholar
- H. Hu, X. Yan, Y. Huang, J. Han, and X. J. Zhou. 2005. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21, 1 (Jan. 2005), 213--221. Google Scholar
Digital Library
- Xin Huang, Hong Cheng, Lu Qin, Wentao Tian, and Jeffrey Xu Yu. 2014. Querying k-truss community in large and dynamic graphs. In Proceedings of the ACM SIGMOD International Conf. on Management of Data. 1311--1322. Google Scholar
Digital Library
- L. D. Iasemidis, D.-S. Shiau, W. Chaovalitwongse, J. C. Sackellares, P. M. Pardalos, J. C. Principe, P. R. Carney, A. Prasad, B. Veeramani, and K. Tsakalis. 2003. Adaptive epileptic seizure prediction system. IEEE. Biomed. Eng. 50 (2003), 616--627.Google Scholar
Cross Ref
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 2009. 3-HOP: A high-compression indexing scheme for reachability query. In Proceedings of the SIGMOD Conference. 813--826. Google Scholar
Digital Library
- S. Khot. 2006. Ruling out PTAS for graph min-bisection, dense k-subgraph, and bipartite clique. SIAM J. Comput. 36, 4 (2006), 1025--1071. Google Scholar
Digital Library
- Samir Khuller and Barna Saha. 2009. On finding dense subgraphs. In Proceedings of the International Colloquium on Automata, Languages and Programming (ICALP). 597--608. Google Scholar
Digital Library
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. 1999. Trawling the web for emerging cyber-communities. In Proc. of the Eighth International Conference on World Wide Web (WWW’99). 1481--1493. Google Scholar
Digital Library
- V. E. Lee, N. Ruan, R. Jin, and C. Aggarwal. 2010. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data. Vol. 40.Google Scholar
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the 17th International Conference on World Wide Web (WWW’08). ACM, New York, NY, 695--704. Google Scholar
Digital Library
- D. Lick and A. White. 1970. k-degenerate graphs. Can. J. Math. 22 (1970), 1082--1096.Google Scholar
Cross Ref
- D. Matula and L. Beck. 1983. Smallest-last ordering and clustering and graph coloring algorithms. J. ACM 30, 3 (1983), 417--427. Google Scholar
Digital Library
- R. J. Mokken. 1979. Cliques, clubs and clans. Qual. Quant. 13, 2 (1979), 161--173.Google Scholar
Cross Ref
- R. A. Rossi, D. F. Gleich, A. H. Gebremedhin, and Md. M. A. Patwary. 2013. A fast parallel maximum clique algorithm for large sparse graphs and temporal strong components. CoRR abs/1302.6256 (2013).Google Scholar
- K. Saito and T. Yamada. 2006. Extracting communities from complex networks by the k-dense method. In Sixth IEEE International Conference on Data Mining Workshops, 2006 (ICDM Workshops 2006). 300--304. Google Scholar
Digital Library
- A. Sala, L. Cao, C. Wilson, R. Zablit, Haitao Zheng, and Ben Y. Zhao. 2010. Measurement-calibrated graph models for social network experiments. In WWW’10. ACM, 861--870. Google Scholar
Digital Library
- A. E. Sariyüce, B. Gedik, G. Jacques-Silva, K. L. Wu, and Ü. V. Çatalyürek. 2013. Streaming algorithms for k-core decomposition. In Proc. VLDB Endow. 433--444. Google Scholar
Digital Library
- A. E. Sariyüce, C. Seshadhri, A. Pinar, and Ü. V. Çatalyürek. 2015. Finding the hierarchy of dense subgraphs using nucleus decompositions. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). 927--937. Google Scholar
Digital Library
- T. Schank and D. Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In Experimental and Efficient Algorithms. 606--609. Google Scholar
Digital Library
- S. B. Seidman. 1983. Network structure and minimum degree. Soc. Netw. 5, 3 (1983), 269--287.Google Scholar
Cross Ref
- S. B. Seidman and B. Foster. 1978. A graph-theoretic generalization of the clique concept. J. Math. Sociol. (1978).Google Scholar
- C. Seshadhri, A. Pinar, and T. G. Kolda. 2014. Triadic measures on graphs: The power of wedge sampling. Stat. Anal. Data Min. 7, 4 (2014), 294--307. Google Scholar
Digital Library
- SNAP. retrieved March, 2014. Stanford Network Analysis Package. Retrieved March 2014 http://snap.stanford.edu/snap.Google Scholar
- S. Suri and S. Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In WWW’11. 607--614. Google Scholar
Digital Library
- N. Tatti and A. Gionis. 2015. Density-friendly graph decomposition. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1089--1099. Google Scholar
Digital Library
- C. Tsourakakis. 2015. The k-clique densest subgraph problem. In Proceedings of the 24th International Conference on World Wide Web (WWW’15). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1122--1132. Google Scholar
Digital Library
- C. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In Proc. of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’13). Google Scholar
Digital Library
- J. Wang and J. Cheng. 2012. Truss decomposition in massive networks. Proc. VLDB Endow. 5, 9 (2012), 812--823. Google Scholar
Digital Library
- N. Wang, J. Zhang, K. L. Tan, and A. K. H. Tung. 2010. On triangulation-based dense neighborhood graph discovery. Proc. VLDB Endow. 4 (2010), 58--68. Google Scholar
Digital Library
- S. Wasserman and K. Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge University Press.Google Scholar
- D. Watts and S. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393 (1998), 440--442.Google Scholar
Cross Ref
- B. Zhang and S. Horvath. 2005. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Molec. Biol. 4, 1 (2005), Article 17+.Google Scholar
- Y. Zhang and S. Parthasarathy. 2012. Extracting analyzing and visualizing triangle k-core motifs within networks. In Proc. of the 2012 IEEE 28th International Conference on Data Engineering (ICDE’12). 1049--1060. Google Scholar
Digital Library
- F. Zhao and A. K. H. Tung. 2013. Large scale cohesive subgraphs discovery for social network visual analysis. In Proc. VLDB Endow. 85--96.Google Scholar
Index Terms
Nucleus Decompositions for Identifying Hierarchy of Dense Subgraphs
Recommendations
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
WWW '15: Proceedings of the 24th International Conference on World Wide WebFinding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest ...
Decompositions of Graphs into Fans and Single Edges
Given two graphs G and H, an H-decomposition of G is a partition of the edge set of G such that each part is either a single edge or forms a graph isomorphic to H. Let ï źn,H be the smallest number ï ź such that any graph G of order n admits an H-...
Minimum H-decompositions of graphs: Edge-critical case
For a given graph H let @f"H(n) be the maximum number of parts that are needed to partition the edge set of any graph on n vertices such that every member of the partition is either a single edge or it is isomorphic to H. Pikhurko and Sousa conjectured ...






Comments