Abstract
Given a network with node attributes, how can we identify communities and spot anomalies? How can we characterize, describe, or summarize the network in a succinct way? Community extraction requires a measure of quality for connected subgraphs (e.g., social circles). Existing subgraph measures, however, either consider only the connectedness of nodes inside the community and ignore the cross-edges at the boundary (e.g., density) or only quantify the structure of the community and ignore the node attributes (e.g., conductance). In this work, we focus on node-attributed networks and introduce: (1) a new measure of subgraph quality for attributed communities called normality, (2) a community extraction algorithm that uses normality to extract communities and a few characterizing attributes per community, and (3) a summarization and interactive visualization approach for attributed graph exploration. More specifically, (1) we first introduce a new measure to quantify the normality of an attributed subgraph. Our normality measure carefully utilizes structure and attributes together to quantify both the internal consistency and external separability. We then formulate an objective function to automatically infer a few attributes (called the “focus”) and respective attribute weights, so as to maximize the normality score of a given subgraph. Most notably, unlike many other approaches, our measure allows for many cross-edges as long as they can be “exonerated;” i.e., either (i) are expected under a null graph model, and/or (ii) their boundary nodes do not exhibit the focus attributes. Next, (2) we propose AMEN (for Attributed Mining of Entity Networks), an algorithm that simultaneously discovers the communities and their respective focus in a given graph, with a goal to maximize the total normality. Communities for which a focus that yields high normality cannot be found are considered low quality or anomalous. Last, (3) we formulate a summarization task with a multi-criteria objective, which selects a subset of the communities that (i) cover the entire graph well, are (ii) high quality and (iii) diverse in their focus attributes. We further design an interactive visualization interface that presents the communities to a user in an interpretable, user-friendly fashion. The user can explore all the communities, analyze various algorithm-generated summaries, as well as devise their own summaries interactively to characterize the network in a succinct way. As the experiments on real-world attributed graphs show, our proposed approaches effectively find anomalous communities and outperform several existing measures and methods, such as conductance, density, OddBall, and SODA. We also conduct extensive user studies to measure the capability and efficiency that our approach provides to the users toward network summarization, exploration, and sensemaking.
- Leman Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, and Christos Faloutsos. 2012. OPAvion: Mining and visualization in large graphs. In SIGMOD Conference. 717--720. Google Scholar
Digital Library
- Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD. 410--421. Google Scholar
Digital Library
- Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: A survey. DAMI 28, 4 (2014).Google Scholar
- Leman Akoglu, Hanghang Tong, Brendan Meeder, and Christos Faloutsos. 2012. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM. 439--450.Google Scholar
- R. Andersen, F. Chung, and K. Lang. 2006. Local graph partitioning using pagerank vectors. In FOCS. Google Scholar
Digital Library
- Reid Andersen and Kevin J. Lang. 2006. Communities from seed sets. In WWW. 223--232. Google Scholar
Digital Library
- Valerio Arnaboldi, Marco Conti, Andrea Passarella, and Fabio Pezzoni. 2012. Analysis of ego network structure in online social networks.. In SocialCom/PASSAT. IEEE, 31--40. Google Scholar
Digital Library
- James Bailey. 2013. Alternative clustering analysis: A review. In Data Clustering: Algorithms and Applications. CRC Press, 535--550.Google Scholar
- Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 6 (2003), 1373--1396. Google Scholar
Digital Library
- V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. L. J. S. Mech. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10 (2008), 10008.Google Scholar
Cross Ref
- Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD. 119--128. Google Scholar
Digital Library
- Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In APPROX. Google Scholar
Digital Library
- Duen Horng Chau, Christos Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, and Tina Eliassi-Rad. 2008. GRAPHITE: A visual query system for large graphs. In ICDM Workshops. 963--966. Google Scholar
Digital Library
- Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. 2011. Apolo: Interactive large graph sensemaking by combining machine learning and visualization. In KDD. 739--742. Google Scholar
Digital Library
- Aaron Clauset. 2005. Finding local community structure in networks. Physical Review E 72 (2005), 6.Google Scholar
Cross Ref
- A. Clauset, M. E. J. Newman, and C. Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6 (2004), 066111.Google Scholar
Cross Ref
- Diane J. Cook and Lawrence B. Holder. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994), 231--255. Google Scholar
Digital Library
- Michele Coscia, Giulio Rossetti, Fosca Giannotti, and Dino Pedreschi. 2012. DEMON: A local-first discovery method for overlapping communities. In KDD. 615--623. Google Scholar
Digital Library
- Cody Dunne and Ben Shneiderman. 2013. Motif simplification: Improving network visualization readability with fan, connector, and clique glyphs. In CHI. 3247--3256. Google Scholar
Digital Library
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM. 251--262. Google Scholar
Digital Library
- Gary William Flake, Steve Lawrence, and C. Lee Giles. 2000. Efficient identification of web communities. In KDD.Google Scholar
- Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2014. Overlapping community detection in labeled graphs. Data Mining and Knowledge Discovery 28, 5--6 (2014), 1586--1610. Google Scholar
Digital Library
- Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han. 2010. On community outliers and their efficient detection in information networks. In KDD. 813--822. Google Scholar
Digital Library
- David F. Gleich and C. Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD. 597--605. Google Scholar
Digital Library
- A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report CSD-84-171. UC Berkeley. Google Scholar
Digital Library
- Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD. 855--864. Google Scholar
Digital Library
- Stephan Günnemann, Ines Färber, Brigitte Boden, and Thomas Seidl. 2010. Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In ICDM. 845--850. Google Scholar
Digital Library
- Stephan Günnemann, Ines Farber, Sebastian Raubach, and Thomas Seidl. 2013. Spectral subspace clustering for graphs with feature vectors. In ICDM. IEEE, 231--240.Google Scholar
- Manish Gupta, Arun Mallya, Subhro Roy, Jason H. D. Cho, and Jiawei Han. 2014. Local learning for mining outlier subgraphs from network datasets. In SIAM SDM. 73--81.Google Scholar
- Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. RolX: Structural role extraction and mining in large graphs. In KDD. ACM, 1231--1239. Google Scholar
Digital Library
- Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In SDM. 633--641.Google Scholar
- G. Karpis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1 (1998), 359--392. Google Scholar
Digital Library
- George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20 (1998), 359--392. Google Scholar
Digital Library
- Danai Koutra, Di Jin, Yuanshi Ning, and Christos Faloutsos. 2015. Perseus: An interactive large-scale graph mining and visualization tool. PVLDB 8, 12 (2015), 1924--1927. Google Scholar
Digital Library
- Danai Koutra, U. Kang, Jilles Vreeken, and Christos Faloutsos. 2014. VOG: Summarizing and understanding large graphs. In SDM. 91--99.Google Scholar
- Darong Lai, Xiangjun Wu, Hongtao Lu, and Christine Nardini. 2011. Learning overlapping communities in complex networks via non-negative matrix factorization. International Journal of Modern Physics C 22, 10 (2011), 1173--1190.Google Scholar
Cross Ref
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD. ACM, 177--187. Google Scholar
Digital Library
- Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD. 177--187. Google Scholar
Digital Library
- Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In WWW. 695--704. Google Scholar
Digital Library
- Nan Li, Ziyu Guan, Lijie Ren, Jian Wu, Jiawei Han, and Xifeng Yan. 2013. gIceberg: Towards iceberg analysis in large graphs. In ICDE. 1021--1032. Google Scholar
Digital Library
- Nan Li, Huan Sun, Kyle Chipman, Jemin George, and Xifeng Yan. 2014. A probabilistic approach to uncovering attributed graph anomalies. In SIAM SDM. 82--90.Google Scholar
- Rui Li, Chi Wang, and Kevin Chen-Chuan Chang. 2014. User profiling in an ego network: Co-profiling attributes and relationships. In WWW. 819--830. Google Scholar
Digital Library
- Yike Liu, Abhilash Dighe, Tara Safavi, and Danai Koutra. 2016. A graph summarization: A survey. CoRR abs/1612.04883 (2016).Google Scholar
- Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In ICML, vol. 148. 585--592. Google Scholar
Digital Library
- Douglas S. Massey and Nancy A. Denton. 1988. The dimensions of residential segregation. Social Forces 67, 2 (1988), 218--315.Google Scholar
Cross Ref
- Julian J. McAuley and Jure Leskovec. 2014. Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data 8, 1 (2014), 4:1--4:28. Google Scholar
Digital Library
- Mary McGlohon, Leman Akoglu, and Christos Faloutsos. 2008. Weighted graphs and disconnected components: patterns and a generator. In KDD. 524--532. Google Scholar
Digital Library
- Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 1 (2001), 415--444.Google Scholar
Cross Ref
- J. Moody. 2001. Race, school integration, and friendship segregation in America. American Journal of Sociology 107, 3 (2001), 679--716.Google Scholar
Cross Ref
- George L. Nemhauser and Laurence A. Wolsey. 1978. Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research 3, 3 (1978), 177--188. Google Scholar
Digital Library
- M. E. J. Newman and M. Girvan. 2003. Mixing patterns and community structure in networks. In Statistical Mechanics of Complex Networks, Vol. 625. 66--87.Google Scholar
Cross Ref
- M. E. J. Newman. 2002. Assortative mixing in networks. Physical Review Letters 89, 20 (2002).Google Scholar
Cross Ref
- M. E. J. Newman. 2003. Mixing patterns in networks. Physical Review E 67 (2003).Google Scholar
- M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America 103, 23 (2006), 8577--8582.Google Scholar
Cross Ref
- M. E. J. Newman. 2010. Networks: An Introduction. Oxford University Press, Oxford; New York. Google Scholar
Cross Ref
- Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In NIPS. Google Scholar
Digital Library
- Donglin Niu, Jennifer G. Dy, and Michael I. Jordan. 2014. Iterative discovery of multiple alternative clustering views. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1340--1353. Google Scholar
Digital Library
- Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In KDD. ACM, 631--636. Google Scholar
Digital Library
- Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics 161, 1--2 (2013), 244--257. Google Scholar
Digital Library
- Bryan Perozzi and Leman Akoglu. 2016. Scalable anomaly ranking of attributed neighborhoods. In SIAM SDM.Google Scholar
- Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller. 2014. Focused clustering and outlier detection in large attributed graphs. In KDD. 1346--1355. Google Scholar
Digital Library
- Bryan Perozzi, Rami Al-Rfou’, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In KDD. 701--710. Google Scholar
Digital Library
- Robert Pienta, James Abello, Minsuk Kahng, and Duen Horng Chau. 2015. Scalable graph exploration and visualization: Sensemaking challenges and opportunities. In BigComp. IEEE Computer Society, 271--278.Google Scholar
- Robert Pienta, Minsuk Kahng, Zhiyuan Lin, Jilles Vreeken, Partha Talukdar, James Abello, Ganesh Parameswaran, and Duen Horng Chau. 2017. FACETS: Adaptive local exploration of large graphs. In SDM.Google Scholar
- Zijie Qi and Ian Davidson. 2009. A principled and flexible framework for finding alternative clusterings. In KDD. 717--726. Google Scholar
Digital Library
- Eunsu Ryu, Yao Rong, Jie Li, and Ashwin Machanavajjhala. 2013. Curso: Protect yourself from curse of attribute inference: A social network privacy-analyzer. In DBSocial. 13--18. Google Scholar
Digital Library
- Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast algorithm for modularity-based graph clustering. In AAAI. Google Scholar
Digital Library
- Arlei Silva, Wagner Meira Jr., and Mohammed J. Zaki. 2012. Mining attribute-structure correlated patterns in large attributed graphs. PVLDB 5, 5 (2012), 466--477. Google Scholar
Digital Library
- Daniel A. Spielman and Shang-Hua Teng. 2004. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC. 81--90. Google Scholar
Digital Library
- Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In WWW. 1067--1077. Google Scholar
Digital Library
- Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580. Google Scholar
Digital Library
- Amanda L. Traud, Peter J. Mucha, and Mason A. Porter. 2012. Social structure of Facebook networks. Physica A: Statistical Mechanics and its Applications 391, 16 (2012), 4165--4180.Google Scholar
- Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD. Google Scholar
Digital Library
- Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. DOULION: Counting triangles in massive graphs with a coin. In KDD. 837--846. Google Scholar
Digital Library
- Tatiana von Landesberger, Arjan Kuijper, Tobias Schreck, Jrn Kohlhammer, Jarke J. van Wijk, Jean-Daniel Fekete, and Dieter W. Fellner. 2011. Visual analysis of large graphs: State-of-the-art and future research challenges. Computer Graphics Forum 30, 6 (2011), 1719--1749.Google Scholar
Cross Ref
- Joyce Jiyoung Whang, David F. Gleich, and Inderjit S. Dhillon. 2013. Overlapping community detection using seed set expansion. In CIKM. 2099--2108. Google Scholar
Digital Library
- Jierui Xie, Stephen Kelley, and Boleslaw K. Szymanski. 2013. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys 45, 4 (2013), 43. Google Scholar
Digital Library
- Jaewon Yang and Jure Leskovec. 2012. Community-affiliation graph model for overlapping network community detection. In ICDM. 1170--1175. Google Scholar
Digital Library
- Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. In ICDM. 745--754. Google Scholar
Digital Library
- Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In WSDM. 587--596. Google Scholar
Digital Library
- Jaewon Yang, Julian J. McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In ICDM. 1151--1156.Google Scholar
- Shengqi Yang, Yanan Xie, Yinghui Wu, Tianyi Wu, Huan Sun, Jian Wu, and Xifeng Yan. 2014. SLQ: A user-friendly graph querying system. In SIGMOD. 893--896. Google Scholar
Digital Library
- Ning Zhang, Yuanyuan Tian, and Jignesh M. Patel. 2010. Discovery-driven graph summarization. In ICDE. 880--891.Google Scholar
- Yang Zhang and Srinivasan Parthasarathy. 2012. Extracting, analyzing and visualizing triangle k-core motifs within networks. In ICDE. 1049--1060. Google Scholar
Digital Library
- Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In WWW. 531--540. Google Scholar
Digital Library
- Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2, 1 (2009), 718--729. Google Scholar
Digital Library
Index Terms
Discovering Communities and Anomalies in Attributed Graphs: Interactive Visual Exploration and Summarization
Recommendations
Mining Attribute Evolution Rules in Dynamic Attributed Graphs
Big Data Analytics and Knowledge DiscoveryAbstractA dynamic attributed graph is a graph that changes over time and where each vertex is described using multiple continuous attributes. Such graphs are found in numerous domains, e.g., social network analysis. Several studies have been done on ...
Spectral Clustering of Attributed Multi-relational Graphs
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningGraph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to ...
When Structure Meets Keywords: Cohesive Attributed Community Search
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementAs an online, query-dependent variant of the well-known community detection problem, community search has been studied for years to find communities containing the query vertices. Along with the generation of graphs with rich attribute information, ...





Comments