skip to main content
research-article
Public Access

Discovering Communities and Anomalies in Attributed Graphs: Interactive Visual Exploration and Summarization

Authors Info & Claims
Published:10 January 2018Publication History
Skip Abstract Section

Abstract

Given a network with node attributes, how can we identify communities and spot anomalies? How can we characterize, describe, or summarize the network in a succinct way? Community extraction requires a measure of quality for connected subgraphs (e.g., social circles). Existing subgraph measures, however, either consider only the connectedness of nodes inside the community and ignore the cross-edges at the boundary (e.g., density) or only quantify the structure of the community and ignore the node attributes (e.g., conductance). In this work, we focus on node-attributed networks and introduce: (1) a new measure of subgraph quality for attributed communities called normality, (2) a community extraction algorithm that uses normality to extract communities and a few characterizing attributes per community, and (3) a summarization and interactive visualization approach for attributed graph exploration. More specifically, (1) we first introduce a new measure to quantify the normality of an attributed subgraph. Our normality measure carefully utilizes structure and attributes together to quantify both the internal consistency and external separability. We then formulate an objective function to automatically infer a few attributes (called the “focus”) and respective attribute weights, so as to maximize the normality  score of a given subgraph. Most notably, unlike many other approaches, our measure allows for many cross-edges as long as they can be “exonerated;” i.e., either (i) are expected under a null graph model, and/or (ii) their boundary nodes do not exhibit the focus attributes. Next, (2) we propose AMEN (for Attributed Mining of Entity Networks), an algorithm that simultaneously discovers the communities and their respective focus in a given graph, with a goal to maximize the total normality. Communities for which a focus that yields high normality  cannot be found are considered low quality or anomalous. Last, (3) we formulate a summarization task with a multi-criteria objective, which selects a subset of the communities that (i) cover the entire graph well, are (ii) high quality and (iii) diverse in their focus attributes. We further design an interactive visualization interface that presents the communities to a user in an interpretable, user-friendly fashion. The user can explore all the communities, analyze various algorithm-generated summaries, as well as devise their own summaries interactively to characterize the network in a succinct way. As the experiments on real-world attributed graphs show, our proposed approaches effectively find anomalous communities and outperform several existing measures and methods, such as conductance, density, OddBall, and SODA. We also conduct extensive user studies to measure the capability and efficiency that our approach provides to the users toward network summarization, exploration, and sensemaking.

References

  1. Leman Akoglu, Duen Horng Chau, U. Kang, Danai Koutra, and Christos Faloutsos. 2012. OPAvion: Mining and visualization in large graphs. In SIGMOD Conference. 717--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Leman Akoglu, Mary McGlohon, and Christos Faloutsos. 2010. Oddball: Spotting anomalies in weighted graphs. In PAKDD. 410--421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: A survey. DAMI 28, 4 (2014).Google ScholarGoogle Scholar
  4. Leman Akoglu, Hanghang Tong, Brendan Meeder, and Christos Faloutsos. 2012. PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In SDM. 439--450.Google ScholarGoogle Scholar
  5. R. Andersen, F. Chung, and K. Lang. 2006. Local graph partitioning using pagerank vectors. In FOCS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Reid Andersen and Kevin J. Lang. 2006. Communities from seed sets. In WWW. 223--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Valerio Arnaboldi, Marco Conti, Andrea Passarella, and Fabio Pezzoni. 2012. Analysis of ego network structure in online social networks.. In SocialCom/PASSAT. IEEE, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. James Bailey. 2013. Alternative clustering analysis: A review. In Data Clustering: Algorithms and Applications. CRC Press, 535--550.Google ScholarGoogle Scholar
  9. Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15, 6 (2003), 1373--1396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. L. J. S. Mech. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10 (2008), 10008.Google ScholarGoogle ScholarCross RefCross Ref
  11. Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang. 2015. Heterogeneous network embedding via deep architectures. In KDD. 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Moses Charikar. 2000. Greedy approximation algorithms for finding dense components in a graph. In APPROX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Duen Horng Chau, Christos Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, and Tina Eliassi-Rad. 2008. GRAPHITE: A visual query system for large graphs. In ICDM Workshops. 963--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. 2011. Apolo: Interactive large graph sensemaking by combining machine learning and visualization. In KDD. 739--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Aaron Clauset. 2005. Finding local community structure in networks. Physical Review E 72 (2005), 6.Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Clauset, M. E. J. Newman, and C. Moore. 2004. Finding community structure in very large networks. Physical Review E 70, 6 (2004), 066111.Google ScholarGoogle ScholarCross RefCross Ref
  17. Diane J. Cook and Lawrence B. Holder. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research 1 (1994), 231--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Michele Coscia, Giulio Rossetti, Fosca Giannotti, and Dino Pedreschi. 2012. DEMON: A local-first discovery method for overlapping communities. In KDD. 615--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cody Dunne and Ben Shneiderman. 2013. Motif simplification: Improving network visualization readability with fan, connector, and clique glyphs. In CHI. 3247--3256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Faloutsos, P. Faloutsos, and C. Faloutsos. 1999. On power-law relationships of the internet topology. In ACM SIGCOMM. 251--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gary William Flake, Steve Lawrence, and C. Lee Giles. 2000. Efficient identification of web communities. In KDD.Google ScholarGoogle Scholar
  22. Esther Galbrun, Aristides Gionis, and Nikolaj Tatti. 2014. Overlapping community detection in labeled graphs. Data Mining and Knowledge Discovery 28, 5--6 (2014), 1586--1610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han. 2010. On community outliers and their efficient detection in information networks. In KDD. 813--822. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David F. Gleich and C. Seshadhri. 2012. Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In KDD. 597--605. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. V. Goldberg. 1984. Finding a Maximum Density Subgraph. Technical Report CSD-84-171. UC Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In KDD. 855--864. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Stephan Günnemann, Ines Färber, Brigitte Boden, and Thomas Seidl. 2010. Subspace clustering meets dense subgraph mining: A synthesis of two paradigms. In ICDM. 845--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Stephan Günnemann, Ines Farber, Sebastian Raubach, and Thomas Seidl. 2013. Spectral subspace clustering for graphs with feature vectors. In ICDM. IEEE, 231--240.Google ScholarGoogle Scholar
  29. Manish Gupta, Arun Mallya, Subhro Roy, Jason H. D. Cho, and Jiawei Han. 2014. Local learning for mining outlier subgraphs from network datasets. In SIAM SDM. 73--81.Google ScholarGoogle Scholar
  30. Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. 2012. RolX: Structural role extraction and mining in large graphs. In KDD. ACM, 1231--1239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xiao Huang, Jundong Li, and Xia Hu. 2017. Accelerated attributed network embedding. In SDM. 633--641.Google ScholarGoogle Scholar
  32. G. Karpis and V. Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20, 1 (1998), 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20 (1998), 359--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Danai Koutra, Di Jin, Yuanshi Ning, and Christos Faloutsos. 2015. Perseus: An interactive large-scale graph mining and visualization tool. PVLDB 8, 12 (2015), 1924--1927. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Danai Koutra, U. Kang, Jilles Vreeken, and Christos Faloutsos. 2014. VOG: Summarizing and understanding large graphs. In SDM. 91--99.Google ScholarGoogle Scholar
  36. Darong Lai, Xiangjun Wu, Hongtao Lu, and Christine Nardini. 2011. Learning overlapping communities in complex networks via non-negative matrix factorization. International Journal of Modern Physics C 22, 10 (2011), 1173--1190.Google ScholarGoogle ScholarCross RefCross Ref
  37. Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD. ACM, 177--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. 2005. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD. 177--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. 2008. Statistical properties of community structure in large social and information networks. In WWW. 695--704. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nan Li, Ziyu Guan, Lijie Ren, Jian Wu, Jiawei Han, and Xifeng Yan. 2013. gIceberg: Towards iceberg analysis in large graphs. In ICDE. 1021--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nan Li, Huan Sun, Kyle Chipman, Jemin George, and Xifeng Yan. 2014. A probabilistic approach to uncovering attributed graph anomalies. In SIAM SDM. 82--90.Google ScholarGoogle Scholar
  42. Rui Li, Chi Wang, and Kevin Chen-Chuan Chang. 2014. User profiling in an ego network: Co-profiling attributes and relationships. In WWW. 819--830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yike Liu, Abhilash Dighe, Tara Safavi, and Danai Koutra. 2016. A graph summarization: A survey. CoRR abs/1612.04883 (2016).Google ScholarGoogle Scholar
  44. Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, and Philip S. Yu. 2006. Spectral clustering for multi-type relational data. In ICML, vol. 148. 585--592. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Douglas S. Massey and Nancy A. Denton. 1988. The dimensions of residential segregation. Social Forces 67, 2 (1988), 218--315.Google ScholarGoogle ScholarCross RefCross Ref
  46. Julian J. McAuley and Jure Leskovec. 2014. Discovering social circles in ego networks. ACM Transactions on Knowledge Discovery from Data 8, 1 (2014), 4:1--4:28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mary McGlohon, Leman Akoglu, and Christos Faloutsos. 2008. Weighted graphs and disconnected components: patterns and a generator. In KDD. 524--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Miller McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 1 (2001), 415--444.Google ScholarGoogle ScholarCross RefCross Ref
  49. J. Moody. 2001. Race, school integration, and friendship segregation in America. American Journal of Sociology 107, 3 (2001), 679--716.Google ScholarGoogle ScholarCross RefCross Ref
  50. George L. Nemhauser and Laurence A. Wolsey. 1978. Best algorithms for approximating the maximum of a submodular set function. Mathematics of Operations Research 3, 3 (1978), 177--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. M. E. J. Newman and M. Girvan. 2003. Mixing patterns and community structure in networks. In Statistical Mechanics of Complex Networks, Vol. 625. 66--87.Google ScholarGoogle ScholarCross RefCross Ref
  52. M. E. J. Newman. 2002. Assortative mixing in networks. Physical Review Letters 89, 20 (2002).Google ScholarGoogle ScholarCross RefCross Ref
  53. M. E. J. Newman. 2003. Mixing patterns in networks. Physical Review E 67 (2003).Google ScholarGoogle Scholar
  54. M. E. J. Newman. 2006. Modularity and community structure in networks. Proceedings of the National Academy of Sciences of the United States of America 103, 23 (2006), 8577--8582.Google ScholarGoogle ScholarCross RefCross Ref
  55. M. E. J. Newman. 2010. Networks: An Introduction. Oxford University Press, Oxford; New York. Google ScholarGoogle ScholarCross RefCross Ref
  56. Andrew Y. Ng, Michael I. Jordan, and Yair Weiss. 2001. On spectral clustering: Analysis and an algorithm. In NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Donglin Niu, Jennifer G. Dy, and Michael I. Jordan. 2014. Iterative discovery of multiple alternative clustering views. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1340--1353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Caleb C. Noble and Diane J. Cook. 2003. Graph-based anomaly detection. In KDD. ACM, 631--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jeffrey Pattillo, Alexander Veremyev, Sergiy Butenko, and Vladimir Boginski. 2013. On the maximum quasi-clique problem. Discrete Applied Mathematics 161, 1--2 (2013), 244--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Bryan Perozzi and Leman Akoglu. 2016. Scalable anomaly ranking of attributed neighborhoods. In SIAM SDM.Google ScholarGoogle Scholar
  61. Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller. 2014. Focused clustering and outlier detection in large attributed graphs. In KDD. 1346--1355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Bryan Perozzi, Rami Al-Rfou’, and Steven Skiena. 2014. DeepWalk: Online learning of social representations. In KDD. 701--710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Robert Pienta, James Abello, Minsuk Kahng, and Duen Horng Chau. 2015. Scalable graph exploration and visualization: Sensemaking challenges and opportunities. In BigComp. IEEE Computer Society, 271--278.Google ScholarGoogle Scholar
  64. Robert Pienta, Minsuk Kahng, Zhiyuan Lin, Jilles Vreeken, Partha Talukdar, James Abello, Ganesh Parameswaran, and Duen Horng Chau. 2017. FACETS: Adaptive local exploration of large graphs. In SDM.Google ScholarGoogle Scholar
  65. Zijie Qi and Ian Davidson. 2009. A principled and flexible framework for finding alternative clusterings. In KDD. 717--726. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Eunsu Ryu, Yao Rong, Jie Li, and Ashwin Machanavajjhala. 2013. Curso: Protect yourself from curse of attribute inference: A social network privacy-analyzer. In DBSocial. 13--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast algorithm for modularity-based graph clustering. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Arlei Silva, Wagner Meira Jr., and Mohammed J. Zaki. 2012. Mining attribute-structure correlated patterns in large attributed graphs. PVLDB 5, 5 (2012), 466--477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Daniel A. Spielman and Shang-Hua Teng. 2004. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC. 81--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale information network embedding. In WWW. 1067--1077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient aggregation for graph summarization. In SIGMOD. 567--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Amanda L. Traud, Peter J. Mucha, and Mason A. Porter. 2012. Social structure of Facebook networks. Physica A: Statistical Mechanics and its Applications 391, 16 (2012), 4165--4180.Google ScholarGoogle Scholar
  73. Charalampos E. Tsourakakis, Francesco Bonchi, Aristides Gionis, Francesco Gullo, and Maria A. Tsiarli. 2013. Denser than the densest subgraph: Extracting optimal quasi-cliques with quality guarantees. In KDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Charalampos E. Tsourakakis, U. Kang, Gary L. Miller, and Christos Faloutsos. 2009. DOULION: Counting triangles in massive graphs with a coin. In KDD. 837--846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Tatiana von Landesberger, Arjan Kuijper, Tobias Schreck, Jrn Kohlhammer, Jarke J. van Wijk, Jean-Daniel Fekete, and Dieter W. Fellner. 2011. Visual analysis of large graphs: State-of-the-art and future research challenges. Computer Graphics Forum 30, 6 (2011), 1719--1749.Google ScholarGoogle ScholarCross RefCross Ref
  76. Joyce Jiyoung Whang, David F. Gleich, and Inderjit S. Dhillon. 2013. Overlapping community detection using seed set expansion. In CIKM. 2099--2108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Jierui Xie, Stephen Kelley, and Boleslaw K. Szymanski. 2013. Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Computing Surveys 45, 4 (2013), 43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Jaewon Yang and Jure Leskovec. 2012. Community-affiliation graph model for overlapping network community detection. In ICDM. 1170--1175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Jaewon Yang and Jure Leskovec. 2012. Defining and evaluating network communities based on ground-truth. In ICDM. 745--754. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Jaewon Yang and Jure Leskovec. 2013. Overlapping community detection at scale: A nonnegative matrix factorization approach. In WSDM. 587--596. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Jaewon Yang, Julian J. McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In ICDM. 1151--1156.Google ScholarGoogle Scholar
  82. Shengqi Yang, Yanan Xie, Yinghui Wu, Tianyi Wu, Huan Sun, Jian Wu, and Xifeng Yan. 2014. SLQ: A user-friendly graph querying system. In SIGMOD. 893--896. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Ning Zhang, Yuanyuan Tian, and Jignesh M. Patel. 2010. Discovery-driven graph summarization. In ICDE. 880--891.Google ScholarGoogle Scholar
  84. Yang Zhang and Srinivasan Parthasarathy. 2012. Extracting, analyzing and visualizing triangle k-core motifs within networks. In ICDE. 1049--1060. Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Elena Zheleva and Lise Getoor. 2009. To join or not to join: The illusion of privacy in social networks with mixed public and private user profiles. In WWW. 531--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2, 1 (2009), 718--729. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering Communities and Anomalies in Attributed Graphs: Interactive Visual Exploration and Summarization

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Knowledge Discovery from Data
            ACM Transactions on Knowledge Discovery from Data  Volume 12, Issue 2
            Survey Papers and Regular Papers
            April 2018
            376 pages
            ISSN:1556-4681
            EISSN:1556-472X
            DOI:10.1145/3178544
            Issue’s Table of Contents

            Copyright © 2018 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 10 January 2018
            • Accepted: 1 August 2017
            • Revised: 1 June 2017
            • Received: 1 September 2016
            Published in tkdd Volume 12, Issue 2

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader