Abstract
The popularity of intelligent devices provides straightforward access to the Internet and online social networks. However, the quick and easy data updates from networks also benefit the risk spreading, such as rumor, malware, or computer viruses. To this end, this article studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, that is, the observed infected graph GN of the network at some time. Prior arts have adopted various statistical quantities such as degree, distance, or infection size to reflect the structural centrality of the source. In this article, we propose a new metric that we call the infected tree entropy (ITE), to utilize richer underlying structural features for source detection. Our idea of ITE is inspired by the conception of structural entropy [21], which demonstrated that the minimization of average bits to encode the network structures with different partitions is the principle for detecting the natural or true structures in real-world networks. Accordingly, our proposed ITE based estimator for the source tries to minimize the coding of network partitions brought by the infected tree rooted at all the potential sources, thus minimizing the structural deviation between the cascades from the potential sources and the actual infection process included in GN. On polynomially growing geometric trees, with increasing tree heterogeneity, the ITE estimator remarkably yields more reliable detection under only moderate infection sizes, and returns an asymptotically complete detection. In contrast, for regular expanding trees, we still observe guaranteed detection probability of ITE estimator even with an infinite infection size, thanks to the degree regularity property. We also algorithmically realize the ITE based detection that enjoys linear time complexity via a message-passing scheme, and further extend it to general graphs. Extensive experiments on synthetic and real datasets confirm the superiority of ITE to the baselines. For example, ITE returns an accuracy of 85%, ranking the source among the top 10%, far exceeding 55% of the classic algorithm on scale-free networks.
- [1] . 2013. A fast Monte Carlo algorithm for source localization on graphs. In Proceeding of the Wavelets and Sparsity XV, Vol. 8858, 429–434.Google Scholar
- [2] . 2009. Entropy measures for networks: Toward an information theory of complex topologies. Physical Review E 80, 4 (2009), 045102.Google Scholar
Cross Ref
- [3] . 1999. Emergence of scaling in random networks. Science 286, 5439 (1999), 509–512.Google Scholar
Cross Ref
- [4] . 2009. Entropy of network ensembles. Physical Review E 79, 3 (2009), 036114.Google Scholar
Cross Ref
- [5] . 1977. Information theory, distance matrix, and molecular branching. The Journal of Chemical Physics 67, 10 (1977), 4517–4533.Google Scholar
Cross Ref
- [6] . 2006. The Laplacian of a graph as a density matrix: A basic combinatorial approach to separability of mixed states. Annals of Combinatorics 10, 3 (2006), 291–317.Google Scholar
Cross Ref
- [7] . 2021. Information sources estimation in time-varying networks. IEEE Transactions on Information Forensics and Security 16 (2021), 2621–2636.Google Scholar
Digital Library
- [8] . 2017. Rumor source detection under querying with untruthful answers. In Proceedings of the IEEE INFOCOM. 1–9.Google Scholar
Cross Ref
- [9] . 2018. Necessary and sufficient budgets in information source finding with querying: Adaptivity gap. In Proceedings of the IEEE ISIT. 2261–2265.Google Scholar
Digital Library
- [10] . 2009. Compression of graphical structures. In Proceedings of the IEEE ISIT. 364–368.Google Scholar
- [11] . 2011. Identifying the starting point of a spreading process in complex networks. Physical Review E 84, 5 (2011), 056105.Google Scholar
Cross Ref
- [12] . 2008. Information processing in complex networks: Graph entropy and information functionals. Applied Mathematics and Computation 201, 1-2 (2008), 82–94.Google Scholar
Cross Ref
- [13] . 2013. Rooting out the rumor culprit from suspects. In Proceedings of the IEEE ISIT. 2671–2675.Google Scholar
Cross Ref
- [14] . 2012. Predicting the sources of an outbreak with a spectral technique. arXiv:1211.2333.Google Scholar
- [15] . 2001. Talk of the network: A complex systems look at the underlying process of word-of-mouth. Marketing Letters 12, 3 (2001), 211–223.Google Scholar
Cross Ref
- [16] . 2015. K-center: An approach on the multi-source identification of information diffusion. IEEE Transactions on Information Forensics and Security 10, 12 (2015), 2616–2626.Google Scholar
Digital Library
- [17] . 2013. Rumor source detection under probabilistic sampling. In Proceedings of the IEEE ISIT. 2184–2188.Google Scholar
Cross Ref
- [18] . 2013. Konect: The koblenz network collection. In Proceedings of the WWW. 1343–1350.Google Scholar
Digital Library
- [19] . 2010. Finding effectors in social networks. In Proceedings of the ACM SIGKDD. 1059–1068.Google Scholar
Digital Library
- [20] . 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (June 2014). Retrieved May 10, 2021 from http://snap.stanford.edu/data.Google Scholar
- [21] . 2016. Structural information and dynamical complexity of networks. IEEE Transactions on Information Theory 62, 6 (2016), 3290–3339.Google Scholar
Digital Library
- [22] . 2019. REM: From structural entropy to community structure deception. In Advances in Neural Information Processing Systems. 12938–12948.Google Scholar
- [23] . 2014. Inferring the origin of an epidemic with a dynamic message-passing algorithm. Physical Review E 90, 1 (2014), 012801.Google Scholar
Cross Ref
- [24] . 2013. Estimating infection sources in a network with incomplete observations. In Proceedings of the IEEE GlobalSIP. 301–304.Google Scholar
Cross Ref
- [25] . 2013. Finding an infection source under the SIS model. In Proceedings of the IEEE ICASSP. 2930–2934.Google Scholar
Cross Ref
- [26] . 2013. Identifying infection sources and regions in large networks. IEEE Transactions on Signal Processing 61, 11 (2013), 2850–2865.Google Scholar
Digital Library
- [27] . 2012. Locating the source of diffusion in large-scale networks. Physical Review Letters 109, 6 (2012), 068702.Google Scholar
Cross Ref
- [28] . 2012. Spotting culprits in epidemics: How many and which ones?. In Proceedings of the ICDM. 11–20.Google Scholar
Digital Library
- [29] . 1955. Life, information theory, and topology. Bulletin of Mathematical Biophysics 17, 3 (1955), 229–235.Google Scholar
Cross Ref
- [30] . 1984. Discrimination of isomeric structures using information theoretic topological indices. Journal of Computational Chemistry 5, 6 (1984), 581–588.Google Scholar
Cross Ref
- [31] . 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences of the United States of America 105, 4 (2008), 1118–1123.Google Scholar
- [32] . 2011. Rumors in a network: Who’s the culprit? IEEE Transactions on Information Theory 57, 8 (2011), 5163–5181.Google Scholar
Digital Library
- [33] . 2016. Finding rumor sources on random trees. Operations Research 64, 3 (2016), 736–755.Google Scholar
Digital Library
- [34] . 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (1948), 379–423.Google Scholar
Cross Ref
- [35] . 2018. Estimating infection sources in networks using partial timestamps. IEEE Transactions on Information Forensics and Security 13, 12 (2018), 3035–3049.Google Scholar
Digital Library
- [36] . 2014. Rumor source detection with multiple observations: Fundamental limits and algorithms. In Proceedings of the ACM SIGMETRICS.Google Scholar
Digital Library
- [37] . 1998. Collective dynamics of “small-world” networks. Nature 393, 6684 (1998), 440–442.Google Scholar
Cross Ref
- [38] . 2018. Rumor source detection in finite graphs with boundary effects by message-passing algorithms. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 175–192.Google Scholar
- [39] . 2016. Locating the contagion source in networks with partial timestamps. Data Mining and Knowledge Discovery 30, 5 (2016), 1217–1248.Google Scholar
Digital Library
- [40] . 2014. Information source detection in the SIR model: A sample-path-based approach. IEEE/ACM Transactions on Networking 24, 1 (2014), 408–421.Google Scholar
Digital Library
- [41] . 2015. Source localization in networks: Trees and beyond. arXiv:1510.01814.Google Scholar
Index Terms
Finding the Source in Networks: An Approach Based on Structural Entropy
Recommendations
ITE: A Structural Entropy Based Approach for Source Detection
IEEE INFOCOM 2021 - IEEE Conference on Computer CommunicationsThis paper studies the problem of source detection, which is to infer the source node out of an aftermath of a cascade, i.e., the observed infected graph G<inf>N</inf> of the network at some time. Prior arts have adopted various statistical quantities ...
Distributed Community Detection over Blockchain Networks Based on Structural Entropy
BSCI '19: Proceedings of the 2019 ACM International Symposium on Blockchain and Secure Critical InfrastructureBlockchain technology provides a groundbreaking computing paradigm that tackles problems in a completely decentralised manner. As the underlying infrastructure and protocol of blockchain, blockchain networks convey communications and coordination across ...
An HMM and structural entropy based detector for Android malware
Smartphones are becoming more and more popular and, as a consequence, malware writers are increasingly engaged to develop new threats and propagate them through official and third-party markets. In addition to the propagation vectors, malware is also ...






Comments