Abstract
Graphlets are induced subgraph patterns that are crucial to the understanding of the structure and function of a large network. A lot of effort has been devoted to calculating graphlet statistics where random walk-based approaches are commonly used to access restricted graphs through the available application programming interfaces (APIs). However, most of them merely consider individual networks while overlooking the strong coupling between different networks. In this article, we estimate the graphlet concentration in multiplex networks with real-world applications. An inter-layer edge connects two nodes in different layers if they actually belong to the same node. The access to a multiplex network is restrictive in the sense that the upper layer allows random walk sampling, whereas the nodes of lower layers can be accessed only through the inter-layer edges and only support random node or edge sampling. To cope with this new challenge, we define a suit of two-layer graphlets and propose novel random walk sampling algorithms to estimate the proportion of all the three-node graphlets. An analytical bound on the sampling steps is proved to guarantee the convergence of our unbiased estimator. We further generalize our algorithm to explore the tradeoff between the estimated accuracy of different graphlets when the sample budget is split into different layers. Experimental evaluation on real-world and synthetic multiplex networks demonstrates the accuracy and high efficiency of our unbiased estimators.
- Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, and Ramana Kompella. 2014. Graph sample and hold: A framework for big-graph analytics. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1446–1455. Google Scholar
Digital Library
- N. K. Ahmed, N. Duffield, T. Willke, and R. A. Rossi. 2017. On sampling from massive graph streams. arXiv:1703.02625. Retrieved from https://arxiv.org/abs/1703.02625. Google Scholar
Digital Library
- N. K. Ahmed, N. Duffield, T. L. Willke, and R. A. Rossi. 2017. On sampling from massive graph streams. VLDB J. 10, 11 (2017), 1430–1441. Google Scholar
Digital Library
- N. K. Ahmed, J. Neville, and R. Kompella. 2013. Network sampling: From static to streaming graphs. ACM Trans. Knowl. Discov. Data 8, 2 (2013), 1–56. Google Scholar
Digital Library
- N. K. Ahmed, J. Neville, R. A. Rossi, and N. Duffield. 2015. Efficient graphlet counting for large networks. In Proceedings of the International Conference on Data Mining. IEEE, 1–10. Google Scholar
Digital Library
- Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Rev. Mod. Phys. 74, 1 (2002), 47. Google Scholar
Digital Library
- Massoud Amin. 2002. Toward secure and resilient interdependent infrastructures. J. Infrastruct. Syst. 8, 3 (2002), 67–75.Google Scholar
Cross Ref
- Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 16–24. Google Scholar
Digital Library
- M. A. Bhuiyan, M. Rahman, M. Rahman, and Al. H. Mohammad. 2012. Guise: Uniform sampling of graphlets for large graph analysis. In Proceedings of the International Conference on Data Mining. IEEE, 91–100. Google Scholar
Digital Library
- Hanjo D. Boekhout, Walter A. Kosters, and Frank W. Takes. 2018. Counting multilayer temporal motifs in complex networks. In Proceedings of the International Conference on Complex Networks and Their Applications. Springer, 565–577.Google Scholar
- Béla Bollobás and Bollobás Béla. 2001. Random Graphs. Number 73. Cambridge University Press.Google Scholar
- S. P. Borgatti, A. Mehra, D. J. Brass, and G. Labianca. 2009. Network analysis in the social sciences. Science 323, 5916 (2009), 892–895.Google Scholar
- Fabio Celli, F. Marta L. Di Lascio, Matteo Magnani, Barbara Pacelli, and Luca Rossi. 2010. Social network data and practices: The case of Friendfeed. In Proceedings of the International Conference on Social Computing, Behavioral Modeling and Prediction.Lecture Notes in Computer Science. Springer, Berlin. Google Scholar
Digital Library
- X. Chen, Y. Li, P. Wang, and J. Lui. 2016. A general framework for estimating graphlet statistics via random walk. VLDB J. 10, 3 (2016), 253–264. Google Scholar
Digital Library
- Kai-Min Chung, Henry Lam, Zhenming Liu, and Michael Mitzenmacher. 2012. Chernoff-Hoeffding bounds for Markov chains: Generalized and simplified. In Proceedings of the Symposium on Theoretical Aspects of Computer Science (STACS’12).Google Scholar
- G. M. Coclite, M. Garavello, and B. Piccoli. 2005. Traffic flow on a road network. SIAM J. Math. Anal. 36, 6 (2005), 1862–1886.Google Scholar
Digital Library
- Yuxiao Dong, Jie Tang, Sen Wu, Jilei Tian, Nitesh V Chawla, Jinghai Rao, and Huanhuan Cao. 2012. Link prediction and recommendation across heterogeneous social networks. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining. IEEE, 181–190. Google Scholar
Digital Library
- Charles J. Geyer. 2005. Markov chain Monte Carlo lecture notes.Google Scholar
- M. Gjoka, C. T. Butts, M. Kurant, and A. Markopoulou. 2011. Multigraph sampling of online social networks. IEEE J. Select. Areas Commun. 29, 9 (2011), 1893–1905.Google Scholar
Cross Ref
- J. W. Godfrey. 1969. The mechanism of a road network. Traffic Eng. Contr. 8, 8 (1969).Google Scholar
- Qingyuan Gong, Yang Chen, Xiaolong Yu, Chao Xu, Zhichun Guo, Yu Xiao, Fehmi Ben Abdesslem, Xin Wang, and Pan Hui. 2019. Exploring the power of social hub services. World Wide Web 22, 6 (2019), 2825–2852.Google Scholar
Cross Ref
- Yacov Y Haimes and Pu Jiang. 2001. Leontief-based model of risk in complex interconnected infrastructures. J. Infrastruct. Syst. 7, 1 (2001), 1–12.Google Scholar
Cross Ref
- Fritz Heider. 1958. The Psychology of Interpersonal Relations. Psychology Press.Google Scholar
- T. Hočevar and J. Demšar. 2014. A combinatorial approach to graphlet counting. Bioinformatics 30, 4 (2014), 559–565.Google Scholar
Cross Ref
- J. M. Hofman and C. H. Wiggins. 2008. Bayesian approach to network modularity. Phys. Rev. Lett. 100, 25 (2008), 258701.Google Scholar
Cross Ref
- Hong Huang, Jie Tang, Lu Liu, JarDer Luo, and Xiaoming Fu. 2015. Triadic closure pattern analysis and prediction in social networks. IEEE Trans. Knowl. Data Eng. 27, 12 (2015), 3374–3389. Google Scholar
Digital Library
- M. Jha, C. Seshadhri, and A. Pinar. 2013. A space efficient streaming algorithm for triangle counting using the birthday paradox. In Proceedings of the International Conference on Knowledge Discovery and Data Mining. ACM, 589–597. Google Scholar
Digital Library
- M. Jha, C. Seshadhri, and A. Pinar. 2015. Path sampling: A fast and provable method for estimating 4-vertex subgraph counts. In Proceedings of the International Conference on World Wide Web. 495–505. Google Scholar
Digital Library
- K. Juszczyszyn, K. Musial, and M. Budka. 2011. Link prediction based on subgraph evolution in dynamic social networks. In Proceedings of the International Conference on Social Computing. IEEE, 27–34.Google Scholar
- L. Katzir and S. J. Hardiman. 2015. Estimating clustering coefficients and size of social networks via random walk. ACM Trans. Web 9, 4 (2015), 19. Google Scholar
Digital Library
- Peter Klimek and Stefan Thurner. 2013. Triadic closure dynamics drives scaling laws in social multiplex networks. New J. Phys. 15, 6 (2013), 063008.Google Scholar
Cross Ref
- Jérôme Kunegis. 2013. Konect: The Koblenz network collection. In Proceedings of the 22nd International Conference on World Wide Web. 1343–1350. Google Scholar
Digital Library
- J. Kunegis, A. Lommatzsch, and C. Bauckhage. 2009. The slashdot zoo: Mining a social network with negative edges. In Proceedings of the International Conference on World Wide Web. ACM, 741–750. Google Scholar
Digital Library
- C. H. Lee, X. Xu, and D. Y. Eun. 2012. Beyond random walk and metropolis-hastings samplers: Why you should not backtrack for unbiased graph sampling. In Proceedings of the ACM Special Interest Group on Performance Evaluation (SIGMETRICS’12), Vol. 40. ACM, 319–330. Google Scholar
Digital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.Google Scholar
- J. Y. Li and M. Y. Yeh. 2011. On sampling type distribution from heterogeneous social networks. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 111–122. Google Scholar
Digital Library
- R. H. Li, J. X. Yu, L. Qin, R. Mao, and T. Jin. 2015. On random walk based graph sampling. In Proceedings of the International Conference on Data Engineering. IEEE, 927–938.Google Scholar
- Matteo Magnani and Luca Rossi. 2011. The ML-model for multi-layer social networks. In Proceedings of the Advances in Social Network Analysis and Mining (ASONAM’11). IEEE Computer Society, 5–12. Google Scholar
Digital Library
- Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. 383–389. Google Scholar
Digital Library
- J. D. Noh and H. Rieger. 2004. Random walks on complex networks. Phys. Rev. Lett. 92, 11 (2004), 118701.Google Scholar
Cross Ref
- N. Pržulj. 2007. Biological network comparison using graphlet degree distribution. Bioinformatics 23, 2 (2007), e177–e183. Google Scholar
Cross Ref
- M. Rahman, M. Bhuiyan, and M. Al. Hasan. 2012. Graft: An approximate graphlet counting algorithm for large graph analysis. In Proceedings of the International Conference on Information and Knowledge Management. ACM, 1467–1471. Google Scholar
Digital Library
- J. Scott. 1988. Social network analysis. Sociology 22, 1 (1988), 109–127.Google Scholar
Cross Ref
- C. Seshadhri, A. Pinar, and T. G. Kolda. 2013. Triadic measures on graphs: The power of wedge sampling. In Proceedings of the International Conference on Data Mining. SIAM, 10–18.Google Scholar
- S. Suri and S. Vassilvitskii. 2011. Counting triangles and the curse of the last reducer. In Proceedings of the International Conference on World Wide Web. 607–614. Google Scholar
Digital Library
- Bimal Viswanath, Alan Mislove, Meeyoung Cha, and Krishna P. Gummadi. 2009. On the evolution of user interaction in Facebook. In Proceedings of the 2nd ACM SIGCOMM Workshop on Social Networks (WOSN’09). Google Scholar
Digital Library
- J. S. Vitter. 1985. Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 1 (1985), 37–57. Google Scholar
Digital Library
- P. Wang, J. Lui, B. Ribeiro, D. Towsley, J. Zhao, and X. Guan. 2014. Efficiently estimating motif statistics of large networks. ACM Trans. Knowl. Discov. Data 9, 2 (2014), 8. Google Scholar
Digital Library
- P. Wang, J. Tao, J. Zhao, and X. Guan. 2015. Moss: A scalable tool for efficiently sampling and counting 4-and 5-node graphlets. arXiv:1509.08089. Retrieved from https://arxiv.org/abs/1509.08089.Google Scholar
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684 (1998), 440–442.Google Scholar
- Sebastian Wernicke and Florian Rasche. 2006. FANMOD: A tool for fast network motif detection. Bioinformatics 22, 9 (2006), 1152–1153. Google Scholar
Digital Library
- O. Younis, M. Krunz, and S. Ramasubramanian. 2006. Node clustering in wireless sensor networks: Recent developments and deployment challenges. IEEE Netw. 20, 3 (2006), 20–25. Google Scholar
Digital Library
- Jing Zhang, Zhanpeng Fang, Wei Chen, and Jie Tang. 2015. Diffusion of “following” links in microblogging networks. IEEE Trans. Knowl. Data Eng. 27, 8 (2015), 2093–2106.Google Scholar
Digital Library
Index Terms
Sampling Graphlets of Multiplex Networks: A Restricted Random Walk Approach
Recommendations
Sampling online social networks by random walk with indirect jumps
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random ...
Faster Random Walks by Rewiring Online Social Networks On-the-Fly
Special Issue: Invited 2014 PODS and EDBT Revised ArticlesMany online social networks feature restrictive web interfaces that only allow the query of a user’s local neighborhood. To enable analytics over such an online social network through its web interface, many recent efforts use Markov Chain Monte Carlo (...
Albatross sampling: robust and effective hybrid vertex sampling for social graphs
HotPlanet '11: Proceedings of the 3rd ACM international workshop on MobiArchNowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing ...






Comments