Abstract
Most previous analysis of Twitter user behavior has focused on individual information cascades and the social followers graph, in which the nodes for two users are connected if one follows the other. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, that the tweet rate distribution, although asymptotically power law, exhibits a lognormal cutoff over finite sample intervals, and that the inter-tweet interval distribution is a power law with exponential cutoff. The retweet graph is small-world and scale-free, like the social graph, but less disassortative and has much stronger clustering. These differences are consistent with it better capturing the real-world social relationships of and trust between users than the social graph. Beyond just understanding and modeling human communication patterns and social networks, applications for alternative, decentralized microblogging systems---both predicting real-word performance and detecting spam---are discussed.
- 1AM. 2013. Censorship-resistant microblogging. http://1am-networks.org.Google Scholar
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 590--512.Google Scholar
- Albert-László Barabási, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A Statist. Mech. Appl. 311, 3--4, 590--614.Google Scholar
Cross Ref
- Albert-László Barabási and Joao Gama Oliveira. 2005. Human dynamics: Darwin and Einstein correspondence patterns. Nature 437, 7063, 1251.Google Scholar
- Christian Bauckhage, Kristian Kersting, and Bashir Rastegarpanah. 2013. The Weibull as a model of shortest path distributions in random networks. In Proceeding of the Workshop on Mining and Learning with Graphs (MLG’13). 1--6.Google Scholar
- Fabrício Benevenuto, Gabriel Magno, Tiago Rodrigues, and VirgílioAlmeida. 2010. Detecting spammers on Twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS’10). 1--9.Google Scholar
- Catherine A. Bliss, Isabel M. Kloumann, Kameron Decker Harrison, Christopher M. Danforth, and Peter Sheridan Dodds. 2012. Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J. Comput. Sci. 3, 388--397.Google Scholar
Cross Ref
- Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan.2003. Directed scale-free graphs. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 132--139. Google Scholar
Digital Library
- Sean Borman. 2009. The expectation maximization algorithm: A short tutorial. http://www.seanborman.com/publications/EM_algorithm.pdf.Google Scholar
- Lawrence Brown, Noah Gans, Avishai Mandelbaum, Anat Sakov, Haipeng Shen,Sergey Zeltyn, and Linda Zhao. 2005. Statistical analysis of a telephone call center. J. Amer. Statist. Assoc. 100, 469, 36--50.Google Scholar
Cross Ref
- Julián Candia, Marta C. González, Pu Wang, Timothy Schoenharl, Greg Madey, and Albert-Laszló Barabási. 2008. Uncovering individual and collective human dynamics from mobile phone records. J. Physica A Math. Theoret. 41, 22, 224015.Google Scholar
Cross Ref
- Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. 2009. A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th International World Wide Web Conference (WWW’09). 721--730. Google Scholar
Digital Library
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the International Conference on Data Mining (ICDM’04). 442--446.Google Scholar
Cross Ref
- Xiaoling Chen, Rajarathnam Chandramouli, and Koduvayur P. Subbalakshmi. 2011. Scam detection in Twitter. In Proceedings of the SIAM Text Mining Workshop (SIAM’11). 1--10.Google Scholar
- Aaron Clauset, Cosma Rohilla Shalizi, and Mark E. J. Newman. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google Scholar
Digital Library
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B39, 1, 1--38.Google Scholar
- Sergey N. Dorogovtsev and Jose F. F. Mendes. 2000. Scaling behavior of developing and decaying networks. Europhys. Lett. 52, 33--39.Google Scholar
Cross Ref
- Sergey N. Dorogovtsev and Jose F. F. Mendes. 2001. Language as an evolving word web. Proc. Royal Soc. London B268, 1485, 2603--2606.Google Scholar
- Sergey N. Dorogovtsev and Jose F. F. Mendes. 2002. Evolution of networks. Adv. Phys. 51, 4, 1079--1187.Google Scholar
- Nick Duffield, Carsten Lund, and Mikkel Thorup. 2005. Estimating flow distributions from sampled flow statistics. IEEE Trans. Netw. 13, 5, 933--946. Google Scholar
Digital Library
- Giorgio Fagiolo. 2007. Clustering in complex directed networks. APS Phys. Rev. E76, 2, 26--107.Google Scholar
- Jacob G. Foster, David V. Foster, Peter Grassberger, and Maya Paczuski. 2010. Edge direction and the structure of networks. Proc. Nat. Acad. Sci. United States Amer. 107, 24, 10815--10820.Google Scholar
Cross Ref
- Miguel Freitas. 2013. Twister: Peer-to-peer microblogging. http://twister.net.co/.Google Scholar
- Maksym Gabielkov and Arnaud Legout. 2012. The complete picture of the Twitter social graph. In Proceedings of the International Conference on Emerging Networking Experiments and Technologies Student Workshop (CoNEXTStudent’12). 19--20. Google Scholar
Digital Library
- Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers---Predicting information cascades in microblogs. In Proceedings of the 3rd Workshop on Online Social Networks (WOSN’10). Google Scholar
Digital Library
- Saptarshi Ghosh, Ajitesh Srivastava, and Niloy Ganguly. 2012. Effects of a soft cut-off on node-degree in the Twitter social network. Comput. Comm. 35, 7, 784--795. Google Scholar
Digital Library
- Kwang-Il Goh and Albert-Lásló Barabási. 2008. Burstiness and memory in complex systems. Europhys. Lett. 81, 4.Google Scholar
Cross Ref
- Leo A. Goodman. 1961. Snowball sampling. Annals Math. Statist. 32, 1, 148--170.Google Scholar
Cross Ref
- Uli Harder and Maya Paczuski. 2006. Correlated dynamics in human printing behavior. Physica A Statist. Mech. Appl. 361, 1, 329--336.Google Scholar
Cross Ref
- Hai-Bo Hu and Xiao-Fan Wong. 2009. Disassortative mixing in online social networks. Europhys. Lett. 86, 1.Google Scholar
Cross Ref
- Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. 2009. Crowdsourcing, attention and productivity. J. Inf. Sci. 35, 6, 758--765. Google Scholar
Digital Library
- Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis (WebKDD/SNA-KDD’07). 56--65. Google Scholar
Digital Library
- Normal L. Johnson, Adrienne W. Kemp, and Samuel Kotz. 2005. Univariate Discrete Distributions, 3rd Ed. John Wiley and Sons.Google Scholar
- Marcus Kaiser. 2008. Mean clustering coefficients: The role of isolated nodes and leafs on clustering measures for small-world networks. New J. Phys. 10, 8.Google Scholar
Cross Ref
- Maurice George Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1--2, 81--93.Google Scholar
Cross Ref
- Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 611--617. Google Scholar
Digital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 591--600. http://an.kaist.ac.kr/traces/WWW2010.html. Google Scholar
Digital Library
- Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. APS Phys. Rev. E73, 1.Google Scholar
- Jure Leskovec and Eric Horvitz. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the International Conference on World Wide Web (WWW’08). 915--924. Google Scholar
Digital Library
- Nelly Litvak and Remco Van Der Hofstad. 2013. Uncovering disassortativity in large scale-free networks. APS Phys. Rev. E87, 2.Google Scholar
- Gilad Lotan, Erhardt Graeff, Mike Ananny, Devin Gaffney, Ian Pearce, and Danah Boyd. 2011. The revolutions were tweeted: Information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Comm. 5, 1375--1405.Google Scholar
- Alfred J. Lotka. 1926. The frequency distribution of scientific productivity. J. Washington Acad. Sci. 16, 12, 317--324.Google Scholar
- Michael Mccord and Mooi C. Chuah. 2011. Spam detection on Twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing (ATC’11). 175--186. Google Scholar
Digital Library
- Geoffrey J. Mclachlan and Thriyambakam Krishnan. 2008. The EM Algorithm and Extensions, 2nd Ed. John Wiley and Sons.Google Scholar
- Stanley Milgram. 1967. The small-world problem. Psychol. Today 1, 1, 61--67.Google Scholar
- Staša Milojevic. 2010. Power-law distributions in information science---Making the case for logarithmic binning. J. Amer. Soc. Inf. Sci. Technol. 61, 12, 2417--2425. Google Scholar
Digital Library
- Toshio Nakagawa and Shunji Osaki. 1975. The discrete Weibull distribution. IEEE Trans. Reliab. R-24, 5, 300--301.Google Scholar
Cross Ref
- Mark E. J. Newman. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 20.Google Scholar
Cross Ref
- Christopher R. Palmer, Georgos Siganos, Michalis Faloutsos, Christos Faloutsos, and Phillip B. Gibbons. 2001. The connectivity and fault-tolerance of the internet topology. In Proceedings of the Workshop on Network-Related Data Management (NRDM’01). 1--6.Google Scholar
- William J. Reed and Murray Jorgensen. 2004. The double Pareto-lognormal distribution---A new parametric model for size distributions. Comm. Statist. Theory Methods 33, 8,1733--1753.Google Scholar
Cross Ref
- Pramod J. Sadalage and Martin Fowler. 2012. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Ed. Addison-Wesley Professional. Google Scholar
Digital Library
- Daniel R. Sandler and Dan S. Wallach. 2009. Birds of a FETHR: Open, decentralized micropublishing. In Proceedings of the 8th International Conference on Peer-to-Peer Systems (IPTPS’09). 1--6. Google Scholar
Digital Library
- Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, ChristosFaloutsos, and Jure Leskovec. 2008. Mobile call graphs: Beyond power-law and lognormal distributions. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 596--604. Google Scholar
Digital Library
- Se.-W. Son, Claire Christensen, Golnoosh Bizhani, David V. Foster, Peter Grassberger, and Maya Paczuski. 2012. Sampling properties of directed networks. APS Phys. Rev. E86, 4.Google Scholar
- Jonghyuk Song, Sangho Lee, and Jong Kim. 2011. Spam filtering in Twitter using sender-receiver relationship. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 301--317. Google Scholar
Digital Library
- Pierre St. Juste, David Wolinsky, P. Oscar Boykin, and Renato J. Figueiredo. 2011. Litter: A lightweight peer-to-peer microblogging service. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT’11). 900--903.Google Scholar
- William E. Stein and Ronald Dattero. 1984. A new discrete Weibull distribution. IEEE Trans. Reliab. R33, 2, 196--197.Google Scholar
Cross Ref
- Michael P. H. Stumpf, Carsten Wiuf, and Robert M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Nat. Acad. Sci. United States Amer. 102, 12, 4221--4224.Google Scholar
Cross Ref
- Bongwon Suh, Lichan Hong, Petr Pirolli, and Ed H. Chi. 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SOCIALCOM’10). 177--184. Google Scholar
Digital Library
- Ole Tange. 2011. GNU parallel---The command-line power tool. Login: The USENIX Mag. 36, 1, 42--47. http://www.gnu.org/s/parallel.Google Scholar
- Abraham Ronel Martínez Teutle. 2010. Twitter: Network properties analysis. In Proceedings of the International Conference on Electronics, Communications, and Computer (CONIELECOMP’10). 180--186.Google Scholar
- Kurt Thomas, Chris Grier, and Vern Paxson. 2012. Adapting social spam infrastructure for political censorship. In Proceedings of the 5th Workshop on Large-Scale Exploits and Emergent Threats (LEET’12). Google Scholar
Digital Library
- Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC’11). 243--256. Google Scholar
Digital Library
- Jeffrey Travers and Stanley Milgram. 1969. An experimental study of the small world problem. Sociometry 32, 4, 425--443.Google Scholar
Cross Ref
- Alex Hai Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). 1--10.Google Scholar
- Audrey Watters. 2011. How recent changes to Twitter’s terms of service might hurt academic research. http://webcitation.org/6MgAFaaMi. http://readwrite.com/2011/03/03/how_recent_changes_to_twitters_terms_of_service_mi.Google Scholar
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684, 440--442.Google Scholar
- Dennis M. Wilkinson. 2008. Strong regularities in online peer production. In Proceedings of the 9th Conference on Electronic Commerce (EC’08). 302--309. Google Scholar
Digital Library
- Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Who says what to whom on Twitter. In Proceedings of the International Conference on World Wide Web (WWW’11). 705--714. Google Scholar
Digital Library
- Tianyin Xu, Yang Chen, Jin Zhao, and Xiaoming Fu. 2010. Cuckoo: Towards decentralized, socio-aware online microblogging services and data measurements. In Proceedings of the 2nd ACM International Workshop on Hot Topics in Planet-Scale Measurement (HotPlanet’10). Google Scholar
Digital Library
- Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammer’s social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 71--80. Google Scholar
Digital Library
- Chao Yang, Robert Chandler Harkreader, and Guofei Gu. 2011. Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 318--337. Google Scholar
Digital Library
- Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 177--186. Google Scholar
Digital Library
- Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008a. SybilLimit: A near-optimal social network defense against Sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’08). 3--17. Google Scholar
Digital Library
- Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2008b. SybilGuard: Defending against Sybil attacks via social networks. IEEE Trans. Netw. 16, 3, 576--589. Google Scholar
Digital Library
Index Terms
Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph
Recommendations
Finding interesting posts in Twitter based on retweet graph analysis
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalMillions of posts are being generated in real-time by users in social networking services, such as Twitter. However, a considerable number of those posts are mundane posts that are of interest to the authors and possibly their friends only. This paper ...
Retweet Behavior Prediction in Twitter
ISCID '14: Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 02Retweet, as a main way to spread information in twitter, has been researched in a number of works. Recently research focuses on analyzing the factors of retweet behavior. However, the prediction on retweet behavior is a new challenge which is not well ...
Analyzing User Retweet Behavior on Twitter
ASONAM '12: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)This paper provides a deep analysis of user retweet behavior on Twitter. While previous works about analyzing retweet have mainly focused on predicting the retweetability of each tweet, they lacked interpretations at an individual level. In this paper, ...






Comments