skip to main content
research-article
Public Access

Clickstream User Behavior Models

Published:25 July 2017Publication History
Skip Abstract Section

Abstract

The next generation of Internet services is driven by users and user-generated content. The complex nature of user behavior makes it highly challenging to manage and secure online services. On one hand, service providers cannot effectively prevent attackers from creating large numbers of fake identities to disseminate unwanted content (e.g., spam). On the other hand, abusive behavior from real users also poses significant threats (e.g., cyberbullying).

In this article, we propose clickstream models to characterize user behavior in large online services. By analyzing clickstream traces (i.e., sequences of click events from users), we seek to achieve two goals: (1) detection: to capture distinct user groups for the detection of malicious accounts, and (2) understanding: to extract semantic information from user groups to understand the captured behavior. To achieve these goals, we build two related systems. The first one is a semisupervised system to detect malicious user accounts (Sybils). The core idea is to build a clickstream similarity graph where each node is a user and an edge captures the similarity of two users’ clickstreams. Based on this graph, we propose a coloring scheme to identify groups of malicious accounts without relying on a large labeled dataset. We validate the system using ground-truth clickstream traces of 16,000 real and Sybil users from Renren, a large Chinese social network. The second system is an unsupervised system that aims to capture and understand the fine-grained user behavior. Instead of binary classification (malicious or benign), this model identifies the natural groups of user behavior and automatically extracts features to interpret their semantic meanings. Applying this system to Renren and another online social network, Whisper (100K users), we help service providers identify unexpected user behaviors and even predict users’ future actions. Both systems received positive feedback from our industrial collaborators including Renren, LinkedIn, and Whisper after testing on their internal clickstream data.

References

  1. Arindam Banerjee and Joydeep Ghosh. 2000. Concept-based clustering of clickstream data. In Proceedings of the International Conference on Information Technology (ICIT'00).Google ScholarGoogle Scholar
  2. Arindam Banerjee and Joydeep Ghosh. 2001. Clickstream clustering using weighted longest common subsequences. In Proceedings of the Web Mining Workshop at the 1st SIAM Conference on Data Mining (CDM'01).Google ScholarGoogle Scholar
  3. Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. 2010. Detecting spammers on twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS'10).Google ScholarGoogle Scholar
  4. Fabricio Benevenuto, Tiago Rodrigues, Meeyoung Cha, and Virgilio Almeida. 2009. Characterizing user behavior in online social networks. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alex Beutel. 2016. User behavior modeling with largescale graph analysis. Ph.D. Thesis at Carnegie Mellon University.Google ScholarGoogle Scholar
  6. Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. CopyCatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the International Conference on World Wide Web (WWW'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Erik W. Black, Kelsey Mezzina, and Lindsay A. Thompson. 2016. Anonymous social media understanding the content and context of Yik Yak. Comput. Human Behav. 57 (2016), 17--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment (JSTAT) 10 (2008), P10008. Google ScholarGoogle ScholarCross RefCross Ref
  9. Armand Brahaj. 2009. English Stop Words. Retrieved from http://xpo6.com/list-of-english-stop-words/.Google ScholarGoogle Scholar
  10. Qiang Cao, Michael Sirivianos, Xiaowei Yang, and Tiago Pregueiro. 2012. Aiding the detection of fake accounts in large scale social online services. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI'12).Google ScholarGoogle Scholar
  11. Qiang Cao, Xiaowei Yang, Jieqi Yu, and Christopher Palow. 2014. Uncovering large groups of active malicious accounts in online social networks. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Rory Cellan. 2012. Facebook has more than 83 million illegitimate accounts. BBC News. (August 2012).Google ScholarGoogle Scholar
  13. Denzil Correa, Leandro Arajo Silva, Mainack Mondal, Fabrcio Benevenuto, and Krishna P. Gummadi. 2015. The many shades of anonymity: Characterizing anonymous social media content. In Proceedings of the AAAI International Conference on Web and Social Media (ICWSM'15).Google ScholarGoogle Scholar
  14. George Danezis and Prateek Mittal. 2009. SybilInfer: Detecting sybil nodes using social networks. In Proceedings of the Network and Distributed System Security Symposium (NDSS'09).Google ScholarGoogle Scholar
  15. John R. Douceur. 2002. The Sybil attack. In Proceedings of Revised Papers from the First International Workshop on Peer-to-Peer Systems (IPTPS'02). Google ScholarGoogle ScholarCross RefCross Ref
  16. Martin Ester, Hans Peter Kriegel, Jrg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'96).Google ScholarGoogle Scholar
  17. Facebook. 2013. Verify Facebook Account. Retrieved from https://www.facebook.com/help/398085743567023/.Google ScholarGoogle Scholar
  18. Hongyu Gao, Jun Hu, Christo Wilson, Zhichun Li, Yan Chen, and Ben Y. Zhao. 2010. Detecting and characterizing social spam campaigns. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'10).Google ScholarGoogle Scholar
  19. R. Stuart Geiger and Aaron Halfaker. 2013. Using edit sessions to measure participation in wikipedia. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010. @spam: The underground on 140 characters or less. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Şule Gündüz and M. Tamer Özsu. 2003. A web page prediction model based on click-stream tree representation of user behavior. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'03). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. John A. Hartigan and Manchek A. Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Appl. Stat. 28 (1979), 100--108. Google ScholarGoogle ScholarCross RefCross Ref
  23. Jeffrey Heer and Ed H. Chi. 2002a. Mining the structure of user activity using cluster stability. In Proceedings of the Workshop on Web Analytics at SIAM Conference on Data Mining (CDM'02).Google ScholarGoogle Scholar
  24. Jeffrey Heer and Ed H. Chi. 2002b. Separating the swarm: Categorization methods for user sessions on the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'02). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Peter I. Hofgesang and Wojtek Kowalczyk. 2005. Analysing clickstream data: From anomaly detection to visitor profiling. In Proceedings of the ECML/PKDD Discovery Challenge (DC'05).Google ScholarGoogle Scholar
  26. Homa Hosseinmardi, Richard Han, Qin Lv, Shivakant Mishra, and Amir Ghasemianlangroodi. 2014. Analyzing negative user behavior in a semi-anonymous social network. CoRR abs/1404.3839 (2014).Google ScholarGoogle Scholar
  27. Danesh Irani, Marco Balduzzi, Davide Balzarotti, Engin Kirda, and Calton Pu. 2011. Reverse social engineering attacks in online social networks. In Proceedings of the Conference on Detection of Intrusions and Malware 8 Vulnerability Assessment (DIMVA'11). Google ScholarGoogle ScholarCross RefCross Ref
  28. Jing Jiang, Christo Wilson, Xiao Wang, Peng Huang, Wenpeng Sha, Yafei Dai, and Ben Y. Zhao. 2010. Understanding latent interactions in online social networks. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014. CatchSync: Catching synchronized behavior in large directed graphs. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (KDD'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2016. Catching synchronized behaviors in large networks: A graph mining approach. ACM Transactions on Knowledge Discovery from Data (TKDD) 10, 4 (2016), 35:1--35:27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. George Karypis and Vipin Kumar. 1998. Multilevel k-way partitioning scheme for irregular graphs. J. Parallel Distrib. Comput. 48 (1998), 96--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Leonard Kaufman and Peter J Rousseeuw. 2009. Finding Groups in Data: An Introduction to Cluster Analysis. Vol. 344. John Wiley 8 Sons.Google ScholarGoogle Scholar
  33. Haewoon Kwak, Jeremy Blackburn, and Seungyeop Han. 2015. Exploring cyberbullying and other toxic behavior in team competition online games. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Michael Levandowsky and David Winter. 1971. Distance between sets. Nature 234 (1971), 34--35. Google ScholarGoogle ScholarCross RefCross Ref
  35. Yixuan Li, Oscar Martinez, Xing Chen, Yi Li, and John E. Hopcroft. 2016. In a world that counts: Clustering and detecting fake social engagement at scale. In Proceedings of the International Conference on World Wide Web (WWW'16).Google ScholarGoogle Scholar
  36. Lin Lu, Margaret Dunham, and Yu Meng. 2005. Mining significant usage patterns from clickstream data. In Proceedings of the Web Mining and Web Usage Analysis Workshop (WebKDD'05).Google ScholarGoogle Scholar
  37. Justin Matejka, Tovi Grossman, and George Fitzmaurice. 2013. Patina: Dynamic heatmaps for visualizing application usage. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Abedelaziz Mohaisen, Aaram Yun, and Yongdae Kim. 2010. Measuring the mixing time of social graphs. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Marti Motoyama, Kirill Levchenko, Chris Kanich, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. 2010. Re: CAPTCHAs—Understanding CAPTCHA-solving from an economic context. In Proceedings of the USENIX Security Symposium (Usenix Security'10).Google ScholarGoogle Scholar
  40. Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011b. An analysis of underground forums. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011a. Dirty jobs: The role of freelance labor in web service abuse. In Proceedings of the USENIX Security Symposium (Usenix Security'11).Google ScholarGoogle Scholar
  42. Arjun Mukherjee, Bing Liu, and Natalie Glance. 2012. Spotting fake reviewer groups in consumer reviews. In Proceedings of the International Conference on World Wide Web (WWW'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jaimie Y. Park, Neil O’Hare, Rossano Schifanella, Alejandro Jaimes, and Chin-Wan Chung. 2015. A large-scale study of user image search behavior on the web. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. John C. Platt. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In Advances in Kernel Methods. MIT Press, 185--208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Narayanan Sadagopan and Jie Li. 2008. Characterizing typical and atypical user sessions in clickstreams. In Proceedings of the International Conference on World Wide Web (WWW'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Stan Salvador and Philip Chan. 2004. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In Proceedings of the Annual IEEE International Conference on Tools with Artificial Intelligence (ICTAI'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fabian Schneider, Anja Feldmann, Balachander Krishnamurthy, and Walter Willinger. 2009. Understanding online social network usage from a network perspective. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang N. Tan. 2000. Web usage mining: Discovery and applications of usage patterns from Web data. SIGKDD Explor. Newsl. 1, 2 (2000), 12--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2010. Detecting spammers on social networks. In Proceedings of the Annual Computer Security Applications Conference (ACSAC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Qiang Su and Lu Chen. 2015. A method for discovering clusters of e-commerce interest patterns using click-stream data. Electronic Commerce Research and Applications (ECRA) 14, 1 (2015), 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. John R. Suler and Wende L. Phillips. 1998. The bad boys of cyberspace: Deviant behavior in a multimedia chat community. Cyberpsy. Behav. Soc. Network. 1, 3 (1998), 275--294. Google ScholarGoogle ScholarCross RefCross Ref
  52. Kurt Thomas, Chris Grier, and Vern Paxson. 2012. Adapting social spam infrastructure for political censorship. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET'12).Google ScholarGoogle Scholar
  53. Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of twitter spam. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. I-Hsien Ting, Chris Kimble, and Daniel Kudenko. 2005. UBB mining: Finding unexpected browsing behaviour in clickstream data to improve a web site’s design. In Proceedings of the International Conference on Web Intelligence (ICWI'05).Google ScholarGoogle Scholar
  55. Nguyen Tran, Bonan Min, Jinyang Li, and Lakshminarayanan Subramanian. 2009. Sybil-resilient online content voting. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI'09).Google ScholarGoogle Scholar
  56. Cecilia Vega. 2012. Yelp Outs Companies That Pay for Positive Reviews. ABC News. Retrieved from http://abcnews.go.com/blogs/business/2012/11/yelp-outs-companies-that-pay-for-positive-reviews.Google ScholarGoogle Scholar
  57. Bimal Viswanath, Ansley Post, Krishna P. Gummadi, and Alan Mislove. 2010. An analysis of social network-based sybil defenses. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Alex Hai Wang. 2010. Don’t follow me: Spam detection on twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT'10).Google ScholarGoogle Scholar
  59. Gang Wang, Manish Mohanlal, Christo Wilson, Xiao Wang, Miriam Metzger, Haitao Zheng, and Ben Y. Zhao. 2013. Social turing tests: Crowdsourcing sybil detection. In Proceedings of the Network and Distributed System Security Symposium (NDSS'13).Google ScholarGoogle Scholar
  60. Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, and Ben Y. Zhao. 2014. Whispers in the dark: Analysis of an anonymous social network. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Gang Wang, Christo Wilson, Xiaohan Zhao, Yibo Zhu, Manish Mohanlal, Haitao Zheng, and Ben Y. Zhao. 2012. Serf and turf: Crowdturfing for fun and profit. In Proceedings of the International Conference on World Wide Web (WWW'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Weixin Wang, Hui Wang, Guozhong Dai, and Hongan Wang. 2006. Visualization of large hierarchical data by circle packing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Jishang Wei, Zeqian Shen, Neel Sundaresan, and Kwan-Liu Ma. 2012. Visual cluster exploration of web clickstream data. In Proceedings of the IEEE Conference on Visual Analytics Science and Technology (VAST'12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Y. Zhao. 2009. User interactions in social networks and their implications. In Proceedings of the European Conference on Computer Systems (EuroSys'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning (ICML'97).Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Zhi Yang, Christo Wilson, Xiao Wang, Tingting Gao, Ben Y. Zhao, and Yafei Dai. 2011. Uncovering social network sybils in the wild. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement (IMC'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Sarita Yardi, Daniel Romero, Grant Schoenebeck, and Danah Boyd. 2010. Detecting spam in a Twitter network. First Monday 15, 1 (2010).Google ScholarGoogle Scholar
  68. Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008. SybilLimit: A near-optimal social network defense against Sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy (IEEE S&P'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against Sybil attacks via social networks. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jian Zhao, Zhicheng Liu, Mira Dontcheva, Aaron Hertzmann, and Alan Wilson. 2015. MatrixWave: Visual comparison of event sequence data. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI'15). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Clickstream User Behavior Models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!