skip to main content
research-article

Dynamic, Incremental, and Continuous Detection of Cyberbullying in Online Social Media

Published:13 May 2021Publication History
Skip Abstract Section

Abstract

The potentially detrimental effects of cyberbullying have led to the development of numerous automated, data-driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a repetitive process, i.e., a sequence of aggressive messages sent from a bully to a victim over a period of time with the intent to harm the victim. Existing work has focused on harassment (i.e., using profanity to classify toxic comments independently) as an indicator of cyberbullying, disregarding the repetitive nature of this harassing process. However, raising a cyberbullying alert immediately after an aggressive comment is detected can lead to a high number of false positives. At the same time, two key practical challenges remain unaddressed: (i) detection timeliness, which is necessary to support victims as early as possible, and (ii) scalability to the staggering rates at which content is generated in online social networks. In this work, we introduce CONcISE, a novel approach for timely and accurate Cyberbullying detectiON in online social media SEssions. CONcISE is a two-stage online approach designed to reduce the time to raise a cyberbullying alert by sequentially examining comments as they become available over time, and minimizing the number of feature evaluations necessary for a decision to be made for each comment. Extensive experiments on a real-world Instagram dataset with \(\) users and \(\) comments demonstrate the effectiveness, scalability, and timeliness of our approach and its benefits over existing methods. Additional experiments using a Twitter dataset offer evidence in support of the potential generalizability of CONcISE to other social media platforms.

References

  1. Mohammed Ali Al-garadi, Kasturi Dewi Varathan, and Sri Devi Ravana. 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Comput. Hum. Behav. 63 (2016), 433–443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Wafa Alorainy, Pete Burnap, Han Liu, and Matthew L. Williams. 2019. The enemy among us: Detecting cyber hate speech with threats-based othering language embeddings. ACM Trans. Web 13, 3 (2019), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 45–54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Vimala Balakrishnan, Shahzaib Khan, and Hamid R. Arabnia. 2020. Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Comput. Secur. 90 (2020), 101710.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. P. Bertsekas. 2005. Dynamic Programming and Optimal Control. Vol. 1. Athena Scientific.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. J. Amer. Soc. Info. Sci. 45, 1 (1994), 12–19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jiuwen Cao, Tao Chen, and Jiayuan Fan. 2014. Fast online learning algorithm for landmark recognition based on BoW framework. In Proceedings of the IEEE 9th Conference on Industrial Electronics and Applications (ICIEA’14). IEEE, 1163–1168.Google ScholarGoogle ScholarCross RefCross Ref
  8. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on Twitter. In Proceedings of the ACM Conference on Web Science. ACM, 13–22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali, and Nicolas Kourtellis. 2019. Detecting cyberbullying and cyberaggression in social media. ACM Trans. Web 13, 3, Article 17 (Oct. 2019), 51 pages. DOI:https://doi.org/10.1145/3343484 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Charalampos Chelmis and Mengfan Yao. 2019. Minority report: Cyberbullying prediction on Instagram. In Proceedings of the 10th ACM Conference on Web Science (WebSci’19). Association for Computing Machinery, New York, NY, 37–45. DOI:https://doi.org/10.1145/3292522.3326024 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hao Chen, Susan McKeever, and Sarah Jane Delany. 2019. The use of deep learning distributed representations in the identification of abusive text. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 125–133.Google ScholarGoogle Scholar
  12. Lu Cheng, Jundong Li, Yasin N Silva, Deborah L. Hall, and Huan Liu. 2019. Xbully: Cyberbullying detection within a multi-modal context. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM’19). Association for Computing Machinery, New York, NY, 339–347. DOI:https://doi.org/10.1145/3289600.3291037 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Harsh Dani, Jundong Li, and Huan Liu. 2017. Sentiment informed cyberbullying detection in social media. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 52–67.Google ScholarGoogle ScholarCross RefCross Ref
  14. Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.Google ScholarGoogle Scholar
  15. Jesse Davis and Mark Goadrich. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 233–240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vivek Singh Devin Soni. [n.d.]. Time reveals AllWounds: Modeling temporal dynamics of cyberbullying sessions. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM’18).Google ScholarGoogle Scholar
  17. Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web. ACM, 29–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chris Emmery, Ben Verhoeven, Guy De Pauw, Gilles Jacobs, Cynthia Van Hee, Els Lefever, Bart Desmet, Véronique Hoste, and Walter Daelemans. 2019. Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity. Retrieved fromDOI:https://arXiv:1910.11922.Google ScholarGoogle Scholar
  19. AllSlang Family. [n.d.]. Internet Slang Swear Word List & Curse Filter. Retrieved from https://www.noswearing.com/dictionary.Google ScholarGoogle Scholar
  20. Jennifer Golbeck, Zahra Ashktorab, Rashad O. Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A. Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, Raja Rajan Gunasekaran, Kelly M. Hoffman, Jenny Hottle, Vichita Jienjitlert, Shivika Khare, Ryan Lau, Marianna J. Martindale, Shalmali Naik, Heather L. Nixon, Piyush Ramachandran, Kristine M. Rogers, Lisa Rogers, Meghna Sardana Sarin, Gaurav Shahane, Jayanee Thanki, Priyanka Vengataraman, Zijian Wan, and Derek Michael Wu. 2017. A large labeled corpus for online harassment research. In Proceedings of the ACM on Web Science Conference (WebSci’17). Association for Computing Machinery, New York, NY, 229–233. DOI:https://doi.org/10.1145/3091478.3091509 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 471–482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Leam Hackett. 2017. The Annual Bullying Survey 2017. Retrieved from https://www.ditchthelabel.org/wp-content/uploads/2017/07/The-Annual-Bullying-Survey-2017-1.pdf. Google ScholarGoogle Scholar
  23. M. A. Hall. 1999. Correlation-based Feature Selection for Machine Learning. Ph.D. Dissertation. The University of Waikato.Google ScholarGoogle Scholar
  24. Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263–1284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sameer Hinduja and Justin W. Patchin. 2007. Offline consequences of online victimization: School violence and delinquency. J. School Violence 6, 3 (2007), 89–112.Google ScholarGoogle ScholarCross RefCross Ref
  26. Steven C. H. Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2018. Online learning: A comprehensive survey. Retrieved fromDOI:https://arXiv:1802.02871.Google ScholarGoogle Scholar
  27. Homa Hosseinmardi, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2016. Prediction of cyberbullying incidents in a media-based social network. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). 186–192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Guichun Hua, Min Zhang, Yiqun Liu, Shaoping Ma, and Liyun Ru. 2010. Hierarchical feature selection for ranking. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1113–1114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Hao Huang, Shinjae Yoo, and Shiva Prasad Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1031–1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Trans. Signal Process. 52, 8 (2004), 2165–2176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Robin M. Kowalski and Susan P. Limber. 2013. Psychological, physical, and academic correlates of cyberbullying and traditional bullying. J. Adolescent Health 53, 1 (2013), S13–S20.Google ScholarGoogle ScholarCross RefCross Ref
  32. Srijan Kumar, Justin Cheng, and Jure Leskovec. 2017. Antisocial behavior on the web: Characterization and detection. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 947–950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Haiguang Li, Xindong Wu, Zhao Li, and Wei Ding. 2013. Group feature selection with streaming features. In Proceedings of the IEEE 13th International Conference on Data Mining. IEEE, 1109–1114.Google ScholarGoogle ScholarCross RefCross Ref
  34. Jiguang Liang, Xiaofei Zhou, Li Guo, and Shuo Bai. 2015. Feature selection for sentiment classification using matrix factorization. In Proceedings of the 24th International Conference on World Wide Web. ACM, 63–64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thomas Lumley. 2000. Kendall’s advanced theory of statistics. Volume 2A: Classical inference and the linear model. Stat. Med. 19, 22 (2000), 3139–3140.Google ScholarGoogle Scholar
  36. T. Marill and D. Green. 1963. On the effectiveness of receptors in recognition systems. IEEE Trans. Info. Theory 9, 1 (1963), 11–17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vinita Nahar, Xue Li, and Chaoyi Pang. 2013. An effective approach for cyberbullying detection. Commun. Info. Sci. Manage. Eng. 3, 5 (2013), 238.Google ScholarGoogle Scholar
  38. Imara Nazar, Daphney-Stavroula Zois, and Mengfan Yao. 2019. A hierarchical approach for timely cyberbullying detection. In IEEE Data Science Workshop (DSW’19). IEEE, 190–195.Google ScholarGoogle ScholarCross RefCross Ref
  39. Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 145–153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. NoSlang.com. [n.d.]. Internet and Text Slang Dictionary. Retrieved from https://www.noslang.com/dictionary/.Google ScholarGoogle Scholar
  41. Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2016. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 410–419.Google ScholarGoogle ScholarCross RefCross Ref
  42. Simon Perkins and James Theiler. 2003. Online feature selection using grafting. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 592–599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David Martin Powers. 2020. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation.arXiv preprint arXiv:2010.16061 (2020).Google ScholarGoogle Scholar
  44. Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, and Shivakant Mishra. 2018. Scalable and timely detection of cyberbullying in online social networks. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 1738–1747. DOI:https://doi.org/10.1145/3167132.3167317 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Elaheh Raisi and Bert Huang. 2017. Cyberbullying detection with weakly supervised machine learning. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, 409–416. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Elaheh Raisi and Bert Huang. 2018. Weakly supervised cyberbullying detection using co-trained ensembles of embedding models. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 479–486. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Hugo Rosa, N. Pereira, Ricardo Ribeiro, Paula Costa Ferreira, João Paulo Carvalho, Sofia Oliveira, Luísa Coheur, Paula Paulino, A. M. Veiga Simão, and Isabel Trancoso. 2019. Automatic cyberbullying detection: A systematic review. Comput. Hum. Behav. 93 (2019), 333–345.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Semiu Salawu, Yulan He, and Joanna Lumsden. 2017. Approaches to automated detection of cyberbullying: A survey. IEEE Trans. Affect. Comput. 11, 1 (2017), 3–24.Google ScholarGoogle ScholarCross RefCross Ref
  49. Weixiang Shao, Lifang He, Chun-Ta Lu, Xiaokai Wei, and S. Yu Philip. 2016. Online unsupervised multi-view feature selection. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 1203–1208.Google ScholarGoogle Scholar
  50. Albert N. Shiryaev. 2007. Optimal Stopping Rules. Vol. 8. Springer Science & Business Media.Google ScholarGoogle Scholar
  51. Vivek K. Singh, Qianjia Huang, and Pradeep K. Atrey. 2016. Cyberbullying detection using probabilistic socio-textual information fusion. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). IEEE, 884–887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Mifta Sintaha, Shahed Bin Satter, Niamat Zawad, Chaity Swarnaker, and Ahanaf Hassan. 2016. Cyberbullying Detection Using Sentiment Analysis in Social Media. Ph.D. Dissertation. BRAC University.Google ScholarGoogle Scholar
  53. Peter K. Smith, Jess Mahdavi, Manuel Carvalho, and Neil Tippett. 2006. An investigation into cyberbullying, its forms, awareness and impact, and the relationship between age and gender in cyberbullying. Research Brief No. RBX03-06. DfES, London.Google ScholarGoogle Scholar
  54. Devin Soni and Vivek K. Singh. 2018. See no evil, hear no evil: Audio-visual-textual cyberbullying detection. In Proceedings of the ACM Conference on Human-Computer Interaction. 1–26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Robert S. Tokunaga. 2010. Following you home from school: A critical review and synthesis of research on cyberbullying victimization. Comput. Hum. Behav. 26, 3 (2010), 277–287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Harry L. Van Trees. 2004. Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and Linear Modulation Theory. John Wiley & Sons.Google ScholarGoogle Scholar
  57. Luis von Ahn. [n.d.]. Offensive/Profane Word List. Retrieved from https://www.cs.cmu.edu/ biglou/resources/bad-words.txt.Google ScholarGoogle Scholar
  58. Jialei Wang, Peilin Zhao, and Steven C. H. Hoi. 2016. Soft confidence-weighted learning. ACM Trans. Intell. Syst. Technol. 8, 1 (2016), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Jialei Wang, Peilin Zhao, Steven C. H. Hoi, and Rong Jin. 2014. Online feature selection and its applications. IEEE Trans. Knowl. Data Eng. 26, 3 (2014), 698–710. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45, 4 (2013), 1191–1207.Google ScholarGoogle ScholarCross RefCross Ref
  61. Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). Citeseer, 1159–1166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Mengfan Yao, Charalampos Chelmis, and Daphney-Stavroula Zois. 2018. Cyberbullying detection on instagram with optimal online feature selection. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 401–408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Mengfan Yao, Charalampos Chelmis, and Daphney-Stavroula Zois. 2019. Cyberbullying ends here: Towards robust detection of cyberbullying in social media. In Proceedings of the World Wide Web Conference. ACM, 3427–3433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Trans. Knowl. Discov. Data 11, 2 (2016), 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Aonan Zhang, Jun Zhu, and Bo Zhang. 2013. Sparse online topic models. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1489–1500. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Liang Zhang, Jie Yang, and Belle Tseng. 2012. Online modeling of proactive moderation system for auction fraud detection. In Proceedings of the 21st International Conference on World Wide Web. ACM, 669–678. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Xiang Zhang, Jonathan Tong, Nishant Vishwamitra, Elizabeth Whittaker, Joseph P Mazer, Robin Kowalski, Hongxin Hu, Feng Luo, Jamie Macbeth, and Edward Dillon. 2016. Cyberbullying detection with a pronunciation based convolutional neural network. In Proceedings of the 15th IEEE International Conference onMachine Learning and Applications (ICMLA’16). 740–745.Google ScholarGoogle ScholarCross RefCross Ref
  68. Yue Zhang and Arti Ramesh. 2019. Learning interpretable relational structures of hinge-loss Markov random fields. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 6050–6056. Google ScholarGoogle ScholarCross RefCross Ref
  69. Rui Zhao and Kezhi Mao. 2017. Cyberbullying detection based on semantic-enhanced marginalized denoising auto-encoder. IEEE Trans. Affect. Comput. 8, 3 (2017), 328–339.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the instagram social network. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 3952–3958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Peng Zhou, Xuegang Hu, Peipei Li, and Xindong Wu. 2019. OFS-density: A novel online streaming feature selection method. Pattern Recogn. 86 (2019), 48–61.Google ScholarGoogle ScholarCross RefCross Ref
  72. Caleb Ziems, Ymir Vigfusson, and Fred Morstatter. 2020. Aggressive, repetitive, intentional, visible, and imbalanced: Refining representations for cyberbullying classification. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 808–819.Google ScholarGoogle Scholar
  73. Daphney-Stavroula Zois, Angeliki Kapodistria, Mengfan Yao, and Charalampos Chelmis. 2018. Optimal online cyberbullying detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 2017–2021.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Dynamic, Incremental, and Continuous Detection of Cyberbullying in Online Social Media

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Article Metrics

          • Downloads (Last 12 months)64
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!