Abstract
The potentially detrimental effects of cyberbullying have led to the development of numerous automated, data-driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well-defined, is a repetitive process, i.e., a sequence of aggressive messages sent from a bully to a victim over a period of time with the intent to harm the victim. Existing work has focused on harassment (i.e., using profanity to classify toxic comments independently) as an indicator of cyberbullying, disregarding the repetitive nature of this harassing process. However, raising a cyberbullying alert immediately after an aggressive comment is detected can lead to a high number of false positives. At the same time, two key practical challenges remain unaddressed: (i) detection timeliness, which is necessary to support victims as early as possible, and (ii) scalability to the staggering rates at which content is generated in online social networks. In this work, we introduce CONcISE, a novel approach for timely and accurate Cyberbullying detectiON in online social media SEssions. CONcISE is a two-stage online approach designed to reduce the time to raise a cyberbullying alert by sequentially examining comments as they become available over time, and minimizing the number of feature evaluations necessary for a decision to be made for each comment. Extensive experiments on a real-world Instagram dataset with \(\) users and \(\) comments demonstrate the effectiveness, scalability, and timeliness of our approach and its benefits over existing methods. Additional experiments using a Twitter dataset offer evidence in support of the potential generalizability of CONcISE to other social media platforms.
- Mohammed Ali Al-garadi, Kasturi Dewi Varathan, and Sri Devi Ravana. 2016. Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network. Comput. Hum. Behav. 63 (2016), 433–443. Google Scholar
Digital Library
- Wafa Alorainy, Pete Burnap, Han Liu, and Matthew L. Williams. 2019. The enemy among us: Detecting cyber hate speech with threats-based othering language embeddings. ACM Trans. Web 13, 3 (2019), 14. Google Scholar
Digital Library
- Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 45–54. Google Scholar
Digital Library
- Vimala Balakrishnan, Shahzaib Khan, and Hamid R. Arabnia. 2020. Improving cyberbullying detection using Twitter users’ psychological features and machine learning. Comput. Secur. 90 (2020), 101710.Google Scholar
Digital Library
- D. P. Bertsekas. 2005. Dynamic Programming and Optimal Control. Vol. 1. Athena Scientific.Google Scholar
Digital Library
- Michael Buckland and Fredric Gey. 1994. The relationship between recall and precision. J. Amer. Soc. Info. Sci. 45, 1 (1994), 12–19. Google Scholar
Digital Library
- Jiuwen Cao, Tao Chen, and Jiayuan Fan. 2014. Fast online learning algorithm for landmark recognition based on BoW framework. In Proceedings of the IEEE 9th Conference on Industrial Electronics and Applications (ICIEA’14). IEEE, 1163–1168.Google Scholar
Cross Ref
- Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on Twitter. In Proceedings of the ACM Conference on Web Science. ACM, 13–22. Google Scholar
Digital Library
- Despoina Chatzakou, Ilias Leontiadis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, Athena Vakali, and Nicolas Kourtellis. 2019. Detecting cyberbullying and cyberaggression in social media. ACM Trans. Web 13, 3, Article 17 (Oct. 2019), 51 pages. DOI:https://doi.org/10.1145/3343484 Google Scholar
Digital Library
- Charalampos Chelmis and Mengfan Yao. 2019. Minority report: Cyberbullying prediction on Instagram. In Proceedings of the 10th ACM Conference on Web Science (WebSci’19). Association for Computing Machinery, New York, NY, 37–45. DOI:https://doi.org/10.1145/3292522.3326024 Google Scholar
Digital Library
- Hao Chen, Susan McKeever, and Sarah Jane Delany. 2019. The use of deep learning distributed representations in the identification of abusive text. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 13. 125–133.Google Scholar
- Lu Cheng, Jundong Li, Yasin N Silva, Deborah L. Hall, and Huan Liu. 2019. Xbully: Cyberbullying detection within a multi-modal context. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining (WSDM’19). Association for Computing Machinery, New York, NY, 339–347. DOI:https://doi.org/10.1145/3289600.3291037 Google Scholar
Digital Library
- Harsh Dani, Jundong Li, and Huan Liu. 2017. Sentiment informed cyberbullying detection in social media. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 52–67.Google Scholar
Cross Ref
- Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.Google Scholar
- Jesse Davis and Mark Goadrich. 2006. The relationship between precision-recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning. ACM, 233–240. Google Scholar
Digital Library
- Vivek Singh Devin Soni. [n.d.]. Time reveals AllWounds: Modeling temporal dynamics of cyberbullying sessions. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM’18).Google Scholar
- Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web. ACM, 29–30. Google Scholar
Digital Library
- Chris Emmery, Ben Verhoeven, Guy De Pauw, Gilles Jacobs, Cynthia Van Hee, Els Lefever, Bart Desmet, Véronique Hoste, and Walter Daelemans. 2019. Current limitations in cyberbullying detection: On evaluation criteria, reproducibility, and data scarcity. Retrieved fromDOI:https://arXiv:1910.11922.Google Scholar
- AllSlang Family. [n.d.]. Internet Slang Swear Word List & Curse Filter. Retrieved from https://www.noswearing.com/dictionary.Google Scholar
- Jennifer Golbeck, Zahra Ashktorab, Rashad O. Banjo, Alexandra Berlinger, Siddharth Bhagwan, Cody Buntain, Paul Cheakalos, Alicia A. Geller, Quint Gergory, Rajesh Kumar Gnanasekaran, Raja Rajan Gunasekaran, Kelly M. Hoffman, Jenny Hottle, Vichita Jienjitlert, Shivika Khare, Ryan Lau, Marianna J. Martindale, Shalmali Naik, Heather L. Nixon, Piyush Ramachandran, Kristine M. Rogers, Lisa Rogers, Meghna Sardana Sarin, Gaurav Shahane, Jayanee Thanki, Priyanka Vengataraman, Zijian Wan, and Derek Michael Wu. 2017. A large labeled corpus for online harassment research. In Proceedings of the ACM on Web Science Conference (WebSci’17). Association for Computing Machinery, New York, NY, 229–233. DOI:https://doi.org/10.1145/3091478.3091509 Google Scholar
Digital Library
- Sujatha Das Gollapalli, Cornelia Caragea, Prasenjit Mitra, and C. Lee Giles. 2013. Researcher homepage classification using unlabeled data. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 471–482. Google Scholar
Digital Library
- Leam Hackett. 2017. The Annual Bullying Survey 2017. Retrieved from https://www.ditchthelabel.org/wp-content/uploads/2017/07/The-Annual-Bullying-Survey-2017-1.pdf. Google Scholar
- M. A. Hall. 1999. Correlation-based Feature Selection for Machine Learning. Ph.D. Dissertation. The University of Waikato.Google Scholar
- Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21, 9 (2009), 1263–1284. Google Scholar
Digital Library
- Sameer Hinduja and Justin W. Patchin. 2007. Offline consequences of online victimization: School violence and delinquency. J. School Violence 6, 3 (2007), 89–112.Google Scholar
Cross Ref
- Steven C. H. Hoi, Doyen Sahoo, Jing Lu, and Peilin Zhao. 2018. Online learning: A comprehensive survey. Retrieved fromDOI:https://arXiv:1802.02871.Google Scholar
- Homa Hosseinmardi, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2016. Prediction of cyberbullying incidents in a media-based social network. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). 186–192. Google Scholar
Digital Library
- Guichun Hua, Min Zhang, Yiqun Liu, Shaoping Ma, and Liyun Ru. 2010. Hierarchical feature selection for ranking. In Proceedings of the 19th International Conference on World Wide Web. ACM, 1113–1114. Google Scholar
Digital Library
- Hao Huang, Shinjae Yoo, and Shiva Prasad Kasiviswanathan. 2015. Unsupervised feature selection on data streams. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. ACM, 1031–1040. Google Scholar
Digital Library
- Jyrki Kivinen, Alexander J. Smola, and Robert C. Williamson. 2004. Online learning with kernels. IEEE Trans. Signal Process. 52, 8 (2004), 2165–2176. Google Scholar
Digital Library
- Robin M. Kowalski and Susan P. Limber. 2013. Psychological, physical, and academic correlates of cyberbullying and traditional bullying. J. Adolescent Health 53, 1 (2013), S13–S20.Google Scholar
Cross Ref
- Srijan Kumar, Justin Cheng, and Jure Leskovec. 2017. Antisocial behavior on the web: Characterization and detection. In Proceedings of the 26th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee, 947–950. Google Scholar
Digital Library
- Haiguang Li, Xindong Wu, Zhao Li, and Wei Ding. 2013. Group feature selection with streaming features. In Proceedings of the IEEE 13th International Conference on Data Mining. IEEE, 1109–1114.Google Scholar
Cross Ref
- Jiguang Liang, Xiaofei Zhou, Li Guo, and Shuo Bai. 2015. Feature selection for sentiment classification using matrix factorization. In Proceedings of the 24th International Conference on World Wide Web. ACM, 63–64. Google Scholar
Digital Library
- Thomas Lumley. 2000. Kendall’s advanced theory of statistics. Volume 2A: Classical inference and the linear model. Stat. Med. 19, 22 (2000), 3139–3140.Google Scholar
- T. Marill and D. Green. 1963. On the effectiveness of receptors in recognition systems. IEEE Trans. Info. Theory 9, 1 (1963), 11–17. Google Scholar
Digital Library
- Vinita Nahar, Xue Li, and Chaoyi Pang. 2013. An effective approach for cyberbullying detection. Commun. Info. Sci. Manage. Eng. 3, 5 (2013), 238.Google Scholar
- Imara Nazar, Daphney-Stavroula Zois, and Mengfan Yao. 2019. A hierarchical approach for timely cyberbullying detection. In IEEE Data Science Workshop (DSW’19). IEEE, 190–195.Google Scholar
Cross Ref
- Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 145–153. Google Scholar
Digital Library
- NoSlang.com. [n.d.]. Internet and Text Slang Dictionary. Retrieved from https://www.noslang.com/dictionary/.Google Scholar
- Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2016. Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 410–419.Google Scholar
Cross Ref
- Simon Perkins and James Theiler. 2003. Online feature selection using grafting. In Proceedings of the 20th International Conference on Machine Learning (ICML’03). 592–599. Google Scholar
Digital Library
- David Martin Powers. 2020. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation.arXiv preprint arXiv:2010.16061 (2020).Google Scholar
- Rahat Ibn Rafiq, Homa Hosseinmardi, Richard Han, Qin Lv, and Shivakant Mishra. 2018. Scalable and timely detection of cyberbullying in online social networks. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing. 1738–1747. DOI:https://doi.org/10.1145/3167132.3167317 Google Scholar
Digital Library
- Elaheh Raisi and Bert Huang. 2017. Cyberbullying detection with weakly supervised machine learning. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. ACM, 409–416. Google Scholar
Digital Library
- Elaheh Raisi and Bert Huang. 2018. Weakly supervised cyberbullying detection using co-trained ensembles of embedding models. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 479–486. Google Scholar
Digital Library
- Hugo Rosa, N. Pereira, Ricardo Ribeiro, Paula Costa Ferreira, João Paulo Carvalho, Sofia Oliveira, Luísa Coheur, Paula Paulino, A. M. Veiga Simão, and Isabel Trancoso. 2019. Automatic cyberbullying detection: A systematic review. Comput. Hum. Behav. 93 (2019), 333–345.Google Scholar
Digital Library
- Semiu Salawu, Yulan He, and Joanna Lumsden. 2017. Approaches to automated detection of cyberbullying: A survey. IEEE Trans. Affect. Comput. 11, 1 (2017), 3–24.Google Scholar
Cross Ref
- Weixiang Shao, Lifang He, Chun-Ta Lu, Xiaokai Wei, and S. Yu Philip. 2016. Online unsupervised multi-view feature selection. In Proceedings of the IEEE 16th International Conference on Data Mining (ICDM’16). IEEE, 1203–1208.Google Scholar
- Albert N. Shiryaev. 2007. Optimal Stopping Rules. Vol. 8. Springer Science & Business Media.Google Scholar
- Vivek K. Singh, Qianjia Huang, and Pradeep K. Atrey. 2016. Cyberbullying detection using probabilistic socio-textual information fusion. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’16). IEEE, 884–887. Google Scholar
Digital Library
- Mifta Sintaha, Shahed Bin Satter, Niamat Zawad, Chaity Swarnaker, and Ahanaf Hassan. 2016. Cyberbullying Detection Using Sentiment Analysis in Social Media. Ph.D. Dissertation. BRAC University.Google Scholar
- Peter K. Smith, Jess Mahdavi, Manuel Carvalho, and Neil Tippett. 2006. An investigation into cyberbullying, its forms, awareness and impact, and the relationship between age and gender in cyberbullying. Research Brief No. RBX03-06. DfES, London.Google Scholar
- Devin Soni and Vivek K. Singh. 2018. See no evil, hear no evil: Audio-visual-textual cyberbullying detection. In Proceedings of the ACM Conference on Human-Computer Interaction. 1–26. Google Scholar
Digital Library
- Robert S. Tokunaga. 2010. Following you home from school: A critical review and synthesis of research on cyberbullying victimization. Comput. Hum. Behav. 26, 3 (2010), 277–287. Google Scholar
Digital Library
- Harry L. Van Trees. 2004. Detection, Estimation, and Modulation Theory, Part I: Detection, Estimation, and Linear Modulation Theory. John Wiley & Sons.Google Scholar
- Luis von Ahn. [n.d.]. Offensive/Profane Word List. Retrieved from https://www.cs.cmu.edu/ biglou/resources/bad-words.txt.Google Scholar
- Jialei Wang, Peilin Zhao, and Steven C. H. Hoi. 2016. Soft confidence-weighted learning. ACM Trans. Intell. Syst. Technol. 8, 1 (2016), 15. Google Scholar
Digital Library
- Jialei Wang, Peilin Zhao, Steven C. H. Hoi, and Rong Jin. 2014. Online feature selection and its applications. IEEE Trans. Knowl. Data Eng. 26, 3 (2014), 698–710. Google Scholar
Digital Library
- Amy Beth Warriner, Victor Kuperman, and Marc Brysbaert. 2013. Norms of valence, arousal, and dominance for 13,915 English lemmas. Behav. Res. Methods 45, 4 (2013), 1191–1207.Google Scholar
Cross Ref
- Xindong Wu, Kui Yu, Hao Wang, and Wei Ding. 2010. Online streaming feature selection. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). Citeseer, 1159–1166. Google Scholar
Digital Library
- Mengfan Yao, Charalampos Chelmis, and Daphney-Stavroula Zois. 2018. Cyberbullying detection on instagram with optimal online feature selection. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). IEEE, 401–408. Google Scholar
Digital Library
- Mengfan Yao, Charalampos Chelmis, and Daphney-Stavroula Zois. 2019. Cyberbullying ends here: Towards robust detection of cyberbullying in social media. In Proceedings of the World Wide Web Conference. ACM, 3427–3433. Google Scholar
Digital Library
- Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and accurate online feature selection for big data. ACM Trans. Knowl. Discov. Data 11, 2 (2016), 16. Google Scholar
Digital Library
- Aonan Zhang, Jun Zhu, and Bo Zhang. 2013. Sparse online topic models. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 1489–1500. Google Scholar
Digital Library
- Liang Zhang, Jie Yang, and Belle Tseng. 2012. Online modeling of proactive moderation system for auction fraud detection. In Proceedings of the 21st International Conference on World Wide Web. ACM, 669–678. Google Scholar
Digital Library
- Xiang Zhang, Jonathan Tong, Nishant Vishwamitra, Elizabeth Whittaker, Joseph P Mazer, Robin Kowalski, Hongxin Hu, Feng Luo, Jamie Macbeth, and Edward Dillon. 2016. Cyberbullying detection with a pronunciation based convolutional neural network. In Proceedings of the 15th IEEE International Conference onMachine Learning and Applications (ICMLA’16). 740–745.Google Scholar
Cross Ref
- Yue Zhang and Arti Ramesh. 2019. Learning interpretable relational structures of hinge-loss Markov random fields. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 6050–6056. Google Scholar
Cross Ref
- Rui Zhao and Kezhi Mao. 2017. Cyberbullying detection based on semantic-enhanced marginalized denoising auto-encoder. IEEE Trans. Affect. Comput. 8, 3 (2017), 328–339.Google Scholar
Digital Library
- Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the instagram social network. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 3952–3958. Google Scholar
Digital Library
- Peng Zhou, Xuegang Hu, Peipei Li, and Xindong Wu. 2019. OFS-density: A novel online streaming feature selection method. Pattern Recogn. 86 (2019), 48–61.Google Scholar
Cross Ref
- Caleb Ziems, Ymir Vigfusson, and Fred Morstatter. 2020. Aggressive, repetitive, intentional, visible, and imbalanced: Refining representations for cyberbullying classification. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 14. 808–819.Google Scholar
- Daphney-Stavroula Zois, Angeliki Kapodistria, Mengfan Yao, and Charalampos Chelmis. 2018. Optimal online cyberbullying detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 2017–2021.Google Scholar
Cross Ref
Index Terms
Dynamic, Incremental, and Continuous Detection of Cyberbullying in Online Social Media
Recommendations
Cyberbullying Ends Here: Towards Robust Detection of Cyberbullying in Social Media
WWW '19: The World Wide Web ConferenceThe potentially detrimental effects of cyberbullying have led to the development of numerous automated, data-driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well-defined, is ...
Robust Detection of Cyberbullying in Social Media
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThe potentially detrimental effects of cyberbullying have led to the development of numerous automated, data–driven approaches, with an emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well–defined, ...
Lightning Talk–Towards Robust Detection of Cyberbullying in Social Media
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThe potentially detrimental effects of cyberbullying have led to the development of numerous automated, data–driven approaches, with emphasis on classification accuracy. Cyberbullying, as a form of abusive online behavior, although not well–defined, is ...






Comments