Abstract
In recent years, the increasing propagation of hate speech in online social networks and the need for effective counter-measures have drawn significant investment from social network companies and researchers. This has resulted in the development of many web platforms and mobile applications for reporting and monitoring online hate speech incidents. In this article, we present MANDOLA, a big-data processing system that monitors, detects, visualizes, and reports the spread and penetration of online hate-related speech using big-data approaches. MANDOLA consists of six individual components that intercommunicate to consume, process, store, and visualize statistical information regarding hate speech spread online. We also present a novel ensemble-based classification algorithm for hate speech detection that can significantly improve the performance of MANDOLA’s ability to detect hate speech. To present the functionality and usability of our system, we present a use case scenario of real-life event annotation and data correlation. As shown from the performance of the individual modules, as well as the usability and functionality of the whole system, MANDOLA is a powerful system for reporting and monitoring online hate speech.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available at http://tensorflow.org (Software available from tensorflow.org.)Google Scholar
- Imran Awan. 2014. Islamophobia and Twitter: A typology of online hate against Muslims on social media. Policy 8 Internet 6, 2 (June 2014), 133--150. DOI:https://doi.org/10.1002/1944-2866.POI364Google Scholar
- Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW’17 Companion). ACM, New York, NY. DOI:https://doi.org/10.1145/3041021.3054223Google Scholar
Digital Library
- Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. 54--63. https://www.aclweb.org/anthology/S19-2007.Google Scholar
Cross Ref
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937.Google Scholar
Digital Library
- L. Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (Aug. 1996), 123--140.Google Scholar
Digital Library
- Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems 29, 8 (1997), 1157--1166. DOI:https://doi.org/10.1016/S0169-7552(97)00031-7Google Scholar
Digital Library
- Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M. Buhmann. 2010. The balanced accuracy and its posterior distribution. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 3121--3124. DOI:https://doi.org/10.1109/ICPR.2010.764Google Scholar
Digital Library
- Peter Burnap, Omer Rana, Matthew Williams, William Housley, Adam Edwards, Jeffrey Morgan, Luke Sloan, and Javier Conejero. 2015. COSMOS: Towards an integrated and scalable service for analysing social media on demand. International Journal of Parallel, Emergent and Distributed Systems 30, 2 (2015), 80--100.Google Scholar
Digital Library
- Arthur T. E. Capozzi, Mirko Lai, Valerio Basile, Cataldo Musto, Marco Polignano, Fabio Poletto, Manuela Sanguinetti, et al. 2019. Computational linguistics against hate: Hate speech detection and visualization on social media in the “Contro L’Odio” project. In Proceedings of the 6th Italian Conference on Computational Linguistics (CLiC-it’19).Google Scholar
- Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on Twitter. arXiv:1702.06877.Google Scholar
- Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 International Conference on Privacy, Security, Risk, and Trust and the 2012 International Confernece on Social Computing. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/socialcom-passat.2012.55Google Scholar
Digital Library
- Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078.Google Scholar
- Raphael Cohen-Almagor. 2015. Viral hate: Containing its spread on the Internet by Abraham H. Foxman and Christopher Wolf. Basingstoke: Palgrave Macmillan, 2013. 256pp., £17.99, ISBN 978 0230342170. Political Studies Review 13, 2 (2015), 281--282. DOI:https://doi.org/10.1111/1478-9302.12087_70 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/1478-9302.12087_70Google Scholar
- R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. http://dl.acm.org/citation.cfm?id=1953048.2078186.Google Scholar
Digital Library
- Maral Dadvar, Dolf Trieschnigg, Roeland Ordelman, and Franciska de Jong. 2013. Improving cyberbullying detection with user context. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). 693--696. DOI:https://doi.org/10.1007/978-3-642-36973-5_62Google Scholar
Digital Library
- Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. arxiv:1703.04009.Google Scholar
- Cong Ding, Yang Chen, and Xiaoming Fu. 2013. Crowd crawling: Towards collaborative data collection for large-scale online social networks. In Proceedings of the 1st ACM Conference on Online Social Networks (COSN’13). ACM, New York, NY, 183--188. DOI:https://doi.org/10.1145/2512938.2512958Google Scholar
Digital Library
- Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web (WWW’15 Companion). ACM, New York, NY. DOI:https://doi.org/10.1145/2740908.2742760Google Scholar
Digital Library
- EEANews. Countering Hate Speech Online. Retrieved February 15, 202 from https://eeagrants.org/News/2012/Countering-hate-speech-online.Google Scholar
- H. Efstathiades, D. Antoniades, G. Pallis, and M. D. Dikaiakos. 2016. Distributed large-scale data collection in online social networks. In Proceedings of the 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC’16). 373--380. DOI:https://doi.org/10.1109/CIC.2016.056Google Scholar
Cross Ref
- Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2018. A unified deep learning architecture for abuse detection. arxiv:1802.00385.Google Scholar
- D. G. Njagi, Z. Zhang, D. Hanyurwimfura, and J. Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (April 2015), 215--230. DOI:https://doi.org/10.14257/ijmue.2015.10.4.21Google Scholar
- Iginio Gagliardone, Danit Gal, Thiago Alves, and Gabriela Martinez. 2015. Countering Online Hate Speech. Retrieved February 15, 2020 from https://unesdoc.unesco.org/ark:/48223/pf0000233231.Google Scholar
- Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. DOI:https://doi.org/10.18653/v1/w17-3013Google Scholar
Cross Ref
- Dario Garcia-Gasulla, Ferran Parés, Armand Vilalta, Jonathan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, and Toyotaro Suzumura. 2017. On the behavior of convolutional nets for feature extraction. arxiv:1703.01127.Google Scholar
- Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215--230.Google Scholar
Cross Ref
- I. Goodfellow, Y. Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org.Google Scholar
Digital Library
- Edel Greevy and Alan F. Smeaton. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY. DOI:https://doi.org/10.1145/1008992.1009074Google Scholar
- Jake Harwood. 2011. Book review: Waltman, M., 8 Haas, J. (2011). The communication of hate. New York, NY: Peter Lang. vii + 202 pp. ISBN: 978-1433104473. Journal of Language and Social Psychology 30, 3 (2011), 350--352. DOI:https://doi.org/10.1177/0261927X11407170 arXiv:https://doi.org/10.1177/0261927X11407170Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arxiv:1512.03385.Google Scholar
- C. J. Hutto and Eric Gilbert. 2015. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM’14).Google Scholar
- John L. Stacy Joshua S. White, Jeanna N. Matthews. 2012. Coalmine: an experience in building a system for social media analytics. In Proceedings Volume 8408: Cyber Sensing 2012. SPIE, 8408. DOI:https://doi.org/10.1117/12.918933Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arxiv:1412.6980.Google Scholar
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov. 1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google Scholar
Cross Ref
- Zachary Chase Lipton. 2015. A critical review of recurrent neural networks for sequence learning. arxiv:1506.00019.Google Scholar
- Walid Magdy, Kareem Darwish, and Norah Abokhodair. 2015. Quantifying public response towards Islam on Twitter after Paris attacks. arXiv:1512.04570.Google Scholar
- Estelle De Marco. 2017. D2.1b: Definition of Illegal Hatred and Implications. Retrieved February 15, 2020 from http://www.mandola-project.eu/publications/.Google Scholar
- Y. Mehdad and J. Tetreault. 2016. Do characters abuse more than words? In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. DOI:https://doi.org/10.18653/v1/w16-3638Google Scholar
- Stefano Menini, Giovanni Moretti, Michele Corazza, Elena Cabrio, Sara Tonelli, and Serena Villata. 2019. A system to monitor cyberbullying based on message classification and social network analysis. In Proceedings of the 3rd Workshop on Abusive Language Online. 105--110. https://www.aclweb.org/anthology/W19-3511.Google Scholar
Cross Ref
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arxiv:1301.3781.Google Scholar
- Fred Morstatter, Jürgen Pfeffer, Huan Liu, and Kathleen M. Carley. 2013. Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s Firehose. arxiv:1306.5204.Google Scholar
- Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (WWW’16). ACM, New York, NY. DOI:https://doi.org/10.1145/2872427.2883062Google Scholar
Digital Library
- Francisco Ordóñez and Daniel Roggen. 2016. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (Jan. 2016), 115. DOI:https://doi.org/10.3390/s16010115Google Scholar
Cross Ref
- Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--390. http://aclweb.org/anthology/N13-1039.Google Scholar
- Ji Ho Park and Pascale Fung. 2017. One-step and two-step classification for abusive language detection on Twitter. In Proceedings of the 1st Workshop on Abusive Language Online.Google Scholar
Cross Ref
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543. http://www.aclweb.org/anthology/D14-1162.Google Scholar
- Georgios Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Effective hate-speech detection in Twitter data using recurrent neural networks. Applied Intelligence 48, 12 (Dec. 2018), 4730--4742. DOI:https://doi.org/10.1007/s10489-018-1242-yGoogle Scholar
Digital Library
- A. Ritter, S. Clark, E., and Oren E. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1524--1534.Google Scholar
- Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. 2012. Open domain event extraction from Twitter. In Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (KDD’12). 1104--1112..Google Scholar
Digital Library
- Robert E. Schapire and Yoav Freund. 2012. Boosting: Foundations and Algorithms. MIT Press, Cambridge, MA.Google Scholar
- Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. DOI:https://doi.org/10.18653/v1/w17-1101Google Scholar
Cross Ref
- Mazin Sidahmed. 2016. Claims of Hate Crimes Possibly Linked to Trump’s Election Reported Across the US. Retrieved February 15, 2020 from https://www.theguardian.com/us-news/2016/nov/10/hate-crime-spike-us-donald-trump-president.Google Scholar
- Leandro Araújo Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. arxiv:1603.07709.Google Scholar
- Naftali Tishby and Noga Zaslavsky. 2015. Deep learning and the information bottleneck principle. arxiv:1503.02406.Google Scholar
- Alan Travis. 2017. Anti-Muslim Hate Crime Surges After Manchester and London Bridge Attacks. Retrieved February 15, 2020 from https://www.theguardian.com/society/2017/jun/20/anti-muslim-hate-surges-after-manchester-and-london-bridge-attacks.Google Scholar
- European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). Retrieved February 15, 2020 from http://data.europa.eu/eli/reg/2016/679/oj.Google Scholar
- Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the 1st Italian Conference on Cybersecurity (ITASEC’17). 86--95.Google Scholar
- William Warner and Julia Hirschberg. 2012. Detecting hate speech on the World Wide Web. In Proceedings of the 2nd Workshop on Language in Social Media (LSM’12). 19--26. http://dl.acm.org/citation.cfm?id=2390374.2390377.Google Scholar
Digital Library
- Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science. DOI:https://doi.org/10.18653/v1/w16-5618Google Scholar
Cross Ref
- Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88--93. http://www.aclweb.org/anthology/N16-2013.Google Scholar
Cross Ref
- David H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 241--259.Google Scholar
Digital Library
- Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY. DOI:https://doi.org/10.1145/2396761.2398556Google Scholar
Digital Library
- Shuhan Yuan, Xintao Wu, and Yang Xiang. 2016. A two phase deep learning model for identifying discrimination from tweets. In Proceedings of the 19th International Conference on Extending Database Technology (EDBT’16). 696--697.Google Scholar
- Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. 75--86. DOI:https://doi.org/10.18653/v1/S19-2010Google Scholar
Cross Ref
- Shiwei Zhang, Xiuzhen Zhang, and Jeffrey Chan. 2017. A word-character convolutional neural network for language-agnostic Twitter sentiment analysis. In Proceedings of the 22nd Australasian Document Computing Symposium (ADCS’17). DOI:https://doi.org/10.1145/3166072.3166082Google Scholar
Digital Library
- Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arxiv:1502.01710.Google Scholar
- Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. arxiv:1509.01626.Google Scholar
Digital Library
- Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics 1, 1 (Dec. 2010), 43--52. DOI:https://doi.org/10.1007/s13042-010-0001-0Google Scholar
Cross Ref
- Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In The Semantic Web. Springer International Publishing, 745--760.Google Scholar
- W. X. Zhao, J. Jiang, J. Weng, J. He, E. P. Lim, H. Yan, and X. Li. 2011. Comparing Twitter and traditional media using topic models. In Advances in Information Retrieval, P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, Hyowon L., and V. Mudoch (Eds.). Springer, Berlin, Germany, 338--349.Google Scholar
- H. Zhong, H. Li, A. Squicciarini, S. Rajtmajer, C. Griffin, D. Miller, and C. Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.Google Scholar
Index Terms
MANDOLA: A Big-Data Processing and Visualization Platform for Monitoring and Detecting Online Hate Speech
Recommendations
Spread of Hate Speech in Online Social Media
WebSci '19: Proceedings of the 10th ACM Conference on Web ScienceHate speech is considered to be one of the major issues currently plaguing the online social media. With online hate speech culminating in gruesome scenarios like the Rohingya genocide in Myanmar, anti-Muslim mob violence in Sri Lanka, and the ...
A Measurement Study of Hate Speech in Social Media
HT '17: Proceedings of the 28th ACM Conference on Hypertext and Social MediaSocial media platforms provide an inexpensive communication medium that allows anyone to quickly reach millions of users. Consequently, in these platforms anyone can publish content and anyone interested in the content can obtain it, representing a ...
Hate speech detection: A solved problem? The challenging case of long tail on Twitter
Special Issue on Semantic Deep LearningIn recent years, the increasing propagation of hate speech on social media and the urgent need for effective counter-measures have drawn significant investment from governments, companies, and researchers. A large number of methods have been developed for ...






Comments