skip to main content
research-article

MANDOLA: A Big-Data Processing and Visualization Platform for Monitoring and Detecting Online Hate Speech

Published:04 March 2020Publication History
Skip Abstract Section

Abstract

In recent years, the increasing propagation of hate speech in online social networks and the need for effective counter-measures have drawn significant investment from social network companies and researchers. This has resulted in the development of many web platforms and mobile applications for reporting and monitoring online hate speech incidents. In this article, we present MANDOLA, a big-data processing system that monitors, detects, visualizes, and reports the spread and penetration of online hate-related speech using big-data approaches. MANDOLA consists of six individual components that intercommunicate to consume, process, store, and visualize statistical information regarding hate speech spread online. We also present a novel ensemble-based classification algorithm for hate speech detection that can significantly improve the performance of MANDOLA’s ability to detect hate speech. To present the functionality and usability of our system, we present a use case scenario of real-life event annotation and data correlation. As shown from the performance of the individual modules, as well as the usability and functionality of the whole system, MANDOLA is a powerful system for reporting and monitoring online hate speech.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, et al. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available at http://tensorflow.org (Software available from tensorflow.org.)Google ScholarGoogle Scholar
  2. Imran Awan. 2014. Islamophobia and Twitter: A typology of online hate against Muslims on social media. Policy 8 Internet 6, 2 (June 2014), 133--150. DOI:https://doi.org/10.1002/1944-2866.POI364Google ScholarGoogle Scholar
  3. Pinkesh Badjatiya, Shashank Gupta, Manish Gupta, and Vasudeva Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW’17 Companion). ACM, New York, NY. DOI:https://doi.org/10.1145/3041021.3054223Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation. 54--63. https://www.aclweb.org/anthology/S19-2007.Google ScholarGoogle ScholarCross RefCross Ref
  5. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3 (March 2003), 993--1022. http://dl.acm.org/citation.cfm?id=944919.944937.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (Aug. 1996), 123--140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, and Geoffrey Zweig. 1997. Syntactic clustering of the web. Computer Networks and ISDN Systems 29, 8 (1997), 1157--1166. DOI:https://doi.org/10.1016/S0169-7552(97)00031-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M. Buhmann. 2010. The balanced accuracy and its posterior distribution. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR’10). IEEE, Los Alamitos, CA, 3121--3124. DOI:https://doi.org/10.1109/ICPR.2010.764Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Peter Burnap, Omer Rana, Matthew Williams, William Housley, Adam Edwards, Jeffrey Morgan, Luke Sloan, and Javier Conejero. 2015. COSMOS: Towards an integrated and scalable service for analysing social media on demand. International Journal of Parallel, Emergent and Distributed Systems 30, 2 (2015), 80--100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Arthur T. E. Capozzi, Mirko Lai, Valerio Basile, Cataldo Musto, Marco Polignano, Fabio Poletto, Manuela Sanguinetti, et al. 2019. Computational linguistics against hate: Hate speech detection and visualization on social media in the “Contro L’Odio” project. In Proceedings of the 6th Italian Conference on Computational Linguistics (CLiC-it’19).Google ScholarGoogle Scholar
  11. Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Athena Vakali. 2017. Mean birds: Detecting aggression and bullying on Twitter. arXiv:1702.06877.Google ScholarGoogle Scholar
  12. Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 International Conference on Privacy, Security, Risk, and Trust and the 2012 International Confernece on Social Computing. IEEE, Los Alamitos, CA. DOI:https://doi.org/10.1109/socialcom-passat.2012.55Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kyunghyun Cho, Bart van Merrienboer, Çaglar Gülçehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arxiv:1406.1078.Google ScholarGoogle Scholar
  14. Raphael Cohen-Almagor. 2015. Viral hate: Containing its spread on the Internet by Abraham H. Foxman and Christopher Wolf. Basingstoke: Palgrave Macmillan, 2013. 256pp., £17.99, ISBN 978 0230342170. Political Studies Review 13, 2 (2015), 281--282. DOI:https://doi.org/10.1111/1478-9302.12087_70 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/1478-9302.12087_70Google ScholarGoogle Scholar
  15. R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. http://dl.acm.org/citation.cfm?id=1953048.2078186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maral Dadvar, Dolf Trieschnigg, Roeland Ordelman, and Franciska de Jong. 2013. Improving cyberbullying detection with user context. In Proceedings of the 35th European Conference on Advances in Information Retrieval (ECIR’13). 693--696. DOI:https://doi.org/10.1007/978-3-642-36973-5_62Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Thomas Davidson, Dana Warmsley, Michael W. Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. arxiv:1703.04009.Google ScholarGoogle Scholar
  18. Cong Ding, Yang Chen, and Xiaoming Fu. 2013. Crowd crawling: Towards collaborative data collection for large-scale online social networks. In Proceedings of the 1st ACM Conference on Online Social Networks (COSN’13). ACM, New York, NY, 183--188. DOI:https://doi.org/10.1145/2512938.2512958Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web (WWW’15 Companion). ACM, New York, NY. DOI:https://doi.org/10.1145/2740908.2742760Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. EEANews. Countering Hate Speech Online. Retrieved February 15, 202 from https://eeagrants.org/News/2012/Countering-hate-speech-online.Google ScholarGoogle Scholar
  21. H. Efstathiades, D. Antoniades, G. Pallis, and M. D. Dikaiakos. 2016. Distributed large-scale data collection in online social networks. In Proceedings of the 2016 IEEE 2nd International Conference on Collaboration and Internet Computing (CIC’16). 373--380. DOI:https://doi.org/10.1109/CIC.2016.056Google ScholarGoogle ScholarCross RefCross Ref
  22. Antigoni-Maria Founta, Despoina Chatzakou, Nicolas Kourtellis, Jeremy Blackburn, Athena Vakali, and Ilias Leontiadis. 2018. A unified deep learning architecture for abuse detection. arxiv:1802.00385.Google ScholarGoogle Scholar
  23. D. G. Njagi, Z. Zhang, D. Hanyurwimfura, and J. Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (April 2015), 215--230. DOI:https://doi.org/10.14257/ijmue.2015.10.4.21Google ScholarGoogle Scholar
  24. Iginio Gagliardone, Danit Gal, Thiago Alves, and Gabriela Martinez. 2015. Countering Online Hate Speech. Retrieved February 15, 2020 from https://unesdoc.unesco.org/ark:/48223/pf0000233231.Google ScholarGoogle Scholar
  25. Björn Gambäck and Utpal Kumar Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the 1st Workshop on Abusive Language Online. DOI:https://doi.org/10.18653/v1/w17-3013Google ScholarGoogle ScholarCross RefCross Ref
  26. Dario Garcia-Gasulla, Ferran Parés, Armand Vilalta, Jonathan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, and Toyotaro Suzumura. 2017. On the behavior of convolutional nets for feature extraction. arxiv:1703.01127.Google ScholarGoogle Scholar
  27. Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun Long. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215--230.Google ScholarGoogle ScholarCross RefCross Ref
  28. I. Goodfellow, Y. Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press, Cambridge, MA. http://www.deeplearningbook.org.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Edel Greevy and Alan F. Smeaton. 2004. Classifying racist texts using a support vector machine. In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (SIGIR’04). ACM, New York, NY. DOI:https://doi.org/10.1145/1008992.1009074Google ScholarGoogle Scholar
  30. Jake Harwood. 2011. Book review: Waltman, M., 8 Haas, J. (2011). The communication of hate. New York, NY: Peter Lang. vii + 202 pp. ISBN: 978-1433104473. Journal of Language and Social Psychology 30, 3 (2011), 350--352. DOI:https://doi.org/10.1177/0261927X11407170 arXiv:https://doi.org/10.1177/0261927X11407170Google ScholarGoogle Scholar
  31. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arxiv:1512.03385.Google ScholarGoogle Scholar
  32. C. J. Hutto and Eric Gilbert. 2015. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the 8th International Conference on Weblogs and Social Media (ICWSM’14).Google ScholarGoogle Scholar
  33. John L. Stacy Joshua S. White, Jeanna N. Matthews. 2012. Coalmine: an experience in building a system for social media analytics. In Proceedings Volume 8408: Cyber Sensing 2012. SPIE, 8408. DOI:https://doi.org/10.1117/12.918933Google ScholarGoogle Scholar
  34. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arxiv:1412.6980.Google ScholarGoogle Scholar
  35. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov. 1998), 2278--2324. DOI:https://doi.org/10.1109/5.726791Google ScholarGoogle ScholarCross RefCross Ref
  36. Zachary Chase Lipton. 2015. A critical review of recurrent neural networks for sequence learning. arxiv:1506.00019.Google ScholarGoogle Scholar
  37. Walid Magdy, Kareem Darwish, and Norah Abokhodair. 2015. Quantifying public response towards Islam on Twitter after Paris attacks. arXiv:1512.04570.Google ScholarGoogle Scholar
  38. Estelle De Marco. 2017. D2.1b: Definition of Illegal Hatred and Implications. Retrieved February 15, 2020 from http://www.mandola-project.eu/publications/.Google ScholarGoogle Scholar
  39. Y. Mehdad and J. Tetreault. 2016. Do characters abuse more than words? In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. DOI:https://doi.org/10.18653/v1/w16-3638Google ScholarGoogle Scholar
  40. Stefano Menini, Giovanni Moretti, Michele Corazza, Elena Cabrio, Sara Tonelli, and Serena Villata. 2019. A system to monitor cyberbullying based on message classification and social network analysis. In Proceedings of the 3rd Workshop on Abusive Language Online. 105--110. https://www.aclweb.org/anthology/W19-3511.Google ScholarGoogle ScholarCross RefCross Ref
  41. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arxiv:1301.3781.Google ScholarGoogle Scholar
  42. Fred Morstatter, Jürgen Pfeffer, Huan Liu, and Kathleen M. Carley. 2013. Is the sample good enough? Comparing data from Twitter’s streaming API with Twitter’s Firehose. arxiv:1306.5204.Google ScholarGoogle Scholar
  43. Chikashi Nobata, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web (WWW’16). ACM, New York, NY. DOI:https://doi.org/10.1145/2872427.2883062Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Francisco Ordóñez and Daniel Roggen. 2016. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (Jan. 2016), 115. DOI:https://doi.org/10.3390/s16010115Google ScholarGoogle ScholarCross RefCross Ref
  45. Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--390. http://aclweb.org/anthology/N13-1039.Google ScholarGoogle Scholar
  46. Ji Ho Park and Pascale Fung. 2017. One-step and two-step classification for abusive language detection on Twitter. In Proceedings of the 1st Workshop on Abusive Language Online.Google ScholarGoogle ScholarCross RefCross Ref
  47. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543. http://www.aclweb.org/anthology/D14-1162.Google ScholarGoogle Scholar
  48. Georgios Pitsilis, Heri Ramampiaro, and Helge Langseth. 2018. Effective hate-speech detection in Twitter data using recurrent neural networks. Applied Intelligence 48, 12 (Dec. 2018), 4730--4742. DOI:https://doi.org/10.1007/s10489-018-1242-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Ritter, S. Clark, E., and Oren E. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 1524--1534.Google ScholarGoogle Scholar
  50. Alan Ritter, Mausam, Oren Etzioni, and Sam Clark. 2012. Open domain event extraction from Twitter. In Proceedings of the 18th ACM International Conference on Knowledge Discovery and Data Mining (KDD’12). 1104--1112..Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Robert E. Schapire and Yoav Freund. 2012. Boosting: Foundations and Algorithms. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  52. Anna Schmidt and Michael Wiegand. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. DOI:https://doi.org/10.18653/v1/w17-1101Google ScholarGoogle ScholarCross RefCross Ref
  53. Mazin Sidahmed. 2016. Claims of Hate Crimes Possibly Linked to Trump’s Election Reported Across the US. Retrieved February 15, 2020 from https://www.theguardian.com/us-news/2016/nov/10/hate-crime-spike-us-donald-trump-president.Google ScholarGoogle Scholar
  54. Leandro Araújo Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. arxiv:1603.07709.Google ScholarGoogle Scholar
  55. Naftali Tishby and Noga Zaslavsky. 2015. Deep learning and the information bottleneck principle. arxiv:1503.02406.Google ScholarGoogle Scholar
  56. Alan Travis. 2017. Anti-Muslim Hate Crime Surges After Manchester and London Bridge Attacks. Retrieved February 15, 2020 from https://www.theguardian.com/society/2017/jun/20/anti-muslim-hate-surges-after-manchester-and-london-bridge-attacks.Google ScholarGoogle Scholar
  57. European Union. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). Retrieved February 15, 2020 from http://data.europa.eu/eli/reg/2016/679/oj.Google ScholarGoogle Scholar
  58. Fabio Del Vigna, Andrea Cimino, Felice Dell’Orletta, Marinella Petrocchi, and Maurizio Tesconi. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the 1st Italian Conference on Cybersecurity (ITASEC’17). 86--95.Google ScholarGoogle Scholar
  59. William Warner and Julia Hirschberg. 2012. Detecting hate speech on the World Wide Web. In Proceedings of the 2nd Workshop on Language in Social Media (LSM’12). 19--26. http://dl.acm.org/citation.cfm?id=2390374.2390377.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science. DOI:https://doi.org/10.18653/v1/w16-5618Google ScholarGoogle ScholarCross RefCross Ref
  61. Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 88--93. http://www.aclweb.org/anthology/N16-2013.Google ScholarGoogle ScholarCross RefCross Ref
  62. David H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 241--259.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Guang Xiang, Bin Fan, Ling Wang, Jason Hong, and Carolyn Rose. 2012. Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY. DOI:https://doi.org/10.1145/2396761.2398556Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Shuhan Yuan, Xintao Wu, and Yang Xiang. 2016. A two phase deep learning model for identifying discrimination from tweets. In Proceedings of the 19th International Conference on Extending Database Technology (EDBT’16). 696--697.Google ScholarGoogle Scholar
  65. Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, and Ritesh Kumar. 2019. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. 75--86. DOI:https://doi.org/10.18653/v1/S19-2010Google ScholarGoogle ScholarCross RefCross Ref
  66. Shiwei Zhang, Xiuzhen Zhang, and Jeffrey Chan. 2017. A word-character convolutional neural network for language-agnostic Twitter sentiment analysis. In Proceedings of the 22nd Australasian Document Computing Symposium (ADCS’17). DOI:https://doi.org/10.1145/3166072.3166082Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Xiang Zhang and Yann LeCun. 2015. Text understanding from scratch. arxiv:1502.01710.Google ScholarGoogle Scholar
  68. Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. arxiv:1509.01626.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Yin Zhang, Rong Jin, and Zhi-Hua Zhou. 2010. Understanding bag-of-words model: A statistical framework. International Journal of Machine Learning and Cybernetics 1, 1 (Dec. 2010), 43--52. DOI:https://doi.org/10.1007/s13042-010-0001-0Google ScholarGoogle ScholarCross RefCross Ref
  70. Z. Zhang, D. Robinson, and J. Tepper. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In The Semantic Web. Springer International Publishing, 745--760.Google ScholarGoogle Scholar
  71. W. X. Zhao, J. Jiang, J. Weng, J. He, E. P. Lim, H. Yan, and X. Li. 2011. Comparing Twitter and traditional media using topic models. In Advances in Information Retrieval, P. Clough, C. Foley, C. Gurrin, G. J. F. Jones, W. Kraaij, Hyowon L., and V. Mudoch (Eds.). Springer, Berlin, Germany, 338--349.Google ScholarGoogle Scholar
  72. H. Zhong, H. Li, A. Squicciarini, S. Rajtmajer, C. Griffin, D. Miller, and C. Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.Google ScholarGoogle Scholar

Index Terms

  1. MANDOLA: A Big-Data Processing and Visualization Platform for Monitoring and Detecting Online Hate Speech

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Internet Technology
              ACM Transactions on Internet Technology  Volume 20, Issue 2
              Special Section on Emotions in Conflictual Social Interactions and Regular Papers
              May 2020
              256 pages
              ISSN:1533-5399
              EISSN:1557-6051
              DOI:10.1145/3386441
              • Editor:
              • Ling Liu
              Issue’s Table of Contents

              Copyright © 2020 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 March 2020
              • Accepted: 1 November 2019
              • Revised: 1 September 2019
              • Received: 1 April 2019
              Published in toit Volume 20, Issue 2

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!