skip to main content
research-article

A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

Authors Info & Claims
Published:08 May 2023Publication History
Skip Abstract Section

Abstract

Social Media has been growing and has provided the world with a platform to opine, debate, display, and discuss like never before. It has a major influence in research areas that analyze human behavior and social groups, and the phenomenon of social interactions is even being used in areas such as Internet of Things. This constant stream of data connecting individuals and organizations across the globe has had a tremendous impact on the functioning of society and even has the power to sway elections. Despite having numerous benefits, social media has certain issues such as the prevalence of fake news, which has also led to the rise of the hate speech phenomenon. Due to lax security throughout these social media platforms, these issues continue to exist without any repercussions. This leads to cyberbullying, defamation, and presents grave security concerns. Even though some work has been done independently on native scripts, hate speech detection, and code-mixed data, there exists a lack of academic work and research in the area of detecting hate speech in transliterated code-mixed data and in-text containing native language scripts. Research in this field is inhibited greatly due to the multiple variations in grammar and spelling and in general a lack of availability of annotated datasets, especially when it comes to native languages. This article comes up with a method to automate hate speech detection in code-mixed and native language text. The article presents an architecture containing a Tabnet classifier-based model trained on features extracted using MuRIL from transliterated code-mixed textual data. The article also shows that the same model works well on features extracted from text in Devanagari despite being trained on transliterated data.

REFERENCES

  1. [1] Wu Jimmy Ming-Tai, Li Zhongcui, Srivastava Gautam, Frnda Jaroslav, Diaz Vicente Garcia, and Lin Jerry Chun-Wei. 2020. A CNN-based stock price trend prediction with futures and historical price. In International Conference on Pervasive Artificial Intelligence (ICPAI). 134139. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Wu Jimmy Ming-Tai, Sun Lingyun, Srivastava Gautam, and Lin Jerry Chun-Wei. 2021. A ML-based stock trading model for profit predication. In Advances and Trends in Artificial Intelligence. From Theory to Practice, Fujita Hamido, Selamat Ali, Lin Jerry Chun-Wei, and Ali Moonis (Eds.). Springer International Publishing, Cham, 554563.Google ScholarGoogle Scholar
  3. [3] Ahmed Usman, Mukhiya Suresh Kumar, Srivastava Gautam, Lamo Yngve, and Lin Jerry Chun-Wei. 2021. Attention-based deep entropy active learning using lexical algorithm for mental health treatment. Front. Psychol. 12 (2021), 471. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] O’Neill Kevin Francis. 2017. True threats. First Amend. Encycl. (2017).Google ScholarGoogle Scholar
  5. [5] Sharma Deepak, Singh Anurag, and Saroha Abhishek. 2018. Language Identification for Hindi Language Transliterated Text in Roman Script Using Generative Adversarial Networks. Towards Extensible and Adaptable Methods in Computing, Springer, 267279. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Sreelakshmi K., Premjith B., and Soman K. P.. 2020. Detection of hate speech text in Hindi-English code-mixed data. Proced. Comput. Sci. 171 (2020), 737744. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Badjatiya Pinkesh, Gupta Shashank, Gupta Manish, and Varma Vasudeva. 2017. Deep learning for hate speech detection in tweets. In 26th International Conference on World Wide Web Companion. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Mathur Puneet, Sawhney Ramit, Ayyar Meghna, and Shah Rajiv. 2018. Did you offend me? Classification of offensive tweets in Hinglish language. In 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, 138148. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Kamble Satyajit and Joshi Aditya. 2018. Hate Speech Detection from Code-mixed Hindi-English Tweets Using Deep Learning Models. (2018). arxiv:cs.CL/1811.05145Google ScholarGoogle Scholar
  10. [10] Vashistha Neeraj and Zubiaga Arkaitz. 2021. Online multilingual hate speech detection: Experimenting with Hindi and English social media. Information 12, 1 (2021). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Li Stephanie H. Cook Kunal Relia, Zhengyi, and Chunara Rumi. 2019. Race, ethnicity and national origin-based discrimination in social media and hate crimes across 100 U.S. cities. In International AAAI Conference on Web and Social Media. arXiv:1902.00119.Google ScholarGoogle Scholar
  12. [12] Research Google. 2020. Multilingual representations for Indian languages: A BERT model pre-trained on 17 Indian languages, and their transliterated counterparts. https://huggingface.co/google/muril-base-cased.Google ScholarGoogle Scholar
  13. [13] Arik Sercan O. and Pfister Tomas. 2020. TabNet: Attentive interpretable tabular learning. arxiv:cs.LG/1908.07442.Google ScholarGoogle Scholar
  14. [14] HASOC2020: Hate speech and offensive content identification in Indo-European languages. hasocfire.github.io/hasoc/2020/.Google ScholarGoogle Scholar
  15. [15] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention Is All You Need. arxiv:cs.CL/1706.03762.Google ScholarGoogle Scholar
  16. [16] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:cs.CL/1810.04805.Google ScholarGoogle Scholar
  17. [17] Badjatiya Pinkesh, Gupta Shashank, Gupta Manish, and Varma Vasudeva. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW’17 Companion). International World Wide Web Conferences Steering Committee, 759760. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Reganti Akshit Bhatia, Ritesh Kumar, Aishwarya N., and Maheshwari Tushar. 2018. Aggression-annotated corpus of Hindi-English code-mixed data. In 11th Language Resources and Evaluation Conference (LREC).Google ScholarGoogle Scholar
  19. [19] Gibert Ona de, Perez Naiara, García-Pablos Aitor, and Cuadros Montse. 2018. Hate speech dataset from a white supremacy forum. In 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, 1120. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Biradar Shankar, Saumya Sunil, and Chauhan Arun. 2021. Hate or non-hate: Translation based hate speech identification in code-mixed Hinglish data set. In IEEE International Conference on Big Data (Big Data). 24702475. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 8 May 2023
      • Online AM: 20 October 2022
      • Accepted: 4 October 2022
      • Revised: 20 August 2022
      • Received: 1 December 2021
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)281
      • Downloads (Last 6 weeks)50

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!