skip to main content
research-article

Linguistic Taboos and Euphemisms in Nepali

Published:12 November 2022Publication History
Skip Abstract Section

Abstract

Languages across the world have words, phrases, and behaviors—the taboos—that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio-physical setting, speaker-listener relationship, and word choices. In this article, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms, such as synonym, metaphor, and circumlocution. In addition, we introduce a manually constructed dataset of more than 1,000 offensive and taboo terms popular among contemporary speakers. We describe the first experiments that provide baseline results in detecting offensive language in Nepali. This in-depth study of offensive language and resource will provide a foundation for several downstream tasks, such as offensive language detection and language learning.

REFERENCES

  1. [1] Allan Keith and Burridge Kate. 2006. Taboos and their origins. In Forbidden Words: Taboo and the Censoring of Language. Cambridge University Press, Cambridge, UK, 128.Google ScholarGoogle Scholar
  2. [2] Aroyehun Segun Taofeek and Gelbukh Alexander. 2018. Aggression detection in social media: Using deep neural networks, data augmentation, and pseudo labeling. In Proceedings of the 1st Workshop on Trolling, Aggression, and Cyberbullying (TRAC’18). 9097.Google ScholarGoogle Scholar
  3. [3] Baruah Arup, Das Kaushik, Barbhuiya Ferdous, and Dey Kuntal. 2020. Aggression identification in English, Hindi and Bangla text using BERT, RoBERTa and SVM. In Proceedings of the 2nd Workshop on Trolling, Aggression, and Cyberbullying (TRAC’20). 7682.Google ScholarGoogle Scholar
  4. [4] Barus Jumat, Sibarani Robert, Saragih Amrin, and Mulyadi. 2018. Linguistic taboos in Karonese culture. KnE Social Sciences 2018 (2018), 411421.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Behzad Anwar, Malik Waseem, and Azam Sara. 2017. Linguistic taboos in the Pahari culture: A sociolinguistic analysis. ARIEL—An International Research Journal of English Language and Literature 27 (2017), 86–97.Google ScholarGoogle Scholar
  6. [6] Bellmore Amy, Calvin Angela J., Xu Jun-Ming, and Zhu Xiaojin. 2015. The five W’s of “bullying” on Twitter: Who, what, why, where, and when. Computers in Human Behavior 44 (2015), 305314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Bhardwaj Mohit, Akhtar Shad, Ekbal Asif, Das Amitava, and Chakraborty Tanmoy. 2020. Hostility detection dataset in Hindi. arXiv preprint arXiv:2011.03588 (2020).Google ScholarGoogle Scholar
  8. [8] Bohra Aditya, Vijay Deepanshu, Singh Vinay, Akhtar Syed Sarfaraz, and Shrivastava Manish. 2018. A dataset of Hindi-English code-mixed social media text for hate speech detection. In Proceedings of the 2nd Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media. 3641.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Chatzakou Despoina, Kourtellis Nicolas, Blackburn Jeremy, Cristofaro Emiliano De, Stringhini Gianluca, and Vakali Athena. 2017. Mean birds: Detecting aggression and bullying on Twitter. In Proceedings of the 2017 ACM Web Science Conference. 1322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Das Amit Kumar, Asif Abdullah Al, Paul Anik, and Hossain Nur. 2021. Bangla hate speech detection on social media using attention-based recurrent neural network. Journal of Intelligent Systems 30, 1 (2021), 578591.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Davidson Thomas, Warmsley Dana, Macy Michael, and Weber Ingmar. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Vigna Fabio Del, Cimino Andrea, Dell’Orletta Felice, Petrocchi Marinella, and Tesconi Maurizio. 2017. Hate me, hate me not: Hate speech detection on Facebook. In Proceedings of the 1st Italian Conference on Cybersecurity (ITASEC’17). 8695.Google ScholarGoogle Scholar
  13. [13] Fakuade Gbenga, Kemdirim Ngozi, Nnaji Ikechukwu, and Nwosu Florence. 2013. Linguistic taboos in the Igbo society: A sociolinguistic investigation. Language Discourse & Society 2, 2 (2013), 117132.Google ScholarGoogle Scholar
  14. [14] Gao Chunming. 2013. A sociolinguistic study of English taboo language. Theory and Practice in Language Studies 3, 12 (2013), 2310.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Ghounane Nadia. 2014. A sociolinguistic view of linguistic taboos and euphemistic strategies in the Algerian society: Attitudes and beliefs in Tlemcen speech community. International Journal of Research in Applied, Natural and Social Sciences 2, 3 (2014), 7388.Google ScholarGoogle Scholar
  16. [16] Greene Carole Teresa. 2001. The use of euphemisms and taboo terms by young speakers of Russian and English. Master’s Thesis. Department of Modern Languages and Cultural Studies, University of Alberta.Google ScholarGoogle Scholar
  17. [17] Hjort Minna. 2017. Swearing in Finnish. In Advances in Swearing Research: New Languages and New Contexts. John Benjamins, Amsterdam, Netherlands, 231–256.Google ScholarGoogle Scholar
  18. [18] Hussain Gulzar, Mahmud Tamim Al, and Akthar Waheda. 2018. An approach to detect abusive Bangla text. In Proceedings of the 2018 International Conference on Innovation in Engineering and Technology (ICIET’18). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Jay Timothy. 2009. The utility and ubiquity of taboo words. Perspectives on Psychological Science 4, 2 (2009), 153161.Google ScholarGoogle Scholar
  20. [20] Jay Timothy and Janschewitz Kristin. 2008. The pragmatics of swearing. Journal of Politeness Research: Language, Behaviour, Culture 4, 2 (2008), 267288.Google ScholarGoogle Scholar
  21. [21] Jha Vikas Kumar, Hrudya P., Vinu P. N., Vijayan Vishnu, and Prabaharan P.. 2020. DHOT-repository and classification of offensive tweets in the Hindi language. Procedia Computer Science 171 (2020), 23242333.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Korta Kepa and Perry John. 2020. Pragmatics. Retrieved April 9, 2022 from https://plato.stanford.edu/archives/spr2020/entries/pragmatics/.Google ScholarGoogle Scholar
  23. [23] Kristiano Johan Tobias and Ardi Priyatno. 2018. Swear words in Bad Boys II: A semantic analysis. LLT Journal: A Journal on Language and Language Teaching 21, 2 (2018), 191198.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Kumar Ritesh, Ojha Atul Kr, Malmasi Shervin, and Zampieri Marcos. 2018. Benchmarking aggression identification in social media. In Proceedings of the 1st Workshop on Trolling, Aggression, and Cyberbullying (TRAC’18). 111.Google ScholarGoogle Scholar
  25. [25] Mathur Puneet, Shah Rajiv, Sawhney Ramit, and Mahata Debanjan. 2018. Detecting offensive tweets in Hindi-English code-switched language. In Proceedings of the 6th International Workshop on Natural Language Processing for Social Media. 1826.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Nepal Government of. 2014. Population Monograph of Nepal: Volume II (Social Demographics). Central Bureau of Statistics, Ramshah Path, Kathmandu, Nepal.Google ScholarGoogle Scholar
  27. [27] Niraula Nobal B., Dulal Saurab, and Koirala Diwa. 2021. Offensive language detection in Nepali. In Proceedings of the 5th Workshop on Online Abuse and Harms. 67–75.Google ScholarGoogle Scholar
  28. [28] Nobata Chikashi, Tetreault Joel, Thomas Achint, Mehdad Yashar, and Chang Yi. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. 145153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Park Ji Ho and Fung Pascale. 2017. One-step and two-step classification for abusive language detection on Twitter. arXiv preprint arXiv:1706.01206 (2017).Google ScholarGoogle Scholar
  30. [30] Qanbar Nada. 2011. A sociolinguistic study of the linguistic taboos in the Yemeni society. Modern Journal of Applied Linguistics 3, 2 (2011), 86104.Google ScholarGoogle Scholar
  31. [31] Read Allen Walker. 2004. The geolinguistics of verbal taboo. ETC: A Review of General Semantics 61, 4 (2004), 444–455.Google ScholarGoogle Scholar
  32. [32] Sabat Benet Oriol, Ferrer Cristian Canton, and Nieto Xavier Giro-i. 2019. Hate speech in pixels: Detection of offensive memes towards automatic moderation. arXiv preprint arXiv:1910.02334 (2019).Google ScholarGoogle Scholar
  33. [33] Saini J. R. and Desai A. A.. XXXX. Identification of slang words used in pornographic unsolicited bulk emails. Journal of SCI-TECH Research XX (XXXX), 49.Google ScholarGoogle Scholar
  34. [34] Saini Jatinderkumar R. and Desai Apurva A.. 2011. Identification of Hindi words used in pornographic unsolicited bulk e-mails.IUP Journal of Systems Management 9, 2 (2011), 1–8.Google ScholarGoogle Scholar
  35. [35] Schmidt Anna and Wiegand Michael. 2017. A survey on hate speech detection using natural language processing. In Proceedings of the 5th International Workshop on Natural Language Processing for Social Media. 110.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Shaw LaShel. 2011. Hate speech in cyberspace: Bitterness without boundaries. Notre Dame Journal of Law, Ethics & Public Policy 25, 1 (2011), Article 9.Google ScholarGoogle Scholar
  37. [37] Smith Peter K.. 2015. The nature of cyberbullying and what we can do about it. Journal of Research in Special Educational Needs 15, 3 (2015), 176184.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Sood Sara Owsley, Churchill Elizabeth F., and Antin Judd. 2012. Automatic identification of personal insults on social news sites. Journal of the American Society for Information Science and Technology 63, 2 (2012), 270285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Stone Teresa and Francis Lyn. 2010. What’s the bloody law on this? Nurses, swearing, and the law in New South Wales, Australia. Contemporary Nurse 34, 2 (2010), 248257.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Trinicenter. n.d. Invasions and Racism in Hinduism. Retrieved April 9, 2022 from http://www.trinicenter.com/more/India/invasionsandracism.htm.Google ScholarGoogle Scholar
  41. [41] Wardhaugh Ronald. 2011. An Introduction to Sociolinguistics. Vol. 28. John Wiley & Sons.Google ScholarGoogle Scholar
  42. [42] Whittaker Elizabeth and Kowalski Robin M.. 2015. Cyberbullying via social media. Journal of School Violence 14, 1 (2015), 1129.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Wiegand Michael, Siegel Melanie, and Ruppenhofer Josef. 2018. Overview of the GermEval 2018 shared task on the identification of offensive language. In Proceedings of the 14th Conference on Natural Language Processing. 1–10.Google ScholarGoogle Scholar
  44. [44] Zampieri Marcos, Malmasi Shervin, Nakov Preslav, Rosenthal Sara, Farra Noura, and Kumar Ritesh. 2019. Predicting the type and target of offensive posts in social media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 14151420.Google ScholarGoogle Scholar
  45. [45] Zampieri Marcos, Malmasi Shervin, Nakov Preslav, Rosenthal Sara, Farra Noura, and Kumar Ritesh. 2019. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). arXiv preprint arXiv:1903.08983 (2019).Google ScholarGoogle Scholar
  46. [46] Nielsen Finn Äruprup. 2011. A new ANEW: Evaluation of a word list for sentiment analysis in microblogs. CoRR abs/1103.2903 (2011).Google ScholarGoogle Scholar

Index Terms

  1. Linguistic Taboos and Euphemisms in Nepali

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
        November 2022
        372 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3568970
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 November 2022
        • Online AM: 1 April 2022
        • Accepted: 4 March 2022
        • Revised: 21 November 2021
        • Received: 20 July 2020
        Published in tallip Volume 21, Issue 6

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)238
        • Downloads (Last 6 weeks)9

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!