skip to main content
research-article

Emotion Detection in Code-Mixed Roman Urdu - English Text

Published:30 March 2023Publication History
Skip Abstract Section

Abstract

Emotion detection is a widely studied topic in natural language processing due to its significance in a number of application areas. A plethora of studies have been conducted on emotion detection in European as well as Asian languages. However, a large majority of these studies have been conducted in monolingual settings, whereas little attention has been paid to emotion detection in code-mixed text. Specifically, merely one study has been conducted on emotion detection in Roman Urdu (RU) and English (EN) code-mixed text despite the fact that such text is widely used in social media platforms. A careful examination of the existing study has revealed several issues which justify that this area requires attention of researchers. For instance, more than 37% of the messages in the contemporary corpus are monolingual sentences representing that a purely code-mixed emotion analysis corpus is non-existent. To that end, this study has scrapped 400,000 sentences from three social media platforms to identify 20,000 RU-EN code-mixed sentences. Subsequently, an iterative approach is employed to develop emotion detection guidelines. These guidelines have been used to develop a large RU-EN emotion detection (RU-EN-Emotion) corpus in which 20,000 sentences are annotated as Neutral or Emotion-sentence. The sentences having emotions are further annotated with the respective emotions. Subsequently, 102 experiments are performed to evaluate the effectiveness of six classical machine learning techniques and six deep learning techniques. The results show, (a) CNN is the most effective technique when used with GloVe embeddings, and (b) our developed RU-EN-Emotion corpus is more useful than the contemporary corpus, as it employs a two-level classification approach.

REFERENCES

  1. [1] Abainia Kheireddine. 2020. DZDC12: A new multipurpose parallel Algerian Arabizi–French code-switched corpus. Language Resources and Evaluation 54, 2 (2020), 419455.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Acheampong Francisca Adoma, Wenyu Chen, and Nunoo-Mensah Henry. 2020. Text-based emotion detection: Advances, challenges, and opportunities. Engineering Reports 2, 7 (2020), e12189.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Akhter Muhammad Pervez, Jiangbin Zheng, Naqvi Irfan Raza, Abdelmajeed Mohammed, and Sadiq Muhammad Tariq. 2020. Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access 8 (2020), 9121391226.Google ScholarGoogle Scholar
  4. [4] Alswaidan Nourah and Menai Mohamed El Bachir. 2020. A survey of state-of-the-art approaches for emotion recognition in text. Knowledge & Information Systems 62, 8 (2020), 29372987.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Ameer Iqra, Sidorov Grigori, Gomez-Adorno Helena, and Nawab Rao Muhammad Adeel. 2022. Multi-label emotion classification on code-mixed text: Data and methods. IEEE Access 10 (2022), 87798789.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Arif Muhammad Hassan, Li Jianxin, Iqbal Muhammad, and Liu Kaixu. 2018. Sentiment analysis and spam detection in short informal text using learning classifier systems. Soft Computing 22, 21 (2018), 72817291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Arshad Muhammad Umair, Bashir Muhammad Farrukh, Majeed Adil, Shahzad Waseem, and Beg Mirza Omer. 2019. Corpus for emotion detection on Roman Urdu. In 22nd International Multitopic Conference (INMIC). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bilal Anas, Rextin Aimal, Kakakhel Ahmad, and Nasim Mehwish. 2017. Roman-txt: Forms and functions of Roman Urdu texting. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, Vienna, Austria, 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Bilal Muhammad, Israr Huma, Shahid Muhammad, and Khan Amin. 2016. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques. Journal of King Saud University-Computer and Information Sciences 28, 3 (2016), 330344.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Bullock Olivia M. and Hubner Austin Y.. 2020. Candidates’ use of informal communication on social media reduces credibility and support: Examining the consequences of expectancy violations. Communication Research Reports 37, 3 (2020), 8798.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Cao Lihong, Peng Sancheng, Yin Pengfei, Zhou Yongmei, Yang Aimin, and Li Xinguang. 2020. A survey of emotion analysis in text based on deep learning. In Proceedings of the 8th International Conference on Smart City and Informatization (iSCI). IEEE, Guangzhou, China, 8188.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Chakravarthi Bharathi Raja, Muralidaran Vigneshwaran, Priyadharshini Ruba, and McCrae John P.. 2020. Corpus creation for sentiment analysis in code-mixed Tamil-English text. arXiv preprint arXiv:2006.00206, 1 (2020), 19.Google ScholarGoogle Scholar
  13. [13] Dayton Emily. 2021. Amazon Statistics You Should Know: Opportunities to Make the Most of America’s Top Online Marketplace. https://www.bigcommerce.com/blog/amazon-statistics/#amazon-everything-to-everybody. [Online; accessed 30-August-2021].Google ScholarGoogle Scholar
  14. [14] Hakak Nida Manzoor, Mohd Mohsin, Kirmani Mahira, and Mohd Mudasir. 2017. Emotion analysis: A survey. In Proceedings of the International Conference on Computer, Communications and Electronics (COMPTELIX). IEEE, Jaipur, India, 397402.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Hirat Ruchi and Mittal Namita. 2015. A survey on emotion detection techniques using text in blogposts. International Bulletin of Mathematical Research 2, 1 (2015), 180187.Google ScholarGoogle Scholar
  16. [16] Jain Vinay Kumar, Kumar Shishir, and Fernandes Steven Lawrence. 2017. Extraction of emotions from multilingual text using intelligent text processing and computational linguistics. Journal of Computational Science 21 (2017), 316326.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Kanwal Safia, Malik Kamran, Shahzad Khurram, Aslam Faisal, and Nawaz Zubair. 2019. Urdu named entity recognition: Corpus generation and deep learning applications. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 1 (2019), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Khan Muhammad Moin, Shahzad Khurram, and Malik Muhammad Kamran. 2021. Hate speech detection in Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 20, 1 (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Lee John. 2019. An emotion detection system for Cantonese. In Proceedings of the Thirty-Second International Flairs Conference. AAAI, Florida, USA, 237240.Google ScholarGoogle Scholar
  20. [20] Lee Sophia and Wang Zhongqing. 2015. Emotion in code-switching texts: Corpus construction and analysis. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing. ACL, Beijing, China, 9199.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Lee Sophia Yat Mei and Wang Zhongqing. 2015. Multi-view learning for emotion detection in code-switching texts. In Proceedings of the International Conference on Asian Language Processing (IALP). IEEE, Suzhou, China, 9093.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Mehmood Khawar, Essam Daryl, Shafi Kamran, and Malik Muhammad Kamran. 2019. Sentiment analysis for a resource poor language - Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 1 (2019), 115.Google ScholarGoogle Scholar
  23. [23] Mehmood Khawar, Essam Daryl, Shafi Kamran, and Malik Muhammad Kamran. 2020. An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis. Information Processing & Management 57, 6 (2020), 102368.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Microsoft. 2015. Emotion Detection and Recognition from Text. https://devblogs.microsoft.com/cse/2015/11/29/emotion-detection-and-recognition-from-text-using-deep-learning/.Google ScholarGoogle Scholar
  25. [25] Microsoft. 2015. Emotion Recognition. https://sightcorp.com/knowledge-base/emotion-recognition/.Google ScholarGoogle Scholar
  26. [26] Parveen Farzana, Jaafar Noor Ismawati, and Ainin Sulaiman. 2015. Social media usage and organizational performance: Reflections of Malaysian social media managers. Telematics and Informatics 32, 1 (2015), 6778.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Pennington Jeffrey, Socher Richard, and Manning Christopher D.. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, Doha, Qatar, 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Rafae Abdul, Qayyum Abdul, Moeenuddin Muhammad, Karim Asim, Sajjad Hassan, and Kamiran Faisal. 2015. An unsupervised method for discovering lexical variations in Roman Urdu informal text. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL, ACL, Lisbon, Portugal, 823828.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Ran Li, Zheng Lin, Hailun Lin, Weiping Wang, and Dan Meng. 2018. Text emotion analysis: A survey. Journal of Computer Research and Development 55, 1 (2018), 30.Google ScholarGoogle Scholar
  30. [30] Rasheed Imran, Banka Haider, Khan Hamaid, and Daud Ali. 2021. Building a text collection for Urdu information retrieval. ETRI Journal 43, 5 (2021), 856868.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Saeed Hafiz Hassaan, Ashraf Muhammad Haseeb, Kamiran Faisal, Karim Asim, and Calders Toon. 2021. Roman Urdu toxic comment classification. Language Resources and Evaluation 55, 4 (2021), 971996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Safdar Zanab, Bajwa Ruqia Safdar, Hussain Shafiq, Abdullah Haslinda Binti, Safdar Kalsoom, and Draz Umar. 2020. The role of Roman Urdu in multilingual information retrieval: A regional study. The Journal of Academic Librarianship 46, 6 (2020), 102258.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Sailunaz Kashfia, Dhaliwal Manmeet, Rokne Jon, and Alhajj Reda. 2018. Emotion detection from text and speech: A survey. Social Network Analysis and Mining 8, 1 (2018), 126.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Sasidhar T. Tulasi, Premjith B., and Soman K. P.. 2020. Emotion detection in Hinglish (Hindi+ English) code-mixed social media text. Procedia Computer Science 171 (2020), 13461352.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Sharma Shashank, Srinivas PYKL, and Balabantaray R.. 2016. Emotion detection using online machine learning method and TLBO on mixed script. In Proceedings of the Language Resources and Evaluation Conference. LREC, Portorož, Slovenia, 4751.Google ScholarGoogle Scholar
  36. [36] Singh Divyansh. 2021. Detection of emotions in Hindi-English code mixed text data. arXiv preprint arXiv:2105.09226, 1 (2021), 13.Google ScholarGoogle Scholar
  37. [37] Smith Andrew N., Fischer Eileen, and Yongjian Chen. 2012. How does brand-related user-generated content differ across YouTube, Facebook, and Twitter? Journal of Interactive Marketing 26, 2 (2012), 102113.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Statista. 2022. Global Social Networks Ranked by Number of Users. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/.Google ScholarGoogle Scholar
  39. [39] Stymne Sara et al. 2020. Evaluating word embeddings for Indonesian–English code-mixed text based on synthetic data. In Proceedings of the 4th Workshop on Computational Approaches to Code Switching. ACL, Marseille, France, 2635.Google ScholarGoogle Scholar
  40. [40] Tan Kathleen Swee Neo, Lim Tong Ming, and Lim Yee Mei. 2020. Emotion analysis using self-training on Malaysian code-mixed Twitter data. In International Conferences ICT, Society, and Human Beings 2020; Connected Smart Cities 2020; and Web Based Communities and Social Media 2020. IADIS, Lisbon, Portugal, 181188.Google ScholarGoogle Scholar
  41. [41] Times Global. 2019. Population Rank. https://www.globaltimes.cn/page/202105/1223127.shtml. [Online; accessed 11-May-2021].Google ScholarGoogle Scholar
  42. [42] Tripathi Vaibhav, Joshi Aditya, and Bhattacharyya Pushpak. 2018. Emotion analysis from text: A survey. International Journal of Advance Research in Science and Engineering 7, 1 (2018), 770777.Google ScholarGoogle Scholar
  43. [43] Vijay Deepanshu, Bohra Aditya, Singh Vinay, Akhtar Syed Sarfaraz, and Shrivastava Manish. 2018. Corpus creation and emotion prediction for Hindi-English code-mixed social media text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. ACL, Louisiana, USA, 128135.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Wadhawan Anshul and Aggarwal Akshita. 2021. Towards emotion recognition in Hindi-English code-mixed data: A transformer based approach. arXiv preprint arXiv:2102.09943, 1 (2021), 18.Google ScholarGoogle Scholar
  45. [45] Wang Zhongqing, Lee Sophia, Li Shoushan, and Zhou Guodong. 2015. Emotion detection in code-switching texts via bilingual and sentimental information. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. ACL, Beijing, China, 763768.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Wang Zhongqing, Lee Sophia Yat Mei, Li Shoushan, and Zhou Guodong. 2016. Emotion analysis in code-switching text with joint factor graph model. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 3 (2016), 469480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Wang Zhongqing, Li Shoushan, Wu Fan, Sun Qingying, and Zhou Guodong. 2018. Overview of NLPCC 2018 shared task 1: Emotion detection in code-switching text. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Hohhot, China, 429433.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Wang Zhongqing, Zhang Yue, Lee Sophia, Li Shoushan, and Zhou Guodong. 2016. A bilingual attention network for code-switched emotion prediction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. ACL, Osaka, Japan, 16241634.Google ScholarGoogle Scholar
  49. [49] Wu Chung-Hsien, Chuang Ze-Jing, and Lin Yu-Chung. 2006. Emotion recognition from text using semantic labels and separable mixture models. ACM Transactions on Asian Language Information Processing (TALIP) 5, 2 (2006), 165183.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Yadollahi Ali, Shahraki Ameneh Gholipour, and Zaiane Osmar R.. 2017. Current state of text sentiment analysis from opinion to emotion mining. ACM Computing Surveys (CSUR) 50, 2 (2017), 133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Yue Tianchi, Chen Chen, Zhang Shaowu, Lin Hongfei, and Yang Liang. 2018. Ensemble of neural networks with sentiment words translation for code-switching emotion detection. In Proceedings of the International Conference on Natural Language Processing and Chinese Computing. Springer, Hohhot, China, 411419.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Zhang Xinghua, Zhang Chunyue, and Shi Huaxing. 2018. Ensemble of binary classification for the emotion detection in code-switching text. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 178189.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Emotion Detection in Code-Mixed Roman Urdu - English Text

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
        February 2023
        624 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3572719
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 30 March 2023
        • Online AM: 5 August 2022
        • Accepted: 21 July 2022
        • Revised: 9 July 2022
        • Received: 13 March 2022
        Published in tallip Volume 22, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)336
        • Downloads (Last 6 weeks)33

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!