Abstract
Emotion detection is a widely studied topic in natural language processing due to its significance in a number of application areas. A plethora of studies have been conducted on emotion detection in European as well as Asian languages. However, a large majority of these studies have been conducted in monolingual settings, whereas little attention has been paid to emotion detection in code-mixed text. Specifically, merely one study has been conducted on emotion detection in Roman Urdu (RU) and English (EN) code-mixed text despite the fact that such text is widely used in social media platforms. A careful examination of the existing study has revealed several issues which justify that this area requires attention of researchers. For instance, more than 37% of the messages in the contemporary corpus are monolingual sentences representing that a purely code-mixed emotion analysis corpus is non-existent. To that end, this study has scrapped 400,000 sentences from three social media platforms to identify 20,000 RU-EN code-mixed sentences. Subsequently, an iterative approach is employed to develop emotion detection guidelines. These guidelines have been used to develop a large RU-EN emotion detection (RU-EN-Emotion) corpus in which 20,000 sentences are annotated as Neutral or Emotion-sentence. The sentences having emotions are further annotated with the respective emotions. Subsequently, 102 experiments are performed to evaluate the effectiveness of six classical machine learning techniques and six deep learning techniques. The results show, (a) CNN is the most effective technique when used with GloVe embeddings, and (b) our developed RU-EN-Emotion corpus is more useful than the contemporary corpus, as it employs a two-level classification approach.
- [1] . 2020. DZDC12: A new multipurpose parallel Algerian Arabizi–French code-switched corpus. Language Resources and Evaluation 54, 2 (2020), 419–455.Google Scholar
Digital Library
- [2] . 2020. Text-based emotion detection: Advances, challenges, and opportunities. Engineering Reports 2, 7 (2020), e12189.Google Scholar
Cross Ref
- [3] . 2020. Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access 8 (2020), 91213–91226.Google Scholar
- [4] . 2020. A survey of state-of-the-art approaches for emotion recognition in text. Knowledge & Information Systems 62, 8 (2020), 2937–2987.Google Scholar
Digital Library
- [5] . 2022. Multi-label emotion classification on code-mixed text: Data and methods. IEEE Access 10 (2022), 8779–8789.Google Scholar
Cross Ref
- [6] . 2018. Sentiment analysis and spam detection in short informal text using learning classifier systems. Soft Computing 22, 21 (2018), 7281–7291.Google Scholar
Digital Library
- [7] . 2019. Corpus for emotion detection on Roman Urdu. In 22nd International Multitopic Conference (INMIC). IEEE, 1–6.Google Scholar
Cross Ref
- [8] . 2017. Roman-txt: Forms and functions of Roman Urdu texting. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, Vienna, Austria, 1–9.Google Scholar
Digital Library
- [9] . 2016. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques. Journal of King Saud University-Computer and Information Sciences 28, 3 (2016), 330–344.Google Scholar
Digital Library
- [10] . 2020. Candidates’ use of informal communication on social media reduces credibility and support: Examining the consequences of expectancy violations. Communication Research Reports 37, 3 (2020), 87–98.Google Scholar
Cross Ref
- [11] . 2020. A survey of emotion analysis in text based on deep learning. In Proceedings of the 8th International Conference on Smart City and Informatization (iSCI). IEEE, Guangzhou, China, 81–88.Google Scholar
Cross Ref
- [12] . 2020. Corpus creation for sentiment analysis in code-mixed Tamil-English text. arXiv preprint arXiv:2006.00206, 1 (2020), 1–9.Google Scholar
- [13] . 2021. Amazon Statistics You Should Know: Opportunities to Make the Most of America’s Top Online Marketplace. https://www.bigcommerce.com/blog/amazon-statistics/#amazon-everything-to-everybody.
[Online; accessed 30-August-2021]. Google Scholar - [14] . 2017. Emotion analysis: A survey. In Proceedings of the International Conference on Computer, Communications and Electronics (COMPTELIX). IEEE, Jaipur, India, 397–402.Google Scholar
Cross Ref
- [15] . 2015. A survey on emotion detection techniques using text in blogposts. International Bulletin of Mathematical Research 2, 1 (2015), 180–187.Google Scholar
- [16] . 2017. Extraction of emotions from multilingual text using intelligent text processing and computational linguistics. Journal of Computational Science 21 (2017), 316–326.Google Scholar
Cross Ref
- [17] . 2019. Urdu named entity recognition: Corpus generation and deep learning applications. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 1 (2019), 1–13.Google Scholar
Digital Library
- [18] . 2021. Hate speech detection in Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 20, 1 (2021), 1–19.Google Scholar
Digital Library
- [19] . 2019. An emotion detection system for Cantonese. In Proceedings of the Thirty-Second International Flairs Conference. AAAI, Florida, USA, 237–240.Google Scholar
- [20] . 2015. Emotion in code-switching texts: Corpus construction and analysis. In Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing. ACL, Beijing, China, 91–99.Google Scholar
Cross Ref
- [21] . 2015. Multi-view learning for emotion detection in code-switching texts. In Proceedings of the International Conference on Asian Language Processing (IALP). IEEE, Suzhou, China, 90–93.Google Scholar
Cross Ref
- [22] . 2019. Sentiment analysis for a resource poor language - Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) 19, 1 (2019), 1–15.Google Scholar
- [23] . 2020. An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis. Information Processing & Management 57, 6 (2020), 102368.Google Scholar
Cross Ref
- [24] . 2015. Emotion Detection and Recognition from Text. https://devblogs.microsoft.com/cse/2015/11/29/emotion-detection-and-recognition-from-text-using-deep-learning/.Google Scholar
- [25] . 2015. Emotion Recognition. https://sightcorp.com/knowledge-base/emotion-recognition/.Google Scholar
- [26] . 2015. Social media usage and organizational performance: Reflections of Malaysian social media managers. Telematics and Informatics 32, 1 (2015), 67–78.Google Scholar
Cross Ref
- [27] . 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). ACL, Doha, Qatar, 1532–1543.Google Scholar
Cross Ref
- [28] . 2015. An unsupervised method for discovering lexical variations in Roman Urdu informal text. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL, ACL, Lisbon, Portugal, 823–828.Google Scholar
Cross Ref
- [29] . 2018. Text emotion analysis: A survey. Journal of Computer Research and Development 55, 1 (2018), 30.Google Scholar
- [30] . 2021. Building a text collection for Urdu information retrieval. ETRI Journal 43, 5 (2021), 856–868.Google Scholar
Cross Ref
- [31] . 2021. Roman Urdu toxic comment classification. Language Resources and Evaluation 55, 4 (2021), 971–996.Google Scholar
Digital Library
- [32] . 2020. The role of Roman Urdu in multilingual information retrieval: A regional study. The Journal of Academic Librarianship 46, 6 (2020), 102258.Google Scholar
Cross Ref
- [33] . 2018. Emotion detection from text and speech: A survey. Social Network Analysis and Mining 8, 1 (2018), 1–26.Google Scholar
Cross Ref
- [34] . 2020. Emotion detection in Hinglish (Hindi+ English) code-mixed social media text. Procedia Computer Science 171 (2020), 1346–1352.Google Scholar
Cross Ref
- [35] . 2016. Emotion detection using online machine learning method and TLBO on mixed script. In Proceedings of the Language Resources and Evaluation Conference. LREC, Portorož, Slovenia, 47–51.Google Scholar
- [36] . 2021. Detection of emotions in Hindi-English code mixed text data. arXiv preprint arXiv:2105.09226, 1 (2021), 1–3.Google Scholar
- [37] . 2012. How does brand-related user-generated content differ across YouTube, Facebook, and Twitter? Journal of Interactive Marketing 26, 2 (2012), 102–113.Google Scholar
Cross Ref
- [38] . 2022. Global Social Networks Ranked by Number of Users. https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/.Google Scholar
- [39] . 2020. Evaluating word embeddings for Indonesian–English code-mixed text based on synthetic data. In Proceedings of the 4th Workshop on Computational Approaches to Code Switching. ACL, Marseille, France, 26–35.Google Scholar
- [40] . 2020. Emotion analysis using self-training on Malaysian code-mixed Twitter data. In International Conferences ICT, Society, and Human Beings 2020; Connected Smart Cities 2020; and Web Based Communities and Social Media 2020. IADIS, Lisbon, Portugal, 181–188.Google Scholar
- [41] . 2019. Population Rank. https://www.globaltimes.cn/page/202105/1223127.shtml.
[Online; accessed 11-May-2021]. Google Scholar - [42] . 2018. Emotion analysis from text: A survey. International Journal of Advance Research in Science and Engineering 7, 1 (2018), 770–777.Google Scholar
- [43] . 2018. Corpus creation and emotion prediction for Hindi-English code-mixed social media text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop. ACL, Louisiana, USA, 128–135.Google Scholar
Cross Ref
- [44] . 2021. Towards emotion recognition in Hindi-English code-mixed data: A transformer based approach. arXiv preprint arXiv:2102.09943, 1 (2021), 1–8.Google Scholar
- [45] . 2015. Emotion detection in code-switching texts via bilingual and sentimental information. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. ACL, Beijing, China, 763–768.Google Scholar
Cross Ref
- [46] . 2016. Emotion analysis in code-switching text with joint factor graph model. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 3 (2016), 469–480.Google Scholar
Digital Library
- [47] . 2018. Overview of NLPCC 2018 shared task 1: Emotion detection in code-switching text. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, Hohhot, China, 429–433.Google Scholar
Cross Ref
- [48] . 2016. A bilingual attention network for code-switched emotion prediction. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. ACL, Osaka, Japan, 1624–1634.Google Scholar
- [49] . 2006. Emotion recognition from text using semantic labels and separable mixture models. ACM Transactions on Asian Language Information Processing (TALIP) 5, 2 (2006), 165–183.Google Scholar
Digital Library
- [50] . 2017. Current state of text sentiment analysis from opinion to emotion mining. ACM Computing Surveys (CSUR) 50, 2 (2017), 1–33.Google Scholar
Digital Library
- [51] . 2018. Ensemble of neural networks with sentiment words translation for code-switching emotion detection. In Proceedings of the International Conference on Natural Language Processing and Chinese Computing. Springer, Hohhot, China, 411–419.Google Scholar
Cross Ref
- [52] . 2018. Ensemble of binary classification for the emotion detection in code-switching text. In CCF International Conference on Natural Language Processing and Chinese Computing. Springer, 178–189.Google Scholar
Cross Ref
Index Terms
Emotion Detection in Code-Mixed Roman Urdu - English Text
Recommendations
Context-aware Emotion Detection from Low-resource Urdu Language Using Deep Neural Network
Emotion detection (ED) plays a vital role in determining individual interest in any field. Humans use gestures, facial expressions, and voice pitch and choose words to describe their emotions. Significant work has been done to detect emotions from the ...
Emotion detection in Roman Urdu text using machine learning
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software EngineeringEmotion detection is playing a very important role in our life. People express their emotions in different ways i.e face expression, gestures, speech, and text. This research focuses on detecting emotions from the Roman Urdu text. Previously, A lot of ...
STEMUR: An Automated Word Conflation Algorithm for the Urdu Language
Stemming is a common word conflation method that perceives stems embedded in the words and decreases them to their stem (root) by conflating all the morphologically related terms into a single term, without doing a complete morphological analysis. This ...






Comments