Abstract
Automated sarcasm detection is deemed as a complex natural language processing task and extending it to a morphologically-rich and free-order dominant indigenous Indian language Hindi is another challenge in itself. The scarcity of resources and tools such as annotated corpora, lexicons, dependency parser, Part-of-Speech tagger, and benchmark datasets engorge the linguistic challenges of sarcasm detection in low-resource languages like Hindi. Furthermore, as context incongruity is imperative to detect sarcasm, various linguistic, aural and visual cues can be used to predict target utterance as sarcastic. While pre-trained word embeddings capture the meanings, semantic relationships and different types of contexts in the form of word representations, emojis can also render useful contextual information, analogous to human facial expressions, for gauging sarcasm. Thus, the goal of this research is to demonstrate the use of a hybrid deep learning model trained using two embeddings, namely word and emoji embeddings to detect sarcasm. The model is validated on a Hindi tweets dataset, Sarc-H, manually annotated with sarcastic and non-sarcastic labels. The preliminary results clearly depict the importance of using emojis for sarcasm detection, with our model attaining an accuracy of 97.35% with an F-score of 0.9708. The research validates that automated feature engineering facilitates efficient and repeatable predictive model for detecting sarcasm in indigenous, low-resource languages.
- [1] . 2013. New avenues in opinion mining and sentiment analysis. IEEE Intelligent Systems 28, 2 (2013), 15–21.Google Scholar
Digital Library
- [2] . 2020. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management 57, 1 (2020), 102141.Google Scholar
Digital Library
- [3] . 2021. Contextual semantics using hierarchical attention network for sentiment classification in social internet-of-things. Multimed. Tools Appl. https://doi.org/10.1007/s11042-021-11262-8Google Scholar
- [4] . 2020. Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Applied Soft Computing 91 (2020), 106198.Google Scholar
Cross Ref
- [5] . 2020. How intense are you? Predicting intensities of emotions and sentiments using stacked ensemble [application notes]. IEEE Computational Intelligence Magazine 15, 1 (2020), 64–75.Google Scholar
Digital Library
- [6] . 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 439–448.Google Scholar
- [7] . 2019. Sentiment and sarcasm classification with multitask learning. IEEE Intelligent Systems 34, 3 (2019), 38–43.Google Scholar
Cross Ref
- [8] . 2015. Sarcastic or not: Word embeddings to predict the literal or sarcastic meaning of words. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1003–1012.Google Scholar
Cross Ref
- [9] . 2021. Explainable artificial intelligence for sarcasm detection in dialogues. Wireless Communications and Mobile Computing (2021).Google Scholar
Digital Library
- [10] . 2020. The multifaceted concept of context in sentiment analysis. In Cognitive Informatics and Soft Computing. Springer, Singapore, 413–421.Google Scholar
- [11] . 2016. The nature, function, and value of emojis as contemporary tools of digital interpersonal communication.Google Scholar
- [12] . 2015. Parsing-based sarcasm sentiment recognition in Twitter data. In 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 1373–1380.Google Scholar
Digital Library
- [13] . 2019. Empirical study of shallow and deep learning models for sarcasm detection using context in benchmark datasets. Journal of Ambient Intelligence and Humanized Computing (2019), 1–16.Google Scholar
- [14] . 2010. Semi-supervised recognition of sarcasm in Twitter and Amazon. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. 107–116.Google Scholar
Digital Library
- [15] . 2016. What is India speaking? Exploring the “Hinglish” invasion. Physica A: Statistical Mechanics and its Applications 449 (2016), 375–389.Google Scholar
Cross Ref
- [16] . 2021. Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor Indian language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 20, 5,
Article 90 (September 2021), 13 pages.DOI: https://doi.org/10.1145/3461764Google ScholarDigital Library
- [17] . 2017. Harnessing online news for sarcasm detection in Hindi tweets. In International Conference on Pattern Recognition and Machine Intelligence. Springer, Cham. 679–686.Google Scholar
Cross Ref
- [18] . 2016. Sarcasm detection in Hindi sentences using support vector machine. International Journal 4, 7 (2016), 8–15.Google Scholar
- [19] . 2021. Rumour detection using deep learning and filter-wrapper feature selection in benchmark Twitter dataset. Multimedia Tools and Applications. 1–18.Google Scholar
Digital Library
- [20] . 2020. D-BullyRumbler: A safety rumble strip to resolve online denigration bullying using a hybrid filter-wrapper approach. Multimedia Systems. 1–17.Google Scholar
- [21] . 2020. Deep learning based sentiment classification on user-generated big data. Recent Advances in Computer Science and Communications (Formerly: Recent Patents on Computer Science) 13, 5 (2020), 1047–1056.Google Scholar
Cross Ref
- [22] . 2020. Multi-input integrative learning using deep neural networks and transfer learning for cyberbullying detection in real-time code-mix data. Multimedia Systems. 1–15.Google Scholar
- [23] . 2016. emoji2vec: Learning emoji representations from their description. arXiv preprint arXiv:1609.08359.Google Scholar
- [24] . 2018. Recent advances in convolutional neural networks. Pattern Recognition 1 (2018 May), 77:354–77.Google Scholar
- [25] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- [26] . 2006. “Yeah Right”: Sarcasm recognition for spoken dialogue systems. In Ninth International Conference on Spoken Language Processing.Google Scholar
Cross Ref
- [27] . 2007. Lexical influences on the perception of sarcasm. In Proceedings of the Workshop on Computational Approaches to Figurative Language. 1–4.Google Scholar
Digital Library
- [28] . 2011. Identifying sarcasm in Twitter: A closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 581–586.Google Scholar
- [29] . 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704–714.Google Scholar
- [30] . 2013. The perfect solution for detecting sarcasm in tweets# not.Google Scholar
- [31] . 2015. Harnessing context incongruity for sarcasm detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 757–762.Google Scholar
Cross Ref
- [32] . 2016. Harnessing sequence labeling for sarcasm detection in dialogue from TV series ‘Friends’. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 146–155.Google Scholar
Cross Ref
- [33] . 2020. Sarcasm detection using feature-variant learning models. In Proceedings of ICETIT 2019. Springer, Cham. 683–693.Google Scholar
Cross Ref
- [34] . 2016. Detecting sarcasm in multimodal social platforms. In Proceedings of the 24th ACM International Conference on Multimedia. 1136–1145.Google Scholar
Digital Library
- [35] . 2019. Towards multimodal sarcasm detection (an _Obviously_ perfect paper). arXiv preprint arXiv:1906.01815.Google Scholar
- [36] . 2019. Sarc-m: Sarcasm detection in typo-graphic memes. In International Conference on Advances in Engineering Science Management & Technology (ICAESMT)-2019, Uttaranchal University, Dehradun, India.Google Scholar
Cross Ref
- [37] . 2016. Are word embedding-based features useful for sarcasm detection?. arXiv preprint arXiv:1610.00883.Google Scholar
- [38] . 2016. Fracking sarcasm using neural network. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 161–169.Google Scholar
Cross Ref
- [39] . 2017. Automatic sarcasm detection: A survey. ACM Computing Surveys (CSUR) 50, 5 (2017), 1–22.Google Scholar
Digital Library
- [40] . 2019. Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access 7 (2019), 23319–23328.Google Scholar
Cross Ref
- [41] . 2018. A combined CNN and LSTM model for Arabic sentiment analysis. In International Cross-domain Conference for Machine Learning and Knowledge Extraction. Springer, Cham. 179–191.Google Scholar
Digital Library
- [42] . 2014. Sarcasm detection on Czech and English Twitter. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 213–223.Google Scholar
- [43] . 2014. Sarcasm detection in social media based on imbalanced classification. In International Conference on Web-Age Information Management. Springer, Cham. 459–471.Google Scholar
Cross Ref
- [44] . 2018. Detection of sarcasm and nastiness: New resources for Spanish language. Cognitive Computation 10, 6 (2018), 1135–1151.Google Scholar
Cross Ref
- [45] . 2013. Indonesian social media sentiment analysis with sarcasm detection. In 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 195–198.Google Scholar
Cross Ref
- [46] . 2018. A corpus of English-Hindi code-mixed tweets for sarcasm detection. arXiv preprint arXiv:1805.11869.Google Scholar
- [47] . 2013. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546.Google Scholar
- [48] . 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524.Google Scholar
- [49] . 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In International Conference on Machine Learning. PMLR, 115–123.Google Scholar
Digital Library
- [50] . 2019. Effective dimensionality reduction for word embeddings. In Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). 235–243.Google Scholar
Cross Ref
- [51] . 2018. Sarcasm as a contradiction between a tweet and its temporal facts: A pattern-based approach. International Journal on Natural Language Computing (IJNLC) 7 (2018).Google Scholar
- [52] . 2017. Context-based sarcasm detection in Hindi tweets. In Ninth International Conference on Advances in Pattern Recognition (ICAPR). 2017.Google Scholar
Cross Ref
- [53] . 2021. Performance evaluation of machine learning algorithms for detecting Hindi sarcasm. In 2021 Bharatiya Vaigyanik Evam Audyogik Anusandhan Patrika 29, 1 (2021), 43–48.Google Scholar
Index Terms
Hybrid Deep Learning Model for Sarcasm Detection in Indian Indigenous Language Using Word-Emoji Embeddings
Recommendations
Signaling sarcasm
The use of hashtags such as #sarcasm reduces the further use of linguistic markers of sarcasm in tweets.Hashtags such as #sarcasm appear to be the extralinguistic equivalent of non-verbal expressions in live interaction.Sarcastic hashtags are 90% ...
Lexical Function Identification Using Word Embeddings and Deep Learning
Advances in Soft ComputingAbstractIn this work, we report the results of our experiments on the task of distinguishing the semantics of verb-noun collocations in a Spanish corpus. This semantics was represented by four lexical functions of the Meaning-Text Theory. Each lexical ...
Sentence-Level Sarcasm Detection in English and Filipino Tweets
ICIBE' 18: Proceedings of the 4th International Conference on Industrial and Business EngineeringSarcasm is a special form of sentiment which defines as "a nuanced form of language in which individuals say the opposite of what is implied". In this study, the researchers collected 6,000 Tagalog tweets and 6,000 English tweets from the microblogging ...






Comments