Abstract
The recent booming of artificial intelligence (AI) applications, e.g., affective robots, human-machine interfaces, autonomous vehicles, and so on, has produced a great number of multi-modal records of human communication. Such data often carry latent subjective users’ attitudes and opinions, which provides a practical and feasible path to realize the connection between human emotion and intelligence services. Sentiment and emotion analysis of multi-modal records is of great value to improve the intelligence level of affective services. However, how to find an optimal manner to learn people’s sentiments and emotional representations has been a difficult problem, since both of them involve subtle mind activity. To solve this problem, a lot of approaches have been published, but most of them are insufficient to mine sentiment and emotion, since they have treated sentiment analysis and emotion recognition as two separate tasks. The interaction between them has been neglected, which limits the efficiency of sentiment and emotion representation learning. In this work, emotion is seen as the external expression of sentiment, while sentiment is the essential nature of emotion. We thus argue that they are strongly related to each other where one’s judgment helps the decision of the other. The key challenges are multi-modal fused representation and the interaction between sentiment and emotion. To solve such issues, we design an external knowledge enhanced multi-task representation learning network, termed KAMT. The major elements contain two attention mechanisms, which are inter-modal and inter-task attentions and an external knowledge augmentation layer. The external knowledge augmentation layer is used to extract the vector of the participant’s gender, age, occupation, and of overall color or shape. The main use of inter-modal attention is to capture effective multi-modal fused features. Inter-task attention is designed to model the correlation between sentiment analysis and emotion classification. We perform experiments on three widely used datasets, and the experimental performance proves the effectiveness of the KAMT model.
- [1] . 2018. Sentiment identification in football-specific tweets. IEEE Access 6 (2018), 78609–78621.Google Scholar
Cross Ref
- [2] . 2021. COVID-19 and non-COVID-19 classification using multi-layer fusion from lung ultrasound images. Inf. Fusion 72 (2021), 80–88.Google Scholar
Cross Ref
- [3] . 2021. DravidianMultiModality: A dataset for multi-modal sentiment analysis in tamil and malayalam. arXiv:2106.04853. Retrieved from https://arxiv.org/abs/2106.04853.Google Scholar
- [4] . 2020. Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4351–4360.
DOI: DOI: Google ScholarCross Ref
- [5] . 2021. HEU emotion: A large-scale database for multimodal emotion recognition in the wild. Neural Computing and Applications 33, (2021), 8669–8685.Google Scholar
Digital Library
- [6] . 2020. Cross-subject multimodal emotion recognition based on hybrid fusion. IEEE Access 8 (2020), 168865–168878.Google Scholar
Cross Ref
- [7] 2018. emHealth: towards emotion health through depression prediction and intelligent health recommender system. Mobile Netw Appl 23, 2 (2018), 216–226.Google Scholar
Digital Library
- [8] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the NAACL-HLT 2019: Annual Conference of the North American Chapter of the Association for Computational Linguistics. 4171–4186.Google Scholar
- [9] . 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.Google Scholar
- [10] . 2022. Gated attention fusion network for multimodal sentiment classification. Knowledge-Based Systems 240 (2022), 108107. https://www.sciencedirect.com/science/article/abs/pii/S0950705121011679#:~:text=Gated%20attention%20fusion%20network%20We%20propose%20a%20novel,with%20the%20image%20and%20gives%20them%20higher%20weight.Google Scholar
Digital Library
- [11] . 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv:1909.00512. Retrieved from https://arxiv.org/abs/1909.00512.Google Scholar
- [12] . 2020. Advances in multimodal emotion recognition based on brain–computer interfaces. Brain Sciences 10, 10 (2020), 687.Google Scholar
Cross Ref
- [13] . 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1s (2019), 17 pages.Google Scholar
Digital Library
- [14] . 2018. Emotion-aware connected healthcare big data towards 5G. IEEE Internet of Things Journal 5, 4 (2018), 2399–2406.Google Scholar
Cross Ref
- [15] . 2019. Emotion recognition using deep learning approach from audio–visual emotional big data. Information Fusion 49 (2019), 69–78.Google Scholar
Digital Library
- [16] . 2019. Emotion recognition using secure edge and cloud computing. Information Sciences 504 (2019), 589–601.Google Scholar
Digital Library
- [17] . 2019. Image–text sentiment analysis via deep multimodal attentive fusion. Knowledge-Based Systems 167 (2019), 26–37.Google Scholar
Digital Library
- [18] . 2021. Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM. Multimed Tools Appl. 80, (2021), 13059–13076.Google Scholar
Digital Library
- [19] . 2020. IITK at SemEval-2020 task 8: Unimodal and bimodal sentiment analysis of internet memes. In Proceedings of the 14th Workshop on Semantic Evaluation. 1135–1140.Google Scholar
Cross Ref
- [20] . 2020. Sarcasm detection using multi-head attention based bidirectional LSTM. IEEE Access 8 (2020), 6388–6397.Google Scholar
Cross Ref
- [21] . 2016. Text sentiment analysis based on long short-term memory. In Proceedings of the 2016 First IEEE International Conference on Computer Communication and the Internet. IEEE, 471–475.Google Scholar
Cross Ref
- [22] . 2019. Quantum-inspired multimodal representation. In Proceedings of the10th Italian Information Retrieval Workshop. 1–2.Google Scholar
- [23] . 2021. MetaCOVID: A Siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recognition 113 (2021), 107700.Google Scholar
Cross Ref
- [24] . 2021. Supercomputer supported online deep learning techniques for high throughput EEG prediction. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 2392–2398.Google Scholar
Cross Ref
- [25] . 2015. Audio–Visual Emotion-Aware Cloud Gaming Framework. IEEE Trans Circuits Syst Video Technol. 25, 12 (2015), 2105–2118.Google Scholar
Digital Library
- [26] . 2021. A multi-modal fusion method based on higher-order orthogonal iteration decomposition. Entropy 23, 10 (2021), 1349.Google Scholar
Cross Ref
- [27] . 2021. Multimodal emotion recognition with capsule graph convolutional based representation fusion. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6339–6343.Google Scholar
Cross Ref
- [28] . 2021. What does your smile mean? Jointly detecting multi-modal sarcasm and sentiment using quantum probability. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021. 871–880.Google Scholar
Cross Ref
- [29] . 2020. Privacy-enhanced data fusion for COVID-19 applications in intelligent Internet of medical Things. IEEE Internet of Things J. 8, 21 (2020), 15683–15693.Google Scholar
Cross Ref
- [30] . 2020. M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues. In Proceedings of the AAAI Conference on Artificial Intelligence. 1359–1367.Google Scholar
Cross Ref
- [31] . 2020. Hitachi at SemEval-2020 task 8: Simple but effective modality ensemble for meme emotion recognition. In Proceedings of the 14th Workshop on Semantic Evaluation. 1126–1134.Google Scholar
Cross Ref
- [32] . 2021. EEG-based pathology detection for home health monitoring. IEEE Journal on Selected Areas in Communications 39, 2 (2021), 603–610.Google Scholar
Cross Ref
- [33] . 2019. A hybrid latent space data fusion method for multimodal emotion recognition. IEEE Access 7 (2019), 172948–172964.Google Scholar
Cross Ref
- [34] . 2017. Context-dependent sentiment analysis in user-generated videos. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (volume 1: Long Papers). 873–883.Google Scholar
Cross Ref
- [35] . 2020. A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications 32, (2020), 17309–17320.Google Scholar
Digital Library
- [36] . 2019. EVM-CNN: Real-time contactless heart rate estimation from facial video. IEEE Transactions on Multimedia 21, 7 (2019), 1778–1787.
DOI: DOI: Google ScholarCross Ref
- [37] . 2020. SemEval-2020 task 8: Memotion analysis-the visuo-lingual metaphor!. In Proceedings of the 14th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Barcelona, Spain.Google Scholar
Cross Ref
- [38] . 2016. Role of text pre-processing in twitter sentiment analysis. Procedia Computer Science 89 (2016), 549–554.Google Scholar
Cross Ref
- [39] . 2018. Convolutional neural network with pair-wise pure dependence for sentence classification. In Proceedings of the 2018 International Conference on Artificial Intelligence and Big Data. IEEE, 117–121.Google Scholar
Cross Ref
- [40] . 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105–6114.Google Scholar
- [41] . 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Conference. Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.Google Scholar
Cross Ref
- [42] . 2022. Exploration meets exploitation: Multitask learning for emotion recognition based on discrete and dimensional models. Knowledge-Based Systems 235 (2022), 107598.Google Scholar
Digital Library
- [43] . 2020. UPB at SemEval-2020 task 8: Joint textual and visual modeling in a multi-task learning architecture for memotion analysis. In Proceedings of the 14th Workshop on Semantic Evaluation. 1208–1214.Google Scholar
Cross Ref
- [44] . 2021. Cross-modal context-gated convolution for multi-modal sentiment analysis. Pattern Recognition Letters 146 (2021), 252–259.Google Scholar
Cross Ref
- [45] . 2022. Video sentiment analysis with bimodal information-augmented multi-head attention. Knowledge-Based Systems 235 (2022), 107676.Google Scholar
Digital Library
- [46] . 2017. An emotion recognition system for mobile applications. IEEE Access5 (2017), 2281–2287.Google Scholar
Cross Ref
- [47] . 2020. CM-BERT: Cross-modal BERT for text-audio sentiment analysis. In Proceedings of the 28th ACM International Conference on Multimedia. 521–528.Google Scholar
Digital Library
- [48] . 2021. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. arXiv:2102.04830. Retrieved from https://arxiv.org/abs/2102.04830.Google Scholar
- [49] . 2021. Multi-modal sentiment analysis based on cross-modal context-aware attention. Data Analysis and Knowledge Discovery 5, 4 (2021), 49–59.Google Scholar
- [50] . 2018. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2236–2246.Google Scholar
Cross Ref
- [51] . 2002. Shape-based image retrieval using generic fourier descriptor. Signal Processing: Image Communication 17, 10 (2002), 825–848.Google Scholar
Cross Ref
- [52] . 2018. Sentiment analysis of chinese micro-blog text based on extended sentiment dictionary. Future Generation Computer Systems 81 (2018), 395–403.Google Scholar
Digital Library
- [53] . 2015. Automatic Visual Concept Learning for Social Event Understanding. IEEE Transactions on Multimedia 17, 3 (2015), 346–358.Google Scholar
Digital Library
- [54] . 2021. Aspect-based sentiment analysis for user reviews. Cognitive Computation 13, 5 (2021), 1114–1127.Google Scholar
Cross Ref
- [55] . 2019. Quantum-inspired interactive networks for conversational sentiment analysis. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. 5436–5442.Google Scholar
Cross Ref
- [56] . 2021. Multi-task learning for jointly detecting depression and emotion. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine. IEEE, 3142–3149.Google Scholar
Cross Ref
- [57] . 2021. CFN: A complex-valued fuzzy network for sarcasm detection in conversations. IEEE Transactions on Fuzzy Systems 29, 12 (2021), 3696–3710.Google Scholar
Digital Library
- [58] . 2021. Aspect-based sentiment analysis of user reviews in 5G networks. IEEE Network 35, 4 (2021), 228–233.Google Scholar
Cross Ref
- [59] . 2019. Edge intelligence in the cognitive internet of things: Improving sensitivity and interactivity. IEEE Network 33, 3 (2019), 58–64.Google Scholar
Cross Ref
- [60] . 2018. Emotion-aware multimedia systems security. IEEE Transactions on Multimedia 21, 3 (2018), 617–624.Google Scholar
Cross Ref
- [61] . 2021. Adversarial examples-security threats to COVID-19 deep-learning systems in medical IoT devices. IEEE Internet Things J. 8, 12 (2021), 9603–9610.Google Scholar
Cross Ref
- [62] . 2018. Unsupervised sentiment analysis of twitter posts using density matrix representation. In Proceedings of the European Conference on Information Retrieval. Springer, 316–329.Google Scholar
Cross Ref
- [63] . 2020. A quantum-like multimodal network framework for modeling interaction dynamics in multiparty conversational sentiment analysis. Information Fusion 62 (2020), 14–31.Google Scholar
Cross Ref
- [64] . 2018. A quantum-inspired multimodal sentiment analysis framework. Theoretical Computer Science 752 (2018), 21–40.Google Scholar
Cross Ref
- [65] . 2021. Learning interaction dynamics with an interactive LSTM for conversational sentiment analysis. Neural Networks 133 (2021), 40–56Google Scholar
Cross Ref
- [66] . 2016. Audio-visual emotion recognition using multi-directional regression and Ridgelet transform. Journal on Multimodal User Interfaces 10, 4 (2016), 325–333.Google Scholar
Cross Ref
- [67] . 2021. Environment and speaker related emotion recognition in conversations. In Proceedings of the 2nd International Conference on Computing and Data Science. 1–6.Google Scholar
Digital Library
- [68] . 2019. Sentiment analysis of chinese microblog based on stacked bidirectional LSTM. IEEE Access 7 (2019), 38856–38866.Google Scholar
Cross Ref
- [69] . 2019. Multi-task emotion communication system with dynamic resource allocations. Information Fusion 52 (2019), 167–174.Google Scholar
Digital Library
- [70] M. S. Hossain et al. 2016. Audio-visual emotion recognition using big data towards 5G. Mob. Networks Appl. 21, 5 (2016), 753–763.Google Scholar
Index Terms
Affective Interaction: Attentive Representation Learning for Multi-Modal Sentiment Classification
Recommendations
Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
Advances in Information RetrievalAbstractEmotion is seen as the external expression of sentiment, while sentiment is the essential nature of emotion. They are tightly entangled with each other in that one helps the understanding of the other, leading to a new research topic, i.e., multi-...
Moving From Narrative to Interactive Multi-Modal Sentiment Analysis: A Survey
A growing number of individuals are expressing their opinions and engaging in interactive communication with others through various modalities, including natural language (text), facial gestures (vision), acoustic behaviors (audio), and more. Within the ...
Multi-modal Sentiment Feature Learning Based on Sentiment Signal
ChineseCSCW '17: Proceedings of the 12th Chinese Conference on Computer Supported Cooperative Work and Social ComputingThe multi-modal characteristic of social media content (e.g. texts and images) significantly challenges traditional text-based sentiment analysis approaches, multi-modal sentiment analysis gets great theoretical value for understanding and analysis of ...






Comments