Abstract
Recently, real-time affect-awareness has been applied in several commercial systems, such as dialogue systems and computer games. Real-time recognition of affective states, however, requires the application of costly feature extraction methods and/or labor-intensive annotation of large datasets, especially in the case of Asian languages where large annotated datasets are seldom available. To improve recognition accuracy, we propose the use of cognitive context in the form of “emotion-sensitive” intentions. Intentions are often represented through dialogue acts and, as an emotion-sensitive model of dialogue acts, a tagset of interpersonal-relations-directing interpersonal acts (the IA model) is proposed. The model's adequacy is assessed using a sentiment classification task in comparison with two well-known dialogue act models, the SWBD-DAMSL and the DIT++. For the assessment, five Japanese in-game dialogues were annotated with labels of sentiments and the tags of all three dialogue act models which were used to enhance a baseline sentiment classifier system. The adequacy of the IA tagset is demonstrated by a 9% improvement to the baseline sentiment classifier's recognition accuracy, outperforming the other two models by more than 5%.
- O. Abdel-Hamid, L. Deng, and D. Yu. 2013. Exploring convolutional neural network structures and optimization techniques for speech recognition. In Proceedings of 14th Annual Conference of the International Speech Communication Association (INTERSPEECH'13).Google Scholar
- J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke. 2002. Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In Proceedings of the 7th International Conference on Spoken Language Processing (INTERSPEECH'02).Google Scholar
- Y. Arimoto and H. Mori. 2017. Emotion category mapping to emotional space by cross-corpus emotion labeling. In Proceedings of the International Conference on Situated Interaction (INTERSPEECH'17).Google Scholar
- A. Batliner, K. Fischer, R. Huber, J. Spilker, and E. Noth. 2003. How to find trouble in communication. Speech Communication 40, 1 117--143. Google Scholar
Digital Library
- P. Brown and S. C. Levinson. 1987. Politeness: Some Universals in Language Usage (Vol. 4), Cambridge University Press, Cambridge.Google Scholar
Cross Ref
- H. Bunt. 2009. The DIT++ taxonomy for functional dialogue markup. In Proceedings of AAMAS 2009 Work. 13--24.Google Scholar
- H. Bunt. 2011. Multifunctionality in dialogue. Comput. Speech Lang. 25 222--245. Google Scholar
Digital Library
- H. Bunt, J. Alexandersson, J. Choe, A. C. Fang, K. Hasida, V. Petukhova, A. Popescu-Belis, and D. Traum. 2012. ISO 24617-2: A semantically-based standard for dialogue annotation. In Proceedings of LREC 2012. 430--437.Google Scholar
- H. Bunt, V. Petukhova, D. Traum, and J. Alexandersson. 2017. Dialogue Act Annotation with the ISO 24617-2 Standard. Multimodal Interaction with W3C Standards: Towards Natural User Interfaces to Everything, Deborah Dahl (Ed.). Springer, Berlin, 109--135.Google Scholar
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. ArXiv1--9.Google Scholar
- D. Duncan, G. Shine, and C. English. 2016. Facial Emotion Recognition in Real Time. Report, Stanford.Google Scholar
- F. Eyben, M. Wolmiller, and B. Schuller. 2009. OpenEAR—Introducing the Munich open-source emotion and affect recognition toolkit. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction. IEEE, 1--6.Google Scholar
- H. M. Fayek, M. Lech, and L. Cavedon. 2015. Towards real-time speech emotion recognition using deep neural networks. In Proceedings of the Conference on Signal Processing and Communication Systems (ICSPCS). IEEE, 1--5.Google Scholar
- N. H. Frijda. 1987. Emotion, cognitive structure, and action tendency. Cogn. Emot. 1 115--143.Google Scholar
- P. L. Ihasz, T. H. Van, and V. V. Kryssanov. 2015. A computational model for conversational Japanese. In Proceedings of 2015 International Conference on Culture and Computing. 64--71. Google Scholar
Digital Library
- D. Jurafsky, E. Shriberg, and D. Biasca. 1997. Switchboard SWBD-DAMSL Shallow Discourse-Function Annotation (Coders Manual, Draft 13). Technical Report 97-02. University of Colorado, Institute of Cognitive Science, Colorado.Google Scholar
- D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. ArXiv Preprint ArXiv:1412.6980.Google Scholar
- C. M. Lee and S. S. Narayanan. 2005. Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing 13, 2, 293--303.Google Scholar
Cross Ref
- J. Liscombe, G. Riccardi, and D. Hakkani-Tur. 2005. Using context to improve emotion detection in spoken dialog systems. In Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH'05).Google Scholar
- M. Mateas and A. Stern. 2005. Structuring content in the façade interactive drama architecture. In Proceedings of the 1st Artificial Intelligence and Interactive Digital Entertainment Conference. 93--98. Google Scholar
Digital Library
- Y. Matsumoto. 1988. Reexamination of the universality of face: Politeness phenomena in Japanese. Journal of Pragmatics 12, 4 (1988), 403--426.Google Scholar
Cross Ref
- M. Obaid, C. Han, and M. Billinghurst. 2008. Feed the fish: An affect-aware game. In Proceedings of the 5th Australasian Conference on Interactive Entertainment. ACM. Google Scholar
Digital Library
- D. W. Opitz and R. Maclin. 1999. Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Resources 11, 169--198. Google Scholar
Digital Library
- J. Pennington, R. Socher, and C. D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods of Natural Language Processes. 532--543.Google Scholar
- S. Planet and I. Iriondo. 2012. Comparison between decision-level and feature-level fusion of acoustic and linguistic features for spontaneous emotion recognition. In Proceedings of the 7th Iberian Conference on Information Systems and Technologies (CISTI). IEEE, 1--6.Google Scholar
- R. Plutchik. 2001. The nature of emotions. American Scientist 89, 4, 344--350.Google Scholar
Cross Ref
- A. Popescu-Belis. 2008. Dimensionality of dialogue act tagsets: An empirical analysis of large corpora. Language Res. Eval. 42, 99--107.Google Scholar
Cross Ref
- J. A. Russel. 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39, 6, 1161.Google Scholar
- M. Szwoch and W. Szwoch. 2014. Emotion recognition for affect aware video games. Image Processing 8 Communications Challenges 6, 313, 227.Google Scholar
- L. Tian, J. D. Moore, and C. Lai. 2015. Emotion recognition in spontaneous and acted dialogues. In Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction. IEEE, 698--704. Google Scholar
Digital Library
- T. Vogt, E. Andre, and N. Bee. 2008. Emovoice—A framework for online recognition of emotions from voice. In Proceedings of the 4th Tutorial and Research Workshop on Perception in Multimodal Dialogue Systems. IEEE, 188--199. Google Scholar
Digital Library
- Wikimedia Project Editors. 2017. Wikimedia database dump of the Japanese Wikipedia on July 20, 2016, https://archive.org/details/jawiki-20160720, Last accessed: 2017/09/04.Google Scholar
- H. Yoon, S. Park, Y. K. Lee, and J. H. Jang. 2013. Emotion recognition of serious game players using a simple brain computer interface. In Proceedings of ICT Convergence (ICTC). IEEE, 783--786.Google Scholar
Index Terms
A Supplementary Feature Set for Sentiment Analysis in Japanese Dialogues
Recommendations
Feature-guided Multimodal Sentiment Analysis towards Industry 4.0
Highlights- Advanced and efficient image-text multimodal fusion approach.
- Clever use of ...
AbstarctCombining Artificial Intelligence (AI) to process rich media information has become an important part of Industry 4.0. Sentiment recognition in AI aims to analyze user emotions contained in rich media to facilitate service enhancement. ...
Graphical abstractDisplay Omitted
Exploiting emotions to disambiguate dialogue acts
IUI '04: Proceedings of the 9th international conference on Intelligent user interfacesThis paper describes an attempt to reveal the user's intention from dialogue acts, thereby improving the effectiveness of natural interfaces to pedagogical agents. It focuses on cases where the intention is unclear from the dialogue context or utterance ...
Interpretation and generation of dialogue with multidimensional context models
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issuesThis paper presents a context-based approach to the analysis and computational modeling of communicative behaviour in dialogue. This approach, known as Dynamic Interpretation Theory (DIT), claims that dialogue behaviour is multifunctional, i.e. ...






Comments