Abstract
An important aspect of developing dialogue agents involves endowing a conversation system with emotion perception and interaction. Most existing emotion dialogue models lack the adaptability and extensibility of different scenes because of their limitation to require a specified emotion category or their reliance on a fixed emotional dictionary. To overcome these limitations, we propose a neural conversation generation with auxiliary emotional supervised model (nCG-ESM) comprising a sequence-to-sequence (Seq2Seq) generation model and an emotional classifier used as an auxiliary model. The emotional classifier was trained to predict the emotion distributions of the dialogues, which were then used as emotion supervised signals to guide the generation model to generate diverse emotional responses. The proposed nCG-ESM is flexible enough to generate responses with emotional diversity, including specified or unspecified emotions, which can be adapted and extended to different scenarios. We conducted extensive experiments on the popular dataset of Weibo post--response pairs. Experimental results showed that the proposed model was capable of producing more diverse, appropriate, and emotionally rich responses, yielding substantial gains in diversity scores and human evaluations.
- Elisabeth André, Matthias Rehm, Wolfgang Minker, and Dirk Bühler. 2004. Endowing spoken language dialogue systems with emotional intelligence. In Proceedings of the Affective Dialogue Systems, Tutorial, and Research Workshop, (ADS’04). 178--187.Google Scholar
Cross Ref
- Nabiha Asghar, Pascal Poupart, Jesse Hoey, Xin Jiang, and Lili Mou. 2017. Affective neural response generation. In Proceedings of the 40th European Conference on IR, Advances in Information Retrieval (ECIR'2018). 154--166.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). arxiv:1409.0473Google Scholar
- Dan Bohus and Alexander I. Rudnicky. 2005. A principled approach for rejection threshold optimization in spoken dialog systems. In Proceedings of the 9th European Conference on Speech Communication and Technology (INTERSPEECH’05). 2781--2784.Google Scholar
- Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378.Google Scholar
Cross Ref
- Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, and Michel Galley. 2017. A knowledge-grounded neural conversation model. CoRR abs/1702.01932 (2017). arxiv:1702.01932Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780. Google Scholar
Digital Library
- Zongcheng Ji, Zhengdong Lu, and Hang Li. 2014. An information retrieval approach to short text conversation. CoRR abs/1408.6988 (2014). arxiv:1408.6988Google Scholar
- Dacher Keltner and Jonathan Haidt. 2001. Social Functions of Emotions in Emotions: Current Issues and Future Directions. Guilford Press, 192--213.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014). arxiv:1412.6980Google Scholar
- Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A Diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’16). 110--119.Google Scholar
Cross Ref
- Jiwei Li, Will Monroe, and Dan Jurafsky. 2016b. A simple, fast diverse decoding algorithm for neural generation. CoRR abs/1611.08562 (2016). arxiv:1611.08562Google Scholar
- Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016c. Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 1192--1202.Google Scholar
Cross Ref
- Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky. 2017. Adversarial learning for neural dialogue generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 2157--2169.Google Scholar
Cross Ref
- Zhouhan Lin, Minwei Feng, Cícero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. 2017. A structured self-attentive sentence embedding. CoRR abs/1703.03130 (2017). arxiv:1703.03130Google Scholar
- Tomas Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH’10). 1045--1048.Google Scholar
Cross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 27th Annual Conference on Advances in Neural Information Processing Systems. 3111--3119. Google Scholar
Digital Library
- Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, and Zhi Jin. 2016. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In Proceedings of the 26th International Conference on Computational Linguistics (COLING'16). 3349--3358.Google Scholar
- Yehong Peng, Yizhen Fang, Zhiwen Xie, and Guangyou Zhou. 2019. Topic-enhanced emotional conversation generation with attention mechanism. Knowl.-Based Syst. 163 (2019), 429--437.Google Scholar
Cross Ref
- Michal Ptaszynski, Pawel Dybala, Wenhan Shi, Rafal Rzepka, and Kenji Araki. 2009. Towards context aware emotional intelligence in machines: Computing contextual appropriateness of affective states. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). 1469--1474. Google Scholar
Digital Library
- Alan Ritter, Colin Cherry, and William B. Dolan. 2011. Data-driven response generation in social media. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP’11). 583--593. Google Scholar
Digital Library
- Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2015. Hierarchical neural network generative models for movie dialogues. CoRR abs/1507.04808 (2015). arxiv:1507.04808Google Scholar
- Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. 3776--3784. Google Scholar
Digital Library
- Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 3295--3301. Google Scholar
Digital Library
- Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15). 1577--1586.Google Scholar
- Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, and Ray Kurzweil. 2017. Generating long and diverse responses with neural conversation models. CoRR abs/1701.03185 (2017). arxiv:1701.03185Google Scholar
- Marcin Skowron. 2009. Affect listeners: Acquisition of affective states by means of conversational systems. In Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony. 169--181. Google Scholar
Digital Library
- Marcin Skowron, Stefan Rank, Mathias Theunis, and Julian Sienkiewicz. 2011. The good, the bad and the neutral: Affective profile in dialog system-user communication. In Proceedings of the 4th International Conference of Affective Computing and Intelligent Interaction (ACII’11). 337--346. Google Scholar
Digital Library
- Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, and Bill Dolan. 2015. A neural network approach to context-sensitive generation of conversational responses. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT’15). 196--205.Google Scholar
Cross Ref
- Xiao Sun, Xiaoqi Peng, and Shuai Ding. 2017. Emotional human-machine conversation generation based on long short-term memory. Cogn. Comput. 10, 3 (2017), 389--397.Google Scholar
Cross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the Annual Conference on Neural Information Processing Systems. 3104--3112. Google Scholar
Digital Library
- Chongyang Tao, Lili Mou, Dongyan Zhao, and Rui Yan. 2017. RUBER: An unsupervised method for automatic evaluation of open-domain dialog systems. CoRR abs/1701.03079 (2017). arxiv:1701.03079Google Scholar
- Zhiliang Tian, Rui Yan, Lili Mou, Yiping Song, Yansong Feng, and Dongyan Zhao. 2017. How to make context more useful? an empirical study on context-aware neural conversational models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17). 231--236.Google Scholar
Cross Ref
- Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David J. Crandall, and Dhruv Batra. 2016. Diverse beam search: Decoding diverse solutions from neural sequence models. CoRR abs/1610.02424 (2016). arxiv:1610.02424Google Scholar
- Hao Wang, Zhengdong Lu, Hang Li, and Enhong Chen. 2013. A dataset for research on short-text conversations. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 935--945.Google Scholar
- Jason D. Williams and Steve J. Young. 2007. Partially observable markov decision processes for spoken dialog systems. Comput. Speech Lang. 21, 2 (2007), 393--422. Google Scholar
Digital Library
- Zhiwen Xie, Zhao Zeng, Guangyou Zhou, and Weijun Wang. 2017. Topic enhanced deep structured semantic models for knowledge base question answering. Inf. Sci. 60, 11 (2017), 110103:1--110103:15.Google Scholar
- Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence. 3351--3357. Google Scholar
Digital Library
- Rui Yan, Dongyan Zhao, and Weinan E. 2017. Joint learning of response ranking and next utterance suggestion in human-computer conversation system. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 685--694. Google Scholar
Digital Library
- Rui Zhang, Zhenyu Wang, and Dongcheng Mai. 2017. Building emotional conversation systems using multi-task seq2seq learning. In Proceedings of the 6th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC’17). 612--621.Google Scholar
- Guangyou Zhou, Tingting He, Jun Zhao, and Po Hu. 2015. Learning continuous word embedding with metadata for question retrieval in community question answering. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15). 250--259. http://aclweb.org/anthology/P/P15/P15-1025.pdf.Google Scholar
- Guangyou Zhou and Jimmy Xiangji Huang. 2017. Modeling and learning distributed word representation with metadata for question retrieval. IEEE Trans. Knowl. Data Eng. 29, 6 (2017), 1226--1239. Google Scholar
Digital Library
- Guangyou Zhou, Zhiwen Xie, Jimmy Xiangji Huang, and Tingting He. 2016b. Bi-transferring deep neural networks for domain adaptation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16).Google Scholar
Cross Ref
- Hao Zhou, Minlie Huang, Tianyang Zhang, Xiaoyan Zhu, and Bing Liu. 2017. Emotional chatting machine: Emotional conversation generation with internal and external memory. CoRR abs/1704.01074 (2017). arxiv:1704.01074Google Scholar
- Xiangyang Zhou, Daxiang Dong, Hua Wu, Shiqi Zhao, Dianhai Yu, Hao Tian, Xuan Liu, and Rui Yan. 2016a. Multi-view response selection for human-computer conversation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 372--381.Google Scholar
Cross Ref
- Yimeng Zhuang, Xianliang Wang, Han Zhang, Jinghui Xie, and Xuan Zhu. 2017. An ensemble approach to conversation generation. In Proceedings of the 6th CCF International Conference on Natural Language Processing and Chinese Computing (NLPCC’17). 51--62.Google Scholar
Index Terms
(auto-classified)Neural Conversation Generation with Auxiliary Emotional Supervised Models
Recommendations
Emotional Conversation Generation Based on a Bayesian Deep Neural Network
The field of conversation generation using neural networks has attracted increasing attention from researchers for several years. However, traditional neural language models tend to generate a generic reply with poor semantic logic and no emotion. This ...
Emotion-infused deep neural network for emotionally resonant conversation
AbstractThe widespread development of conversational agents (chatbots) has enabled us to communicate and collaborate with different forms and functions of robots using natural language, thus facilitating a closer relationship between humans ...
Highlights- We propose an integrated framework to improve current chatbots with natural emotions. Unlike previous work, we intend to distinguish fine emotion difference ...
Emotional conversation generation with heterogeneous graph neural network
AbstractThe successful emotional conversation system depends on sufficient perception and appropriate expression of emotions. In a real-life conversation, humans firstly instinctively perceive emotions from multi-source information, including ...






Comments