Abstract
Building Virtual Agents capable of carrying out complex queries of the user involving multiple intents of a domain is quite a challenge, because it demands that the agent manages several subtasks simultaneously. This article presents a universal Deep Reinforcement Learning framework that can synthesize dialogue managers capable of working in a task-oriented dialogue system encompassing various intents pertaining to a domain. The conversation between agent and user is broken down into hierarchies, to segregate subtasks pertinent to different intents. The concept of Hierarchical Reinforcement Learning, particularly options, is used to learn policies in different hierarchies that operates in distinct time steps to fulfill the user query successfully. The dialogue manager comprises top-level intent meta-policy to select among subtasks or options and a low-level controller policy to pick primitive actions to communicate with the user to complete the subtask provided to it by the top-level policy in varying intents of a domain. The proposed dialogue management module has been trained in a way such that it can be reused for any language for which it has been developed with little to no supervision. The developed system has been demonstrated for “Air Travel” and “Restaurant” domain in English and Hindi languages. Empirical results determine the robustness and efficacy of the learned dialogue policy as it outperforms several baselines and a state-of-the-art system.
- Iñigo Casanueva, Pawel Budzianowski, Pei-Hao Su, Stefan Ultes, Lina Maria Rojas-Barahona, Bo-Hsiang Tseng, and Milica Gasic. 2018. Feudal reinforcement learning for dialogue management in large domains. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18). 714–719. Retrieved from https://aclanthology.info/papers/N18-2112/n18-2112.Google Scholar
Cross Ref
- Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for joint intent classification and slot filling. Retrieved from https://arXiv:1902.10909.Google Scholar
- Heriberto Cuayáhuitl. 2017. SimpleDS: A simple deep reinforcement learning dialogue system. In Dialogues with Social Robots. Springer, Berlin, 109–118.Google Scholar
- Heriberto Cuayáhuitl, Seunghak Yu et al. 2017. Deep reinforcement learning of dialogue policies with less weight updates. In Proceedings of the18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017.Google Scholar
Cross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- Peter Lajos Ihasz, Mate Kovacs, Ian Piumarta, and Victor V. Kryssanov. 2019. A supplementary feature set for sentiment analysis in japanese dialogues. ACM Trans. Asian Low-Resour. Lang. Info. Process. 18, 4, Article 39 (May 2019), 21 pages. DOI:https://doi.org/10.1145/3310283 Google Scholar
Digital Library
- Harksoo Kim and Jungyun Seo. 2003. Resolution of referring expressions in a Korean multimodal dialogue system. ACM Trans. Asian Low-Resour. Lang. Info. Process. 2, 4 (Dec. 2003), 324–337. DOI:https://doi.org/10.1145/1007551.1007553 Google Scholar
Digital Library
- Michael F. McTear. 2002. Spoken dialogue technology: Enabling the conversational user interface. ACM Comput. Surveys 34, 1 (2002), 90–169. Google Scholar
Digital Library
- Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Yun-Nung Chen, and Kam-Fai Wong. 2018. Adversarial advantage actor-critic model for task-completion dialogue policy learning. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 6149–6153.Google Scholar
Cross Ref
- Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Çelikyilmaz, Sungjin Lee, and Kam-Fai Wong. 2017. Composite task-completion dialogue policy learning via hierarchical deep reinforcement learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 2231–2240. Retrieved from https://aclanthology.info/papers/D17-1237/d17-1237.Google Scholar
Cross Ref
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.Google Scholar
Cross Ref
- Patti J. Price. 1990. Evaluation of spoken language systems: The ATIS domain. In Proceedings of the Speech and Natural Language Workshop. Google Scholar
Digital Library
- Lina Maria Rojas-Barahona, Milica Gasic, Nikola Mrksic, Pei-Hao Su, Stefan Ultes, Tsung-Hsien Wen, Steve J. Young, and David Van Dyke. 2017. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (EACL’17). 438–449. Retrieved from https://aclanthology.info/papers/E17-1042/e17-1042.Google Scholar
- Tulika Saha, Dhawal Gupta, Sriparna Saha, and Pushpak Bhattacharyya. 2018. Reinforcement learning based dialogue management strategy. In Proceedings of the 25th International Conference on Neural Information Processing (ICONIP’18), Long Cheng, Andrew Chi-Sing Leung, and Seiichi Ozawa (Eds.). Springer. https://doi.org/10.1007/978-3-030-04182-3_32Google Scholar
Cross Ref
- Tulika Saha, Dhawal Gupta, Sriparna Saha, and Pushpak Bhattacharyya. 2020. A hierarchical approach for efficient multi-intent dialogue policy learning. Multimedia Tools Appl. (2020), 1–26.Google Scholar
- Tulika Saha, Dhawal Gupta, Sriparna Saha, and Pushpak Bhattacharyya. 2020. Towards integrated dialogue policy learning for multiple domains and intents using hierarchical deep reinforcement learning. Expert Syst. Appl. 162 (2020), 113650.Google Scholar
Cross Ref
- Tulika Saha, Sriparna Saha, and Pushpak Bhattacharyya. 2020. Towards sentiment aided dialogue policy learning for multi-intent conversations using hierarchical reinforcement learning. PloS One 15, 7 (2020), e0235367.Google Scholar
Cross Ref
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. 2016. Prioritized experience replay. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16). Retrieved from http://arxiv.org/abs/1511.05952.Google Scholar
- Pei-Hao Su, Pawel Budzianowski, Stefan Ultes, Milica Gasic, and Steve J. Young. 2017. Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Kristiina Jokinen, Manfred Stede, David DeVault, and Annie Louis (Eds.). Association for Computational Linguistics, 147–157. DOI:https://doi.org/10.18653/v1/w17-5518Google Scholar
- Richard S. Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artific. Intell. 112, 1-2 (1999), 181–211. Google Scholar
Digital Library
- Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, and Tony Jebara. 2018. Subgoal discovery for hierarchical dialogue policy learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2298–2309. Retrieved from https://aclanthology.info/papers/D18-1253/d18-1253.Google Scholar
Cross Ref
- Bernard L. Welch. 1947. The generalization of students’ problem when several different population variances are involved. Biometrika 34, 1/2 (1947), 28–35.Google Scholar
Cross Ref
- Xiaodong Zhang and Houfeng Wang. 2016. A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16), Vol. 16. 2993–2999. Google Scholar
Digital Library
Index Terms
A Unified Dialogue Management Strategy for Multi-intent Dialogue Conversations in Multiple Languages
Recommendations
From vocal to multimodal dialogue management
ICMI '06: Proceedings of the 8th international conference on Multimodal interfacesMultimodal, speech-enabled systems pose different research problems when compared to unimodal, voice-only dialogue systems. One of the important issues is the question of how a multimodal interface should look like in order to make the multimodal ...
A hierarchical approach for efficient multi-intent dialogue policy learning
AbstractThis paper proposes a hierarchical method for learning an efficient Dialogue Management (DM) strategy for task-oriented conversations serving multiple intents of a domain. Deep Reinforcement Learning (DRL) networks specializing in individual ...






Comments