Abstract
In this article, conditional-transforming variational autoencoders (CTVAEs) are proposed for generating diverse short text conversations. In conditional variational autoencoders (CVAEs), the prior distribution of latent variable z follows a multivariate Gaussian distribution with mean and variance modulated by the input conditions. Previous work found that this distribution tended to become condition-independent in practical applications. Thus, this article designs CTVAEs to enhance the influence of conditions in CVAEs. In a CTVAE model, the latent variable z is sampled by performing a non-linear transformation on the combination of the input conditions and the samples from a condition-independent prior distribution N (0, I). In our experiments using a Chinese Sina Weibo dataset, the CTVAE model derives z samples for decoding with better condition-dependency than that of the CVAE model. The earth mover’s distance (EMD) between the distributions of the latent variable z at the training stage, and the testing stage is also reduced by using the CTVAE model. In subjective preference tests, our proposed CTVAE model performs significantly better than CVAE and sequence-to-sequence (Seq2Seq) models on generating diverse, informative, and topic-relevant responses.
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7–9, 2015. http://arxiv.org/abs/1409.0473.Google Scholar
- Antoine Bordes and Jason Weston. 2016. Learning end-to-end goal-oriented dialog. CoRR abs/1605.07683 (2016). arxiv:1605.07683 http://arxiv.org/abs/1605.07683.Google Scholar
- Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew Dai, Rafal Jozefowicz, and Samy Bengio. 2016. Generating sentences from a continuous space. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. 10--21.Google Scholar
Cross Ref
- Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1657--1668.Google Scholar
Cross Ref
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, and Hsiao-Wuen Hon. 2019. Unified language model pre-training for natural language understanding and generation. In Advances in Neural Information Processing Systems. 13042--13054.Google Scholar
- Jiachen Du, Wenjie Li, Yulan He, Ruifeng Xu, Lidong Bing, and Xuan Wang. 2018. Variational autoregressive decoder for neural response generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3154--3163.Google Scholar
Cross Ref
- Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, and Michel Galley. 2018. A knowledge-grounded neural conversation model. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.Google Scholar
- Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, and Eric P Xing. 2017. Toward controlled generation of text. In Proceedings of the International Conference on Machine Learning. 1587--1596.Google Scholar
- Pei Ke, Jian Guan, Minlie Huang, and Xiaoyan Zhu. 2018. Generating informative responses with controlled sentence function. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1499--1508.Google Scholar
Cross Ref
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7--9, 2015.Google Scholar
- Diederik P. Kingma, Shakir Mohamed, Danilo Jimenez Rezende, and Max Welling. 2014. Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems. 3581--3589.Google Scholar
Digital Library
- Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14--16, 2014. http://arxiv.org/abs/1312.6114Google Scholar
- Xiaopeng Yang, Xiaowen Lin, Shunda Suo, and Ming Li. 2018. Generating thematic Chinese poetry using conditional variational autoencoders with hybrid decoders. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI'18), July 13-19, 2018, Stockholm, Sweden, Jé;rôme Lang (Ed.). ijcai.org, 4539--4545.Google Scholar
Cross Ref
- Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of NAACL-HLT. 110--119.Google Scholar
Cross Ref
- Jiwei Li, Michel Galley, Chris Brockett, Georgios Spithourakis, Jianfeng Gao, and Bill Dolan. 2016. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 994--1003.Google Scholar
Cross Ref
- Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, Michel Galley, and Jianfeng Gao. 2016. Deep reinforcement learning for dialogue generation. in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 1192--1202.Google Scholar
Cross Ref
- Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2122--2132.Google Scholar
Cross Ref
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1412--1421.Google Scholar
- Wentao Ma, Yiming Cui, Nan Shao, Su He, Weinan Zhang, Ting Liu, Shijin Wang, and Guoping Hu. 2019. TripleNet: Triple attention network for multi-turn response selection in retrieval-based chatbots. In CoNLL.Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, Nov. (2008), 2579--2605.Google Scholar
- Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. Retrieved from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf (2018).Google Scholar
- Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Zhigang Chen, and Nitin Indurkhya. 2019. Condition-transforming variational autoencoder for conversation response generation. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 7215--7219.Google Scholar
Cross Ref
- Yossi Rubner and Carlo Tomasi. 2001. The earth mover’s distance. In Perceptual Metrics for Image Database Navigation. Springer, 13--28.Google Scholar
- Yossi Rubner, Carlo Tomasi, and Leonidas J. Guibas. 2000. The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision 40, 2 (2000), 99--121.Google Scholar
Digital Library
- Jost Schatzmann, Karl Weilhammer, Matthew N. Stuttle, and Steve J. Young. 2006. A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowledge Engineering Review 21, 2 (2006), 97--126. DOI:https://doi.org/10.1017/S0269888906000944Google Scholar
Digital Library
- Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona 3776--3784.Google Scholar
- Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In AAAI. 3295--3301.Google Scholar
- Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 1577--1586.Google Scholar
Cross Ref
- Lifeng Shang, Tetsuya Sakai, Zhengdong Lu, Hang Li, Ryuichiro Higashinaka, and Yusuke Miyao. 2016. Overview of the NTCIR-12 short text conversation task. In NTCIR.Google Scholar
- Xiaoyu Shen, Hui Su, Shuzi Niu, and Vera Demberg. 2018. Improving variational encoder-decoders in dialogue generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Kihyuk Sohn, Honglak Lee, and Xinchen Yan. 2015. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems. 3483--3491.Google Scholar
- Haoyu Song, Weinan Zhang, Yiming Cui, Dong Wang, and Ting Liu. 2019. Exploiting persona information for diverse generation of conversational responses. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019. 5190--5196. DOI:https://doi.org/10.24963/ijcai.2019/721Google Scholar
Cross Ref
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google Scholar
Digital Library
- Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).Google Scholar
- Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gasic, Lina M. Rojas Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A network-based end-to-end trainable task-oriented dialogue system. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 438--449.Google Scholar
Cross Ref
- Jason D. Williams and Steve J. Young. 2007. Partially observable Markov decision processes for spoken dialog systems. Computer Speech 8 Language 21, 2 (2007), 393--422. DOI:https://doi.org/10.1016/j.csl.2006.06.008Google Scholar
- Yu Wu, Wei Wu, Dejian Yang, Can Xu, and Zhoujun Li. 2018. Neural response generation with dynamic vocabularies. In Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar
- Chen Xing, Wei Wu, Yu Wu, Jie Liu, Yalou Huang, Ming Zhou, and Wei-Ying Ma. 2017. Topic aware neural response generation. In AAAI. 3351--3357.Google Scholar
- Rui Yan, Yiping Song, and Hua Wu. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, July 17--21, 2016. 55--64. DOI:https://doi.org/10.1145/2911451.2911542Google Scholar
Digital Library
- Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, 776--791.Google Scholar
Cross Ref
- Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R. Salakhutdinov, and Quoc V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 5754--5764.Google Scholar
- Steve J. Young, Milica Gasic, Blaise Thomson, and Jason D. Williams. 2013. POMDP-based statistical spoken dialog systems: A review. Proceedings of the IEEE 101, 5 (2013), 1160--1179. DOI:https://doi.org/10.1109/JPROC.2012.2225812Google Scholar
Cross Ref
- Tiancheng Zhao, Ran Zhao, and Maxine Eskenazi. 2017. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 654--664.Google Scholar
Cross Ref
- Ganbin Zhou, Ping Luo, Rongyu Cao, Fen Lin, Bo Chen, and Qing He. 2017. Mechanism-aware neural machine for dialogue response generation. In AAAI. 3400--3407.Google Scholar
- Xianda Zhou and William Yang Wang. 2018. MojiTalk: Generating emotional responses at scale. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1128--1137.Google Scholar
Cross Ref
- Qingfu Zhu, Lei Cui, Weinan Zhang, Furu Wei, and Ting Liu. 2019. Retrieval-enhanced adversarial training for neural response generation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers. 3763--3773. https://www.aclweb.org/anthology/P19-1366/.Google Scholar
Cross Ref
Index Terms
Condition-Transforming Variational Autoencoder for Generating Diverse Short Text Conversations
Recommendations
VAEPP: Variational Autoencoder with a Pull-Back Prior
Neural Information ProcessingAbstractMany approaches to training generative models by distinct training objectives have been proposed in the past. Variational Autoencoder (VAE) is an outstanding model of them based on log-likelihood. In this paper, we propose a novel learnable prior, ...
Topic-word-constrained sentence generation with variational autoencoder
Highlights- A model that can generate sentences under topic and word constraints was proposed.
AbstractWe propose a topic-word-constrained sentence-generation model with a variational autoencoder and convolutional neural network. It can generate sentences conditioned on a given topic distribution and a certain word. Unlike the vanilla ...
Importance Weighted Adversarial Variational Bayes
Hybrid Artificial Intelligent SystemsAbstractAdversarial variational Bayes (AVB) can infer the parameters of a generative model from the data using approximate maximum likelihood. The likelihood of deep generative models model is intractable. However, it can be approximated by a lower bound ...






Comments