skip to main content
research-article

Densely Enhanced Semantic Network for Conversation System in Social Media

Authors Info & Claims
Published:04 March 2022Publication History
Skip Abstract Section

Abstract

The human–computer conversation system is a significant application in the field of multimedia. To select an appropriate response, retrieval-based systems model the matching between the dialogue history and response candidates. However, most of the existing methods cannot fully capture and utilize varied matching patterns, which may degrade the performance of the systems. To address the issue, a densely enhanced semantic network (DESN) is proposed in our work. Given a multi-turn dialogue history and a response candidate, DESN first constructs the semantic representations of sentences from the word perspective, the sentence perspective, and the dialogue perspective. In particular, the dialogue perspective is a novel one introduced in our work. The dependencies between a single sentence and the whole dialogue are modeled from the dialogue perspective. Then, the response candidate and each utterance in the dialogue history are made to interact with each other. The varied matching patterns are captured for each utterance–response pair by using a dense matching module. The matching patterns of all the utterance–response pairs are accumulated in chronological order to calculate the matching degree between the dialogue history and the response. The responses in the candidate pool are ranked with the matching degree, thereby returning the most appropriate candidate. Our model is evaluated on the benchmark datasets. The experimental results prove that our model achieves significant and consistent improvement when compared with other baselines.

REFERENCES

  1. [1] Chaudhary Chandramani, Goyal Poonam, Goyal Navneet, and Chen Yi-Ping Phoebe. 2020. Image retrieval for complex queries using knowledge embedding. ACM Trans. Multimedia Comput. Commun. Appl. 16, 1 (2020), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Chen Haolan, Han Fred X., Niu Di, Liu Dong, Lai Kunfeng, Wu Chenglin, and Xu Yu. 2018. Mix: Multi-channel information crossing for text matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 110119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Chen Hongshen, Liu Xiaorui, Yin Dawei, and Tang Jiliang. 2017. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explor. Newslett. 19, 2 (2017), 2535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Cui Chen, Wang Wenjie, Song Xuemeng, Huang Minlie, Xu Xin-Shun, and Nie Liqiang. 2019. User attention-guided multimodal dialog systems. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 445454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Dang-Nguyen Duc-Tien, Piras Luca, Giacinto Giorgio, Boato Giulia, and Natale Francesco GB DE. 2017. Multimodal retrieval with diversification and relevance feedback for tourist attraction images. ACM Trans. Multimedia Comput. Commun. Appl. 13, 4 (2017), 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Das Abhishek, Kottur Satwik, Gupta Khushi, Singh Avi, Yadav Deshraj, Moura José MF, Parikh Devi, and Batra Dhruv. 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 326335.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Gu Jia-Chen, Ling Zhen-Hua, and Liu Quan. 2019. Interactive matching network for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 23212324.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Hu Jun, Qian Shengsheng, Fang Quan, and Xu Changsheng. 2018. Attentive interactive convolutional matching for community question answering in social multimedia. In Proceedings of the 26th ACM International Conference on Multimedia. 456464.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Huang Gao, Liu Zhuang, Maaten Laurens Van Der, and Weinberger Kilian Q. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 47004708.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ji Zongcheng, Lu Zhengdong, and Li Hang. 2014. An information retrieval approach to short text conversation. arXiv:1408.6988. Retrieved from https://arxiv.org/abs/1408.6988.Google ScholarGoogle Scholar
  12. [12] Jin Xisen, Lei Wenqiang, Ren Zhaochun, Chen Hongshen, Liang Shangsong, Zhao Yihong, and Yin Dawei. 2018. Explicit state tracking with semi-supervisionfor neural dialogue generation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 14031412.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Kadlec Rudolf, Schmid Martin, and Kleindienst Jan. 2015. Improved deep learning baselines for ubuntu corpus dialogs. arXiv:1510.03753. Retrieved from https://arxiv.org/abs/1510.03753.Google ScholarGoogle Scholar
  14. [14] Kang Gi-Cheon, Lim Jaeseo, and Zhang Byoung-Tak. 2019. Dual attention networks for visual reference resolution in visual dialog. arXiv:1902.09368. Retrieved from https://arxiv.org/abs/1902.09368.Google ScholarGoogle Scholar
  15. [15] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arxiv.org/abs/1412.6980.Google ScholarGoogle Scholar
  16. [16] Li Hao, Qi Fei, Shi Guangming, and Lin Chunhuan. 2019. A multiscale dilated dense convolutional network for saliency prediction with instance-level attention competition. J. Vis. Commun. Image Represent. 64 (2019), 102611.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Li Jiwei, Monroe Will, Shi Tianlin, Jean Sébastien, Ritter Alan, and Jurafsky Dan. 2017. Adversarial learning for neural dialogue generation. arXiv:1701.06547. Retrieved from https://arxiv.org/abs/1701.06547.Google ScholarGoogle Scholar
  18. [18] Li Jingyuan and Sun Xiao. 2018. A syntactically constrained bidirectional-asynchronous approach for emotional conversation generation. arXiv:1806.07000. Retrieved from https://arxiv.org/abs/1806.07000.Google ScholarGoogle Scholar
  19. [19] Li Kai, Qi Guo-Jun, and Hua Kien A.. 2017. Learning label preserving binary codes for multimedia retrieval: A general approach. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Li Lu, Li Chenliang, and Ji Donghong. 2021. Deep context modeling for multi-turn response selection in dialogue systems. Inf. Process. Manage. 58, 1 (2021), 102415.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Li Qun, Xiao Fu, An Le, Long Xianzhong, and Sun Xiaochuan. 2019. Semantic concept network and deep walk-based visual question answering. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (2019), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Li Yongrui, Yu Jun, and Wang Zengfu. 2019. Dense semantic matching network for multi-turn conversation. In Proceedings of the IEEE International Conference on Data Mining (ICDM’19). IEEE, 11861191.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Liao Lizi, Ma Yunshan, He Xiangnan, Hong Richang, and Chua Tat-seng. 2018. Knowledge-aware multimodal dialogue systems. In Proceedings of the 26th ACM International Conference on Multimedia. 801809.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Lin Junyang, Su Qi, Yang Pengcheng, Ma Shuming, and Sun Xu. 2018. Semantic-unit-based dilated convolution for multi-label text classification. arXiv:1808.08561. Retrieved from https://arxiv.org/abs/1808.08561.Google ScholarGoogle Scholar
  25. [25] Lin Min, Chen Qiang, and Yan Shuicheng. 2013. Network in network. arXiv:1312.4400. Retrieved from https://arxiv.org/abs/1312.4400.Google ScholarGoogle Scholar
  26. [26] Liu Ruoyu, Zhao Yao, Wei Shikui, Zheng Liang, and Yang Yi. 2019. Modality-invariant image-text embedding for image-sentence matching. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Liu Yahui, Bi Wei, Gao Jun, Liu Xiaojiang, Yao Jian, and Shi Shuming. 2018. Towards less generic responses in neural conversation models: A statistical re-weighting method. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 27692774.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Liu Zhenguang, Xia Yingjie, Liu Qi, He Qinming, Zhang Chao, and Zimmermann Roger. 2018. Toward personalized activity level prediction in community question answering websites. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Lowe Ryan, Pow Nissan, Serban Iulian, and Pineau Joelle. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv:1506.08909. Retrieved from https://arxiv.org/abs/1506.08909.Google ScholarGoogle Scholar
  30. [30] Lu Junyu, Ren Xiancong, Ren Yazhou, Liu Ao, and Xu Zenglin. 2020. Improving contextual language models for response retrieval in multi-turn conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 18051808.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Luo Liangchen, Xu Jingjing, Lin Junyang, Zeng Qi, and Sun Xu. 2018. An auto-encoder matching model for learning utterance-level semantic dependency in dialogue generation. arXiv:1808.08795. Retrieved from https://arxiv.org/abs/1808.08795.Google ScholarGoogle Scholar
  32. [32] Ma W., Cui Y., Shao N., He S., Zhang W. N., Liu T., Wang S., and Hu G. TripleNet. 2019. Triple attention network for multi-turn response selection in retrieval-based chatbots. arXiv:1909.10666. Retrieved from https://arxiv.org/abs/1909.10666.Google ScholarGoogle Scholar
  33. [33] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S., and Dean Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 31113119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Molchanov Pavlo, Tyree Stephen, Karras Tero, Aila Timo, and Kautz Jan. 2016. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440. Retrieved from https://arxiv.org/abs/1611.06440.Google ScholarGoogle Scholar
  35. [35] Nie Liqiang, Wang Wenjie, Hong Richang, Wang Meng, and Tian Qi. 2019. Multimodal dialog system: Generating responses via adaptive decoders. In Proceedings of the 27th ACM International Conference on Multimedia. 10981106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Pang Liang, Lan Yanyan, Guo Jiafeng, Xu Jun, Wan Shengxian, and Cheng Xueqi. 2016. Text matching as image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Peng Dunlu, Zhou Ming, Liu Cong, and Ai Jun. 2020. Human–machine dialogue modelling with the fusion of word-and sentence-level emotions. Knowl.-Bas. Syst. 192 (2020), 105319.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Saha Amrita, Khapra Mitesh, and Sankaranarayanan Karthik. 2018. Towards building large scale multimodal domain-aware conversation systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Seo Minjoon, Kembhavi Aniruddha, Farhadi Ali, and Hajishirzi Hannaneh. 2016. Bidirectional attention flow for machine comprehension. arXiv:1611.01603. Retrieved from https://arxiv.org/abs/1611.01603.Google ScholarGoogle Scholar
  40. [40] Shuster Kurt, Humeau Samuel, Bordes Antoine, and Weston Jason. 2018. Engaging image chat: Modeling personality in grounded dialogue. arXiv:1811.00945. Retrieved from https://arxiv.org/abs/1811.00945.Google ScholarGoogle Scholar
  41. [41] Song Yiping, Yan Rui, Li Xiang, Zhao Dongyan, and Zhang Ming. 2016. Two are better than one: An ensemble of retrieval-and generation-based dialog systems. arXiv:1610.07149. Retrieved from https://arxiv.org/abs/1610.07149.Google ScholarGoogle Scholar
  42. [42] Tan Ming, Santos Cicero dos, Xiang Bing, and Zhou Bowen. 2015. Lstm-based deep learning models for non-factoid answer selection. arXiv:1511.04108. Retrieved from https://arxiv.org/abs/1511.04108.Google ScholarGoogle Scholar
  43. [43] Tao Chongyang, Wu Wei, Xu Can, Hu Wenpeng, Zhao Dongyan, and Yan Rui. 2019. Multi-representation fusion network for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 267275.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Tao Chongyang, Wu Wei, Xu Can, Hu Wenpeng, Zhao Dongyan, and Yan Rui. 2019. One time of interaction may not be enough: Go deep with an interaction-over-interaction network for response selection in dialogues. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 111.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Tay Yi, Tuan Luu Anh, and Hui Siu Cheung. 2018. Cross temporal recurrent networks for ranking question answer pairs. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  46. [46] Tay Yi, Tuan Luu Anh, and Hui Siu Cheung. 2018. Multi-cast attention networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 22992308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Tiwari Akanksha, Weth Christian Von Der, and Kankanhalli Mohan S.. 2018. Multimodal multiplatform social media event summarization. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Maaten Laurens Van der and Hinton Geoffrey. 2008. Visualizing data using t-SNE.J. Mach. Learn. Res. 9, 11 (2008).Google ScholarGoogle Scholar
  49. [49] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  50. [50] Wan Shengxian, Lan Yanyan, Guo Jiafeng, Xu Jun, Pang Liang, and Cheng Xueqi. 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wan Shengxian, Lan Yanyan, Xu Jun, Guo Jiafeng, Pang Liang, and Cheng Xueqi. 2016. Match-srnn: Modeling the recursive matching structure with spatial rnn. arXiv:1604.04378. Retrieved from https://arxiv.org/abs/1604.04378.Google ScholarGoogle Scholar
  52. [52] Wang Hao, Chen Enhong, Liu Qi, Xu Tong, Du Dongfang, Su Wen, and Zhang Xiaopeng. 2018. A united approach to learning sparse attributed network embedding. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). IEEE, 557566.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wang Shuo, Guo Dan, Xu Xin, Zhuo Li, and Wang Meng. 2019. Cross-modality retrieval by joint correlation learning. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (2019), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wang Shuohang and Jiang Jing. 2015. Learning natural language inference with LSTM. arXiv:1512.08849. Retrieved from https://arxiv.org/abs/1512.08849.Google ScholarGoogle Scholar
  55. [55] Wang Shuheng, Wang Hanli, and Li Qinyu. 2019. Multi-dilation network for crowd counting. In Proceedings of the ACM Multimedia Asia. 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Wen Haoyang, Liu Yijia, Che Wanxiang, Qin Libo, and Liu Ting. 2018. Sequence-to-sequence learning for task-oriented dialogue with dialogue state representation. arXiv:1806.04441. Retrieved from https://arxiv.org/abs/1806.04441.Google ScholarGoogle Scholar
  57. [57] Wu Qi, Wang Peng, Shen Chunhua, Reid Ian, and Hengel Anton Van Den. 2018. Are you talking to me? reasoned visual dialog generation through adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 61066115.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Wu Yiling, Wang Shuhui, Song Guoli, and Huang Qingming. 2019. Learning fragment self-attention embeddings for image-text matching. In Proceedings of the 27th ACM International Conference on Multimedia. 20882096.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Wu Yu, Wu Wei, Xing Chen, Zhou Ming, and Li Zhoujun. 2016. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. arXiv:1612.01627. Retrieved from https://arxiv.org/abs/1612.01627.Google ScholarGoogle Scholar
  60. [60] Wu Yu, Wu Wei, Yang Dejian, Xu Can, and Li Zhoujun. 2018. Neural response generation with dynamic vocabularies. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google ScholarGoogle Scholar
  61. [61] Yan Rui, Song Yiping, and Wu Hua. 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 5564.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Yan Rui and Zhao Dongyan. 2018. Smarter response with proactive suggestion: A new generative neural conversation paradigm. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 45254531.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Yu Dongfei, Fu Jianlong, Tian Xinmei, and Mei Tao. 2019. Multi-source multi-level attention networks for visual question answering. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (2019), 120.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Yu Fisher and Koltun Vladlen. 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122. Retrieved from https://arxiv.org/abs/1511.07122.Google ScholarGoogle Scholar
  65. [65] Zhang Weinan, Cui Yiming, Wang Yifa, Zhu Qingfu, Li Lingzhi, Zhou Lianqiang, and Liu Ting. 2018. Context-sensitive generation of open-domain conversational responses. In Proceedings of the 27th International Conference on Computational Linguistics. 24372447.Google ScholarGoogle Scholar
  66. [66] Zhang Zhuosheng, Li Jiangtong, Zhu Pengfei, Zhao Hai, and Liu Gongshen. 2018. Modeling multi-turn conversation with deep utterance aggregation. arXiv:1806.09102. Retrieved from https://arxiv.org/abs/1806.09102.Google ScholarGoogle Scholar
  67. [67] Zhao Hai, Huang Changning, and Li Mu. 2006. An improved chinese word segmentation system with conditional random field. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. 162165.Google ScholarGoogle Scholar
  68. [68] Zhou Xiangyang, Dong Daxiang, Wu Hua, Zhao Shiqi, Yu Dianhai, Tian Hao, Liu Xuan, and Yan Rui. 2016. Multi-view response selection for human-computer conversation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 372381.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Zhou Xiangyang, Li Lu, Dong Daxiang, Liu Yi, Chen Ying, Zhao Wayne Xin, Yu Dianhai, and Wu Hua. 2018. Multi-turn response selection for chatbots with deep attention matching network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 11181127.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Densely Enhanced Semantic Network for Conversation System in Social Media

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 4
          November 2022
          497 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3514185
          • Editor:
          • Abdulmotaleb El Saddik
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 March 2022
          • Accepted: 1 November 2021
          • Revised: 1 June 2021
          • Received: 1 April 2020
          Published in tomm Volume 18, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!