Abstract
The human–computer conversation system is a significant application in the field of multimedia. To select an appropriate response, retrieval-based systems model the matching between the dialogue history and response candidates. However, most of the existing methods cannot fully capture and utilize varied matching patterns, which may degrade the performance of the systems. To address the issue, a densely enhanced semantic network (DESN) is proposed in our work. Given a multi-turn dialogue history and a response candidate, DESN first constructs the semantic representations of sentences from the word perspective, the sentence perspective, and the dialogue perspective. In particular, the dialogue perspective is a novel one introduced in our work. The dependencies between a single sentence and the whole dialogue are modeled from the dialogue perspective. Then, the response candidate and each utterance in the dialogue history are made to interact with each other. The varied matching patterns are captured for each utterance–response pair by using a dense matching module. The matching patterns of all the utterance–response pairs are accumulated in chronological order to calculate the matching degree between the dialogue history and the response. The responses in the candidate pool are ranked with the matching degree, thereby returning the most appropriate candidate. Our model is evaluated on the benchmark datasets. The experimental results prove that our model achieves significant and consistent improvement when compared with other baselines.
- [1] . 2020. Image retrieval for complex queries using knowledge embedding. ACM Trans. Multimedia Comput. Commun. Appl. 16, 1 (2020), 1–23.Google Scholar
Digital Library
- [2] . 2018. Mix: Multi-channel information crossing for text matching. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 110–119.Google Scholar
Digital Library
- [3] . 2017. A survey on dialogue systems: Recent advances and new frontiers. ACM SIGKDD Explor. Newslett. 19, 2 (2017), 25–35.Google Scholar
Digital Library
- [4] . 2019. User attention-guided multimodal dialog systems. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 445–454.Google Scholar
Digital Library
- [5] . 2017. Multimodal retrieval with diversification and relevance feedback for tourist attraction images. ACM Trans. Multimedia Comput. Commun. Appl. 13, 4 (2017), 1–24.Google Scholar
Digital Library
- [6] . 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 326–335.Google Scholar
Cross Ref
- [7] . 2019. Interactive matching network for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2321–2324.Google Scholar
Digital Library
- [8] . 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google Scholar
Digital Library
- [9] . 2018. Attentive interactive convolutional matching for community question answering in social multimedia. In Proceedings of the 26th ACM International Conference on Multimedia. 456–464.Google Scholar
Digital Library
- [10] . 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4700–4708.Google Scholar
Cross Ref
- [11] . 2014. An information retrieval approach to short text conversation. arXiv:1408.6988. Retrieved from https://arxiv.org/abs/1408.6988.Google Scholar
- [12] . 2018. Explicit state tracking with semi-supervisionfor neural dialogue generation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1403–1412.Google Scholar
Digital Library
- [13] . 2015. Improved deep learning baselines for ubuntu corpus dialogs. arXiv:1510.03753. Retrieved from https://arxiv.org/abs/1510.03753.Google Scholar
- [14] . 2019. Dual attention networks for visual reference resolution in visual dialog. arXiv:1902.09368. Retrieved from https://arxiv.org/abs/1902.09368.Google Scholar
- [15] . 2014. Adam: A method for stochastic optimization. arXiv:1412.6980. Retrieved from https://arxiv.org/abs/1412.6980.Google Scholar
- [16] . 2019. A multiscale dilated dense convolutional network for saliency prediction with instance-level attention competition. J. Vis. Commun. Image Represent. 64 (2019), 102611.Google Scholar
Digital Library
- [17] . 2017. Adversarial learning for neural dialogue generation. arXiv:1701.06547. Retrieved from https://arxiv.org/abs/1701.06547.Google Scholar
- [18] . 2018. A syntactically constrained bidirectional-asynchronous approach for emotional conversation generation. arXiv:1806.07000. Retrieved from https://arxiv.org/abs/1806.07000.Google Scholar
- [19] . 2017. Learning label preserving binary codes for multimedia retrieval: A general approach. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 1–23.Google Scholar
Digital Library
- [20] . 2021. Deep context modeling for multi-turn response selection in dialogue systems. Inf. Process. Manage. 58, 1 (2021), 102415.Google Scholar
Cross Ref
- [21] . 2019. Semantic concept network and deep walk-based visual question answering. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (2019), 1–19.Google Scholar
Digital Library
- [22] . 2019. Dense semantic matching network for multi-turn conversation. In Proceedings of the IEEE International Conference on Data Mining (ICDM’19). IEEE, 1186–1191.Google Scholar
Cross Ref
- [23] . 2018. Knowledge-aware multimodal dialogue systems. In Proceedings of the 26th ACM International Conference on Multimedia. 801–809.Google Scholar
Digital Library
- [24] . 2018. Semantic-unit-based dilated convolution for multi-label text classification. arXiv:1808.08561. Retrieved from https://arxiv.org/abs/1808.08561.Google Scholar
- [25] . 2013. Network in network. arXiv:1312.4400. Retrieved from https://arxiv.org/abs/1312.4400.Google Scholar
- [26] . 2019. Modality-invariant image-text embedding for image-sentence matching. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1 (2019), 1–19.Google Scholar
Digital Library
- [27] . 2018. Towards less generic responses in neural conversation models: A statistical re-weighting method. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2769–2774.Google Scholar
Cross Ref
- [28] . 2018. Toward personalized activity level prediction in community question answering websites. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 1–15.Google Scholar
Digital Library
- [29] . 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. arXiv:1506.08909. Retrieved from https://arxiv.org/abs/1506.08909.Google Scholar
- [30] . 2020. Improving contextual language models for response retrieval in multi-turn conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1805–1808.Google Scholar
Digital Library
- [31] . 2018. An auto-encoder matching model for learning utterance-level semantic dependency in dialogue generation. arXiv:1808.08795. Retrieved from https://arxiv.org/abs/1808.08795.Google Scholar
- [32] . 2019. Triple attention network for multi-turn response selection in retrieval-based chatbots. arXiv:1909.10666. Retrieved from https://arxiv.org/abs/1909.10666.Google Scholar
- [33] . 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111–3119.Google Scholar
Digital Library
- [34] . 2016. Pruning convolutional neural networks for resource efficient inference. arXiv:1611.06440. Retrieved from https://arxiv.org/abs/1611.06440.Google Scholar
- [35] . 2019. Multimodal dialog system: Generating responses via adaptive decoders. In Proceedings of the 27th ACM International Conference on Multimedia. 1098–1106.Google Scholar
Digital Library
- [36] . 2016. Text matching as image recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google Scholar
Cross Ref
- [37] . 2020. Human–machine dialogue modelling with the fusion of word-and sentence-level emotions. Knowl.-Bas. Syst. 192 (2020), 105319.Google Scholar
Cross Ref
- [38] . 2018. Towards building large scale multimodal domain-aware conversation systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
Cross Ref
- [39] . 2016. Bidirectional attention flow for machine comprehension. arXiv:1611.01603. Retrieved from https://arxiv.org/abs/1611.01603.Google Scholar
- [40] . 2018. Engaging image chat: Modeling personality in grounded dialogue. arXiv:1811.00945. Retrieved from https://arxiv.org/abs/1811.00945.Google Scholar
- [41] . 2016. Two are better than one: An ensemble of retrieval-and generation-based dialog systems. arXiv:1610.07149. Retrieved from https://arxiv.org/abs/1610.07149.Google Scholar
- [42] . 2015. Lstm-based deep learning models for non-factoid answer selection. arXiv:1511.04108. Retrieved from https://arxiv.org/abs/1511.04108.Google Scholar
- [43] . 2019. Multi-representation fusion network for multi-turn response selection in retrieval-based chatbots. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. 267–275.Google Scholar
Digital Library
- [44] . 2019. One time of interaction may not be enough: Go deep with an interaction-over-interaction network for response selection in dialogues. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1–11.Google Scholar
Cross Ref
- [45] . 2018. Cross temporal recurrent networks for ranking question answer pairs. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- [46] . 2018. Multi-cast attention networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2299–2308.Google Scholar
Digital Library
- [47] . 2018. Multimodal multiplatform social media event summarization. ACM Trans. Multimedia Comput. Commun. Appl. 14, 2s (2018), 1–23.Google Scholar
Digital Library
- [48] . 2008. Visualizing data using t-SNE.J. Mach. Learn. Res. 9, 11 (2008).Google Scholar
- [49] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.Google Scholar
- [50] . 2016. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30.Google Scholar
Cross Ref
- [51] . 2016. Match-srnn: Modeling the recursive matching structure with spatial rnn. arXiv:1604.04378. Retrieved from https://arxiv.org/abs/1604.04378.Google Scholar
- [52] . 2018. A united approach to learning sparse attributed network embedding. In Proceedings of the IEEE International Conference on Data Mining (ICDM’18). IEEE, 557–566.Google Scholar
Cross Ref
- [53] . 2019. Cross-modality retrieval by joint correlation learning. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (2019), 1–16.Google Scholar
Digital Library
- [54] . 2015. Learning natural language inference with LSTM. arXiv:1512.08849. Retrieved from https://arxiv.org/abs/1512.08849.Google Scholar
- [55] . 2019. Multi-dilation network for crowd counting. In Proceedings of the ACM Multimedia Asia. 1–6.Google Scholar
Digital Library
- [56] . 2018. Sequence-to-sequence learning for task-oriented dialogue with dialogue state representation. arXiv:1806.04441. Retrieved from https://arxiv.org/abs/1806.04441.Google Scholar
- [57] . 2018. Are you talking to me? reasoned visual dialog generation through adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6106–6115.Google Scholar
Cross Ref
- [58] . 2019. Learning fragment self-attention embeddings for image-text matching. In Proceedings of the 27th ACM International Conference on Multimedia. 2088–2096.Google Scholar
Digital Library
- [59] . 2016. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots. arXiv:1612.01627. Retrieved from https://arxiv.org/abs/1612.01627.Google Scholar
- [60] . 2018. Neural response generation with dynamic vocabularies. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.Google Scholar
- [61] . 2016. Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 55–64.Google Scholar
Digital Library
- [62] . 2018. Smarter response with proactive suggestion: A new generative neural conversation paradigm. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 4525–4531.Google Scholar
Cross Ref
- [63] . 2019. Multi-source multi-level attention networks for visual question answering. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (2019), 1–20.Google Scholar
Digital Library
- [64] . 2015. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122. Retrieved from https://arxiv.org/abs/1511.07122.Google Scholar
- [65] . 2018. Context-sensitive generation of open-domain conversational responses. In Proceedings of the 27th International Conference on Computational Linguistics. 2437–2447.Google Scholar
- [66] . 2018. Modeling multi-turn conversation with deep utterance aggregation. arXiv:1806.09102. Retrieved from https://arxiv.org/abs/1806.09102.Google Scholar
- [67] . 2006. An improved chinese word segmentation system with conditional random field. In Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. 162–165.Google Scholar
- [68] . 2016. Multi-view response selection for human-computer conversation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 372–381.Google Scholar
Cross Ref
- [69] . 2018. Multi-turn response selection for chatbots with deep attention matching network. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1118–1127.Google Scholar
Cross Ref
Index Terms
Densely Enhanced Semantic Network for Conversation System in Social Media
Recommendations
Response Ranking with Deep Matching Networks and External Knowledge in Information-seeking Conversation Systems
SIGIR '18: The 41st International ACM SIGIR Conference on Research & Development in Information RetrievalIntelligent personal assistant systems with either text-based or voice-based conversational interfaces are becoming increasingly popular around the world. Retrieval-based conversation models have the advantages of returning fluent and informative ...
Social Media for Earthquake Response: Unpacking its Limitations with Care
When a 7.8 intensity earthquake caused widespread disaster in Ecuador on April 16, 2016, citizens across the country self-organized to gather, mobilize, and distribute supplies to affected populations, assuming the role of ad hoc humanitarian ...
Exploitation of Social Media for Emergency Relief and Preparedness: Recent Research and Trends
Online Social Media, such as Twitter, Facebook and WhatsApp, are important sources of real-time information related to emergency events, including both natural calamities, man-made disasters, epidemics, and so on. There has been lot of recent work on ...






Comments