Abstract
Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.
- [1] . 2018. On the alignment problem in multi-head attention-based neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers. 177–185.Google Scholar
- [2] . 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450.Google Scholar
- [3] . 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 632–642.Google Scholar
- [4] . 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1870–1879.Google Scholar
Cross Ref
- [5] . 2018. Neural natural language inference models enhanced with external knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2406–2417.Google Scholar
Cross Ref
- [6] . 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1657–1668.Google Scholar
Cross Ref
- [7] . 2016. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 551–561.Google Scholar
Cross Ref
- [8] . 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug.2011), 2493–2537. Google Scholar
Digital Library
- [9] . 2018. UAV-empowered edge computing environment for cyber-threat detection in smart vehicles. IEEE Netw. 32, 3 (2018), 42–51.Google Scholar
Cross Ref
- [10] . 2019. Edge computing-based security framework for big data analytics in VANETs. IEEE Netw. 33, 2 (2019), 72–81.Google Scholar
Cross Ref
- [11] . 2018. DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1460–1469.Google Scholar
Cross Ref
- [12] . 2018. Natural language inference over interaction space. In Proceedings of the International Conference on Learning Representations.Google Scholar
- [13] . 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 937–948.Google Scholar
Cross Ref
- [14] . 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google Scholar
Cross Ref
- [15] . 2018. Edge computing in the industrial internet of things environment: Software-defined-networks-based edge-cloud interplay. IEEE Commun. Mag. 56, 2 (2018), 44–51. Google Scholar
Digital Library
- [16] . 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6586–6593. Google Scholar
Digital Library
- [17] . 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19, 1 (2019), 447–457.Google Scholar
Cross Ref
- [18] . 2020. A distant supervision method based on paradigmatic relations for learning word embeddings. Neural Comput. Appl. 32, 12 (2020), 7759–7768.Google Scholar
- [19] . 2004. ITSPOKE: An intelligent tutoring spoken dialogue system. In Demonstration papers at HLT-NAACL’04. Association for Computational Linguistics, 5–8. Google Scholar
Digital Library
- [20] . 2019. Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4487–4496.Google Scholar
- [21] . 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:1605.09090. Retrieved from https://arxiv.org/abs/1605.09090.Google Scholar
- [22] . 2018. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3698–3707.Google Scholar
Cross Ref
- [23] . 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning. 1727–1736. Google Scholar
Digital Library
- [24] . 2016. Key-Value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1400–1409.Google Scholar
Cross Ref
- [25] . 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 130–136.Google Scholar
Cross Ref
- [26] . 2017. Neural semantic encoders. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 397–407.Google Scholar
Cross Ref
- [27] . 2019. Discourse marker augmented network with reinforcement learning for natural language inference. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 989–999.Google Scholar
- [28] . 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2249–2255.Google Scholar
Cross Ref
- [29] . 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’18). 2227–2237.Google Scholar
Cross Ref
- [30] . 2007. A chatbot-based interactive question answering system. Decalog. In Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue. 83–90.Google Scholar
- [31] . [n. d.]. Improving language understanding by generative pre-training.Google Scholar
- [32] . 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2383–2392.Google Scholar
Cross Ref
- [33] . 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. 1278–1286. Google Scholar
Digital Library
- [34] . 2015. Reasoning about entailment with neural attention. arXiv:1509.06664. Retrieved from https://arxiv.org/abs/1509.06664.Google Scholar
- [35] . 2016. Attentive pooling networks. arXiv:1602.03609. Retrieved from https://arxiv.org/abs/1602.03609.Google Scholar
- [36] . 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373–382. Google Scholar
Digital Library
- [37] . 2018. Analysis, classification and marker discovery of gene expression data with evolving spiking neural networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 517–527.Google Scholar
Cross Ref
- [38] . 2018. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 464–468.Google Scholar
- [39] . 2017. Inter-weighted alignment network for sentence pair modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1179–1189.Google Scholar
Cross Ref
- [40] . 2015. Highway Networks. arXiv preprint arXiv:1505.00387.Google Scholar
- [41] . 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1556–1566.Google Scholar
- [42] . 2018. Multiway attention networks for modeling sentence pairs. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 4411–4417. Google Scholar
Digital Library
- [43] . 2018. Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1565–1575.Google Scholar
Cross Ref
- [44] . 2018. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 583–591. Google Scholar
Digital Library
- [45] . 2018. Towards a quantum-inspired framework for binary classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 1815–1818. Google Scholar
Digital Library
- [46] . 2019. Binary classifier inspired by quantum theory. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 10051–10052. Google Scholar
Digital Library
- [47] . 2019. Towards a quantum-inspired binary classifier. IEEE Access 7 (2019), 42354–42372.Google Scholar
Cross Ref
- [48] . 2020. TermInformer: Unsupervised term mining and analysis in biomedical literature. Neural Comput. Appl. (2020), 1–14.Google Scholar
- [49] . 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
- [50] . 2015. Order-embeddings of images and language. arXiv:1511.06361. Retrieved from https://arxiv.org/abs/1511.06361.Google Scholar
- [51] . 2020. Structural block driven enhanced convolutional neural representation for relation extraction. Appl. Soft Comput. 86 (2020), 105913.Google Scholar
Cross Ref
- [52] . 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 22–32.Google Scholar
- [53] . 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1442–1451.Google Scholar
Cross Ref
- [54] . 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 4144–4150. Google Scholar
Digital Library
- [55] . 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 1340–1349.Google Scholar
- [56] . 2019. Simple and effective text matching with richer alignment features. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4699–4709.Google Scholar
- [57] . 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013–2018.Google Scholar
Cross Ref
- [58] . 2016. Words or characters? fine-grained gating for reading comprehension. arXiv:1611.01724. Retrieved from https://arxiv.org/abs/1611.01724.Google Scholar
- [59] . 2019. A compare-aggregate model with latent clustering for answer selection. arXiv:1905.12897 (2019). Google Scholar
Digital Library
- [60] . 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. Springer, 818–833.Google Scholar
Cross Ref
- [61] . 2018. I know what you want: Semantic learning for text comprehension. arXiv:1809.02794. Retrieved from https://arxiv.org/abs/1809.02794.Google Scholar
- [62] . 2016. Convolutional neural networks for sentence classification. GitHub.Google Scholar
Index Terms
SANTM: Efficient Self-attention-driven Network for Text Matching
Recommendations
Attention-Based Multi-level Network for Text Matching with Feature Fusion
ACAI '21: Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial IntelligenceText matching is a basic and common task in natural language processing. Recently, deep learning has achieved excellent performance in text matching tasks. The major process of the existing model is to pass two sentences through shallow encoder and ...
A simple and efficient text matching model based on deep interaction
Highlights- We propose a novel model, namely Deep Interaction Text Matching (DITM).
- The ...
AbstractIn recent years, text matching has gained increasing research focus and shown great improvements. However, due to the long-distance dependency and polysemy, existing text matching models cannot effectively capture the contextual and ...
Multi-Level Matching Networks for Text Matching
SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information RetrievalText matching aims to establish the matching relationship between two texts. It is an important operation in some information retrieval related tasks such as question duplicate detection, question answering, and dialog systems. Bidirectional long short ...






Comments