Abstract
Inference has been central problem for understanding and reasoning in artificial intelligence. Especially, Natural Language Inference is an interesting problem that has attracted the attention of many researchers. Natural language inference intends to predict whether a hypothesis sentence can be inferred from the premise sentence. Most prior works rely on a simplistic association between the premise and hypothesis sentence pairs, which is not sufficient for learning complex relationships between them. The strategy also fails to exploit local context information fully. Long Short Term Memory (LSTM) or gated recurrent units networks (GRU) are not effective in modeling long-term dependencies, and their schemes are far more complex as compared to Convolutional Neural Networks (CNN). To address this problem of long-term dependency, and to involve context for modeling better representation of a sentence, in this article, a general Self-Attentive Convolution Neural Network (SACNN) is presented for natural language inference and sentence pair modeling tasks. The proposed model uses CNNs to integrate mutual interactions between sentences, and each sentence with their counterparts is taken into consideration for the formulation of their representation. Moreover, the self-attention mechanism helps fully exploit the context semantics and long-term dependencies within a sentence. Experimental results proved that SACNN was able to outperform strong baselines and achieved an accuracy of 89.7% on the stanford natural language inference (SNLI) dataset.
- Bill MacCartney and Christopher D. Manning. 2009. Natural Language Inference. Citeseer. Google Scholar
Digital Library
- Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The PASCAL recognising textual entailment challenge. In Proceedings of the 1st International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment (MLCW’05). Springer-Verlag, Berlin, 177–190. DOI:http://dx.doi.org/10.1007/11736790_9 Google Scholar
Digital Library
- Cleo Condoravdi, Dick Crouch, Valeria de Paiva, Reinhard Stolle, and Daniel G. Bobrow. 2003. Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL Workshop on Text Meaning. 38–45. Retrieved from https://www.aclweb.org/anthology/W03-0906. Google Scholar
Digital Library
- Yoav Goldberg and Graeme Hirst. 2017. Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers. Google Scholar
Digital Library
- Nikita Nangia, Adina Williams, Angeliki Lazaridou, and Samuel Bowman. 2017. The RepEval 2017 shared task: Multi-genre natural language inference with sentence representations. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics, 1–10. DOI:http://dx.doi.org/10.18653/v1/W17-5301Google Scholar
Cross Ref
- Aikaterini-Lida Kalouli, Annebeth Buis, Livy Real, Martha Palmer, and Valeria de Paiva. 2019. Explaining simple natural language inference. In Proceedings of the 13th Linguistic Annotation Workshop. Association for Computational Linguistics, 132–143. DOI:http://dx.doi.org/10.18653/v1/W19-4016Google Scholar
Cross Ref
- Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 632–642.Google Scholar
- Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1657–1668. DOI:http://dx.doi.org/10.18653/v1/P17-1152Google Scholar
Cross Ref
- Shuohang Wang and Jing Jiang. 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1442–1451. DOI:http://dx.doi.org/10.18653/v1/N16-1170Google Scholar
Cross Ref
- Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 4144–4150. DOI:http://dx.doi.org/10.24963/ijcai.2017/579 Google Scholar
Digital Library
- Pengfei Liu, Xipeng Qiu, Jifan Chen, and Xuanjing Huang. 2016. Deep fusion LSTMs for text semantic matching. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1034–1043.Google Scholar
Cross Ref
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google Scholar
Cross Ref
- Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015).Google Scholar
- Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP ([email protected]’17), Samuel R. Bowman, Yoav Goldberg, Felix Hill, Angeliki Lazaridou, Omer Levy, Roi Reichart, and Anders Søgaard (Eds.). Association for Computational Linguistics, 36–40. DOI:http://dx.doi.org/10.18653/v1/w17-5307Google Scholar
Cross Ref
- Wen-tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastusiak. 2013. Question answering using enhanced lexical semantic models. In Proceedings of the 51st Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1744–1753.Google Scholar
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http://arxiv.org/abs/1409.0473.Google Scholar
- Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomás Kociský, and Phil Blunsom. 2016. Reasoning about entailment with neural attention. In Proceedings of the 4th International Conference on Learning Representations, (ICLR 2016’16), Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http://arxiv.org/abs/1509.06664.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google Scholar
Cross Ref
- Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 655–665. DOI:http://dx.doi.org/10.3115/v1/P14-1062Google Scholar
Cross Ref
- Richard Socher, Eric H. Huang, Jeffrey Pennin, Christopher D. Manning, and Andrew Y. Ng. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 801–809. Google Scholar
Digital Library
- Wenpeng Yin and Hinrich Schütze. 2015. Convolutional neural network for paraphrase identification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901–911.Google Scholar
Cross Ref
- Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Ling. 4 (2016), 259–272.Google Scholar
Cross Ref
- Matthew W. Bilotti, Paul Ogilvie, Jamie Callan, and Eric Nyberg. 2007. Structured retrieval for question answering. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 351–358. Google Scholar
Digital Library
- Dan Shen and Mirella Lapata. 2007. Using semantic roles to improve question answering. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07).Google Scholar
- Dan Moldovan, Christine Clark, Sanda Harabagiu, and Daniel Hodges. 2007. COGEX: A semantically and contextually enriched logic prover for question answering. J. Appl. Log. 5, 1 (2007), 49–69.Google Scholar
Cross Ref
- V. Punyakanok, D. Roth and W.-t. Yih. 2004. Mapping dependencies trees: An application to question answering. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics.Google Scholar
- Michael Heilman and Noah A. Smith. 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1011–1019. Google Scholar
Digital Library
- Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07).Google Scholar
- Ming-Wei Chang, Dan Goldwasser, Dan Roth, and Vivek Srikumar. 2010. Discriminative learning over constrained latent representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 429–437. Google Scholar
Digital Library
- Alice Lai and Julia Hockenmaier. 2014. Illinois-lh: A denotational and distributional approach to semantics. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 329–334.Google Scholar
Cross Ref
- Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Semi-Markov phrase-based monolingual alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 590–600.Google Scholar
- William Blacoe and Mirella Lapata. 2012. A comparison of vector-based representations for semantic composition. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 546–556. Google Scholar
Digital Library
- Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep learning for answer sentence selection. In Proceedings of the NIPS Deep Learning and Representation Learning Workshop. Retrieved from http://www.dlworkshop.org/accepted-papers.Google Scholar
- Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013–2018.Google Scholar
Cross Ref
- Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’15). IEEE, 813–820.Google Scholar
Cross Ref
- Samuel R. Bowman, Christopher Potts, and Christopher D. Manning. 2015. Recursive neural networks can learn logical semantics. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality. Association for Computational Linguistics, 12–21. DOI:http://dx.doi.org/10.18653/v1/W15-4002Google Scholar
- Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, and Zheng Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 842–850.Google Scholar
- Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia. 2015. ABC-CNN: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960 (2015).Google Scholar
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048–2057. Google Scholar
Digital Library
- Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3204–3212.Google Scholar
Cross Ref
- Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei, Xu et al. 2015. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 2956–2964. Google Scholar
Digital Library
- Jiwei Li, Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1106–1115. DOI:http://dx.doi.org/10.3115/v1/P15-1107Google Scholar
- Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 379–389. DOI:http://dx.doi.org/10.18653/v1/D15-1044Google Scholar
Cross Ref
- Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1412–1421. DOI:http://dx.doi.org/10.18653/v1/D15-1166Google Scholar
- Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2249–2255. DOI:http://dx.doi.org/10.18653/v1/D16-1244Google Scholar
Cross Ref
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the International Conference on Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119. Retrieved from http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. Google Scholar
Digital Library
- Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10). Omnipress, Madison, WI, 807–814. Google Scholar
Digital Library
- Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). Association for Computational Linguistics.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 5998–6008. Google Scholar
Digital Library
- John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, July (2011), 2121–2159. Google Scholar
Digital Library
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958. Google Scholar
Digital Library
- Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 130–136. DOI:http://dx.doi.org/10.18653/v1/P16-2022Google Scholar
Cross Ref
- Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. 2016. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1466–1477. DOI:http://dx.doi.org/10.18653/v1/P16-1139Google Scholar
Cross Ref
- Yixin Nie and Mohit Bansal. 2017. Shortcut-stacked sentence encoders for multi-domain inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP ([email protected]’17), Samuel R. Bowman, Yoav Goldberg, Felix Hill, Angeliki Lazaridou, Omer Levy, Roi Reichart, and Anders Søgaard (Eds.). Association for Computational Linguistics, 41–45. DOI:http://dx.doi.org/10.18653/v1/w17-5308Google Scholar
Cross Ref
- Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv:1605.09090 (2016).Google Scholar
- Lei Sha, Baobao Chang, Zhifang Sui, and Sujian Li. 2016. Reading and thinking: Re-read LSTM unit for textual entailment recognition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. 2870–2879.Google Scholar
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training (2018). Retrieved from https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language-unsupervised/language_ understanding_paper. pdf.Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. DOI:http://dx.doi.org/10.18653/v1/N19-1423Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf. Google Scholar
Digital Library
Index Terms
SACNN: Self-attentive Convolutional Neural Network Model for Natural Language Inference
Recommendations
Convolutional Recurrent Neural Networks for Text Classification
Recurrent neural network (RNN) and convolutional neural network (CNN) are two prevailing architectures used in text classification. Traditional approaches combine the strengths of these two networks by straightly streamlining them or linking features ...
Scene text recognition using residual convolutional recurrent neural network
Text is a significant tool for human communication, and text recognition in scene images becomes more and more important. In this paper, we propose a residual convolutional recurrent neural network for solving the task of scene text recognition. The ...
Natural Language Grammatical Inference with Recurrent Neural Networks
This paper examines the inductive inference of a complex grammar with neural networks specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same ...






Comments