skip to main content
research-article

SACNN: Self-attentive Convolutional Neural Network Model for Natural Language Inference

Authors Info & Claims
Published:16 June 2021Publication History
Skip Abstract Section

Abstract

Inference has been central problem for understanding and reasoning in artificial intelligence. Especially, Natural Language Inference is an interesting problem that has attracted the attention of many researchers. Natural language inference intends to predict whether a hypothesis sentence can be inferred from the premise sentence. Most prior works rely on a simplistic association between the premise and hypothesis sentence pairs, which is not sufficient for learning complex relationships between them. The strategy also fails to exploit local context information fully. Long Short Term Memory (LSTM) or gated recurrent units networks (GRU) are not effective in modeling long-term dependencies, and their schemes are far more complex as compared to Convolutional Neural Networks (CNN). To address this problem of long-term dependency, and to involve context for modeling better representation of a sentence, in this article, a general Self-Attentive Convolution Neural Network (SACNN) is presented for natural language inference and sentence pair modeling tasks. The proposed model uses CNNs to integrate mutual interactions between sentences, and each sentence with their counterparts is taken into consideration for the formulation of their representation. Moreover, the self-attention mechanism helps fully exploit the context semantics and long-term dependencies within a sentence. Experimental results proved that SACNN was able to outperform strong baselines and achieved an accuracy of 89.7% on the stanford natural language inference (SNLI) dataset.

References

  1. Bill MacCartney and Christopher D. Manning. 2009. Natural Language Inference. Citeseer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The PASCAL recognising textual entailment challenge. In Proceedings of the 1st International Conference on Machine Learning Challenges: Evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment (MLCW’05). Springer-Verlag, Berlin, 177–190. DOI:http://dx.doi.org/10.1007/11736790_9 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cleo Condoravdi, Dick Crouch, Valeria de Paiva, Reinhard Stolle, and Daniel G. Bobrow. 2003. Entailment, intensionality and text understanding. In Proceedings of the HLT-NAACL Workshop on Text Meaning. 38–45. Retrieved from https://www.aclweb.org/anthology/W03-0906. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yoav Goldberg and Graeme Hirst. 2017. Neural Network Methods in Natural Language Processing. Morgan & Claypool Publishers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nikita Nangia, Adina Williams, Angeliki Lazaridou, and Samuel Bowman. 2017. The RepEval 2017 shared task: Multi-genre natural language inference with sentence representations. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP. Association for Computational Linguistics, 1–10. DOI:http://dx.doi.org/10.18653/v1/W17-5301Google ScholarGoogle ScholarCross RefCross Ref
  6. Aikaterini-Lida Kalouli, Annebeth Buis, Livy Real, Martha Palmer, and Valeria de Paiva. 2019. Explaining simple natural language inference. In Proceedings of the 13th Linguistic Annotation Workshop. Association for Computational Linguistics, 132–143. DOI:http://dx.doi.org/10.18653/v1/W19-4016Google ScholarGoogle ScholarCross RefCross Ref
  7. Samuel Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 632–642.Google ScholarGoogle Scholar
  8. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1657–1668. DOI:http://dx.doi.org/10.18653/v1/P17-1152Google ScholarGoogle ScholarCross RefCross Ref
  9. Shuohang Wang and Jing Jiang. 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1442–1451. DOI:http://dx.doi.org/10.18653/v1/N16-1170Google ScholarGoogle ScholarCross RefCross Ref
  10. Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 4144–4150. DOI:http://dx.doi.org/10.24963/ijcai.2017/579 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Pengfei Liu, Xipeng Qiu, Jifan Chen, and Xuanjing Huang. 2016. Deep fusion LSTMs for text semantic matching. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1034–1043.Google ScholarGoogle ScholarCross RefCross Ref
  12. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  14. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.Google ScholarGoogle ScholarCross RefCross Ref
  15. Ming Tan, Cicero dos Santos, Bing Xiang, and Bowen Zhou. 2015. LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015).Google ScholarGoogle Scholar
  16. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP ([email protected]’17), Samuel R. Bowman, Yoav Goldberg, Felix Hill, Angeliki Lazaridou, Omer Levy, Roi Reichart, and Anders Søgaard (Eds.). Association for Computational Linguistics, 36–40. DOI:http://dx.doi.org/10.18653/v1/w17-5307Google ScholarGoogle ScholarCross RefCross Ref
  17. Wen-tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastusiak. 2013. Question answering using enhanced lexical semantic models. In Proceedings of the 51st Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1744–1753.Google ScholarGoogle Scholar
  18. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http://arxiv.org/abs/1409.0473.Google ScholarGoogle Scholar
  19. Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomás Kociský, and Phil Blunsom. 2016. Reasoning about entailment with neural attention. In Proceedings of the 4th International Conference on Learning Representations, (ICLR 2016’16), Yoshua Bengio and Yann LeCun (Eds.). Retrieved from http://arxiv.org/abs/1509.06664.Google ScholarGoogle Scholar
  20. Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.Google ScholarGoogle ScholarCross RefCross Ref
  21. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 655–665. DOI:http://dx.doi.org/10.3115/v1/P14-1062Google ScholarGoogle ScholarCross RefCross Ref
  22. Richard Socher, Eric H. Huang, Jeffrey Pennin, Christopher D. Manning, and Andrew Y. Ng. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 801–809. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wenpeng Yin and Hinrich Schütze. 2015. Convolutional neural network for paraphrase identification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 901–911.Google ScholarGoogle ScholarCross RefCross Ref
  24. Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. 2016. ABCNN: Attention-based convolutional neural network for modeling sentence pairs. Trans. Assoc. Comput. Ling. 4 (2016), 259–272.Google ScholarGoogle ScholarCross RefCross Ref
  25. Matthew W. Bilotti, Paul Ogilvie, Jamie Callan, and Eric Nyberg. 2007. Structured retrieval for question answering. In Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 351–358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Dan Shen and Mirella Lapata. 2007. Using semantic roles to improve question answering. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07).Google ScholarGoogle Scholar
  27. Dan Moldovan, Christine Clark, Sanda Harabagiu, and Daniel Hodges. 2007. COGEX: A semantically and contextually enriched logic prover for question answering. J. Appl. Log. 5, 1 (2007), 49–69.Google ScholarGoogle ScholarCross RefCross Ref
  28. V. Punyakanok, D. Roth and W.-t. Yih. 2004. Mapping dependencies trees: An application to question answering. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics.Google ScholarGoogle Scholar
  29. Michael Heilman and Noah A. Smith. 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1011–1019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07).Google ScholarGoogle Scholar
  31. Ming-Wei Chang, Dan Goldwasser, Dan Roth, and Vivek Srikumar. 2010. Discriminative learning over constrained latent representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 429–437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alice Lai and Julia Hockenmaier. 2014. Illinois-lh: A denotational and distributional approach to semantics. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 329–334.Google ScholarGoogle ScholarCross RefCross Ref
  33. Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. Semi-Markov phrase-based monolingual alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 590–600.Google ScholarGoogle Scholar
  34. William Blacoe and Mirella Lapata. 2012. A comparison of vector-based representations for semantic composition. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 546–556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lei Yu, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. 2014. Deep learning for answer sentence selection. In Proceedings of the NIPS Deep Learning and Representation Learning Workshop. Retrieved from http://www.dlworkshop.org/accepted-papers.Google ScholarGoogle Scholar
  36. Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. WikiQA: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013–2018.Google ScholarGoogle ScholarCross RefCross Ref
  37. Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. 2015. Applying deep learning to answer selection: A study and an open task. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’15). IEEE, 813–820.Google ScholarGoogle ScholarCross RefCross Ref
  38. Samuel R. Bowman, Christopher Potts, and Christopher D. Manning. 2015. Recursive neural networks can learn logical semantics. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality. Association for Computational Linguistics, 12–21. DOI:http://dx.doi.org/10.18653/v1/W15-4002Google ScholarGoogle Scholar
  39. Tianjun Xiao, Yichong Xu, Kuiyuan Yang, Jiaxing Zhang, Yuxin Peng, and Zheng Zhang. 2015. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 842–850.Google ScholarGoogle Scholar
  40. Kan Chen, Jiang Wang, Liang-Chieh Chen, Haoyuan Gao, Wei Xu, and Ram Nevatia. 2015. ABC-CNN: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960 (2015).Google ScholarGoogle Scholar
  41. Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048–2057. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3204–3212.Google ScholarGoogle ScholarCross RefCross Ref
  43. Chunshui Cao, Xianming Liu, Yi Yang, Yinan Yu, Jiang Wang, Zilei Wang, Yongzhen Huang, Liang Wang, Chang Huang, Wei, Xu et al. 2015. Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks. In Proceedings of the IEEE International Conference on Computer Vision. 2956–2964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jiwei Li, Thang Luong, and Dan Jurafsky. 2015. A hierarchical neural autoencoder for paragraphs and documents. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 1106–1115. DOI:http://dx.doi.org/10.3115/v1/P15-1107Google ScholarGoogle Scholar
  45. Alexander M. Rush, Sumit Chopra, and Jason Weston. 2015. A neural attention model for abstractive sentence summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 379–389. DOI:http://dx.doi.org/10.18653/v1/D15-1044Google ScholarGoogle ScholarCross RefCross Ref
  46. Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1412–1421. DOI:http://dx.doi.org/10.18653/v1/D15-1166Google ScholarGoogle Scholar
  47. Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. 2016. A decomposable attention model for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2249–2255. DOI:http://dx.doi.org/10.18653/v1/D16-1244Google ScholarGoogle ScholarCross RefCross Ref
  48. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the International Conference on Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3111–3119. Retrieved from http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML’10). Omnipress, Madison, WI, 807–814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). Association for Computational Linguistics.Google ScholarGoogle Scholar
  51. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 5998–6008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, July (2011), 2121–2159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929–1958. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 130–136. DOI:http://dx.doi.org/10.18653/v1/P16-2022Google ScholarGoogle ScholarCross RefCross Ref
  55. Samuel R. Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D. Manning, and Christopher Potts. 2016. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1466–1477. DOI:http://dx.doi.org/10.18653/v1/P16-1139Google ScholarGoogle ScholarCross RefCross Ref
  56. Yixin Nie and Mohit Bansal. 2017. Shortcut-stacked sentence encoders for multi-domain inference. In Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP ([email protected]’17), Samuel R. Bowman, Yoav Goldberg, Felix Hill, Angeliki Lazaridou, Omer Levy, Roi Reichart, and Anders Søgaard (Eds.). Association for Computational Linguistics, 41–45. DOI:http://dx.doi.org/10.18653/v1/w17-5308Google ScholarGoogle ScholarCross RefCross Ref
  57. Yang Liu, Chengjie Sun, Lei Lin, and Xiaolong Wang. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv preprint arXiv:1605.09090 (2016).Google ScholarGoogle Scholar
  58. Lei Sha, Baobao Chang, Zhifang Sui, and Sujian Li. 2016. Reading and thinking: Re-read LSTM unit for textual entailment recognition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. 2870–2879.Google ScholarGoogle Scholar
  59. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training (2018). Retrieved from https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/language-unsupervised/language_ understanding_paper. pdf.Google ScholarGoogle Scholar
  60. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 4171–4186. DOI:http://dx.doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  61. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the International Conference on Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. SACNN: Self-attentive Convolutional Neural Network Model for Natural Language Inference

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 3
        May 2021
        240 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3457152
        Issue’s Table of Contents

        Copyright © 2021 Association for Computing Machinery.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 June 2021
        • Revised: 1 September 2020
        • Accepted: 1 September 2020
        • Received: 1 June 2020
        Published in tallip Volume 20, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!