skip to main content
research-article

SANTM: Efficient Self-attention-driven Network for Text Matching

Authors Info & Claims
Published:29 November 2021Publication History
Skip Abstract Section

Abstract

Self-attention mechanisms have recently been embraced for a broad range of text-matching applications. Self-attention model takes only one sentence as an input with no extra information, i.e., one can utilize the final hidden state or pooling. However, text-matching problems can be interpreted either in symmetrical or asymmetrical scopes. For instance, paraphrase detection is an asymmetrical task, while textual entailment classification and question-answer matching are considered asymmetrical tasks. In this article, we leverage attractive properties of self-attention mechanism and proposes an attention-based network that incorporates three key components for inter-sequence attention: global pointwise features, preceding attentive features, and contextual features while updating the rest of the components. Our model follows evaluation on two benchmark datasets cover tasks of textual entailment and question-answer matching. The proposed efficient Self-attention-driven Network for Text Matching outperforms the state of the art on the Stanford Natural Language Inference and WikiQA datasets with much fewer parameters.

REFERENCES

  1. [1] Alkhouli Tamer, Bretschner Gabriel, and Ney Hermann. 2018. On the alignment problem in multi-head attention-based neural machine translation. In Proceedings of the Third Conference on Machine Translation: Research Papers. 177–185.Google ScholarGoogle Scholar
  2. [2] Ba Jimmy Lei, Kiros Jamie Ryan, and Hinton Geoffrey E.. 2016. Layer normalization. arXiv:1607.06450. Retrieved from https://arxiv.org/abs/1607.06450.Google ScholarGoogle Scholar
  3. [3] Bowman Samuel R., Angeli Gabor, Potts Christopher, and Manning Christopher D.. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 632–642.Google ScholarGoogle Scholar
  4. [4] Chen Danqi, Fisch Adam, Weston Jason, and Bordes Antoine. 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 18701879.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Chen Qian, Zhu Xiaodan, Ling Zhen-Hua, Inkpen Diana, and Wei Si. 2018. Neural natural language inference models enhanced with external knowledge. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 24062417.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chen Qian, Zhu Xiaodan, Ling Zhen-Hua, Wei Si, Jiang Hui, and Inkpen Diana. 2017. Enhanced LSTM for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 16571668.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Cheng Jianpeng, Dong Li, and Lapata Mirella. 2016. Long short-term memory-networks for machine reading. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 551561.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Collobert Ronan, Weston Jason, Bottou Léon, Karlen Michael, Kavukcuoglu Koray, and Kuksa Pavel. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(Aug.2011), 24932537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Garg Sahil, Singh Amritpal, Batra Shalini, Kumar Neeraj, and Yang Laurence T.. 2018. UAV-empowered edge computing environment for cyber-threat detection in smart vehicles. IEEE Netw. 32, 3 (2018), 4251.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Garg Sahil, Singh Amritpal, Kaur Kuljeet, Aujla Gagangeet Singh, Batra Shalini, Kumar Neeraj, and Obaidat Mohammad S.. 2019. Edge computing-based security framework for big data analytics in VANETs. IEEE Netw. 33, 2 (2019), 7281.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ghaeini Reza, Hasan Sadid A., Datla Vivek, Liu Joey, Lee Kathy, Qadir Ashequl, Ling Yuan, Prakash Aaditya, Fern Xiaoli, and Farri Oladimeji. 2018. DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 14601469.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Gong Yichen, Luo Heng, and Zhang Jian. 2018. Natural language inference over interaction space. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  13. [13] He Hua and Lin Jimmy. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 937948.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Kaur Kuljeet, Garg Sahil, Aujla Gagangeet Singh, Kumar Neeraj, Rodrigues Joel JPC, and Guizani Mohsen. 2018. Edge computing in the industrial internet of things environment: Software-defined-networks-based edge-cloud interplay. IEEE Commun. Mag. 56, 2 (2018), 4451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Kim Seonhoon, Kang Inho, and Kwak Nojun. 2019. Semantic sentence matching with densely-connected recurrent and co-attentive information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 65866593. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Li En, Zeng Liekang, Zhou Zhi, and Chen Xu. 2019. Edge AI: On-demand accelerating deep neural network inference via edge computing. IEEE Trans. Wireless Commun. 19, 1 (2019), 447457.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Li Jianquan, Hu Renfen, Liu Xiaokang, Tiwari Prayag, Pandey Hari Mohan, Chen Wei, Wang Benyou, Jin Yaohong, and Yang Kaicheng. 2020. A distant supervision method based on paradigmatic relations for learning word embeddings. Neural Comput. Appl. 32, 12 (2020), 7759–7768.Google ScholarGoogle Scholar
  19. [19] Litman Diane J. and Silliman Scott. 2004. ITSPOKE: An intelligent tutoring spoken dialogue system. In Demonstration papers at HLT-NAACL’04. Association for Computational Linguistics, 58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Liu Xiaodong, He Pengcheng, Chen Weizhu, and Gao Jianfeng. 2019. Multi-task deep neural networks for natural language understanding. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4487–4496.Google ScholarGoogle Scholar
  21. [21] Liu Yang, Sun Chengjie, Lin Lei, and Wang Xiaolong. 2016. Learning natural language inference using bidirectional LSTM model and inner-attention. arXiv:1605.09090. Retrieved from https://arxiv.org/abs/1605.09090.Google ScholarGoogle Scholar
  22. [22] Ma Zhuang and Collins Michael. 2018. Noise contrastive estimation and negative sampling for conditional models: Consistency and statistical efficiency. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 36983707.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Miao Yishu, Yu Lei, and Blunsom Phil. 2016. Neural variational inference for text processing. In Proceedings of the International Conference on Machine Learning. 17271736. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Miller Alexander, Fisch Adam, Dodge Jesse, Karimi Amir-Hossein, Bordes Antoine, and Weston Jason. 2016. Key-Value memory networks for directly reading documents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 14001409.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Mou Lili, Men Rui, Li Ge, Xu Yan, Zhang Lu, Yan Rui, and Jin Zhi. 2016. Natural language inference by tree-based convolution and heuristic matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 130136.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Munkhdalai Tsendsuren and Yu Hong. 2017. Neural semantic encoders. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 397407.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Pan Boyuan, Yang Yazheng, Zhao Zhou, Zhuang Yueting, Cai Deng, and He Xiaofei. 2019. Discourse marker augmented network with reinforcement learning for natural language inference. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 989–999.Google ScholarGoogle Scholar
  28. [28] Parikh Ankur, Täckström Oscar, Das Dipanjan, and Uszkoreit Jakob. 2016. A decomposable attention model for natural language inference. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 22492255.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Peters Matthew E., Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL-HLT’18). 22272237.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Quarteroni Silvia and Manandhar Suresh. 2007. A chatbot-based interactive question answering system. Decalog. In Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue. 83–90.Google ScholarGoogle Scholar
  31. [31] Radford Alec, Narasimhan Karthik, Salimans Tim, and Sutskever Ilya. [n. d.]. Improving language understanding by generative pre-training.Google ScholarGoogle Scholar
  32. [32] Rajpurkar Pranav, Zhang Jian, Lopyrev Konstantin, and Liang Percy. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 23832392.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Rezende Danilo Jimenez, Mohamed Shakir, and Wierstra Daan. 2014. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the International Conference on Machine Learning. 12781286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Rocktäschel Tim, Grefenstette Edward, Hermann Karl Moritz, Kočiskỳ Tomáš, and Blunsom Phil. 2015. Reasoning about entailment with neural attention. arXiv:1509.06664. Retrieved from https://arxiv.org/abs/1509.06664.Google ScholarGoogle Scholar
  35. [35] Santos Cicero dos, Tan Ming, Xiang Bing, and Zhou Bowen. 2016. Attentive pooling networks. arXiv:1602.03609. Retrieved from https://arxiv.org/abs/1602.03609.Google ScholarGoogle Scholar
  36. [36] Severyn Aliaksei and Moschitti Alessandro. 2015. Learning to rank short text pairs with convolutional deep neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 373382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Shahi Gautam Kishore, Bilbao Imanol, Capecci Elisa, Nandini Durgesh, Choukri Maria, and Kasabov Nikola. 2018. Analysis, classification and marker discovery of gene expression data with evolving spiking neural networks. In Proceedings of the International Conference on Neural Information Processing. Springer, 517527.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Shaw Peter, Uszkoreit Jakob, and Vaswani Ashish. 2018. Self-attention with relative position representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 464–468.Google ScholarGoogle Scholar
  39. [39] Shen Gehui, Yang Yunlun, and Deng Zhi-Hong. 2017. Inter-weighted alignment network for sentence pair modeling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 11791189.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Srivastava Rupesh Kumar, Greff Klaus, and Schmidhuber Jürgen. 2015. Highway Networks. arXiv preprint arXiv:1505.00387.Google ScholarGoogle Scholar
  41. [41] Tai Kai Sheng, Socher Richard, and Manning Christopher D.. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1556–1566.Google ScholarGoogle Scholar
  42. [42] Tan Chuanqi, Wei Furu, Wang Wenhui, Lv Weifeng, and Zhou Ming. 2018. Multiway attention networks for modeling sentence pairs. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 44114417. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Tay Yi, Luu Anh Tuan, and Hui Siu Cheung. 2018. Compare, compress and propagate: Enhancing neural architectures with alignment factorization for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 15651575.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Tay Yi, Tuan Luu Anh, and Hui Siu Cheung. 2018. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining. ACM, 583591. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Tiwari Prayag and Melucci Massimo. 2018. Towards a quantum-inspired framework for binary classification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 18151818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Tiwari Prayag and Melucci Massimo. 2019. Binary classifier inspired by quantum theory. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 1005110052. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Tiwari Prayag and Melucci Massimo. 2019. Towards a quantum-inspired binary classifier. IEEE Access 7 (2019), 4235442372.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Tiwari Prayag, Uprety Sagar, Dehdashti Shahram, and Hossain M Shamim. 2020. TermInformer: Unsupervised term mining and analysis in biomedical literature. Neural Comput. Appl. (2020), 114.Google ScholarGoogle Scholar
  49. [49] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Vendrov Ivan, Kiros Ryan, Fidler Sanja, and Urtasun Raquel. 2015. Order-embeddings of images and language. arXiv:1511.06361. Retrieved from https://arxiv.org/abs/1511.06361.Google ScholarGoogle Scholar
  51. [51] Wang Dongsheng, Tiwari Prayag, Garg Sahil, Zhu Hongyin, and Bruza Peter. 2020. Structural block driven enhanced convolutional neural representation for relation extraction. Appl. Soft Comput. 86 (2020), 105913.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Wang Mengqiu, Smith Noah A., and Mitamura Teruko. 2007. What is the Jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 2232.Google ScholarGoogle Scholar
  53. [53] Wang Shuohang and Jiang Jing. 2016. Learning natural language inference with LSTM. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 14421451.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Wang Zhiguo, Hamza Wael, and Florian Radu. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 41444150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Wang Zhiguo, Mi Haitao, and Ittycheriah Abraham. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). 13401349.Google ScholarGoogle Scholar
  56. [56] Yang Runqi, Zhang Jianhai, Gao Xing, Ji Feng, and Chen Haiqing. 2019. Simple and effective text matching with richer alignment features. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4699–4709.Google ScholarGoogle Scholar
  57. [57] Yang Yi, Yih Wen-tau, and Meek Christopher. 2015. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 20132018.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Yang Zhilin, Dhingra Bhuwan, Yuan Ye, Hu Junjie, Cohen William W., and Salakhutdinov Ruslan. 2016. Words or characters? fine-grained gating for reading comprehension. arXiv:1611.01724. Retrieved from https://arxiv.org/abs/1611.01724.Google ScholarGoogle Scholar
  59. [59] Yoon Seunghyun, Dernoncourt Franck, Kim Doo Soon, Bui Trung, and Jung Kyomin. 2019. A compare-aggregate model with latent clustering for answer selection. arXiv:1905.12897 (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Zeiler Matthew D. and Fergus Rob. 2014. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision. Springer, 818833.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Zhang Zhuosheng, Wu Yuwei, Li Zuchao, He Shexia, Zhao Hai, Zhou Xi, and Zhou Xiang. 2018. I know what you want: Semantic learning for text comprehension. arXiv:1809.02794. Retrieved from https://arxiv.org/abs/1809.02794.Google ScholarGoogle Scholar
  62. [62] Rakhlin A.. 2016. Convolutional neural networks for sentence classification. GitHub.Google ScholarGoogle Scholar

Index Terms

  1. SANTM: Efficient Self-attention-driven Network for Text Matching

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 22, Issue 3
          August 2022
          631 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/3498359
          • Editor:
          • Ling Liu
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 November 2021
          • Revised: 1 September 2020
          • Accepted: 1 September 2020
          • Received: 1 July 2020
          Published in toit Volume 22, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!