skip to main content
research-article

Interactive Gated Decoder for Machine Reading Comprehension

Published:19 January 2022Publication History
Skip Abstract Section

Abstract

Owing to the availability of various large-scale Machine Reading Comprehension (MRC) datasets, building an effective model to extract passage spans for question answering has been well studied in previous works. However, in reality, there are some questions that cannot be answered through the passage information, which brings more challenges to this task. In this article, we propose an Interactive Gated Decoder (IG Decoder), which focuses on modeling the interactions between the answer span prediction and no-answer prediction with a gating mechanism. We also propose a simple but effective approach for automatically generating pseudo training data, which aims to enrich the training data of the unanswerable questions. Experimental results on popular benchmark SQuAD 2.0 and NewsQA show that the proposed approaches yield consistent improvements over traditional BERT-large and strong ALBERT-xxlarge baseline systems. We also provide detailed ablations of the proposed method and error analysis on hard samples, which could be helpful in future research.

REFERENCES

  1. [1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, et al. 2016. Tensorflow: A system for large-scale machine learning. In OSDI, Vol. 16. 265283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Ba Jimmy Lei, Kiros Jamie Ryan, and Hinton Geoffrey E.. 2016. Layer normalization. arXiv:1607.06450.Google ScholarGoogle Scholar
  3. [3] Chen Danqi, Fisch Adam, Weston Jason, and Bordes Antoine. 2017. Reading Wikipedia to answer open-domain questions. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 18701879. https://doi.org/10.18653/v1/P17-1171Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Clark Kevin, Luong Minh-Thang, Le Quoc V., and Manning Christopher D.. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In ICLR. https://openreview.net/pdf?id=r1xMH1BtvB.Google ScholarGoogle Scholar
  5. [5] Cui Yiming, Chen Zhipeng, Wei Si, Wang Shijin, Liu Ting, and Hu Guoping. 2017. Attention-over-Attention neural networks for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 593602. https://doi.org/10.18653/v1/P17-1055Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Cui Yiming, Liu Ting, Chen Zhipeng, Wang Shijin, and Hu Guoping. 2016. Consensus attention-based neural networks for Chinese reading comprehension. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers (COLING’16). The COLING 2016 Organizing Committee, 17771786. http://aclweb.org/anthology/C16-1167.Google ScholarGoogle Scholar
  7. [7] Cui Yiming, Zhang Wei-Nan, Che Wanxiang, Liu Ting, Chen Zhipeng, Wang Shijin, and Hu Guoping. 2019. Contextual recurrent units for cloze-style reading comprehension. arXiv:1911.05960.Google ScholarGoogle Scholar
  8. [8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 41714186.Google ScholarGoogle Scholar
  9. [9] Dhingra Bhuwan, Liu Hanxiao, Yang Zhilin, Cohen William, and Salakhutdinov Ruslan. 2017. Gated-Attention readers for text comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 18321846. https://doi.org/10.18653/v1/P17-1168Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Hendrycks Dan and Gimpel Kevin. 2016. Gaussian error linear units (GELUs). arXiv:1606.08415.Google ScholarGoogle Scholar
  11. [11] Hermann Karl Moritz, Kocisky Tomas, Grefenstette Edward, Espeholt Lasse, Kay Will, Suleyman Mustafa, and Blunsom Phil. 2015. Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems. 16841692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Hill Felix, Bordes Antoine, Chopra Sumit, and Weston Jason. 2015. The Goldilocks principle: Reading children’s books with explicit memory representations. arXiv:1511.02301.Google ScholarGoogle Scholar
  13. [13] Hu Minghao, Wei Furu, Peng Yuxing, Huang Zhen, Yang Nan, and Li Dongsheng. 2019. Read+ verify: Machine reading comprehension with unanswerable questions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 65296537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Kadlec Rudolf, Schmid Martin, Bajgar Ondřej, and Kleindienst Jan. 2016. Text understanding with the attention sum reader network. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 908918. https://doi.org/10.18653/v1/P16-1086Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Kingma Diederik and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv:1412.6980.Google ScholarGoogle Scholar
  16. [16] Kundu Souvik and Ng Hwee Tou. 2018. A question-focused multi-factor attention network for question answering. In 32nd AAAI Conference on Artificial Intelligence. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Lai Guokun, Xie Qizhe, Liu Hanxiao, Yang Yiming, and Hovy Eduard. 2017. RACE: Large-scale reading comprehension dataset from examinations. In Proceedings of EMNLP 2017. Association for Computational Linguistics, 785794. http://aclweb.org/anthology/D17-1082.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. In International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=H1eA7AEtvS.Google ScholarGoogle Scholar
  19. [19] Levy Omer, Seo Minjoon, Choi Eunsol, and Zettlemoyer Luke. 2017. Zero-Shot relation extraction via reading comprehension. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL’17). Association for Computational Linguistics, 333342. https://doi.org/10.18653/v1/K17-1034Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Liu Ting, Cui Yiming, Yin Qingyu, Zhang Wei-Nan, Wang Shijin, and Hu Guoping. 2017. Generating and exploiting large-scale pseudo training data for zero pronoun resolution. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 102111. https://doi.org/10.18653/v1/P17-1010Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Liu Xiaodong, Li Wei, Fang Yuwei, Kim Aerin, Duh Kevin, and Gao Jianfeng. 2018. Stochastic answer networks for SQuAD 2.0. arXiv:1809.09194.Google ScholarGoogle Scholar
  22. [22] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692.Google ScholarGoogle Scholar
  23. [23] Rajpurkar Pranav, Jia Robin, and Liang Percy. 2018. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, 784789. http://aclweb.org/anthology/P18-2124.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Rajpurkar Pranav, Zhang Jian, Lopyrev Konstantin, and Liang Percy. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 23832392. https://doi.org/10.18653/v1/D16-1264Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Richardson Matthew, Burges Christopher J. C., and Renshaw Erin. 2013. MCTest: A challenge dataset for the open-domain machine comprehension of text. In Proceedings of EMNLP 2013. 193203.Google ScholarGoogle Scholar
  26. [26] Seo Minjoon, Kembhavi Aniruddha, Farhadi Ali, and Hajishirzi Hananneh. 2017. Bi-Directional attention flow for machine comprehension. In International Conference on Learning Representations (ICLR’17). https://openreview.net/forum?id=HJ0UKP9ge.Google ScholarGoogle Scholar
  27. [27] Sun Fu, Li Linyang, Qiu Xipeng, and Liu Yang. 2018. U-Net: Machine reading comprehension with unanswerable questions. arXiv:1810.06638.Google ScholarGoogle Scholar
  28. [28] Tay Yi, Luu Anh Tuan, Hui Siu Cheung, and Su Jian. 2018. Densely connected attention propagation for reading comprehension. In Advances in Neural Information Processing Systems. 49064917. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Trischler Adam, Wang Tong, Yuan Xingdi, Harris Justin, Sordoni Alessandro, Bachman Philip, and Suleman Kaheer. 2017. NewsQA: A machine comprehension dataset. In Proceedings of the 2nd Workshop on Representation Learning for NLP. Association for Computational Linguistics, 191200. https://doi.org/10.18653/v1/W17-2623Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Vinyals Oriol, Fortunato Meire, and Jaitly Navdeep. 2015. Pointer networks. In Advances in Neural Information Processing Systems. 26922700. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Wang Shuohang and Jiang Jing. 2016. Machine comprehension using match-LSTM and answer pointer. arXiv:1608.07905.Google ScholarGoogle Scholar
  33. [33] Xiong Caiming, Zhong Victor, and Socher Richard. 2016. Dynamic coattention networks for question answering. arXiv:1611.01604.Google ScholarGoogle Scholar
  34. [34] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime, Salakhutdinov Russ R., and Le Quoc V.. 2019. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 57535763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Zhang Zhuosheng, Wu Yuwei, Zhao Hai, Li Zuchao, Zhang Shuailiang, Zhou Xi, and Zhou Xiang. 2020. Semantics-aware BERT for language understanding. In The 34th AAAI Conference on Artificial Intelligence (AAAI’20).Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Zhang Zhuosheng, Wu Yuwei, Zhou Junru, Duan Sufeng, Zhao Hai, and Wang Rui. 2020. SG-Net: Syntax-Guided machine reading comprehension. In Proceedings of the 34th AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Zhang Zhuosheng, Yang Junjie, and Zhao Hai. 2020. Retrospective reader for machine reading comprehension. arXiv:2001.09694.Google ScholarGoogle Scholar

Index Terms

  1. Interactive Gated Decoder for Machine Reading Comprehension

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 4
        July 2022
        464 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3511099
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 January 2022
        • Accepted: 1 November 2021
        • Revised: 1 September 2021
        • Received: 1 February 2021
        Published in tallip Volume 21, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!