skip to main content
research-article

Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation

Authors Info & Claims
Published:18 May 2020Publication History
Skip Abstract Section

Abstract

Dependency-based graph convolutional networks (DepGCNs) are proven helpful for text representation to handle many natural language tasks. Almost all previous models are trained with cross-entropy (CE) loss, which maximizes the posterior likelihood directly. However, the contribution of dependency structures is not well considered by CE loss. As a result, the performance improvement gained by using the structure information can be narrow due to the failure in learning to rely on this structure information. To face the challenge, we propose the novel structurally comparative hinge (SCH) loss function for DepGCNs. SCH loss aims at enlarging the margin gained by structural representations over non-structural ones. From the perspective of information theory, this is equivalent to improving the conditional mutual information of model decision and structure information given text. Our experimental results on both English and Chinese datasets show that by substituting SCH loss for CE loss on various tasks, for both induced structures and structures from an external parser, performance is improved without additional learnable parameters. Furthermore, the extent to which certain types of examples rely on the dependency structure can be measured directly by the learned margin, which results in better interpretability. In addition, through detailed analysis, we show that this structure margin has a positive correlation with task performance and structure induction of DepGCNs, and SCH loss can help model focus more on the shortest dependency path between entities. We achieve the new state-of-the-art results on TACRED, IMDB, and Zh. Literature datasets, even compared with ensemble and BERT baselines.

References

  1. Joost Bastings, Wilker Aziz, Ivan Titov, and Khalil Sima’an. 2019. Modeling latent sentence structure in neural machine translation. arxiv:1901.06436.Google ScholarGoogle Scholar
  2. Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima’an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 1957--1967. https://aclanthology.info/papers/D17-1209/d17-1209.Google ScholarGoogle ScholarCross RefCross Ref
  3. Yonatan Bisk and Ke Tran. 2018. Inducing grammars with and for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation ([email protected]’18). 25--35. https://aclanthology.info/papers/W18-2704/w18-2704.Google ScholarGoogle ScholarCross RefCross Ref
  4. Rui Cai, Xiaodong Zhang, and Houfeng Wang. 2016. Bidirectional recurrent convolutional neural network for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. 756–765. http://aclweb.org/anthology/P/P16/P16-1072.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  5. Daniel Cer, Yinfei Yang, Sheng-Yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, et al. 2018. Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18): System Demonstrations 169--174. https://aclanthology.info/papers/D18-2029/d18-2029.Google ScholarGoogle ScholarCross RefCross Ref
  6. Jihun Choi, Kang Min Yoo, and SangGoo Lee. 2018. Learning to compose task-specific tree structures. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). 5094--5101. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16682.Google ScholarGoogle Scholar
  7. Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. 2017. Hierarchical multiscale recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings. https://openreview.net/forum?id=S1di0sfgl.Google ScholarGoogle Scholar
  8. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.Google ScholarGoogle Scholar
  9. Qiming Diao, Minghui Qiu, Chao-Yuan Wu, Alexander J. Smola, Jing Jiang, and Chong Wang. 2014. Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 193--202. DOI:https://doi.org/10.1145/2623330.2623758Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Volume 2: Short Papers. 72--78. DOI:https://doi.org/10.18653/v1/P17-2012Google ScholarGoogle ScholarCross RefCross Ref
  11. Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, and Lawrence Carin. 2017. Learning generic sentence representations using convolutional neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 2390--2400. https://aclanthology.info/papers/D17-1254/d17-1254.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18): Conference Track Proceedings. https://openreview.net/forum?id=r1dHXnH6-.Google ScholarGoogle Scholar
  13. Matthew R. Gormley, Mo Yu, and Mark Dredze. 2015. Improved relation extraction with feature-rich compositional embedding models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1774--1784. http://aclweb.org/anthology/D/D15/D15-1205.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  14. Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 241--251. DOI:https://doi.org/10.18653/v1/P19-1024Google ScholarGoogle ScholarCross RefCross Ref
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. DOI:https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle ScholarCross RefCross Ref
  16. Shexia He, Zuchao Li, Hai Zhao, and Hongxiao Bai. 2018. Syntax for semantic role labeling, to be, or not to be. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), Volume 1: Long Papers. 2061--2071. https://aclanthology.info/papers/P18-1192/p18-1192.Google ScholarGoogle ScholarCross RefCross Ref
  17. Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 1367--1377. http://aclweb.org/anthology/N/N16/N16-1162.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. DOI:https://doi.org/10.1162/neco.1997.9.8.1735Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Matthew Honnibal and Mark Johnson. 2015. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1373--1378. https://aclweb.org/anthology/D/D15/D15-1162.Google ScholarGoogle ScholarCross RefCross Ref
  20. Sébastien Jean and Kyunghyun Cho. 2019. Context-aware learning for neural machine translation. arxiv:1903.04715.Google ScholarGoogle Scholar
  21. Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14), Volume 1: Long Papers. 655--665. http://aclweb.org/anthology/P/P14/P14-1062.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  22. Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings. https://openreview.net/forum?id=SJU4ayYgl.Google ScholarGoogle Scholar
  23. Terry Koo, Amir Globerson, Xavier Carreras, and Michael Collins. 2007. Structured prediction models via the matrix-tree theorem. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 141--150. http://www.aclweb.org/anthology/D07-1015.Google ScholarGoogle Scholar
  24. Jiwei Li, Xinlei Chen, Eduard H. Hovy, and Dan Jurafsky. 2016. Visualizing and understanding neural models in NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 681--691. http://aclweb.org/anthology/N/N16/N16-1082.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  25. Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. 2018. Analogical reasoning on Chinese morphological and semantic relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), Volume 2: Short Papers. 138--143. http://aclweb.org/anthology/P18-2023.Google ScholarGoogle ScholarCross RefCross Ref
  26. Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated graph sequence neural networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Conference Track Proceedings. http://arxiv.org/abs/1511.05493Google ScholarGoogle Scholar
  27. Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li, and Buzhou Tang. 2018. LCQMC: A large-scale Chinese question matching corpus. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). 1952--1962. https://aclanthology.info/papers/C18-1166/c18-1166.Google ScholarGoogle Scholar
  28. Yang Liu and Mirella Lapata. 2018. Learning structured text representations. Transactions of the Association for Computational Linguistics 6 (2018), 63--75. https://transacl.org/ojs/index.php/tacl/article/view/1185.Google ScholarGoogle ScholarCross RefCross Ref
  29. Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, and Houfeng Wang. 2015. A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15), Volume 2: Short Papers. 285--290. http://aclweb.org/anthology/P/P15/P15-2047.pdf.Google ScholarGoogle Scholar
  30. Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings. https://openreview.net/forum?id=S1jE5L5gl.Google ScholarGoogle Scholar
  31. Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14): System Demonstrations. 55--60. http://aclweb.org/anthology/P/P14/P14-5010.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  32. Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 1506--1515. https://aclanthology.info/papers/D17-1159/d17-1159.Google ScholarGoogle ScholarCross RefCross Ref
  33. Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. http://aclweb.org/anthology/P/P16/P16-1105.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  34. Vlad Niculae, André F. T. Martins, and Claire Cardie. 2018. Towards dynamic computation graphs via sparse latent structure. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language. 905--911. https://aclanthology.info/papers/D18-1108/d18-1108.Google ScholarGoogle ScholarCross RefCross Ref
  35. Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab K. Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 4 (2016), 694--707. DOI:https://doi.org/10.1109/TASLP.2016.2520371Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), a Meeting of SIGDAT, a Special Interest Group of the ACL. 1532--1543. http://aclweb.org/anthology/D/D14/D14-1162.pdf.Google ScholarGoogle Scholar
  37. Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Volume 1 (Long Papers). 2227--2237. https://aclanthology.info/papers/N18-1202/n18-1202Google ScholarGoogle ScholarCross RefCross Ref
  38. Alessandro Raganato and Jörg Tiedemann. 2018. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the Workshop on Analyzing and Interpreting Neural Networks for NLP ([email protected]’18). 287--297. https://aclanthology.info/papers/W18-5431/w18-5431.Google ScholarGoogle ScholarCross RefCross Ref
  39. Rico Sennrich and Barry Haddow. 2016. Linguistic input features improve neural machine translation. In Proceedings of the 1st Conference on Machine Translation (WMT’17), Colocated with ACL 2016. 83--91. http://aclweb.org/anthology/W/W16/W16-2209.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  40. Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron C. Courville. 2019. Ordered neurons: Integrating tree structures into recurrent neural networks. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=B1l6qiR5F7.Google ScholarGoogle Scholar
  41. Peng Shi and Jimmy Lin. 2019. Simple BERT models for relation extraction and semantic role labeling. arxiv:1904.05255.Google ScholarGoogle Scholar
  42. Xing Shi, Inkit Padhi, and Kevin Knight. 2016. Does string-based neural MT learn source syntax? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 1526--1534. http://aclweb.org/anthology/D/D16/D16-1159.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  43. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13), a Meeting of SIGDAT, a Special Interest Group of the ACL. 1631--1642. https://aclanthology.info/papers/D13-1170/d13-1170.Google ScholarGoogle Scholar
  44. Baohua Sun, Lin Yang, Patrick Dong, Wenhan Zhang, Jason Dong, and Charles Young. 2018. Super characters: A conversion from sentiment classification to image classification. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis ([email protected]’18). 309--315. https://aclanthology.info/papers/W18-6245/w18-6245.Google ScholarGoogle ScholarCross RefCross Ref
  45. Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, and Noah A. Smith. 2018. Syntactic scaffolds for semantic structures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 3772--3782. https://aclanthology.info/papers/D18-1412/d18-1412.Google ScholarGoogle Scholar
  46. Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15), Volume 1: Long Papers. 1556--1566. http://aclweb.org/anthology/P/P15/P15-1150.pdf.Google ScholarGoogle Scholar
  47. Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1422--1432. http://aclweb.org/anthology/D/D15/D15-1167.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  48. Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. arxiv:1905.05950.Google ScholarGoogle Scholar
  49. Yufei Wang, Mark Johnson, Stephen Wan, Yifang Sun, and Wei Wang. 2019. How to best use syntax in semantic role labelling. arxiv:1906.00266.Google ScholarGoogle Scholar
  50. Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 4144--4150. DOI:https://doi.org/10.24963/ijcai.2017/579Google ScholarGoogle ScholarCross RefCross Ref
  51. Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16): Technical Papers. 1340--1349. http://aclweb.org/anthology/C/C16/C16-1127.pdf.Google ScholarGoogle Scholar
  52. Larry Wasserman. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 1 (2000), 92--107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ji Wen, Xu Sun, Xuancheng Ren, and Qi Su. 2018. Structure regularized neural network for entity relation classification for Chinese literature text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Volume 2 (Short Papers). 365--370. https://aclanthology.info/papers/N18-2059/n18-2059.Google ScholarGoogle ScholarCross RefCross Ref
  54. Yunlun Yang, Yunhai Tong, Shulei Ma, and Zhi-Hong Deng. 2016. A position encoding convolutional neural network based on dependency tree for relation classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 65--74. DOI:https://doi.org/10.18653/v1/D16-1007Google ScholarGoogle ScholarCross RefCross Ref
  55. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 1480--1489. http://aclweb.org/anthology/N/N16/N16-1174.pdf.Google ScholarGoogle Scholar
  56. Meishan Zhang, Zhenghua Li, Guohong Fu, and Min Zhang. 2019. Syntax-enhanced neural machine translation with syntax-aware word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Volume 1 (Long and Short Papers). 1151--1161. https://aclweb.org/anthology/papers/N/N19/N19-1118/.Google ScholarGoogle ScholarCross RefCross Ref
  57. Xiang Zhang and Yann LeCun. 2017. Which encoding is the best for text classification in Chinese, English, Japanese and Korean?arxiv:1708.02657.Google ScholarGoogle Scholar
  58. Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 2205--2215. https://aclanthology.info/papers/D18-1244/d18-1244.Google ScholarGoogle Scholar
  59. Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 35--45. https://aclanthology.info/papers/D17-1004/d17-1004.Google ScholarGoogle Scholar
  60. Xiao-Dan Zhu, Parinaz Sobhani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1604--1612. http://jmlr.org/proceedings/papers/v37/zhub15.html.Google ScholarGoogle Scholar

Index Terms

  1. Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 4
        July 2020
        291 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3391538
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 May 2020
        • Online AM: 7 May 2020
        • Accepted: 1 March 2020
        • Revised: 1 December 2019
        • Received: 1 August 2019
        Published in tallip Volume 19, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!