Abstract
In recent years, the research on dependency parsing focuses on improving the accuracy of the domain-specific (in-domain) test datasets and has made remarkable progress. However, there are innumerable scenarios in the real world that are not covered by the dataset, namely, the out-of-domain dataset. As a result, parsers that perform well on the in-domain data usually suffer from significant performance degradation on the out-of-domain data. Therefore, to adapt the existing in-domain parsers with high performance to a new domain scenario, cross-domain transfer learning methods are essential to solve the domain problem in parsing. This paper examines two scenarios for cross-domain transfer learning: semi-supervised and unsupervised cross-domain transfer learning. Specifically, we adopt a pre-trained language model BERT for training on the source domain (in-domain) data at the subword level and introduce self-training methods varied from tri-training for these two scenarios. The evaluation results on the NLPCC-2019 shared task and universal dependency parsing task indicate the effectiveness of the adopted approaches on cross-domain transfer learning and show the potential of self-learning to cross-lingual transfer learning.
- [1] . 2019. On difficulties of cross-lingual transfer with order differences: A case study on dependency parsing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 2440–2452.
DOI: https://doi.org/10.18653/v1/N19-1253Google ScholarCross Ref
- [2] . 2016. Globally normalized transition-based neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 2442–2452.
DOI: https://doi.org/10.18653/v1/P16-1231Google ScholarCross Ref
- [3] . 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 344–354.
DOI: https://doi.org/10.3115/v1/P15-1034Google ScholarCross Ref
- [4] . 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, COLT 1998, and (Eds.), ACM, Madison, Wisconsin, 92–100.
DOI: https://doi.org/10.1145/279943.279962 Google ScholarDigital Library
- [5] . 2016. A fast unified model for parsing and sentence understanding. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Berlin, Germany, 1466–1477.
DOI: https://doi.org/10.18653/v1/P16-1139Google ScholarCross Ref
- [6] . 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Doha, Qatar, 740–750.
DOI: https://doi.org/10.3115/v1/D14-1082Google ScholarCross Ref
- [7] . 2017. Neural machine translation with source dependency representation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 2846–2852.
DOI: https://doi.org/10.18653/v1/D17-1304Google ScholarCross Ref
- [8] . 2015. Semi-Supervised Dependency Parsing. Springer, Singapore.
DOI: https://doi.org/10.1007/978-981-287-552-5_4Google ScholarCross Ref
- [9] . 2018. Semi-Supervised sequence modeling with cross-view training. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, 1914–1925.
DOI: https://doi.org/10.18653/v1/D18-1217Google ScholarCross Ref
- [10] . 2017. Deep biaffine attention for neural dependency parsing. In Proceedings of the 5th International Conference on Learning Representations 2017. OpenReview.net, Toulon, France. Retrieved from https://openreview.net/forum?id=Hk95PK9le.Google Scholar
- [11] . 2012. A dynamic oracle for arc-eager dependency parsing. In Proceedings of COLING 2012. The COLING 2012 Organizing Committee, Mumbai, India, 959–976. Retrieved from https://www.aclweb.org/anthology/C12-1059.Google Scholar
- [12] . 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, 1234–1244.
DOI: https://doi.org/10.3115/v1/P15-1119Google ScholarCross Ref
- [13] . 2017. A joint many-task model: Growing a neural network for multiple NLP tasks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Copenhagen, Denmark, 1923–1933.
DOI: https://doi.org/10.18653/v1/D17-1206Google ScholarCross Ref
- [14] . 2012. Incremental joint approach to word segmentation, POS tagging, and dependency parsing in Chinese. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Jeju Island, Korea, 1045–1053.
DOI: https://www.aclweb.org/anthology/P12-1110 Google ScholarDigital Library
- [15] . 2018. Syntax for semantic role labeling, To Be, Or not to be. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 2061–2071.
DOI: https://doi.org/10.18653/v1/P18-1192Google ScholarCross Ref
- [16] . 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.
DOI: https://doi.org/10.1162/neco.1997.9.8.1735 Google ScholarDigital Library
- [17] . 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 328–339.
DOI: https://doi.org/10.18653/v1/P18-1031Google ScholarCross Ref
- [18] . 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the Association for Computational Linguistics 4 (2016), 313–327.
DOI: https://doi.org/10.1162/tacl_a_00101Google ScholarCross Ref
- [19] . 2010. Efficient third-order dependency parsers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Uppsala, Sweden, 1–11. https://www.aclweb.org/anthology/P10-1001. Google Scholar
Digital Library
- [20] . 2017. Neural joint model for transition-based Chinese syntactic analysis. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1204–1214.
DOI: https://doi.org/10.18653/v1/P17-1111Google ScholarCross Ref
- [21] . 2014. Dependency-Based word embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Association for Computational Linguistics, Baltimore, Maryland, 302–308.
DOI: https://doi.org/10.3115/v1/P14-2050Google ScholarCross Ref
- [22] . 2018. Seq2seq dependency parsing. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, 3203–3214. Retrieved from https://www.aclweb.org/anthology/C18-1271.Google Scholar
- [23] . 2019. Semi-supervised domain adaptation for dependency parsing. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 2386–2395.
DOI: https://doi.org/10.18653/v1/P19-1229Google ScholarCross Ref
- [24] . 2014. Ambiguity-aware ensemble training for semi-supervised dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 457–467.
DOI: https://doi.org/10.3115/v1/P14-1043Google ScholarCross Ref
- [25] . 2018. Stack-Pointer networks for dependency parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1403–1414.
DOI: https://doi.org/10.18653/v1/P18-1130Google ScholarCross Ref
- [26] . 2014. Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1337–1348.
DOI: https://doi.org/10.3115/v1/P14-1126Google ScholarCross Ref
- [27] . 2012. Fourth-Order dependency parsing. In Proceedings of COLING 2012: Posters. The COLING 2012 Organizing Committee, Mumbai, India, 785–796. Retrieved from https://www.aclweb.org/anthology/C12-2077.Google Scholar
- [28] . 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 2 (1993), 313–330. Google Scholar
Digital Library
- [29] . 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Trento, Italy, 81–88. Retrieved from https://www.aclweb.org/anthology/E06-1011.Google Scholar
- [30] . 2011. Multi-Source transfer of delexicalized dependency parsers. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Edinburgh, Scotland, UK., 62–72. Retrieved from https://www.aclweb.org/anthology/D11-1006. Google Scholar
Digital Library
- [31] . 2018. Advances in pre-training distributed word representations. In Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC 2018, , , , , , , , , , , , , , and (Eds.), European Language Resources Association (ELRA), Miyazaki, Japan, 52–55. Retrieved from http://www.lrec-conf.org/proceedings/lrec2018/summaries/721.html.Google Scholar
- [32] . 2019. Overview of the NLPCC 2019 shared task: Cross-domain dependency parsing. Natural Language Processing and Chinese Computing - 8th CCF International Conference, NLPCC 2019, Dunhuang, China, October 9–14, 2019, Proceedings, Part II 11839 (2019), 760–771.
DOI: https://doi.org/10.1007/978-3-030-32236-6_69Google Scholar - [33] . 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237.
DOI: https://doi.org/10.18653/v1/N18-1202Google ScholarCross Ref
- [34] . 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 2227–2237.
DOI: https://doi.org/10.18653/v1/N18-1202Google ScholarCross Ref
- [35] . 2019. Deep contextualized self-training for low resource dependency parsing. Transactions of the Association for Computational Linguistics 7 (
March 2019), 695–713.DOI: https://doi.org/10.1162/tacl_a_00294Google ScholarCross Ref
- [36] . 2018. Strong baselines for neural semi-supervised learning under domain shift. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1044–1054.
DOI: https://doi.org/10.18653/v1/P18-1096Google ScholarCross Ref
- [37] . 2021. Cross-lingual Universal Dependency Parsing Only from One Monolingual Treebank.
arxiv:cs.CL/2012.13163 . Retrieved from https://arxiv.org/abs/2012.13163.Google Scholar - [38] . 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Montréal, Canada, 477–487. Retrieved from https://www.aclweb.org/anthology/N12-1052. Google Scholar
Digital Library
- [39] . 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS, Long Beach, 6000–6010. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need. Google Scholar
Digital Library
- [40] . 2019. Beto, Bentz, Becas: The surprising cross-lingual effectiveness of BERT. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, Hong Kong, China, 833–844.
DOI: https://doi.org/10.18653/v1/D19-1077Google ScholarCross Ref
- [41] . 2020. Neural network based deep transfer learning for cross-domain dependency parsing. In Proceedings of the Artificial Intelligence and Security, , , and (Eds.), Springer Singapore, Singapore, 549–558.Google Scholar
Cross Ref
- [42] . 2014. Distributed word representation learning for cross-lingual dependency parsing. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics, Ann Arbor, Michigan, 119–129.
DOI: https://doi.org/10.3115/v1/W14-1613Google ScholarCross Ref
- [43] . 2020. A graph-based model for joint Chinese word segmentation and dependency parsing. Transactions of the Association for Computational Linguistics 8 (2020), 78–92.
DOI: https://doi.org/10.1162/tacl_a_00301Google ScholarCross Ref
- [44] . 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Cambridge, Massachusetts, 189–196.
DOI: https://doi.org/10.3115/981658.981684 Google ScholarDigital Library
- [45] . 2014. How transferable are features in deep neural networks? In Proceedings of the Advances in Neural Information Processing Systems 27, , , , , and (Eds.), Curran Associates, Inc., Montreal, Quebec, Canada, 3320–3328. Retrieved from http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf. Google Scholar
Digital Library
- [46] . 2015. Domain adaptation for dependency parsing via self-training. In Proceedings of the 14th International Conference on Parsing Technologies. Association for Computational Linguistics, Bilbao, Spain, 1–10.
DOI: https://doi.org/10.18653/v1/W15-2201Google ScholarCross Ref
- [47] . 2018. CoNLL 2018 Shared Task: Multilingual parsing from raw text to universal dependencies. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, Brussels, Belgium, 1–21.
DOI: https://doi.org/10.18653/v1/K18-2001Google Scholar - [48] . 2013. Online learning for inexact hypergraph search. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, 908–913. Retrieved from https://www.aclweb.org/anthology/D13-1093.Google Scholar
- [49] . 2012. Generalized higher-order dependency parsing with cube pruning. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Jeju Island, Korea, 320–331. Retrieved from https://www.aclweb.org/anthology/D12-1030. Google Scholar
Digital Library
- [50] . 2014. Character-Level Chinese dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Baltimore, Maryland, 1326–1336.
DOI: https://doi.org/10.3115/v1/P14-1125Google ScholarCross Ref
- [51] . 2008. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Honolulu, Hawaii, 562–571. Retrieved from https://www.aclweb.org/anthology/D08-1059. Google Scholar
Digital Library
- [52] . 2009. Cross-Domain dependency parsing using a deep linguistic grammar. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, Suntec, Singapore, 378–386. https://www.aclweb.org/anthology/P09-1043. Google Scholar
Digital Library
- [53] . 2019. Effective subword segmentation for text comprehension. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 11 (
Nov 2019), 1664–1674.DOI: https://doi.org/10.1109/TASLP.2019.2922537 Google ScholarDigital Library
- [54] . 2005. Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering 17, 11 (
Nov 2005), 1529–1541.DOI: https://doi.org/10.1109/TKDE.2005.186 Google ScholarDigital Library
Index Terms
Tri-training for Dependency Parsing Domain Adaptation
Recommendations
Cross-Domain Transfer Learning for Dependency Parsing
Natural Language Processing and Chinese ComputingAbstractIn recent years, the research of dependency parsing focuses on improving the accuracy of in-domain data and has made remarkable progress. However, the real world is different from a single scenario dataset, filled with countless scenarios that are ...
Improving Telugu Dependency Parsing using Combinatory Categorial Grammar Supertags
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG ...
Learning Domain Invariant Word Representations for Parsing Domain Adaptation
Natural Language Processing and Chinese ComputingAbstractWe show that strong domain adaptation results for dependency parsing can be achieved using a conceptually simple method that learns domain-invariant word representations. Lacking labeled resources, dependency parsing for low-resource domains has ...






Comments