Abstract
Transfer parsing has been used for developing dependency parsers for languages with no treebank by using transfer from treebanks of other languages (source languages). In delexicalized transfer, parsed words are replaced by their part-of-speech tags. Transfer parsing may not work well if a language does not follow uniform syntactic structure with respect to its different constituent patterns. Earlier work has used information derived from linguistic databases to transform a source language treebank to reduce the syntactic differences between the source and the target languages.
We propose a transformation method where a source language pattern is transformed stochastically to one of the multiple possible patterns followed in the target language. The transformed source language treebank can be used to train a delexicalized parser in the target language. We show that this method significantly improves the average performance of single-source delexicalized transfer parsers.
We also show that, in the multi-source settings, parsers trained using a concatenation of transformed source language treebanks work better when a subset of the source language treebanks is used rather than concatenating all of them or only one.
However, the problem of selecting the subset of treebanks whose combination gives the best-performing parser from the set of all the available treebanks is hard. We propose a greedy selection heuristic based on the labelled attachment scores of the corresponding single-source parsers trained using the treebanks after transformation.
- Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2016. Many languages, one parser. Retrieved from: arXiv preprint arXiv:1602.01595.Google Scholar
- Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. Retrieved from: http://arxiv.org/abs/1603.06042.Google Scholar
- Lauriane Aufrant, Guillaume Wisniewski, and François Yvon. 2016. Zero-resource dependency parsing: Boosting delexicalized cross-lingual transfer with linguistic knowledge. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 119--130. Retrieved from: http://aclweb.org/anthology/C/C16/C16-1012.pdf.Google Scholar
- Anders Björkelund, Agnieszka Falenska, Xiang Yu, and Jonas Kuhn. 2017. IMS at the CoNLL 2017 UD shared task: CRFs and perceptrons meet neural networks. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, 40--51. Retrieved from: http://www.aclweb.org/anthology/K17-3004.Google Scholar
Cross Ref
- Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 740--750.Google Scholar
- Timothy Dozat, Peng Qi, and Christopher D. Manning. 2017. Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, 20--30. Retrieved from: http://www.aclweb.org/anthology/K17-3002.Google Scholar
- Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Cross-lingual transfer for unsupervised dependency parsing without parallel data. In Proceedings of the 19th Conference on Computational Natural Language Learning. 113--122.Google Scholar
Cross Ref
- Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1234--1244.Google Scholar
Cross Ref
- Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2016. A representation learning framework for multi-source transfer parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. 2734--2740. Google Scholar
Digital Library
- Martin Haspelmath. 2005. The World Atlas of Language Structures, Martin Haspelmath et al. (Eds.), Oxford University Press, Oxford, UK, 695 pages.Google Scholar
- Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Nat. Lang. Eng. 11 (2005), 11--311. Google Scholar
Digital Library
- Ophélie Lacroix, Lauriane Aufrant, Guillaume Wisniewski, and François Yvon. 2016. Frustratingly easy cross-lingual transfer for transition-based dependency parsing. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1058--1063. Retrieved from: http://www.aclweb.org/anthology/N16-1121.Google Scholar
Cross Ref
- Xuezhe Ma and Fei Xia. 2014. Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1337--1348. Retrieved from: http://www.aclweb.org/anthology/P/P14/P14-1126.Google Scholar
Cross Ref
- Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). Association for Computational Linguistics, 62--72. Retrieved from: http://dl.acm.org/citation.cfm?id=2145432.2145440. Google Scholar
Digital Library
- Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. In Proceedings of the 50th Meeting of the Association for Computational Linguistics: Long Papers—Volume 1 (ACL’12). Association for Computational Linguistics, 629--637. Retrieved from: http://dl.acm.org/citation.cfm?id=2390524.2390613. Google Scholar
Digital Library
- Joakim Nivre. 2016. Universal dependencies: A cross-linguistic perspective on grammar and lexicon. In Proceedings of the Workshop on Grammar and Lexicon: Interactions and Interfaces (GramLex). The COLING Organizing Committee, Osaka, Japan, 38--40. https://www.aclweb.org/anthology/W16-3806.Google Scholar
- Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association, 1659--1666.Google Scholar
- Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uäur Doäan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 23--25.Google Scholar
- Mohammad Sadegh Rasooli and Michael Collins. 2015. Density-driven cross-lingual transfer of dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). Association for Computational Linguistics, 328--338. Retrieved from: http://aclweb.org/anthology/D15-1039.Google Scholar
Cross Ref
- Mohammad Sadegh Rasooli and Michael Collins. 2016. Cross-lingual syntactic transfer with limited resources. Retrieved from: arXiv preprint arXiv:1610.06227.Google Scholar
- Rudolf Rosa and Zdenek Zabokrtsky. 2015. Klcpos3-a language similarity measure for delexicalized parser transfer. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 243--249.Google Scholar
Cross Ref
- Michael Sejr Schlichtkrull and Anders Søgaard. 2017. Cross-lingual dependency parsing with late decoding for truly low-resource languages. Retrieved from: arXiv preprint arXiv:1701.01623.Google Scholar
- Anders Søgaard. 2011. Data point selection for cross-language adaptation of dependency parsers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, 682--686. https://www.aclweb.org/anthology/P11-2120. Google Scholar
Digital Library
- Milan Straka, Jan Hajič, Jana Straková, and Jan Hajič jr. 2014. Parsing universal dependency treebanks using neural networks and search-based oracle. In Proceedings of the International Workshop on Treebanks and Linguistic Theories (TLT’14). 208--220.Google Scholar
- Milan Straka, Jan Hajič, and Jana Straková. 2016. UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association.Google Scholar
- Oscar Täckström, Ryan McDonald, and Joakim Nivre. 2013. Target language adaptation of discriminative transfer parsers. In Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 1061--1071. http://aclweb.org/anthology/N/N13/N13-1126.pdf.Google Scholar
- Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 477--487. Google Scholar
Digital Library
- Jörg Tiedemann. 2014. Rediscovering annotation projection for cross-lingual parser induction. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1854--1864.Google Scholar
- Jörg Tiedemann. 2015. Improving the cross-lingual projection of syntactic dependencies. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA’15). Linköping University Electronic Press, 191--199. Retrieved from: http://www.aclweb.org/anthology/W15-1824.Google Scholar
- Jörg Tiedemann, Željko Agić, and Joakim Nivre. 2014. Treebank translation for cross-lingual parser induction. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 130--140. Retrieved from: http://www.aclweb.org/anthology/W/W14/W14-1614.Google Scholar
Cross Ref
- Jörg Tiedemann and Zeljko Agic. 2016. Synthetic treebanking for cross-lingual dependency parsing. J. Artif. Intell. Res. 55 (2016), 209--248. Google Scholar
Digital Library
- Min Xiao and Yuhong Guo. 2014. Distributed word representation learning for cross-lingual dependency parsing. In Proceedings of the 18th Conference on Computational Natural Language Learning. 119--129.Google Scholar
Cross Ref
- Min Xiao and Yuhong Guo. 2015. Annotation projection-based representation learning for cross-lingual dependency parsing. In Proceedings of the 19th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 73--82. Retrieved from: http://www.aclweb.org/anthology/K15-1008.Google Scholar
Cross Ref
- D. Zeman and Philip Resnik. 2008. Cross-language parser adaptation between related languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. Asian Federation of Natural Language Processing.Google Scholar
Index Terms
Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages
Recommendations
A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing
Cross-lingual dependency parsing approaches have been employed to develop dependency parsers for the languages for which little or no treebanks are available using the treebanks of other languages. A language for which the cross-lingual parser is ...
Dependency treelet translation: the convergence of statistical and example-based machine-translation?
We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with ...
Syntactic discriminative language model rerankers for statistical machine translation
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language ...






Comments