skip to main content
research-article

Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages

Authors Info & Claims
Published:05 June 2019Publication History
Skip Abstract Section

Abstract

Transfer parsing has been used for developing dependency parsers for languages with no treebank by using transfer from treebanks of other languages (source languages). In delexicalized transfer, parsed words are replaced by their part-of-speech tags. Transfer parsing may not work well if a language does not follow uniform syntactic structure with respect to its different constituent patterns. Earlier work has used information derived from linguistic databases to transform a source language treebank to reduce the syntactic differences between the source and the target languages.

We propose a transformation method where a source language pattern is transformed stochastically to one of the multiple possible patterns followed in the target language. The transformed source language treebank can be used to train a delexicalized parser in the target language. We show that this method significantly improves the average performance of single-source delexicalized transfer parsers.

We also show that, in the multi-source settings, parsers trained using a concatenation of transformed source language treebanks work better when a subset of the source language treebanks is used rather than concatenating all of them or only one.

However, the problem of selecting the subset of treebanks whose combination gives the best-performing parser from the set of all the available treebanks is hard. We propose a greedy selection heuristic based on the labelled attachment scores of the corresponding single-source parsers trained using the treebanks after transformation.

References

  1. Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2016. Many languages, one parser. Retrieved from: arXiv preprint arXiv:1602.01595.Google ScholarGoogle Scholar
  2. Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. Retrieved from: http://arxiv.org/abs/1603.06042.Google ScholarGoogle Scholar
  3. Lauriane Aufrant, Guillaume Wisniewski, and François Yvon. 2016. Zero-resource dependency parsing: Boosting delexicalized cross-lingual transfer with linguistic knowledge. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16). 119--130. Retrieved from: http://aclweb.org/anthology/C/C16/C16-1012.pdf.Google ScholarGoogle Scholar
  4. Anders Björkelund, Agnieszka Falenska, Xiang Yu, and Jonas Kuhn. 2017. IMS at the CoNLL 2017 UD shared task: CRFs and perceptrons meet neural networks. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, 40--51. Retrieved from: http://www.aclweb.org/anthology/K17-3004.Google ScholarGoogle ScholarCross RefCross Ref
  5. Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 740--750.Google ScholarGoogle Scholar
  6. Timothy Dozat, Peng Qi, and Christopher D. Manning. 2017. Stanford’s graph-based neural dependency parser at the CoNLL 2017 shared task. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. Association for Computational Linguistics, 20--30. Retrieved from: http://www.aclweb.org/anthology/K17-3002.Google ScholarGoogle Scholar
  7. Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. Cross-lingual transfer for unsupervised dependency parsing without parallel data. In Proceedings of the 19th Conference on Computational Natural Language Learning. 113--122.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2015. Cross-lingual dependency parsing based on distributed representations. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1234--1244.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jiang Guo, Wanxiang Che, David Yarowsky, Haifeng Wang, and Ting Liu. 2016. A representation learning framework for multi-source transfer parsing. In Proceedings of the AAAI Conference on Artificial Intelligence. 2734--2740. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Martin Haspelmath. 2005. The World Atlas of Language Structures, Martin Haspelmath et al. (Eds.), Oxford University Press, Oxford, UK, 695 pages.Google ScholarGoogle Scholar
  11. Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel texts. Nat. Lang. Eng. 11 (2005), 11--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ophélie Lacroix, Lauriane Aufrant, Guillaume Wisniewski, and François Yvon. 2016. Frustratingly easy cross-lingual transfer for transition-based dependency parsing. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1058--1063. Retrieved from: http://www.aclweb.org/anthology/N16-1121.Google ScholarGoogle ScholarCross RefCross Ref
  13. Xuezhe Ma and Fei Xia. 2014. Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In Proceedings of the 52nd Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1337--1348. Retrieved from: http://www.aclweb.org/anthology/P/P14/P14-1126.Google ScholarGoogle ScholarCross RefCross Ref
  14. Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11). Association for Computational Linguistics, 62--72. Retrieved from: http://dl.acm.org/citation.cfm?id=2145432.2145440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. In Proceedings of the 50th Meeting of the Association for Computational Linguistics: Long Papers—Volume 1 (ACL’12). Association for Computational Linguistics, 629--637. Retrieved from: http://dl.acm.org/citation.cfm?id=2390524.2390613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joakim Nivre. 2016. Universal dependencies: A cross-linguistic perspective on grammar and lexicon. In Proceedings of the Workshop on Grammar and Lexicon: Interactions and Interfaces (GramLex). The COLING Organizing Committee, Osaka, Japan, 38--40. https://www.aclweb.org/anthology/W16-3806.Google ScholarGoogle Scholar
  17. Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association, 1659--1666.Google ScholarGoogle Scholar
  18. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uäur Doäan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 23--25.Google ScholarGoogle Scholar
  19. Mohammad Sadegh Rasooli and Michael Collins. 2015. Density-driven cross-lingual transfer of dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’15). Association for Computational Linguistics, 328--338. Retrieved from: http://aclweb.org/anthology/D15-1039.Google ScholarGoogle ScholarCross RefCross Ref
  20. Mohammad Sadegh Rasooli and Michael Collins. 2016. Cross-lingual syntactic transfer with limited resources. Retrieved from: arXiv preprint arXiv:1610.06227.Google ScholarGoogle Scholar
  21. Rudolf Rosa and Zdenek Zabokrtsky. 2015. Klcpos3-a language similarity measure for delexicalized parser transfer. In Proceedings of the 53rd Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 243--249.Google ScholarGoogle ScholarCross RefCross Ref
  22. Michael Sejr Schlichtkrull and Anders Søgaard. 2017. Cross-lingual dependency parsing with late decoding for truly low-resource languages. Retrieved from: arXiv preprint arXiv:1701.01623.Google ScholarGoogle Scholar
  23. Anders Søgaard. 2011. Data point selection for cross-language adaptation of dependency parsers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, Oregon, 682--686. https://www.aclweb.org/anthology/P11-2120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Milan Straka, Jan Hajič, Jana Straková, and Jan Hajič jr. 2014. Parsing universal dependency treebanks using neural networks and search-based oracle. In Proceedings of the International Workshop on Treebanks and Linguistic Theories (TLT’14). 208--220.Google ScholarGoogle Scholar
  25. Milan Straka, Jan Hajič, and Jana Straková. 2016. UDPipe: Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and parsing. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association.Google ScholarGoogle Scholar
  26. Oscar Täckström, Ryan McDonald, and Joakim Nivre. 2013. Target language adaptation of discriminative transfer parsers. In Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics. 1061--1071. http://aclweb.org/anthology/N/N13/N13-1126.pdf.Google ScholarGoogle Scholar
  27. Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 477--487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jörg Tiedemann. 2014. Rediscovering annotation projection for cross-lingual parser induction. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1854--1864.Google ScholarGoogle Scholar
  29. Jörg Tiedemann. 2015. Improving the cross-lingual projection of syntactic dependencies. In Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA’15). Linköping University Electronic Press, 191--199. Retrieved from: http://www.aclweb.org/anthology/W15-1824.Google ScholarGoogle Scholar
  30. Jörg Tiedemann, Željko Agić, and Joakim Nivre. 2014. Treebank translation for cross-lingual parser induction. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 130--140. Retrieved from: http://www.aclweb.org/anthology/W/W14/W14-1614.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jörg Tiedemann and Zeljko Agic. 2016. Synthetic treebanking for cross-lingual dependency parsing. J. Artif. Intell. Res. 55 (2016), 209--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Min Xiao and Yuhong Guo. 2014. Distributed word representation learning for cross-lingual dependency parsing. In Proceedings of the 18th Conference on Computational Natural Language Learning. 119--129.Google ScholarGoogle ScholarCross RefCross Ref
  33. Min Xiao and Yuhong Guo. 2015. Annotation projection-based representation learning for cross-lingual dependency parsing. In Proceedings of the 19th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 73--82. Retrieved from: http://www.aclweb.org/anthology/K15-1008.Google ScholarGoogle ScholarCross RefCross Ref
  34. D. Zeman and Philip Resnik. 2008. Cross-language parser adaptation between related languages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. Asian Federation of Natural Language Processing.Google ScholarGoogle Scholar

Index Terms

  1. Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)29
      • Downloads (Last 6 weeks)1

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!