skip to main content
research-article

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

Authors Info & Claims
Published:30 March 2021Publication History
Skip Abstract Section

Abstract

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituent-to-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.

References

  1. Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. arXiv:1603.06042. Retrieved from https://arxiv.org/abs/1603.06042. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Xiang Maoji and Anjian Cairang. 2018. Research and implementation of tibetan syntactic analysis system based on top-down parsing algorithm[J]. Information and Communication 8 (2018), 92--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Razvan C. Bunescu and Raymond J. Mooney. 2005. A shortest path dependency kernel for relation extraction. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT'05). Association for Computational Linguistics, pages 724–731. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Huaque Cairang, Jiang Wenbin, Zhao Haixing, and Liu Qun. 2013. Semi-automatic building tibetan treebank based on word-pair dependency classification[J]. Journal of Chinese Information Processing 27, 5 (2013), 166--172.Google ScholarGoogle Scholar
  5. Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). pages 740–750.Google ScholarGoogle ScholarCross RefCross Ref
  6. Michael Collins. 2003. Head-driven statistical models for natural language parsing. Comput. Ling. 29, 4 (2003), 589–637.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Long Congjun, Liu Huidan, and Zhou Maoke. 2019. Longest noun phrases detection in tibetan[J]. J. Chin. Inf. Process. 33, 2 (2019), 59–66.Google ScholarGoogle Scholar
  8. Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’05). ACM, New York, NY, pp. 400–407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jiang Di. 2003. Syntactic blocks and formal marks in modern tibetan (in chinese). In Language Computing and Content-based text Processing. Tsinghua University Press, Beijing, pp. 160–166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Timothy Dozat and Christopher D. Manning. 2016. Deep biaffine attention for neural dependency parsing. arXiv:1611.01734. Retrieved from https://arxiv.org/abs/1611.01734Google ScholarGoogle Scholar
  11. Timothy Dozat and Christopher D. Manning. 2018. Simpler but more accurate semantic dependency parsing. arXiv:1807.01396. Retrieved from https://arxiv.org/abs/1807.01396.Google ScholarGoogle Scholar
  12. Long Duong, Trevor Cohn, Steven Bird, and Paul Cook. 2015. A neural network model for low-resource universal dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. pp. 339–348.Google ScholarGoogle ScholarCross RefCross Ref
  13. Carlos Gómez-Rodríguez, Tianze Shi, and Lillian Lee. 2018. Global transition-based non-projective dependency parsing. arXiv:1807.01745. Retrieved from https://arxiv.org/abs/1807.01745.Google ScholarGoogle Scholar
  14. Tashi Gyal and Duo La. 2015. Research center of tibetan information technology, tibet university; Northwest university for nationalities; Theory and method of tibetan dependency treebank construction[J]. Tibet. Univ. 30, 2 (2015), 76--83.Google ScholarGoogle Scholar
  15. Hua quecairang and Zhao Haixing. 2013. Tibetan text dependency syntactic analysis based on discriminant[J]. Computer Engineering 39, 4 (2013), 300--304.Google ScholarGoogle Scholar
  16. Tao Ji, Yuanbin Wu, and Man Lan. 2019. Graph-based dependency parsing with graph neural networks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’19). 2475–2485.Google ScholarGoogle ScholarCross RefCross Ref
  17. D. Klein and C. D. Manning. 2002. Fast exact inference with a factored model for natural language Parsing. NIPS. 3--10.Google ScholarGoogle Scholar
  18. David M. Magerman. 1994. Natural language parsing as statistical pattern recognition. arXiv:cmp-lg/9405009. Retrieved from http://arxiv.org/abs/cmp-lg/9405009.Google ScholarGoogle Scholar
  19. Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajic. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 523–530Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Joakim Nivre. 2015. Towards a universal grammar for natural language Processing. CICLing 1. 3--16.Google ScholarGoogle Scholar
  21. Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. Maltparser: A data-driven parser-generator for dependency parsing. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’06), Vol. 6. 2216–2219.Google ScholarGoogle Scholar
  22. Wenzhe Pei, Tao Ge, and Baobao Chang. 2015. An effective neural network model for graph-based dependency parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’15). 313–32.Google ScholarGoogle ScholarCross RefCross Ref
  23. Michael Sejr Schlichtkrull and Anders Søgaard. 2017. Cross-lingual dependency parsing with late decoding for truly low-resource languages. arXiv:1701.01623. Retrieved from https://arxiv.org/abs/1701.01623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tianze Shi, Liang Huang, and Lillian Lee. 2017a. Fast(er) exact decoding and global training for transition-based dependency parsing via a minimal feature set. arXiv:1708.09403. Retrieved from https://arxiv.org/abs/1708.09403.Google ScholarGoogle Scholar
  25. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 1631–1642.Google ScholarGoogle Scholar
  26. Wang Tianhang, Shi Shumin, Long Congjun, Huang Heyan, and Li Lin. 2014. Tibetan chunking based on error-driven learning startegy (in chinese). J. Chin. Inf. Process. 28, 05 (2014), 170–175+191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Xinyu Wang, Jingxian Huang, and Kewei Tu. 2019. Second-order semantic dependency parsing with end-to-end neural networks. arXiv:1906.07880. Retrieved from https://arxiv.org/abs/1906.07880.Google ScholarGoogle Scholar
  28. Wenhui Wang and Baobao Chang. 2016. Graph based dependency parsing with bidirectional LSTM. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL’16).Google ScholarGoogle ScholarCross RefCross Ref
  29. David Weiss, Chris Alberti, Michael Collins, and Slav Petrov. 2015. Structured training for neural network transition-based parsing. arXiv:1506.06158. Retrieved from https://arxiv.org/abs/1506.06158.Google ScholarGoogle Scholar
  30. Peng Xu, Jaeho Kang, Michael Ringgaard, and Franz Och. 2009. Using a dependency parser to improve smt for subject-object-verb languages. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’09). Association for Computational Linguistics, pp. 245–253.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Conference on Parsing Technologies. pp. 195–206.Google ScholarGoogle Scholar

Index Terms

  1. Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)2

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!