Abstract
Treebank is one of the important and useful resources in natural language processing represented in two different annotated schemas: phrase and dependency structures. There are many works that convert a phrase structure into a dependency structure and vice versa. Most of them are based that exploit the handcrafted head percolation table and argument table in predefined deterministic ways. In this article, we propose a method to convert a dependency structure into a phrase structure by enriching a trainable model of former hybrid strategy approach. By adding a classifier to the algorithm and using postprocessing modification, the quality of conversion is increased. We evaluate our method in two different languages, English and Persian, and then analyze the errors. The results of our experiments show a 46.01% reduction of error rate in English and 76.50% for Persian compared to our baseline. We build a new phrase structure treebank by converting 10,000 sentences of Persian dependency treebank into corresponding phrase structures and correcting them manually.
- Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. 2016. Hindi CCGbank: A CCG treebank from the Hindi dependency treebank. Language Resources and Evaluation 52, 1 (2016), 67--100. Google Scholar
Digital Library
- Tania Avgustinova and Yi Zhang. 2010. Conversion of a Russian dependency treebank into HPSG derivations. In Proceedings of the 9th International Workshop on Treebanks and Linguistic Theories. 7.Google Scholar
- Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. A multi-representational and multi-layered treebank for Hindi/Urdu. In Proceedings of the 3rd Linguistic Annotation Workshop. 186--189. Google Scholar
Digital Library
- Rajesh Bhatt, Owen Rambow, and Fei Xia. 2011. Linguistic phenomena, analyses, and representations: Understanding conversion between treebanks. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP’11). 1234--1242.Google Scholar
- Rajesh Bhatt, Owen Rambow, and Fei Xia. 2012. Creating a tree adjoining grammar from a multilayer treebank. In Proceedings of the 11th International Workshop on Tree Adjoining Grammars and Related Formalisms. 162--170.Google Scholar
- Rajesh Bhatt and Fei Xia. 2012. Challenges in converting between treebanks: A case study from the HUTB. In Proceedings of META-RESEARCH Workshop on Advanced Treebanking in Conjunction With LREC-2012.Google Scholar
- Alena Böhmová, Jan Hajič, Eva Hajičová, and Barbora Hladká. 2003. The Prague dependency treebank. In Treebanks. Springer, 103--127.Google Scholar
- Johan Bos, Cristina Bosco, and Alessandro Mazzei. 2009. Converting a dependency treebank to a categorial grammar treebank for Italian. In Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT’09). 27--38.Google Scholar
- Aoife Cahill, Mairead McCarthy, Josef Van Genabith, and Andy Way. 2002. Automatic annotation of the Penn-treebank with LFG F-structure information. In Proceedings of the LREC 2002 Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Language Data.Google Scholar
- Ruken Cakici. 2005. Automatic induction of a CCG grammar for Turkish. In Proceedings of the ACL Student Research Workshop. 73--78. Google Scholar
Digital Library
- Marie Candito, Joakim Nivre, Pascal Denis, and Enrique Henestroza Anguiano. 2010. Benchmarking of statistical dependency parsers for French. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 108--116. Google Scholar
Digital Library
- Atanas Chanev, Kiril Simov, Petya Osenova, and Svetoslav Marinov. 2006. Dependency conversion and parsing of the BulTreeBank. In Proceedings of the LREC Workshop Merging and Layering Linguistic Information.Google Scholar
- Michael Collins, Lance Ramshaw, Jan Hajič, and Christoph Tillmann. 1999. A statistical parser for Czech. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 505--512. Google Scholar
Digital Library
- Michael A. Covington. 1994. An empirically motivated reinterpretation of dependency grammar. arXiv:cmp-lg/9404004.Google Scholar
- Bart Cramer and Yi Zhang. 2009. Construction of a German HPSG grammar from a detailed treebank. In Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks. 37--45. Google Scholar
Digital Library
- Mark Dras, David Chiang, and William Schuler. 2004. On relations of constituency and dependency grammars. Research on Language and Computation 2, 2 (2004), 281--305.Google Scholar
Cross Ref
- Masood Ghayoomi. 2012. Bootstrapping the development of an HPSG-based treebank for Persian. Linguistic Issues in Language Technology 7, 1 (2012), 1--13.Google Scholar
- Masood Ghayoomi. 2012. Word clustering for Persian statistical parsing. In Advances in Natural Language Processing. Springer, 126--137.Google Scholar
- Masood Ghayoomi and Jonas Kuhn. 2014. Converting an HPSG-based treebank into its parallel dependency treebank. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14).Google Scholar
- Pawan Goyal and Amba Kulkarni. 2014. Converting phrase structures to dependency structures in Sanskrit. In Proceedings of COLING 2014: The 25th International Conference on Computational Linguistics. 1834--1843.Google Scholar
- Julia Hockenmaier. 2001. Statistical parsing for CCG with simple generative models. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (Companion Volume). 7--12.Google Scholar
- Dan Klein and Christopher D. Manning. 2003. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—Volume 1. 423--430. Google Scholar
Digital Library
- Lingpeng Kong, Alexander M. Rush, and Noah A. Smith. 2015. Transforming dependencies into phrase structures. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics.Google Scholar
- Young-Suk Lee and Zhiguo Wang. 2016. Language independent dependency to constituent tree conversion. In Proceedings of COLING 2016: The 26th International Conference on Computational Linguistics: Technical Papers. 421--428.Google Scholar
- Alex Luu, Sophia A. Malamud, and Nianwen Xue. 2016. Converting SynTagRus dependency treebank into Penn treebank style. In Proceedings of the 10th Linguistic Annotation Workshop. 16--21.Google Scholar
Cross Ref
- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn treebank. Computational Linguistics 19, 2 (1993), 313--330. Google Scholar
Digital Library
- Yusuke Miyao, Takashi Ninomiya, and Junichi Tsujii. 2004. Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the Penn treebank. In Proceedings of the International Conference on Natural Language Processing. 684--693. Google Scholar
Digital Library
- Joakim Nivre. 2006. Inductive Dependency Parsing. Springer.Google Scholar
- Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure. In Proceedings of the 7th International Conference on Natural Language Processing. 14--17.Google Scholar
- Likun Qiu, Yue Zhang, Peng Jin, and Houfeng Wang. 2014. Multi-view Chinese treebanking. In Proceedings of COLING 2014: The 25th International Conference on Computational Linguistics. 257--268.Google Scholar
- Mohammad Sadegh Rasooli, Manouchehr Kouhestani, and Amirsaeid Moloodi. 2013. Development of a Persian syntactic dependency treebank. In Proceedings of the 2013 Conference of th North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 306--314.Google Scholar
- Mohammad Sadegh Rasooli, Amirsaeid Moloodi, Manouchehr Kouhestani, and Behrouz Minaei-Bidgoli. 2011. A syntactic valency lexicon for Persian verbs: The first steps towards Persian dependency treebank. In Proceedings of the 5th Language and Technology Conference (LTC’11): Human Language Technologies as a Challenge for Computer Science and Linguistics. 227--231.Google Scholar
- Siva Reddy, Oscar Täckström, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, and Mirella Lapata. 2016. Transforming dependency structures to logical forms for semantic parsing. Transactions of the Association for Computational Linguistics 4 (2016), 127--140.Google Scholar
Cross Ref
- Yuka Tateisi, Kentaro Torisawa, Yusuke Miyao, and Junichi Tsujii. 1998. Translating the XTAG English grammar to HPSG. In Proceedings of the 4th International Workshop on Tree Adjoining Grammars and Related Frameworks, Vol. 4. 172--175.Google Scholar
- Lamia Tounsi, Mohammed Attia, and Josef van Genabith. 2009. Automatic treebank-based acquisition of Arabic LFG dependency structures. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages. 45--52. Google Scholar
Digital Library
- Fei Xia and Martha Palmer. 2001. Converting dependency structures to phrase structures. In Proceedings of the 1st International Conference on Human Language Technology Research. 1--5. Google Scholar
Digital Library
- Fei Xia, Owen Rambow, Rajesh Bhatt, Martha Palmer, and Dipti Misra Sharma. 2009. Towards a multi-representational treebank. In Proceedings of the 7th International Workshop on Treebanks and Linguistic Theories.Google Scholar
- Naoki Yoshinaga and Yusuke Miyao. 2001. Grammar conversion from LTAG to HPSG. In Proceedings of the 6th ESSLLI Student Session. 309--324.Google Scholar
- Kun Yu, Yusuke Miyao, Xiangli Wang, Takuya Matsuzaki, and Junichi Tsujii. 2010. Semi-automatically developing Chinese HPSG grammar from the Penn Chinese treebank for deep parsing. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 1417--1425. Google Scholar
Digital Library
Index Terms
Converting Dependency Structure Into Persian Phrase Structure
Recommendations
Development and evaluation of an Urdu treebank (CLE-UTB) and a statistical parser
AbstractA number of natural language processing tools for Urdu language processing have been developed in the past few years for word segmentation, part of speech tagging, chunking, named entity recognition and parsing. Corpora, especially treebanks, are ...
Unsupervised identification of persian compound verbs
MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part IOne of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian ...
Kazakh Noun Phrase Extraction Based on N-gram and Rules
IALP '10: Proceedings of the 2010 International Conference on Asian Language ProcessingThe aim of the work is to extract Kazakh phrase and basic noun phrase from corpus. For the phrase extraction, N-gram model methods were used, specifically bigram and trigram methods were applied. For basic noun phrase extraction, rule-based methods were ...






Comments