Abstract
A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent on tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyntactic features.
- Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. 2013. Using CCG categories to improve Hindi dependency parsing. In Proceedings of ACL. 604--609.Google Scholar
- Daisuke Bekki. 2010. Formal Theory of Japanese Syntax. Kuroshio Shuppan (in Japanese).Google Scholar
- Johan Bos. 2007. Recognising textual entailment and computational semantics. In Proceedings of the 7th International Workshop on Computational Semantics (IWCS).Google Scholar
- Johan Bos, Cristina Bosco, and Alessandro Mazzei. 2009. Converting adependency treebank to a categorical grammar treebank for Italian. In Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT). 27--38.Google Scholar
- Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. 2004. Wide-coverage semantic representations from a CCG parser. In Proceedings of COLING. 1240--1246. Google Scholar
Digital Library
- Ruken Çakıcı. 2005. Automatic induction of a CCG grammar for Turkish. In Proceedings of the ACL Student Research Workshop. 73--78. Google Scholar
Digital Library
- Stephen Clark and James R. Curran. 2007. Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguist. 33, 4. Google Scholar
Digital Library
- Takao Gunji. 1987. Japanese Phrase Structure Grammar: A Unification-Based Approach. D. Reidel.Google Scholar
- Hiroki Hanaoka, Hideki Mima, and Jun’ichi Tsujii. 2010. A Japanese particle corpus built by example-based annotation. In Proceedings of LREC.Google Scholar
- Yuta Hayashibe, Mamoru Komachi, and Yuji Matsumoto. 2011. Japanese predicate argument structure analysis exploiting argument position and type. In Proceedings of IJCNLP. 201--209.Google Scholar
- Julia Hockenmaier. 2006. Creating a CCGbank and a wide-coverage CCG lexicon for German. In Proceedings of the Joint Conference of COLING/ACL. Google Scholar
Digital Library
- Julia Hockenmaier and Mark Steedman. 2007. CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguist. 33, 3, 355--396. Google Scholar
Digital Library
- Ryu Iida, Mamoru Komachi, Kentaro Inui, and Yuji Matsumoto. 2007. Annotating a Japanese text corpus with predicate-argument and coreference relations. In Proceedings of the Linguistic Annotation Workshop. 132--139. Google Scholar
Digital Library
- Ryu Iida and Massimo Poesio. 2011. A cross-lingual ILP solution to zero anaphora resolution. In Proceedings of ACL-HLT. 804--813. Google Scholar
Digital Library
- Hans Kamp and Uwe Reyle. 1993. From Discourse to Logic. Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer, Dordrecht, The Netherlands.Google Scholar
- Daisuke Kawahara and Sadao Kurohashi. 2011. Generative modeling of coordination by factoring parallelism and selectional preferences. In Proceedings of IJCNLP 2011.Google Scholar
- Daisuke Kawahara, Sadao Kurohashi, and Koiti Hasida. 2002. Construction of a Japanese relevance-tagged corpus. In Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing. 495--498 (in Japanese).Google Scholar
- Nobo Komagata. 1999. Information structure in texts: A computational analysis of contextual appropriateness in English and Japanese, Ph.D. Dissertation, University of Pennsylvania.Google Scholar
- Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL. Google Scholar
Digital Library
- Sadao Kurohashi and Makoto Nagao. 2003. Building a Japanese parsed corpus. In Treebanks, Anne Abeillé (Ed.), Text, Speech and Language Technology, Vol. 20, Springer, The Netherlands, 249--260.Google Scholar
- M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguist. 19, 2, 313--330. Google Scholar
Digital Library
- Takashi Masuoka and Yukinori Takubo. 1989. Basic Japanese Grammar. Kuroshio Publishing, Tokyo (in Japanese).Google Scholar
- Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguist. 34, 1, 35--80. Google Scholar
Digital Library
- Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguist. 31, 1, 71--106. Google Scholar
Digital Library
- Ivan A. Sag, Thomas Wasow, and Emily M. Bender. 2003. Syntactic Theory: A Formal Introduction. 2nd Ed. CSLI Publications.Google Scholar
- Ryohei Sasano and Sadao Kurohashi. 2011. A discriminative approach to Japanese zero anaphora resolution with large-scale lexicalized case frames. In Proceedings of IJCNLP.Google Scholar
- Manabu Sassano and Sadao Kurohashi. 2009. A unified single scan algorithm for Japanese base phrase chunking and dependency parsing. In Proceedings of ACL-IJCNLP. Google Scholar
Digital Library
- Melanie Siegel and Emily M. Bender. 2002. Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Google Scholar
Digital Library
- Mark Steedman. 2001. The Syntactic Process. MIT Press. Google Scholar
Digital Library
- Sumire Uematsu, Takuya Matsuzaki, Hiroki Hanaoka, Yusuke Miyao, and Hideki Mima. 2013. Integrating multiple dependency corpora for inducing wide-coverage Japanese CCG resources. In Proceedings of ACL. 1042--1051.Google Scholar
- David Vadas and James Curran. 2007. Adding noun phrase structure to the Penn Treebank. In Proceedings of ACL. 240--247.Google Scholar
- Emiko Yamada, Eiji Aramaki, Takeshi Imai, and Kazuhiko Ohe. 2010. Internal structure of a disease name and its application for ICD coding. Studies in Health Technol. Informatics 160, 2, 1010--1014.Google Scholar
- Kazuhiro Yoshida. 2005. Corpus-oriented development of Japanese HPSG parsers. In Proceedings of the ACL Student Research Workshop. Google Scholar
Digital Library
Index Terms
Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources
Recommendations
Improving Telugu Dependency Parsing using Combinatory Categorial Grammar Supertags
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG ...
Building deep dependency structures with a wide-coverage CCG parser
ACL '02: Proceedings of the 40th Annual Meeting on Association for Computational LinguisticsThis paper describes a wide-coverage statistical parser that uses Combinatory Categorial Grammar (CCG) to derive dependency structures. The parser differs from most existing wide-coverage treebank parsers in capturing the long-range dependencies ...
Wide-coverage efficient statistical parsing with ccg and log-linear models
This article describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are “full” parsing models in the sense that probabilities are defined for complete parses, rather than for independent events ...






Comments