skip to main content
research-article

Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources

Published:30 January 2015Publication History
Skip Abstract Section

Abstract

A novel method to induce wide-coverage Combinatory Categorial Grammar (CCG) resources for Japanese is proposed in this article. For some languages including English, the availability of large annotated corpora and the development of data-based induction of lexicalized grammar have enabled deep parsing, i.e., parsing based on lexicalized grammars. However, deep parsing for Japanese has not been widely studied. This is mainly because most Japanese syntactic resources are represented in chunk-based dependency structures, while previous methods for inducing grammars are dependent on tree corpora. To translate syntactic information presented in chunk-based dependencies to phrase structures as accurately as possible, integration of annotation from multiple dependency-based corpora is proposed. Our method first integrates dependency structures and predicate-argument information and converts them into phrase structure trees. The trees are then transformed into CCG derivations in a similar way to previously proposed methods. The quality of the conversion is empirically evaluated in terms of the coverage of the obtained CCG lexicon and the accuracy of the parsing with the grammar. While the transforming process used in this study is specialized for Japanese, the framework of our method would be applicable to other languages for which dependency-based analysis has been regarded as more appropriate than phrase structure-based analysis due to morphosyntactic features.

References

  1. Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. 2013. Using CCG categories to improve Hindi dependency parsing. In Proceedings of ACL. 604--609.Google ScholarGoogle Scholar
  2. Daisuke Bekki. 2010. Formal Theory of Japanese Syntax. Kuroshio Shuppan (in Japanese).Google ScholarGoogle Scholar
  3. Johan Bos. 2007. Recognising textual entailment and computational semantics. In Proceedings of the 7th International Workshop on Computational Semantics (IWCS).Google ScholarGoogle Scholar
  4. Johan Bos, Cristina Bosco, and Alessandro Mazzei. 2009. Converting adependency treebank to a categorical grammar treebank for Italian. In Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT). 27--38.Google ScholarGoogle Scholar
  5. Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. 2004. Wide-coverage semantic representations from a CCG parser. In Proceedings of COLING. 1240--1246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ruken Çakıcı. 2005. Automatic induction of a CCG grammar for Turkish. In Proceedings of the ACL Student Research Workshop. 73--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Stephen Clark and James R. Curran. 2007. Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguist. 33, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Takao Gunji. 1987. Japanese Phrase Structure Grammar: A Unification-Based Approach. D. Reidel.Google ScholarGoogle Scholar
  9. Hiroki Hanaoka, Hideki Mima, and Jun’ichi Tsujii. 2010. A Japanese particle corpus built by example-based annotation. In Proceedings of LREC.Google ScholarGoogle Scholar
  10. Yuta Hayashibe, Mamoru Komachi, and Yuji Matsumoto. 2011. Japanese predicate argument structure analysis exploiting argument position and type. In Proceedings of IJCNLP. 201--209.Google ScholarGoogle Scholar
  11. Julia Hockenmaier. 2006. Creating a CCGbank and a wide-coverage CCG lexicon for German. In Proceedings of the Joint Conference of COLING/ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Julia Hockenmaier and Mark Steedman. 2007. CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguist. 33, 3, 355--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ryu Iida, Mamoru Komachi, Kentaro Inui, and Yuji Matsumoto. 2007. Annotating a Japanese text corpus with predicate-argument and coreference relations. In Proceedings of the Linguistic Annotation Workshop. 132--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Ryu Iida and Massimo Poesio. 2011. A cross-lingual ILP solution to zero anaphora resolution. In Proceedings of ACL-HLT. 804--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hans Kamp and Uwe Reyle. 1993. From Discourse to Logic. Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer, Dordrecht, The Netherlands.Google ScholarGoogle Scholar
  16. Daisuke Kawahara and Sadao Kurohashi. 2011. Generative modeling of coordination by factoring parallelism and selectional preferences. In Proceedings of IJCNLP 2011.Google ScholarGoogle Scholar
  17. Daisuke Kawahara, Sadao Kurohashi, and Koiti Hasida. 2002. Construction of a Japanese relevance-tagged corpus. In Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing. 495--498 (in Japanese).Google ScholarGoogle Scholar
  18. Nobo Komagata. 1999. Information structure in texts: A computational analysis of contextual appropriateness in English and Japanese, Ph.D. Dissertation, University of Pennsylvania.Google ScholarGoogle Scholar
  19. Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of CoNLL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Sadao Kurohashi and Makoto Nagao. 2003. Building a Japanese parsed corpus. In Treebanks, Anne Abeillé (Ed.), Text, Speech and Language Technology, Vol. 20, Springer, The Netherlands, 249--260.Google ScholarGoogle Scholar
  21. M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguist. 19, 2, 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Takashi Masuoka and Yukinori Takubo. 1989. Basic Japanese Grammar. Kuroshio Publishing, Tokyo (in Japanese).Google ScholarGoogle Scholar
  23. Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguist. 34, 1, 35--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguist. 31, 1, 71--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ivan A. Sag, Thomas Wasow, and Emily M. Bender. 2003. Syntactic Theory: A Formal Introduction. 2nd Ed. CSLI Publications.Google ScholarGoogle Scholar
  26. Ryohei Sasano and Sadao Kurohashi. 2011. A discriminative approach to Japanese zero anaphora resolution with large-scale lexicalized case frames. In Proceedings of IJCNLP.Google ScholarGoogle Scholar
  27. Manabu Sassano and Sadao Kurohashi. 2009. A unified single scan algorithm for Japanese base phrase chunking and dependency parsing. In Proceedings of ACL-IJCNLP. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Melanie Siegel and Emily M. Bender. 2002. Efficient deep processing of Japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mark Steedman. 2001. The Syntactic Process. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sumire Uematsu, Takuya Matsuzaki, Hiroki Hanaoka, Yusuke Miyao, and Hideki Mima. 2013. Integrating multiple dependency corpora for inducing wide-coverage Japanese CCG resources. In Proceedings of ACL. 1042--1051.Google ScholarGoogle Scholar
  31. David Vadas and James Curran. 2007. Adding noun phrase structure to the Penn Treebank. In Proceedings of ACL. 240--247.Google ScholarGoogle Scholar
  32. Emiko Yamada, Eiji Aramaki, Takeshi Imai, and Kazuhiko Ohe. 2010. Internal structure of a disease name and its application for ICD coding. Studies in Health Technol. Informatics 160, 2, 1010--1014.Google ScholarGoogle Scholar
  33. Kazuhiro Yoshida. 2005. Corpus-oriented development of Japanese HPSG parsers. In Proceedings of the ACL Student Research Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Integrating Multiple Dependency Corpora for Inducing Wide-Coverage Japanese CCG Resources

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!