Abstract
In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly incorporating syntactically relevant information. In this article, we enhance transition-based parsing of Hindi and Urdu by redefining the features and feature extraction procedures that have been previously proposed in the parsing literature of Indian languages. We propose and empirically show that properly incorporating syntactically relevant information like case marking, complex predication and grammatical agreement in an arc-eager parsing model can significantly improve parsing accuracy. Our experiments show an absolute improvement of ∼2% LAS for parsing of both Hindi and Urdu over a competitive baseline which uses rich features like part-of-speech (POS) tags, chunk tags, cluster ids and lemmas. We also propose some heuristics to identify ezafe constructions in Urdu texts which show promising results in parsing these constructions.
- Wajid Ali and Sarmad Hussain. 2010. Urdu dependency parser: A data-driven approach. In Proceedings of Conference on Language and Technology (CLT’10), SNLP, Lahore, Pakistan.Google Scholar
- Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. 2013. Using CCG categories to improve Hindi dependency parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 604--609.Google Scholar
- Bharat Ram Ambati, Samar Husain, Sambhav Jain, Dipti Misra Sharma, and Rajeev Sangal. 2010a. Two methods to incorporate local morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 22--30.Google Scholar
Digital Library
- Bharat Ram Ambati, Samar Husain, Joakim Nivre, and Rajeev Sangal. 2010b. On the role of morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 94--102.Google Scholar
Digital Library
- Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai, and Rajeev Sangal. 2008. Dependency annotation scheme for Indian languages. In Proceedings of the T3rd International Joint Conference on Natural Language Processing: Volume II. Citeseer, 721--726.Google Scholar
- Rafiya Begum, Karan Jindal, Ashish Jain, Samar Husain, and Dipti Misra Sharma. 2011. Identification of conjunct verbs in Hindi and its effect on parsing accuracy. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing-Volume Part I. Springer, 29--40. Google Scholar
Cross Ref
- Kepa Bengoetxea and Koldo Gojenola. 2009. Application of feature propagation to dependency parsing. In Proceedings of the 11th International Conference on Parsing Technologies. 142--145. Google Scholar
Cross Ref
- Kepa Bengoetxea, Koldo Gojenola, and Arantza Casillas. 2011. Testing the effect of morphological disambiguation in dependency parsing of Basque. In Proceedings of the 2nd Workshop on Statistical Parsing of Morphologically Rich Languages. 28--33.Google Scholar
Digital Library
- Akshar Bharati, Vineet Chaitanya, Rajeev Sangal, and K. V. Ramakrishnamacharyulu. 1995. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi, India.Google Scholar
- Akshar Bharati, D. M. Sharma S. Husain, L. Bai, R. Begam, and R. Sangal. 2009. AnnCorra: TreeBanks for Indian Languages, Guidelines for Annotating Hindi TreeBank (version 2.0).Google Scholar
- Riyaz Ahmad Bhat, Rajesh Bhatt, Annahita Farudi, Prescott Klassen, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, Ashwini Vaidya, Sri Ramagurumurthy Vishnu, and others. 2015. The Hindi/Urdu treebank project. In Handbook of Linguistic Annotation. Springer.Google Scholar
- Riyaz Ahmad Bhat, Naman Jain, Ashwini Vaidya, Martha Palmer, Tafseer Ahmed Khan, Dipti Misra Sharma, and James Babani. 2014. Adapting predicate frames for Urdu PropBanking. In Proceedings of LT4CloseLang: Language Technology for Closely Related Languages and Language Variants. Google Scholar
Cross Ref
- Riyaz Ahmad Bhat, Sambhav Jain, and Dipti Misra Sharma. 2012. Experiments on dependency parsing of Urdu. In The 11th International Workshop on Treebanks and Linguistic Theories.Google Scholar
- Riyaz Ahmad Bhat and Dipti Misra Sharma. 2012. Non-projective structures in Indian language treebanks. In The 11th International Workshop on Treebanks and Linguistic Theories. Edições Colibri, 25--30.Google Scholar
- Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. A multi-representational and multi-layered treebank for Hindi/Urdu. In Proceedings of the 3rd Linguistic Annotation Workshop. 186--189. Google Scholar
Cross Ref
- Pushpak Bhattacharyya. 2010. IndoWordNet. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10) (19-21), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias (Eds.). European Language Resources Association (ELRA), Valletta, Malta.Google Scholar
- Tina Bögel, Miriam Butt, and Sebastian Sulger. 2008. Urdu ezafe and the morphology-syntax interface. In Proceedings of Lexical Functional Grammar. CSLI. Stanford, CA.Google Scholar
- Bernd Bohnet. 2010. Very high accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics. 89--97.Google Scholar
Digital Library
- Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistics 18, 4, 467--479.Google Scholar
Digital Library
- Miriam Butt, Tina Bögel, Annette Hautli, Sebastian Sulger, and Tafseer Ahmed. 2012. Identifying Urdu complex predication via bigram extraction. In Proceedings of the 24th International Conference on Computational Linguistics. 409--424.Google Scholar
- Miriam Butt and Tracy Holloway King. 2004. The status of case. In Clause Structure in South Asian Languages. Springer, 153--198. Google Scholar
Cross Ref
- Marie Candito and Matthieu Constant. 2014. Strategies for contiguous multiword expression analysis and dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Google Scholar
Cross Ref
- Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Vol. 1. 740--750. Google Scholar
Cross Ref
- Jinho D. Choi, Joel Tetreault, and Amanda Stent. 2015. It depends: Dependency parser comparison using a web-based evaluation tool. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 26--31. Google Scholar
Cross Ref
- Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16, 1, 22--29.Google Scholar
Digital Library
- Michael Collins. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Volume 10. 1--8. Google Scholar
Digital Library
- Matthieu Constant, Anthony Sigogne, and Patrick Watrin. 2012. Discriminative strategies to integrate multiword expression recognition and parsing. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, Volume 1. 204--212.Google Scholar
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3, 273--297. Google Scholar
Cross Ref
- Jason M. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th Conference on Computational Linguistics, Volume 1. 340--345. Google Scholar
Digital Library
- Gülşen Eryiğit, Tugay Ilbay, and Ozan Arkan Can. 2011. Multiword expressions in statistical dependency parsing. In Proceedings of the 2nd Workshop on Statistical Parsing of Morphologically Rich Languages. 45--55.Google Scholar
Digital Library
- Yoav Goldberg and Michael Elhadad. 2010. Easy first dependency parsing of modern Hebrew. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 103--107.Google Scholar
Digital Library
- Yoav Goldberg and Michael Elhadad. 2013. Word segmentation, unknown-word resolution, and morphological agreement in a Hebrew parsing system. Computational Linguistics 39, 1, 121--160. Google Scholar
Digital Library
- Yoav Goldberg and Joakim Nivre. 2012. A dynamic oracle for arc-eager dependency parsing. In Proceedings of the 24th International Conference on Computational Linguistics. 959--976.Google Scholar
- Yoav Goldberg and Joakim Nivre. 2013. Training deterministic parsers with non-deterministic oracles. Transactions of the Association for Computational Linguistics 1, 403--414.Google Scholar
Cross Ref
- Jan Hajic, Jarmila Panevová, Eva Hajicová, Petr Sgall, Petr Pajas, Jan Štepánek, Jiří Havelka, Marie Mikulová, Zdenek ZabokrtskÀ, and Magda Ševcıková Razımová. 2006. Prague dependency treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia 98.Google Scholar
- Johan Hall, Joakim Nivre, and Jens Nilsson. 2006. Discriminative classifiers for deterministic dependency parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 316--323. Google Scholar
Cross Ref
- Matthew Hohensee. 2012. It’s Only Morpho-Logical: Modeling Agreement in Cross-Linguistic Dependency Parsing. Ph.D. Dissertation. University of Washington, Seattle, WA.Google Scholar
- Matt Hohensee and Emily M. Bender. 2012. Getting more from morphology in multilingual dependency parsing. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 315--326.Google Scholar
- Dirk Hovy, Stephen Tratz, and Eduard Hovy. 2010. What’s in a preposition? Dimensions of sense disambiguation for an interesting word class. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 454--462.Google Scholar
- Samar Husain. 2011. A Generalized Parsing Framework Based On Computational Paninian Grammar. Ph.D. Dissertation. IIIT-Hyderabad, India.Google Scholar
- Sambhav Jain, Naman Jain, Aniruddha Tammewar, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2013. Exploring semantic information in Hindi WordNet for Hindi dependency parsing. In International Joint Conference on Natural Language Processing, Nagoya, Japan, 14--18 October 2013. 189--197.Google Scholar
- Terry Koo and Michael Collins. 2010. Efficient third-order dependency parsers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1--11.Google Scholar
Digital Library
- Prudhvi Kosaraju, Samar Husain, Bharat Ram Ambati, Dipti Misra Sharma, and Rajeev Sangal. 2012. Intra-chunk dependency annotation: Expanding Hindi inter-chunk annotated treebank. In Proceedings of the 6th Linguistic Annotation Workshop. 49--56.Google Scholar
- Prudhvi Kosaraju, Sruthilaya Reddy Kesidi, Vinay Bhargav Reddy Ainavolu, and Puneeth Kukkadapu. 2010. Experiments on Indian language dependency parsing. In Proceedings of the ICON10 NLP Tools Contest: Indian Language Dependency Parsing.Google Scholar
- Sandra Kübler, Ryan McDonald, and Joakim Nivre. 2009. Dependency parsing. Synthesis Lectures on Human Language Technologies 1, 1, 1--127. Google Scholar
Cross Ref
- Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of the 6th Conference on Natural Language Learning, Volume 20. 1--7. Google Scholar
Digital Library
- P. Liang. 2005. Semi-Supervised Learning for Natural Language. Master’s thesis. Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- Deepak Kumar Malladi and Prashanth Mannem. 2013. Statistical morphological analyzer for Hindi. In Proceedings of the 6th International Joint Conference on Natural Language Processing. 1007--1011.Google Scholar
- Yuval Marton, Nizar Habash, and Owen Rambow. 2013. Dependency parsing of modern standard Arabic with lexical and inflectional features. Computational Linguistics 39, 1, 161--194. Google Scholar
Digital Library
- Colin P. Masica. 1993. The Indo-Aryan Languages. Cambridge University Press, New York, NY.Google Scholar
- Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 523--530. Google Scholar
Digital Library
- Ryan T. McDonald and Joakim Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 122--131.Google Scholar
- Tara Mohanan. 1994. Argument Structure in Hindi. Center for the Study of Language (CSLI).Google Scholar
- Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT’03).Google Scholar
- Joakim Nivre. 2004. Incrementality in deterministic dependency parsing. In Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. 50--57. Google Scholar
Cross Ref
- Joakim Nivre. 2008. Algorithms for deterministic incremental dependency parsing. Computational Linguistics 34, 4, 513--553. Google Scholar
Digital Library
- Joakim Nivre. 2009. Parsing Indian languages with maltparser. In Proceedings of the ICON09 NLP Tools Contest: Indian Language Dependency Parsing. 12--18.Google Scholar
- Joakim Nivre and Jens Nilsson. 2005. Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 99--106. Google Scholar
Digital Library
- Alireza Nourian, Mohammad Sadegh Rasooli, Mohsen Imany, and Heshaam Faili. 2015. On the importance of ezafe construction in Persian parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Beijing, China, 877--882. http://www.aclweb.org/anthology/P15-2144.Google Scholar
- Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure. In The 7th International Conference on Natural Language Processing. 14--17.Google Scholar
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 2825--2830.Google Scholar
Digital Library
- Anuradha Saksena. 1982. Case marking semantics. Lingua 56, 3, 335--343. Google Scholar
Cross Ref
- Ruth Laila Schmidt. 2013. Urdu: An Essential Grammar. Routledge, Abingdon-on-Thames, UK.Google Scholar
- Wolfgang Seeker and Jonas Kuhn. 2013. Morphological and syntactic case in statistical dependency parsing. Computational Linguistics 39, 1, 23--55. Google Scholar
Digital Library
- Peter Svenonius. 2007. Adpositions, particles and the arguments they introduce. Argument Structure 108, 63. Google Scholar
Cross Ref
- Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 477--487.Google Scholar
Digital Library
- Aniruddha Tammewar, Karan Singla, Bhasha Agrawal, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2015. Can distributed word embeddings be an alternative to costly linguistic features: A study on parsing Hindi. In Proceedings of the 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL’15). 21--30.Google Scholar
- Lucien Tesnière. 1959. Eléments de Syntaxe Structurale. Librairie C. Klincksieck.Google Scholar
- Reut Tsarfaty, Djamé Seddah, Yoav Goldberg, Sandra Kübler, Marie Candito, Jennifer Foster, Yannick Versley, Ines Rehbein, and Lamia Tounsi. 2010. Statistical parsing of morphologically rich languages (SPMRL): What, how and whither. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages. 1--12.Google Scholar
- Reut Tsarfaty, Djamé Seddah, Sandra Kübler, and Joakim Nivre. 2013. Parsing morphologically rich languages: Introduction to the special issue. Computational Linguistics 39, 1, 15--22. Google Scholar
Digital Library
- Reut Tsarfaty and Khalil Sima’an. 2010. Modeling morphosyntactic agreement in constituency-based parsing of modern Hebrew. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 40--48.Google Scholar
Digital Library
- Ashwini Vaidya, Martha Palmer, and Bhuvana Narasimhan. 2013. Semantic roles for nominal predicates: Building a lexical resource. In The 9th Workshop on Multi-word Expressions, NAACL. 126.Google Scholar
- Fei Xia, Owen Rambow, Rajesh Bhatt, Martha Palmer, and Dipti Misra Sharma. 2009. Towards a multi-representational treebank. In The 7th International Workshop on Treebanks and Linguistic Theories. Groningen, Netherlands. 159--170.Google Scholar
- Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technology, Vol. 3. 195--206.Google Scholar
- Yue Zhang and Stephen Clark. 2008. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing using beam-search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 562--571. Google Scholar
Cross Ref
- Yue Zhang and Joakim Nivre. 2011. Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 188--193.Google Scholar
Index Terms
Improving Transition-Based Dependency Parsing of Hindi and Urdu by Modeling Syntactically Relevant Phenomena
Recommendations
A Survey of the Model Transfer Approaches to Cross-Lingual Dependency Parsing
Cross-lingual dependency parsing approaches have been employed to develop dependency parsers for the languages for which little or no treebanks are available using the treebanks of other languages. A language for which the cross-lingual parser is ...
Minimalist Grammar Transition-Based Parsing
Logical Aspects of Computational Linguistics. Celebrating 20 Years of LACL (1996–2016)AbstractCurrent chart-based parsers of Minimalist Grammars exhibit prohibitively high polynomial complexity that makes them unusable in practice. This paper presents a transition-based parser for Minimalist Grammars that approximately searches through the ...
Improving Telugu Dependency Parsing using Combinatory Categorial Grammar Supertags
We show that Combinatory Categorial Grammar (CCG) supertags can improve Telugu dependency parsing. In this process, we first extract a CCG lexicon from the dependency treebank. Using both the CCG lexicon and the dependency treebank, we create a CCG ...






Comments