skip to main content
research-article
Public Access

Improving Transition-Based Dependency Parsing of Hindi and Urdu by Modeling Syntactically Relevant Phenomena

Published:20 January 2017Publication History
Skip Abstract Section

Abstract

In recent years, transition-based parsers have shown promise in terms of efficiency and accuracy. Though these parsers have been extensively explored for multiple Indian languages, there is still considerable scope for improvement by properly incorporating syntactically relevant information. In this article, we enhance transition-based parsing of Hindi and Urdu by redefining the features and feature extraction procedures that have been previously proposed in the parsing literature of Indian languages. We propose and empirically show that properly incorporating syntactically relevant information like case marking, complex predication and grammatical agreement in an arc-eager parsing model can significantly improve parsing accuracy. Our experiments show an absolute improvement of ∼2% LAS for parsing of both Hindi and Urdu over a competitive baseline which uses rich features like part-of-speech (POS) tags, chunk tags, cluster ids and lemmas. We also propose some heuristics to identify ezafe constructions in Urdu texts which show promising results in parsing these constructions.

References

  1. Wajid Ali and Sarmad Hussain. 2010. Urdu dependency parser: A data-driven approach. In Proceedings of Conference on Language and Technology (CLT’10), SNLP, Lahore, Pakistan.Google ScholarGoogle Scholar
  2. Bharat Ram Ambati, Tejaswini Deoskar, and Mark Steedman. 2013. Using CCG categories to improve Hindi dependency parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 604--609.Google ScholarGoogle Scholar
  3. Bharat Ram Ambati, Samar Husain, Sambhav Jain, Dipti Misra Sharma, and Rajeev Sangal. 2010a. Two methods to incorporate local morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 22--30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bharat Ram Ambati, Samar Husain, Joakim Nivre, and Rajeev Sangal. 2010b. On the role of morphosyntactic features in Hindi dependency parsing. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 94--102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai, and Rajeev Sangal. 2008. Dependency annotation scheme for Indian languages. In Proceedings of the T3rd International Joint Conference on Natural Language Processing: Volume II. Citeseer, 721--726.Google ScholarGoogle Scholar
  6. Rafiya Begum, Karan Jindal, Ashish Jain, Samar Husain, and Dipti Misra Sharma. 2011. Identification of conjunct verbs in Hindi and its effect on parsing accuracy. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing-Volume Part I. Springer, 29--40. Google ScholarGoogle ScholarCross RefCross Ref
  7. Kepa Bengoetxea and Koldo Gojenola. 2009. Application of feature propagation to dependency parsing. In Proceedings of the 11th International Conference on Parsing Technologies. 142--145. Google ScholarGoogle ScholarCross RefCross Ref
  8. Kepa Bengoetxea, Koldo Gojenola, and Arantza Casillas. 2011. Testing the effect of morphological disambiguation in dependency parsing of Basque. In Proceedings of the 2nd Workshop on Statistical Parsing of Morphologically Rich Languages. 28--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Akshar Bharati, Vineet Chaitanya, Rajeev Sangal, and K. V. Ramakrishnamacharyulu. 1995. Natural Language Processing: A Paninian Perspective. Prentice-Hall of India, New Delhi, India.Google ScholarGoogle Scholar
  10. Akshar Bharati, D. M. Sharma S. Husain, L. Bai, R. Begam, and R. Sangal. 2009. AnnCorra: TreeBanks for Indian Languages, Guidelines for Annotating Hindi TreeBank (version 2.0).Google ScholarGoogle Scholar
  11. Riyaz Ahmad Bhat, Rajesh Bhatt, Annahita Farudi, Prescott Klassen, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, Ashwini Vaidya, Sri Ramagurumurthy Vishnu, and others. 2015. The Hindi/Urdu treebank project. In Handbook of Linguistic Annotation. Springer.Google ScholarGoogle Scholar
  12. Riyaz Ahmad Bhat, Naman Jain, Ashwini Vaidya, Martha Palmer, Tafseer Ahmed Khan, Dipti Misra Sharma, and James Babani. 2014. Adapting predicate frames for Urdu PropBanking. In Proceedings of LT4CloseLang: Language Technology for Closely Related Languages and Language Variants. Google ScholarGoogle ScholarCross RefCross Ref
  13. Riyaz Ahmad Bhat, Sambhav Jain, and Dipti Misra Sharma. 2012. Experiments on dependency parsing of Urdu. In The 11th International Workshop on Treebanks and Linguistic Theories.Google ScholarGoogle Scholar
  14. Riyaz Ahmad Bhat and Dipti Misra Sharma. 2012. Non-projective structures in Indian language treebanks. In The 11th International Workshop on Treebanks and Linguistic Theories. Edições Colibri, 25--30.Google ScholarGoogle Scholar
  15. Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. A multi-representational and multi-layered treebank for Hindi/Urdu. In Proceedings of the 3rd Linguistic Annotation Workshop. 186--189. Google ScholarGoogle ScholarCross RefCross Ref
  16. Pushpak Bhattacharyya. 2010. IndoWordNet. In Proceedings of the 7th Conference on International Language Resources and Evaluation (LREC’10) (19-21), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias (Eds.). European Language Resources Association (ELRA), Valletta, Malta.Google ScholarGoogle Scholar
  17. Tina Bögel, Miriam Butt, and Sebastian Sulger. 2008. Urdu ezafe and the morphology-syntax interface. In Proceedings of Lexical Functional Grammar. CSLI. Stanford, CA.Google ScholarGoogle Scholar
  18. Bernd Bohnet. 2010. Very high accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics. 89--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistics 18, 4, 467--479.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Miriam Butt, Tina Bögel, Annette Hautli, Sebastian Sulger, and Tafseer Ahmed. 2012. Identifying Urdu complex predication via bigram extraction. In Proceedings of the 24th International Conference on Computational Linguistics. 409--424.Google ScholarGoogle Scholar
  21. Miriam Butt and Tracy Holloway King. 2004. The status of case. In Clause Structure in South Asian Languages. Springer, 153--198. Google ScholarGoogle ScholarCross RefCross Ref
  22. Marie Candito and Matthieu Constant. 2014. Strategies for contiguous multiword expression analysis and dependency parsing. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarCross RefCross Ref
  23. Danqi Chen and Christopher D. Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Vol. 1. 740--750. Google ScholarGoogle ScholarCross RefCross Ref
  24. Jinho D. Choi, Joel Tetreault, and Amanda Stent. 2015. It depends: Dependency parser comparison using a web-based evaluation tool. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing. 26--31. Google ScholarGoogle ScholarCross RefCross Ref
  25. Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16, 1, 22--29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Collins. 2002. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, Volume 10. 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Matthieu Constant, Anthony Sigogne, and Patrick Watrin. 2012. Discriminative strategies to integrate multiword expression recognition and parsing. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, Volume 1. 204--212.Google ScholarGoogle Scholar
  28. Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine Learning 20, 3, 273--297. Google ScholarGoogle ScholarCross RefCross Ref
  29. Jason M. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th Conference on Computational Linguistics, Volume 1. 340--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Gülşen Eryiğit, Tugay Ilbay, and Ozan Arkan Can. 2011. Multiword expressions in statistical dependency parsing. In Proceedings of the 2nd Workshop on Statistical Parsing of Morphologically Rich Languages. 45--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yoav Goldberg and Michael Elhadad. 2010. Easy first dependency parsing of modern Hebrew. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 103--107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yoav Goldberg and Michael Elhadad. 2013. Word segmentation, unknown-word resolution, and morphological agreement in a Hebrew parsing system. Computational Linguistics 39, 1, 121--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yoav Goldberg and Joakim Nivre. 2012. A dynamic oracle for arc-eager dependency parsing. In Proceedings of the 24th International Conference on Computational Linguistics. 959--976.Google ScholarGoogle Scholar
  34. Yoav Goldberg and Joakim Nivre. 2013. Training deterministic parsers with non-deterministic oracles. Transactions of the Association for Computational Linguistics 1, 403--414.Google ScholarGoogle ScholarCross RefCross Ref
  35. Jan Hajic, Jarmila Panevová, Eva Hajicová, Petr Sgall, Petr Pajas, Jan Štepánek, Jiří Havelka, Marie Mikulová, Zdenek ZabokrtskÀ, and Magda Ševcıková Razımová. 2006. Prague dependency treebank 2.0. CD-ROM, Linguistic Data Consortium, LDC Catalog No.: LDC2006T01, Philadelphia 98.Google ScholarGoogle Scholar
  36. Johan Hall, Joakim Nivre, and Jens Nilsson. 2006. Discriminative classifiers for deterministic dependency parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. 316--323. Google ScholarGoogle ScholarCross RefCross Ref
  37. Matthew Hohensee. 2012. It’s Only Morpho-Logical: Modeling Agreement in Cross-Linguistic Dependency Parsing. Ph.D. Dissertation. University of Washington, Seattle, WA.Google ScholarGoogle Scholar
  38. Matt Hohensee and Emily M. Bender. 2012. Getting more from morphology in multilingual dependency parsing. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 315--326.Google ScholarGoogle Scholar
  39. Dirk Hovy, Stephen Tratz, and Eduard Hovy. 2010. What’s in a preposition? Dimensions of sense disambiguation for an interesting word class. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters. 454--462.Google ScholarGoogle Scholar
  40. Samar Husain. 2011. A Generalized Parsing Framework Based On Computational Paninian Grammar. Ph.D. Dissertation. IIIT-Hyderabad, India.Google ScholarGoogle Scholar
  41. Sambhav Jain, Naman Jain, Aniruddha Tammewar, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2013. Exploring semantic information in Hindi WordNet for Hindi dependency parsing. In International Joint Conference on Natural Language Processing, Nagoya, Japan, 14--18 October 2013. 189--197.Google ScholarGoogle Scholar
  42. Terry Koo and Michael Collins. 2010. Efficient third-order dependency parsers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Prudhvi Kosaraju, Samar Husain, Bharat Ram Ambati, Dipti Misra Sharma, and Rajeev Sangal. 2012. Intra-chunk dependency annotation: Expanding Hindi inter-chunk annotated treebank. In Proceedings of the 6th Linguistic Annotation Workshop. 49--56.Google ScholarGoogle Scholar
  44. Prudhvi Kosaraju, Sruthilaya Reddy Kesidi, Vinay Bhargav Reddy Ainavolu, and Puneeth Kukkadapu. 2010. Experiments on Indian language dependency parsing. In Proceedings of the ICON10 NLP Tools Contest: Indian Language Dependency Parsing.Google ScholarGoogle Scholar
  45. Sandra Kübler, Ryan McDonald, and Joakim Nivre. 2009. Dependency parsing. Synthesis Lectures on Human Language Technologies 1, 1, 1--127. Google ScholarGoogle ScholarCross RefCross Ref
  46. Taku Kudo and Yuji Matsumoto. 2002. Japanese dependency analysis using cascaded chunking. In Proceedings of the 6th Conference on Natural Language Learning, Volume 20. 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. P. Liang. 2005. Semi-Supervised Learning for Natural Language. Master’s thesis. Massachusetts Institute of Technology, Cambridge, MA.Google ScholarGoogle Scholar
  48. Deepak Kumar Malladi and Prashanth Mannem. 2013. Statistical morphological analyzer for Hindi. In Proceedings of the 6th International Joint Conference on Natural Language Processing. 1007--1011.Google ScholarGoogle Scholar
  49. Yuval Marton, Nizar Habash, and Owen Rambow. 2013. Dependency parsing of modern standard Arabic with lexical and inflectional features. Computational Linguistics 39, 1, 161--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Colin P. Masica. 1993. The Indo-Aryan Languages. Cambridge University Press, New York, NY.Google ScholarGoogle Scholar
  51. Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. 523--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ryan T. McDonald and Joakim Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 122--131.Google ScholarGoogle Scholar
  53. Tara Mohanan. 1994. Argument Structure in Hindi. Center for the Study of Language (CSLI).Google ScholarGoogle Scholar
  54. Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT’03).Google ScholarGoogle Scholar
  55. Joakim Nivre. 2004. Incrementality in deterministic dependency parsing. In Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. 50--57. Google ScholarGoogle ScholarCross RefCross Ref
  56. Joakim Nivre. 2008. Algorithms for deterministic incremental dependency parsing. Computational Linguistics 34, 4, 513--553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Joakim Nivre. 2009. Parsing Indian languages with maltparser. In Proceedings of the ICON09 NLP Tools Contest: Indian Language Dependency Parsing. 12--18.Google ScholarGoogle Scholar
  58. Joakim Nivre and Jens Nilsson. 2005. Pseudo-projective dependency parsing. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Alireza Nourian, Mohammad Sadegh Rasooli, Mohsen Imany, and Heshaam Faili. 2015. On the importance of ezafe construction in Persian parsing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, Beijing, China, 877--882. http://www.aclweb.org/anthology/P15-2144.Google ScholarGoogle Scholar
  60. Martha Palmer, Rajesh Bhatt, Bhuvana Narasimhan, Owen Rambow, Dipti Misra Sharma, and Fei Xia. 2009. Hindi syntax: Annotating dependency, lexical predicate-argument structure, and phrase structure. In The 7th International Conference on Natural Language Processing. 14--17.Google ScholarGoogle Scholar
  61. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, and others. 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 2825--2830.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Anuradha Saksena. 1982. Case marking semantics. Lingua 56, 3, 335--343. Google ScholarGoogle ScholarCross RefCross Ref
  63. Ruth Laila Schmidt. 2013. Urdu: An Essential Grammar. Routledge, Abingdon-on-Thames, UK.Google ScholarGoogle Scholar
  64. Wolfgang Seeker and Jonas Kuhn. 2013. Morphological and syntactic case in statistical dependency parsing. Computational Linguistics 39, 1, 23--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Peter Svenonius. 2007. Adpositions, particles and the arguments they introduce. Argument Structure 108, 63. Google ScholarGoogle ScholarCross RefCross Ref
  66. Oscar Täckström, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 477--487.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Aniruddha Tammewar, Karan Singla, Bhasha Agrawal, Riyaz Ahmad Bhat, and Dipti Misra Sharma. 2015. Can distributed word embeddings be an alternative to costly linguistic features: A study on parsing Hindi. In Proceedings of the 6th Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL’15). 21--30.Google ScholarGoogle Scholar
  68. Lucien Tesnière. 1959. Eléments de Syntaxe Structurale. Librairie C. Klincksieck.Google ScholarGoogle Scholar
  69. Reut Tsarfaty, Djamé Seddah, Yoav Goldberg, Sandra Kübler, Marie Candito, Jennifer Foster, Yannick Versley, Ines Rehbein, and Lamia Tounsi. 2010. Statistical parsing of morphologically rich languages (SPMRL): What, how and whither. In Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages. 1--12.Google ScholarGoogle Scholar
  70. Reut Tsarfaty, Djamé Seddah, Sandra Kübler, and Joakim Nivre. 2013. Parsing morphologically rich languages: Introduction to the special issue. Computational Linguistics 39, 1, 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Reut Tsarfaty and Khalil Sima’an. 2010. Modeling morphosyntactic agreement in constituency-based parsing of modern Hebrew. In Proceedings of the NAACL HLT 2010 1st Workshop on Statistical Parsing of Morphologically-Rich Languages. 40--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Ashwini Vaidya, Martha Palmer, and Bhuvana Narasimhan. 2013. Semantic roles for nominal predicates: Building a lexical resource. In The 9th Workshop on Multi-word Expressions, NAACL. 126.Google ScholarGoogle Scholar
  73. Fei Xia, Owen Rambow, Rajesh Bhatt, Martha Palmer, and Dipti Misra Sharma. 2009. Towards a multi-representational treebank. In The 7th International Workshop on Treebanks and Linguistic Theories. Groningen, Netherlands. 159--170.Google ScholarGoogle Scholar
  74. Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technology, Vol. 3. 195--206.Google ScholarGoogle Scholar
  75. Yue Zhang and Stephen Clark. 2008. A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing using beam-search. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 562--571. Google ScholarGoogle ScholarCross RefCross Ref
  76. Yue Zhang and Joakim Nivre. 2011. Transition-based dependency parsing with rich non-local features. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. 188--193.Google ScholarGoogle Scholar

Index Terms

  1. Improving Transition-Based Dependency Parsing of Hindi and Urdu by Modeling Syntactically Relevant Phenomena

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!