skip to main content
short-paper

Improving Unsupervised Dependency Parsing with Knowledge from Query Logs

Published:22 June 2016Publication History
Skip Abstract Section

Abstract

Unsupervised dependency parsing becomes more and more popular in recent years because it does not need expensive annotations, such as treebanks, which are required for supervised and semi-supervised dependency parsing. However, its accuracy is still far below that of supervised dependency parsers, partly due to the fact that their parsing model is insufficient to capture linguistic phenomena underlying texts. The performance for unsupervised dependency parsing can be improved by mining knowledge from the texts and by incorporating it into the model. In this article, syntactic knowledge is acquired from query logs to help estimate better probabilities in dependency models with valence. The proposed method is language independent and obtains an improvement of 4.1% unlabeled accuracy on the Penn Chinese Treebank by utilizing additional dependency relations from the Sogou query logs and Baidu query logs. Morever, experiments show that the proposed model achieves improvements of 8.07% on CoNLL 2007 English using the AOL query logs. We believe query logs are useful sources of syntactic knowledge for many natural language processing (NLP) tasks.

References

  1. Cory Barr, Rosie Jones, and Moira Regelson. 2008. The linguistic structure of english web-search queries. In Proceedings of EMNLP 2008. Association for Computational Linguistics, 1021--1030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Wenliang Chen, Min Zhang, and Yue Zhang. 2013. Semi-supervised feature transformation for dependency parsing. In EMNLP 2013. Association for Computational Linguistics, Seattle, WA, 1303--1313.Google ScholarGoogle Scholar
  3. Y. J. Chu and T. H. Liu. 1965. On the shortest arborescence of a directed graph. Sci. Sinica 14 (1965), 1396--1400.Google ScholarGoogle Scholar
  4. Kenneth Ward Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Comput. Linguist. 16, 1 (March 1990), 22--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shay B. Cohen, Dipanjan Das, and Noah A. Smith. 2011. Unsupervised structure prediction with non-parallel multilingual guidance. In Proceedings of the EMNLP 2011. Association for Computational Linguistics, Edinburgh, Scotland, UK, 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and Tat-Seng Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings of SIGIR 2005. ACM, New York, NY, 400--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aron Culotta and Jeffrey Sorensen. 2004. Dependency tree kernels for relation extraction. In Proceedings of ACL’04. Barcelona, Spain, 423--429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. William P. Headden, III. 2012. Unsupervised Bayesian Lexicalized Dependency Grammar Induction. Ph.D. Dissertation. Brown University.Google ScholarGoogle Scholar
  9. William P. Headden, III, Mark Johnson, and David McClosky. 2009. Improving unsupervised dependency parsing with richer contexts and smoothing. In Proceedings of NAACL 2009. Association for Computational Linguistics, Boulder, CO, 101--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dan Klein and Christopher D. Manning. 2004. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of ACL’04. Association for Computational Linguistics, Article 478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Terry Koo, Xavier Carreras, and Michael Collins. 2008. Simple semi-supervised dependency parsing. In Proc. ACL/HLT.Google ScholarGoogle Scholar
  12. Xiao Li. 2010. Understanding the semantic structure of noun phrase queries. In Proceedings of ACL ’10. Association for Computational Linguistics, 1337--1345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zhenghua Li, Min Zhang, and Wenliang Chen. 2014. Ambiguity-aware ensemble training for semi-supervised dependency parsing. In Proceedings of the 52nd Annual Meeting of the ACL. Association for Computational Linguistics, Baltimore, MD, 457--467.Google ScholarGoogle ScholarCross RefCross Ref
  14. Kai Liu, Yajuan Lü, Wenbin Jiang, and Qun Liu. 2013. Bilingually-guided monolingual dependency grammar induction. In Proceedings of ACL 2013. Association for Computational Linguistics, Sofia, Bulgaria, 1063--1072.Google ScholarGoogle Scholar
  15. Yiqun Liu, Junwei Miao, Min Zhang, Shaoping Ma, and Liyun Ru. 2011. How do users describe their information need: Query recommendation based on snippet click model. Expert Syst. Appl. 38, 11 (2011), 13847--13856. DOI:http://dx.doi.org/10.1016/j.eswa.2011.04.188Google ScholarGoogle Scholar
  16. Xuezhe Ma and Fei Xia. 2014. Unsupervised dependency parsing with transferring distribution via parallel guidance and entropy regularization. In Proceedings of ACL 2014. Association for Computational Linguistics, Baltimore, MD, 1337--1348.Google ScholarGoogle ScholarCross RefCross Ref
  17. Martin Majliš and Zdeněk Žabokrtský. 2012. Language richness of the web. In Proceedings of LREC-2012. European Language Resources Association (ELRA), Istanbul, Turkey, 2927--2934. ACL Anthology Identifier: L12-1110.Google ScholarGoogle Scholar
  18. David Mareček and Milan Straka. 2013. Stop-probability estimates computed on a large corpus improve unsupervised dependency parsing. In Proceedings of ACL’13. Association for Computational Linguistics, Sofia, Bulgaria, 281--290.Google ScholarGoogle Scholar
  19. David Mareček and Zdeněk Žabokrtský. 2012a. Exploiting reducibility in unsupervised dependency parsing. In Proceedings of EMNLP-CoNLL’12. Association for Computational Linguistics, Jeju Island, Korea, 297--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. David Mareček and Zdeněk Žabokrtský. 2012b. Unsupervised dependency parsing using reducibility and fertility features. In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure (WILS’12). Association for Computational Linguistics, Stroudsburg, PA, 84--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. McDonald and F. Pereira. 2006. Online learning of approximate dependency parsing algorithms. In 11th Conference of the European Chapter of the Association for Computational Linguistics: EACL 2006 (EACL’06).Google ScholarGoogle Scholar
  22. Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. In Proceedings of ACL’12. Association for Computational Linguistics, Jeju Island, Korea, 629--637. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, Gülsen Eryigit, Sandra Kübler, Svetoslav Marinov, and Erwin Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natur. Lang. Eng. 13, 2 (2007), 95--135.Google ScholarGoogle ScholarCross RefCross Ref
  24. Chris Quirk, Arul Menezes, and Colin Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proceedings of ACL 2005. Association for Computational Linguistics, Ann Arbor, MI, 271--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Satoshi Sekine and Hisami Suzuki. 2007. Acquiring ontological knowledge from query logs. In Proceedings of WWW’07. ACM, New York, NY, 1223--1224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Libin Shen, Jinxi Xu, and Ralph Weischedel. 2010. String-to-dependency statistical machine translation. Comput. Linguist. 36, 4 (Dec. 2010), 649--671. DOI:http://dx.doi.org/10.1162/coli_a_00015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Anders Søgaard. 2011. Data point selection for cross-language adaptation of dependency parsers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Portland, OR, 682--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Valentin I. Spitkovsky, Hiyan Alshawi, Angel X. Chang, and Daniel Jurafsky. 2011. Unsupervised dependency parsing without gold part-of-speech tags. In Proceedings of EMNLP 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2010. From baby steps to leapfrog: How “less is more” in unsupervised dependency parsing. In Proc. of NAACL-HLT. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2011. Punctuation: Making a point in unsupervised dependency parsing. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL-2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2012. Three dependency-and-boundary models for grammar induction. In Proceedings of the EMNLP-CoNLL 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jurafsky, and Christopher D. Manning. 2010a. Viterbi training improves unsupervised dependency parsing. In Proceedings of CoNLL-2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Valentin I. Spitkovsky, Daniel Jurafsky, and Hiyan Alshawi. 2010b. Profiting from mark-up: Hyper-text annotations for guided parsing. In Proceedings of ACL 2010. Association for Computational Linguistics, Uppsala, Sweden, 1278--1287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Wolfgang Tannebaum and Andreas Rauber. 2012. Acquiring lexical knowledge from query logs for query expansion in patent searching. In Proceedings of ICSC’12. IEEE Computer Society, Washington, DC, 336--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Gokhan Tur, Dilek Hakkani-Tur, Dustin Hillard, and Asli Celikyilmaz. 2011. Towards unsupervised spoken language understanding: Exploiting query click logs for slot filling. Annual Conference of the International Speech Communication Association (Interspeech).Google ScholarGoogle Scholar
  36. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the jeopardy model? A quasi-synchronous grammar for QA. In Proceedings of the 2007 Joint Conference on EMNLP-CoNLL. Association for Computational Linguistics, Prague, Czech Republic, 22--32.Google ScholarGoogle Scholar
  37. Mo Yu, Tiejun Zhao, and Yalong Bai. 2013. Learning domain differences automatically for dependency parsing adaptation. In IJCAI, Francesca Rossi (Ed.). IJCAI/AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Unsupervised Dependency Parsing with Knowledge from Query Logs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 16, Issue 1
        TALLIP Notes and Regular Papers
        March 2017
        133 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/2961867
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 June 2016
        • Accepted: 1 March 2016
        • Revised: 1 September 2015
        • Received: 1 June 2015
        Published in tallip Volume 16, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!