skip to main content
research-article

A Cascaded Unsupervised Model for PoS Tagging

Authors Info & Claims
Published:31 March 2021Publication History
Skip Abstract Section

Abstract

Part of speech (PoS) tagging is one of the fundamental syntactic tasks in Natural Language Processing, as it assigns a syntactic category to each word within a given sentence or context (such as noun, verb, adjective, etc.). Those syntactic categories could be used to further analyze the sentence-level syntax (e.g., dependency parsing) and thereby extract the meaning of the sentence (e.g., semantic parsing). Various methods have been proposed for learning PoS tags in an unsupervised setting without using any annotated corpora. One of the widely used methods for the tagging problem is log-linear models. Initialization of the parameters in a log-linear model is very crucial for the inference. Different initialization techniques have been used so far. In this work, we present a log-linear model for PoS tagging that uses another fully unsupervised Bayesian model to initialize the parameters of the model in a cascaded framework. Therefore, we transfer some knowledge between two different unsupervised models to leverage the PoS tagging results, where a log-linear model benefits from a Bayesian model’s expertise. We present results for Turkish as a morphologically rich language and for English as a comparably morphologically poor language in a fully unsupervised framework. The results show that our framework outperforms other unsupervised models proposed for PoS tagging.

References

  1. Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. 2016. Globally normalized transition-based neural networks. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany. 2442--2452.Google ScholarGoogle ScholarCross RefCross Ref
  2. Galen Andrew and Jianfeng Gao. 2007. Scalable training of L1-regularized log-linear models. In Proceedings of the 24th International Conference on Machine Learning (ICML’07). ACM, New York, NY, 33--40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sanjeev Arora, Rong Ge, Yonatan Halpern, David Mimno, Ankur Moitra, David Sontag, Yichen Wu, and Michael Zhu. 2013. A practical algorithm for topic modeling with provable guarantees. In Proceedings of the 30th International Conference on Machine Learning, Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. 280--288.Google ScholarGoogle Scholar
  4. Michele Banko and Robert C. Moore. 2004. Part of speech tagging in context. In Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 556.Google ScholarGoogle Scholar
  5. Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Human Language Technologies: Proceedings of the 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 582--590.Google ScholarGoogle Scholar
  6. Chris Biemann. 2006. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, 7--12.Google ScholarGoogle Scholar
  7. Necva Bölücü and Burcu Can. 2017. Joint PoS tagging and stemming for agglutinative languages. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing. Springer, 110--122.Google ScholarGoogle Scholar
  8. Lubomir Bourdev and Jonathan Brandt. 2005. Robust object detection via soft cascade. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE Computer Society, Los Alamitos, CA, 236--243.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Thorsten Brants. 2000. TnT—A statistical part-of-speech tagger. In Proceedings of the 6th Applied Natural Language Processing Conference. Association for Computational Linguistics, 224--231.Google ScholarGoogle Scholar
  10. Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing. Association for Computational Linguistics, 152--155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Comput. Ling. 18, 4 (1992), 467--479.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Charles Brubaker, Jianxin Wu, Jie Sun, Matthew D. Mullin, and James M. Rehg. 2008. On the design of cascades of boosted ensembles for face detection. Int. J. Comput. Vis. 77, 1 (01 May 2008), 65--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. 1995. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 5 (1995), 1190--1208.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’93), Vol. 11. 784--789.Google ScholarGoogle Scholar
  15. Danqi Chen and Christopher Manning. 2014. A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 740--750.Google ScholarGoogle ScholarCross RefCross Ref
  16. Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, Volume 1. Association for Computational Linguistics, 59--66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12 (Aug 2011), 2493--2537.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. 1992. A practical part-of-speech tagger. In Proceedings of the 3rd Conference on Applied Natural Language Processing. Association for Computational Linguistics, 133--140.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, and John W. Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 318--329.Google ScholarGoogle Scholar
  20. L. De Lathauwer, B. De Moor, J. Vandewalle, and Blind Source Separation by Higher-Order. 1994. Singular value decomposition. In Proceedings of the European Association for Signal Processing (EUSIPCO’94), Vol. 1. 175--178.Google ScholarGoogle Scholar
  21. David Elworthy. 1994. Does Baum-Welch re-estimation help taggers? In Proceedings of the Fourth Conference on Applied Natural Language Processing (ANLC’94). Association for Computational Linguistics, 53--58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Roger Fletcher. 1987. Practical Methods of Optimization (2nd ed.). Wiley-Interscience, New York, NY.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jianfeng Gao and Mark Johnson. 2008. A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 344--352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics. 744--751.Google ScholarGoogle Scholar
  25. Yu Gong, Xusheng Luo, Yu Zhu, Wenwu Ou, Zhao Li, Muhua Zhu, Kenny Zhu, Lu Duan, and Xi Chen. 2019. Deep cascade multi-task learning for slot filling in online shopping assistant. In Proceedings of the AAAI Conference on Artificial Intelligence. 6465--6472.Google ScholarGoogle ScholarCross RefCross Ref
  26. Matthew R. Gormley and Jason Eisner. 2013. Nonconvex global optimization for latent-variable models. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 444--454.Google ScholarGoogle Scholar
  27. Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 320--327.Google ScholarGoogle Scholar
  28. Geremy Heitz, Stephen Gould, Ashutosh Saxena, and Daphne Koller. 2009. Cascaded classification models: Combining models for holistic scene understanding. In Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou (Eds.). Curran Associates, Inc., 641--648.Google ScholarGoogle Scholar
  29. Yong Jiang, Wenjuan Han, and Kewei Tu. 2016. Unsupervised neural dependency parsing. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 763--771.Google ScholarGoogle ScholarCross RefCross Ref
  30. Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07).Google ScholarGoogle Scholar
  31. Matthieu Labeau, Kevin Löser, and Alexandre Allauzen. 2015. Non-lexical neural architecture for fine-grained POS tagging. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 232--237.Google ScholarGoogle ScholarCross RefCross Ref
  32. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, 282--289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dong C. Liu and Jorge Nocedal. 1989. On the limited memory BFGS method for large scale optimization. Math. Program. 45, 1–3 (1989), 503--528.Google ScholarGoogle ScholarCross RefCross Ref
  34. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn treebank. Comput. Ling. 19, 2 (1993), 313--330.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Marina Meilă. 2007. Comparing clusterings - An information based distance. J. Multivar. Anal. 98, 5 (2007), 873--895.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Thomas Minka. 2001. Algorithms for maximum-likelihood logistic regression. Retrieved on March 2021 from http://www.stat.cmu.edu/tr/tr758/tr758.pdf.Google ScholarGoogle Scholar
  37. Robert Moore. 2015. An improved tag dictionary for faster part-of-speech tagging. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1303--1308.Google ScholarGoogle ScholarCross RefCross Ref
  38. Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola. 2015. An unsupervised method for uncovering morphological chains. Transactions of the Association for Computational Linguistics 3 (2015), 157--167.Google ScholarGoogle ScholarCross RefCross Ref
  39. Kemal Oflazer, Bilge Say, Dilek Zeynep Hakkani-Tür, and Gökhan Tür. 2003. Building a Turkish treebank. In Treebanks. Springer, 261--277.Google ScholarGoogle Scholar
  40. J. A. Perez-Ortiz and M. L. Forcada. 2001. Part-of-speech tagging with recurrent neural networks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’01). 1588--1592.Google ScholarGoogle Scholar
  41. Barbara Plank, Anders Søgaard, and Yoav Goldberg. 2016. Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 412--418.Google ScholarGoogle ScholarCross RefCross Ref
  42. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  43. Sujith Ravi and Kevin Knight. 2009. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1. Association for Computational Linguistics, 504--512.Google ScholarGoogle ScholarCross RefCross Ref
  44. Henry Schneiderman. 2004. Feature-centric evaluation for efficient cascaded object detection. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04). IEEE Computer Society, 29--36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Hinrich Schütze. 1993. Part-of-speech induction from scratch. In Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 251--258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Fei Sha and Fernando Pereira. 2003. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (NAACL’03). Association for Computational Linguistics, 134--141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Noah Smith. 2006. Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text. Ph.D. Dissertation.Google ScholarGoogle Scholar
  48. Noah A. Smith and Jason Eisner. 2005. Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 354--362.Google ScholarGoogle Scholar
  49. Noah A. Smith and Jason Eisner. 2006. Annealing structural bias in multilingual weighted grammar induction. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 569--576.Google ScholarGoogle Scholar
  50. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky. 2010. From baby steps to leapfrog: How less is more in unsupervised dependency parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 751--759.Google ScholarGoogle Scholar
  51. Karl Stratos, Michael Collins, and Daniel Hsu. 2016. Unsupervised part-of-speech tagging with anchor hidden markov models. Trans. Assoc. Comput. Ling. 4 (2016), 245--257.Google ScholarGoogle ScholarCross RefCross Ref
  52. Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3 (Dec. 2002), 583--617.Google ScholarGoogle Scholar
  53. Nicola Ueffing and Hermann Ney. 2003. Using POS information for statistical machine translation into morphologically rich languages. In Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics, Volume 1. Association for Computational Linguistics, 347--354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Peilu Wang, Yao Qian, Frank K. Soong, Lei He, and Hai Zhao. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. pre-print, abs/1510.06168.Google ScholarGoogle Scholar
  55. Ming Yan, Jiangnan Xia, Chen Wu, Bin Bi, Zhongzhou Zhao, Ji Zhang, Luo Si, Rui Wang, Wei Wang, and Haiqing Chen. 2018. A deep cascade model for multi-document reading comprehension. In Proceedings of The Thirty-Third AAAI Conference on Artificial Intelligence. 7354--7361.Google ScholarGoogle Scholar
  56. Ciyou Zhu, Richard H. Byrd, Peihuang Lu, and Jorge Nocedal. 1997. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 23, 4 (1997), 550--560.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Cascaded Unsupervised Model for PoS Tagging

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 20, Issue 1
            Special issue on Deep Learning for Low-Resource Natural Language Processing, Part 1 and Regular Papers
            January 2021
            332 pages
            ISSN:2375-4699
            EISSN:2375-4702
            DOI:10.1145/3439335
            Issue’s Table of Contents

            Copyright © 2021 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 31 March 2021
            • Accepted: 1 October 2020
            • Revised: 1 September 2020
            • Received: 1 January 2019
            Published in tallip Volume 20, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!