skip to main content
research-article

Weakly Supervised POS Tagging without Disambiguation

Authors Info & Claims
Published:21 July 2018Publication History
Skip Abstract Section

Abstract

Weakly supervised part-of-speech (POS) tagging is to learn to predict the POS tag for a given word in context by making use of partial annotated data instead of the fully tagged corpora. Weakly supervised POS tagging would benefit various natural language processing applications in such languages where tagged corpora are mostly unavailable.

In this article, we propose a novel framework for weakly supervised POS tagging based on a dictionary of words with their possible POS tags. In the constrained error-correcting output codes (ECOC)-based approach, a unique L-bit vector is assigned to each POS tag. The set of bitvectors is referred to as a coding matrix with value { 1, -1}. Each column of the coding matrix specifies a dichotomy over the tag space to learn a binary classifier. For each binary classifier, its training data is generated in the following way: each pair of words and its possible POS tags are considered as a positive training example only if the whole set of its possible tags falls into the positive dichotomy specified by the column coding and similarly for negative training examples. Given a word in context, its POS tag is predicted by concatenating the predictive outputs of the L binary classifiers and choosing the tag with the closest distance according to some measure. By incorporating the ECOC strategy, the set of all possible tags for each word is treated as an entirety without the need of performing disambiguation. Moreover, instead of manual feature engineering employed in most previous POS tagging approaches, features for training and testing in the proposed framework are automatically generated using neural language modeling. The proposed framework has been evaluated on three corpora for English, Italian, and Malagasy POS tagging, achieving accuracies of 93.21%, 90.9%, and 84.5% individually, which shows a significant improvement compared to the state-of-the-art approaches.

References

  1. Omri Abend, Roi Reichart, and Ari Rappoport. 2010. Improved unsupervised POS induction through prototype discovery. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1298--1307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Erin L. Allwein, Robert E. Schapire, and Yoram Singer. 2000. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, 9--16. http://dl.acm.org/citation.cfm?id=645529.658120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Michele Banko and Robert C. Moore. 2004. Part of speech tagging in context. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). Association for Computational Linguistics, Stroudsburg, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT’10). Association for Computational Linguistics, Stroudsburg, PA, 582--590. http://dl.acm.org/citation.cfm?id=1857999.1858082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chris Biemann. 2006. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, 7--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the Workshop on Speech and Natural Language. Association for Computational Linguistics, 112--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Eric Brill. 1995. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, Vol. 30. Somerset, New Jersey: Association for Computational Linguistics, 1--13.Google ScholarGoogle Scholar
  8. Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based N-gram models of natural language. Computational Linguistics 18, 4 (Dec. 1992), 467--479. http://dl.acm.org/citation.cfm?id=176313.176316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Daniel M. Cer, Marie Catherine De Marneffe, Daniel Jurafsky, and Christopher D. Manning. 2010. Parsing to stanford dependencies: Trade-offs between speed and accuracy. In International Conference on Language Resources and Evaluation, Lrec 2010, 17-23 May 2010, Valletta, Malta.Google ScholarGoogle Scholar
  10. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, Article 27 (May 2011), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, Stroudsburg, PA, 575--584. http://dl.acm.org/citation.cfm?id=1870658.1870714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics (EACL’03). 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thomas G. Dietterich and Ghulum Bakiri. 1995. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 1 (Jan. 1995), 263--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Dan Garrette and Jason Baldridge. 2012. Type-supervised hidden markov models for part-of-speech tagging with incomplete tag dictionaries. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 821--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dan Garrette and Jason Baldridge. 2013. Learning a part-of-speech tagger from two hours of annotation. In HLT-NAACL. Citeseer, 138--147.Google ScholarGoogle Scholar
  17. Yoav Goldberg, Meni Adler, and Michael Elhadad. 2008. EM can find pretty good HMM POS-taggers (when given a good start). In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 746--754.Google ScholarGoogle Scholar
  18. Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23--30, 2007, Prague, Czech Republic. 744--751.Google ScholarGoogle Scholar
  19. Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL’06). Association for Computational Linguistics, Stroudsburg, PA, 320--327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07), June 28--30, 2007, Prague, Czech Republic. 296--305.Google ScholarGoogle Scholar
  21. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 2 (June 1993), 313--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Bernard Merialdo. 1994a. Tagging english text with a probabilistic model. Computational Linguistics 20, 2 (1994), 155--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Bernard Merialdo. 1994b. Tagging english text with a probabilistic model. Computational linguistics 20, 2 (1994), 155--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, and Regina Barzilay. 2009. Multilingual part-of-speech tagging: Two unsupervised approaches. Journal of Artificial Intelligence Research 36, 1 (Sept. 2009), 341--385. http://dl.acm.org/citation.cfm?id=1734953.1734961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Oriol Pujol, Sergio Escalera, and Petia Radeva. 2008. An incremental node embedding technique for error correcting output codes. Pattern Recognition 41, 2 (Feb. 2008), 713--725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sujith Ravi and Kevin Knight. 2009. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 504--512. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Sujith Ravi, Sergei Vassilivitskii, and Vibhor Rastogi. 2014. Parallel algorithms for unsupervised tagging. Transactions of the Association for Computational Linguistics 2 (2014), 105--118.Google ScholarGoogle ScholarCross RefCross Ref
  28. Sujith Ravi, Ashish Vaswani, Kevin Knight, and David Chiang. 2010. Fast, greedy model minimization for unsupervised tagging. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 940--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kairit Sirts, Jacob Eisenstein, Micha Elsner, and Sharon Goldwater. 2014. POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Baltimore, Maryland, 265--271.Google ScholarGoogle ScholarCross RefCross Ref
  30. Noah A. Smith and Jason Eisner. 2005. Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Stroudsburg, PA, 354--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kristina Toutanova, Mark Johnson, et al. 2007. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Advances in Neural Information Processing Systems. 1521--1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mehmet Ali Yatbaz and Deniz Yuret. 2010. Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 1391--1398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Minling Zhang. 2014. Disambiguation-free partial label learning. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM’14). 37--45.Google ScholarGoogle ScholarCross RefCross Ref
  34. Meishan Zhang, Yue Zhang, Wanxiang Che, and Ting Liu. 2014. Type-supervised domain adaptation for joint segmentation and POS-tagging. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 588--597.Google ScholarGoogle ScholarCross RefCross Ref
  35. Qiuye Zhao and Mitch Marcus. 2009. A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 (EMNLP’09). Association for Computational Linguistics, Stroudsburg, PA, 688--697. http://dl.acm.org/citation.cfm?id=1699571.1699602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Deyu Zhou, Liangyu Chen, and Yulan He. 2015. An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2468--2474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Deyu Zhou, Dayou Zhong, and Yulan He. 2014. Event trigger identification for biomedical events extraction using domain knowledge. Bioinformatics 30, 11 (2014), 1587.Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms (1st ed.). Chapman 8 Hall/CRC. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Weakly Supervised POS Tagging without Disambiguation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 4
      December 2018
      193 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3229525
      Issue’s Table of Contents

      Copyright © 2018 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 July 2018
      • Accepted: 1 May 2018
      • Revised: 1 March 2018
      • Received: 1 May 2017
      Published in tallip Volume 17, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!