Abstract
Weakly supervised part-of-speech (POS) tagging is to learn to predict the POS tag for a given word in context by making use of partial annotated data instead of the fully tagged corpora. Weakly supervised POS tagging would benefit various natural language processing applications in such languages where tagged corpora are mostly unavailable.
In this article, we propose a novel framework for weakly supervised POS tagging based on a dictionary of words with their possible POS tags. In the constrained error-correcting output codes (ECOC)-based approach, a unique L-bit vector is assigned to each POS tag. The set of bitvectors is referred to as a coding matrix with value { 1, -1}. Each column of the coding matrix specifies a dichotomy over the tag space to learn a binary classifier. For each binary classifier, its training data is generated in the following way: each pair of words and its possible POS tags are considered as a positive training example only if the whole set of its possible tags falls into the positive dichotomy specified by the column coding and similarly for negative training examples. Given a word in context, its POS tag is predicted by concatenating the predictive outputs of the L binary classifiers and choosing the tag with the closest distance according to some measure. By incorporating the ECOC strategy, the set of all possible tags for each word is treated as an entirety without the need of performing disambiguation. Moreover, instead of manual feature engineering employed in most previous POS tagging approaches, features for training and testing in the proposed framework are automatically generated using neural language modeling. The proposed framework has been evaluated on three corpora for English, Italian, and Malagasy POS tagging, achieving accuracies of 93.21%, 90.9%, and 84.5% individually, which shows a significant improvement compared to the state-of-the-art approaches.
- Omri Abend, Roi Reichart, and Ari Rappoport. 2010. Improved unsupervised POS induction through prototype discovery. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1298--1307. Google Scholar
Digital Library
- Erin L. Allwein, Robert E. Schapire, and Yoram Singer. 2000. Reducing multiclass to binary: A unifying approach for margin classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). Morgan Kaufmann Publishers Inc., San Francisco, CA, 9--16. http://dl.acm.org/citation.cfm?id=645529.658120. Google Scholar
Digital Library
- Michele Banko and Robert C. Moore. 2004. Part of speech tagging in context. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). Association for Computational Linguistics, Stroudsburg, PA. Google Scholar
Digital Library
- Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT’10). Association for Computational Linguistics, Stroudsburg, PA, 582--590. http://dl.acm.org/citation.cfm?id=1857999.1858082. Google Scholar
Digital Library
- Chris Biemann. 2006. Unsupervised part-of-speech tagging employing efficient graph clustering. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, 7--12. Google Scholar
Digital Library
- Eric Brill. 1992. A simple rule-based part of speech tagger. In Proceedings of the Workshop on Speech and Natural Language. Association for Computational Linguistics, 112--116. Google Scholar
Digital Library
- Eric Brill. 1995. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, Vol. 30. Somerset, New Jersey: Association for Computational Linguistics, 1--13.Google Scholar
- Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based N-gram models of natural language. Computational Linguistics 18, 4 (Dec. 1992), 467--479. http://dl.acm.org/citation.cfm?id=176313.176316. Google Scholar
Digital Library
- Daniel M. Cer, Marie Catherine De Marneffe, Daniel Jurafsky, and Christopher D. Manning. 2010. Parsing to stanford dependencies: Trade-offs between speed and accuracy. In International Conference on Language Resources and Evaluation, Lrec 2010, 17-23 May 2010, Valletta, Malta.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, Article 27 (May 2011), 27 pages. Google Scholar
Digital Library
- Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP’10). Association for Computational Linguistics, Stroudsburg, PA, 575--584. http://dl.acm.org/citation.cfm?id=1870658.1870714. Google Scholar
Digital Library
- Alexander Clark. 2003. Combining distributional and morphological information for part of speech induction. In Proceedings of the 10th Conference on European Chapter of the Association for Computational Linguistics (EACL’03). 59--66. Google Scholar
Digital Library
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12 (Nov. 2011), 2493--2537. Google Scholar
Digital Library
- Thomas G. Dietterich and Ghulum Bakiri. 1995. Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research 2, 1 (Jan. 1995), 263--286. Google Scholar
Digital Library
- Dan Garrette and Jason Baldridge. 2012. Type-supervised hidden markov models for part-of-speech tagging with incomplete tag dictionaries. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 821--831. Google Scholar
Digital Library
- Dan Garrette and Jason Baldridge. 2013. Learning a part-of-speech tagger from two hours of annotation. In HLT-NAACL. Citeseer, 138--147.Google Scholar
- Yoav Goldberg, Meni Adler, and Michael Elhadad. 2008. EM can find pretty good HMM POS-taggers (when given a good start). In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 746--754.Google Scholar
- Sharon Goldwater and Tom Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In ACL 2007, Proceedings of the Meeting of the Association for Computational Linguistics, June 23--30, 2007, Prague, Czech Republic. 744--751.Google Scholar
- Aria Haghighi and Dan Klein. 2006. Prototype-driven learning for sequence models. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL’06). Association for Computational Linguistics, Stroudsburg, PA, 320--327. Google Scholar
Digital Library
- Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07), June 28--30, 2007, Prague, Czech Republic. 296--305.Google Scholar
- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 2 (June 1993), 313--330. Google Scholar
Digital Library
- Bernard Merialdo. 1994a. Tagging english text with a probabilistic model. Computational Linguistics 20, 2 (1994), 155--171. Google Scholar
Digital Library
- Bernard Merialdo. 1994b. Tagging english text with a probabilistic model. Computational linguistics 20, 2 (1994), 155--171. Google Scholar
Digital Library
- Tahira Naseem, Benjamin Snyder, Jacob Eisenstein, and Regina Barzilay. 2009. Multilingual part-of-speech tagging: Two unsupervised approaches. Journal of Artificial Intelligence Research 36, 1 (Sept. 2009), 341--385. http://dl.acm.org/citation.cfm?id=1734953.1734961. Google Scholar
Digital Library
- Oriol Pujol, Sergio Escalera, and Petia Radeva. 2008. An incremental node embedding technique for error correcting output codes. Pattern Recognition 41, 2 (Feb. 2008), 713--725. Google Scholar
Digital Library
- Sujith Ravi and Kevin Knight. 2009. Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Association for Computational Linguistics, 504--512. Google Scholar
Digital Library
- Sujith Ravi, Sergei Vassilivitskii, and Vibhor Rastogi. 2014. Parallel algorithms for unsupervised tagging. Transactions of the Association for Computational Linguistics 2 (2014), 105--118.Google Scholar
Cross Ref
- Sujith Ravi, Ashish Vaswani, Kevin Knight, and David Chiang. 2010. Fast, greedy model minimization for unsupervised tagging. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 940--948. Google Scholar
Digital Library
- Kairit Sirts, Jacob Eisenstein, Micha Elsner, and Sharon Goldwater. 2014. POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Baltimore, Maryland, 265--271.Google Scholar
Cross Ref
- Noah A. Smith and Jason Eisner. 2005. Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, Stroudsburg, PA, 354--362. Google Scholar
Digital Library
- Kristina Toutanova, Mark Johnson, et al. 2007. A Bayesian LDA-based model for semi-supervised part-of-speech tagging. Advances in Neural Information Processing Systems. 1521--1528. Google Scholar
Digital Library
- Mehmet Ali Yatbaz and Deniz Yuret. 2010. Unsupervised part of speech tagging using unambiguous substitutes from a statistical language model. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 1391--1398. Google Scholar
Digital Library
- Minling Zhang. 2014. Disambiguation-free partial label learning. In Proceedings of the 14th SIAM International Conference on Data Mining (SDM’14). 37--45.Google Scholar
Cross Ref
- Meishan Zhang, Yue Zhang, Wanxiang Che, and Ting Liu. 2014. Type-supervised domain adaptation for joint segmentation and POS-tagging. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 588--597.Google Scholar
Cross Ref
- Qiuye Zhao and Mitch Marcus. 2009. A simple unsupervised learner for POS disambiguation rules given only a minimal lexicon. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 (EMNLP’09). Association for Computational Linguistics, Stroudsburg, PA, 688--697. http://dl.acm.org/citation.cfm?id=1699571.1699602 Google Scholar
Digital Library
- Deyu Zhou, Liangyu Chen, and Yulan He. 2015. An unsupervised framework of exploring events on twitter: Filtering, extraction and categorization. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2468--2474. Google Scholar
Digital Library
- Deyu Zhou, Dayou Zhong, and Yulan He. 2014. Event trigger identification for biomedical events extraction using domain knowledge. Bioinformatics 30, 11 (2014), 1587.Google Scholar
Cross Ref
- Zhi-Hua Zhou. 2012. Ensemble Methods: Foundations and Algorithms (1st ed.). Chapman 8 Hall/CRC. Google Scholar
Cross Ref
Index Terms
Weakly Supervised POS Tagging without Disambiguation
Recommendations
Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages
The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS ...
A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus
NISS19: Proceedings of the 2nd International Conference on Networking, Information Systems & SecurityPart-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic ...
Experiments on POS tagging and data driven dependency parsing for Telugu language
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and InformaticsIn this paper we present our experiments on Part-Of-Speech tagging and data driven dependency Parsing for Telugu language. We adopted three Part-Of-Speech taggers named as Brill tagger, Maximum Entropy tagger and Trigrams 'n' Tags tagger (TnT) to Telugu ...






Comments