Abstract
Social media data has become invaluable component of business analytics. A multitude of nuances of social media text make the job of conventional text analytical tools difficult. Code-mixing of text is a phenomenon prevalent among social media users, wherein words used are borrowed from multiple languages, though written in the commonly understood roman script. All the existing supervised learning methods for tasks such as Parts Of Speech (POS) tagging for code-mixed social media (CMSM) text typically depend on a large amount of training data. Preparation of such large training data is resource-intensive, requiring expertise in multiple languages. Though the preparation of small dataset is possible, the out of vocabulary (OOV) words pose major difficulty, while learning models from CMSM text as the number of different ways of writing non-native words in roman script is huge. POS tagging for code-mixed text is non-trivial, as tagging should deal with syntactic rules of multiple languages. The important research question addressed by this article is whether abundantly available unlabeled data can help in resolving the difficulties posed by code-mixed text for POS tagging. We develop an approach for scraping and building word embeddings for code-mixed text illustrating it for Bengali-English, Hindi-English, and Telugu-English code-mixing scenarios. We used a hierarchical deep recurrent neural network with linear-chain CRF layer on top of it to improve the performance of POS tagging in CMSM text by capturing contextual word features and character-sequence–based information. We prepared a labeled resource for POS tagging of CMSM text by correcting 19% of labels from an existing resource. A detailed analysis of the performance of our approach with varying levels of code-mixing is provided. The results indicate that the F1-score of our approach with custom embeddings is better than the CRF-based baseline by 5.81%, 5.69%, and 6.3% in Bengali, Hindi, and Telugu languages, respectively.
- Hanan Aldarmaki and Mona Diab. 2015. Robust part-of-speech tagging of Arabic text. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. Association for Computational Linguistics, 173–182. DOI:https://doi.org/10.18653/v1/W15-3222Google Scholar
Cross Ref
- Ozkan Aslan, Serkan Gunal, and Bekir Taner Dincer. November 2018. On constituent chunking for Turkish. Inf. Proc. Manag. 54, 6 (Nov. 2018), 1262--1276.Google Scholar
- Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu, and Manish Shrivastava. 2016. Towards deep learning in Hindi NER: An approach to tackle the labelled data sparsity. In Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 154--160. Retrieved from http://www.aclweb.org/anthology/W16-6320.Google Scholar
- J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró. 2006. FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (ELRA). Retrieved from http://www.aclweb.org/anthology/L06-1108.Google Scholar
- Ngo Xuan Bach, Nguyen Dieu Linh, and Tu Minh Phuong. 2018. An empirical study on POS tagging for Vietnamese social media text. Comput. Speech Lang. 50 (2018), 1--15.Google Scholar
Digital Library
- Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. “I am borrowing ya mixing?” An analysis of English-Hindi code mixing in Facebook. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. Association for Computational Linguistics, 116--126. DOI:https://doi.org/10.3115/v1/W14-3914Google Scholar
Cross Ref
- Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 13--23.Google Scholar
Cross Ref
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Computat. Ling. 5, 1 (2017), 135--146.Google Scholar
Cross Ref
- Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Computat. Ling. 18, 4 (1992), 467--479.Google Scholar
Digital Library
- Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury. 2014. Word-level language identification using CRF: Code-switching shared task report of MSR india system. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 73--79.Google Scholar
Cross Ref
- Jason Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Computat. Ling. 4, 1 (2016), 357--370.Google Scholar
Cross Ref
- Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In Proceedings of the International Conference on Machine Learning. 2067--2075.Google Scholar
Digital Library
- Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). ACM, New York, NY, 160--167. DOI:https://doi.org/10.1145/1390156.1390177Google Scholar
Digital Library
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
- Manuel Vilares Ferro, Víctor Manuel Darriba Bilbao, and Francisco José Ribadas Pena. 2017. Modeling of learning curves with applications to POS tagging. Comput. Speech Lang. 41 (2017), 1--28.Google Scholar
Digital Library
- Boris A. Galitsky. 2016. Generalization of parse trees for iterative taxonomy learning. Inf. Sci. 329 (2016), 125--143.Google Scholar
Digital Library
- Björn Gambäck and Amitava Das. 2016. Comparing the level of code-switching in corpora. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 23–28.Google Scholar
- Souvick Ghosh, Satanu Ghosh, and Dipankar Das. 2016. Part-of-speech tagging of code-mixed social media text. In Proceedings of the 2nd Workshop on Computational Approaches to Code Switching. Association for Computational Linguistics, 90--97. DOI:https://doi.org/10.18653/v1/W16-5811.Google Scholar
Cross Ref
- Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for Twitter: Annotation, features, and experiments. In Proceedings of the 49th Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2 (HLT’11). Association for Computational Linguistics, 42--47. Retrieved from http://dl.acm.org/citation.cfm?id=2002736.2002747.Google Scholar
- Yi Guo, Zhiqing Shao, and Nan Hua. 2010. A cognitive interactionist sentence parser with simple recurrent networks. Inf. Sci. 180, 23 (2010), 4695--4705.Google Scholar
Digital Library
- Deepak Gupta, Ankit Lamba, Asif Ekbal, and Pushpak Bhattacharyya. 2016. Opinion mining in a code-mixed environment: A case study with government portals. In Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 249--258. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-6331.Google Scholar
- Deepak Gupta, Shubham Tripathi, Asif Ekbal, and Pushpak Bhattacharyya. 2017. SMPOST: Parts of speech tagger for code-mixed Indic social media text. arXiv preprint arXiv:1702.00167 (2017).Google Scholar
- Parth Gupta, Kalika Bali, Rafael E. Banchs, Monojit Choudhury, and Paolo Rosso. 2014. Query expansion for mixed-script information retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research 8 Development in Information Retrieval (SIGIR’14). ACM, New York, NY, 677--686. DOI:https://doi.org/10.1145/2600428.2609622.Google Scholar
Digital Library
- Marjolein Gysels. 1992. French in urban Lubumbashi Swahili: Codeswitching, borrowing, or both? J. Multiling. Multicult. Dev. 13, 1--2 (1992), 41--55.Google Scholar
Cross Ref
- Donald Hindle. 1989. Acquiring disambiguation rules from text. In Proceedings of the 27th Meeting on Association for Computational Linguistics (ACL’89). Association for Computational Linguistics, 118--125. DOI:https://doi.org/10.3115/981623.981638Google Scholar
Digital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computat. 9, 8 (1997), 1735--1780.Google Scholar
Digital Library
- Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. 2016. Harnessing deep neural networks with logic rules. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2410--2420. DOI:https://doi.org/10.18653/v1/P16-1228Google Scholar
Cross Ref
- Aaron Jaech, George Mulcaire, Shobhit Hathi, Mari Ostendorf, and Noah A. Smith. 2016. Hierarchical character-word models for language identification. In Proceedings of the 4th International Workshop on Natural Language Processing for Social Media (Socia[email protected]’16). 84--93. DOI:https://doi.org/10.18653/v1/W16-6212Google Scholar
- Anupam Jamatia, Björn Gambäck, and Amitava Das. 2015. Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. INCOMA Ltd., 239--248. Retrieved from http://www.aclweb.org/anthology/R15-1033.Google Scholar
- Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, 2482--2491. Retrieved from http://aclweb.org/anthology/C16-1234.Google Scholar
- Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, and Anders Søgaard. 2018. Character-level supervision for low-resource POS tagging. In Proceedings of the Workshop on Deep Learning Approaches for Low-resource NLP. Association for Computational Linguistics, 1--11. Retrieved from http://aclweb.org/anthology/W18-3401.Google Scholar
Cross Ref
- Daisuke Kawahara, Sadao Kurohashi, and Kôiti Hasida. 2002. Construction of a Japanese relevance-tagged corpus. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC’02). European Language Resources Association (ELRA). Retrieved from http://www.aclweb.org/anthology/L02-1302.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014).Google Scholar
- Subham Kumar, Anwesh Sinha Ray, Sabyasachi Kamila, Asif Ekbal, Sriparna Saha, and Pushpak Bhattacharyya. 2016. Improving document ranking using query expansion and classification techniques for mixed script information retrieval. In Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 81--89. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-6311.Google Scholar
- John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., 282--289. Retrieved from http://dl.acm.org/citation.cfm?id=645530.655813.Google Scholar
Digital Library
- Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260--270. DOI:https://doi.org/10.18653/v1/N16-1030Google Scholar
Cross Ref
- Ying Li and Pascale Fung. 2012. Code-switch language model with inversion constraints for mixed language speech recognition. In Proceedings of the International Conference on Computational Linguistics (COLING’12). The COLING 2012 Organizing Committee, 1671--1680. Retrieved from http://aclweb.org/anthology/C12-1102.Google Scholar
- Xiaohua Liu and Ming Zhou. 2013. Two-stage NER for tweets with clustering. Inf. Proc. Manag. 49 (2013), 264--273. DOI:https://doi.org/10.1016/j.ipm.2012.05.006Google Scholar
Digital Library
- Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing Nie. 2015. Joint entity recognition and disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 879--888. Retrieved from https://aclweb.org/anthology/D/D15/D15-1104.Google Scholar
Cross Ref
- Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. arXiv preprint arXiv:1904.07094 (2019).Google Scholar
- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 2 (June 1993), 313--330. Retrieved from http://dl.acm.org/citation.cfm?id=972470.972475.Google Scholar
Digital Library
- Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of the 43rd Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 91--98. DOI:https://doi.org/10.3115/1219840.1219852Google Scholar
Digital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3111--3119.Google Scholar
- Rudra Murthy, Mitesh M. Khapra, and Pushpak Bhattacharyya. 2018. Improving NER tagging performance in low-resource languages via multilingual learning. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 18, 2, Article 9 (Dec. 2018), 20 pages. DOI:https://doi.org/10.1145/3238797Google Scholar
- Tetsuji Nakagawa and Kiyotaka Uchimoto. 2007. A hybrid approach to word segmentation and POS tagging. In Proceedings of the 45th Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 217--220.Google Scholar
Digital Library
- Phuong-Thai Nguyen, Xuan-Luong Vu, Thi-Minh-Huyen Nguyen, Van-Hiep Nguyen, and Hong-Phuong Le. 2009. Building a large syntactically annotated corpus of vietnamese. In Proceedings of the 3rd Linguistic Annotation Workshop (ACL-IJCNLP’09). Association for Computational Linguistics, 182--185. Retrieved from http://dl.acm.org/citation.cfm?id=1698381.1698416.Google Scholar
Cross Ref
- Joakim Nivre and Mario Scholz. 2004. Deterministic dependency parsing of English text. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). Association for Computational Linguistics, Article 64. DOI:https://doi.org/10.3115/1220355.1220365Google Scholar
Digital Library
- Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--390.Google Scholar
- Rana D. Parshad, Suman Bhowmick, Vineeta Chand, Nitu Kumari, and Neha Sinha. 2016. What is India speaking? Exploring the “Hinglish” invasion. Phys. A: Statist. Mech. Applic. 449 (2016), 375--389.Google Scholar
Cross Ref
- Alexandre Passos, Vineet Kumar, and Andrew McCallum. 2014. Lexicon infused phrase embeddings for named entity resolution. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 78--86. Retrieved from http://www.aclweb.org/anthology/W/W14/W14-1609.Google Scholar
Cross Ref
- Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google Scholar
Cross Ref
- Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA).Google Scholar
- Prakash B. Pimpale and Raj Nath Patel. 2016. Experiments with POS tagging code-mixed Indian social media text. arXiv preprint arXiv:1610.09799 (2016).Google Scholar
- Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, Sandipan Dandapat, and Kalika Bali. 2018a. Language modeling for code-mixing: The role of linguistic theory based synthetic data. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1543--1553.Google Scholar
Cross Ref
- Adithya Pratapa, Monojit Choudhury, and Sunayana Sitaram. 2018b. Word embeddings for code-mixed language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3067--3072. Retrieved from https://www.aclweb.org/anthology/D18-1344.Google Scholar
Cross Ref
- L. R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (Feb. 1989), 257--286. DOI:https://doi.org/10.1109/5.18626Google Scholar
Digital Library
- Khyathi Chandu Raghavi, Manoj Kumar Chinnakotla, and Manish Shrivastava. 2015. “Answer Ka Type Kya He?”: Learning to classify questions in code-mixed language. In Proceedings of the 24th International Conference on World Wide Web (WWW’15 Companion). ACM, New York, NY, 853--858. DOI:https://doi.org/10.1145/2740908.2743006Google Scholar
Digital Library
- Pattabhi R. K. Rao and Sobha Lalitha Devi. 2016. CMEE-IL: Code mix entity extraction in Indian languages from social media Text@ FIRE 2016-An overview. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) (Working Notes). 289--295.Google Scholar
- Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL’09). Association for Computational Linguistics, 147--155. Retrieved from http://dl.acm.org/citation.cfm?id=1596374.1596399.Google Scholar
Digital Library
- Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Retrieved from http://www.aclweb.org/anthology/W96-0213.Google Scholar
- Alan Ritter, Sam Clark, Oren Etzioni, et al. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1524--1534.Google Scholar
Digital Library
- Salvatore Romeo, Giovanni Da San Martino, Yonatan Belinkov, Alberto Barrón-Cedeño, Mohamed Eldesouki, Kareem Darwish, Hamdy Mubarak, James Glass, and Alessandro Moschitti. 2019. Language processing and learning models for community question answering in Arabic. Inf. Proc. Manag. 56, 2 (2019), 274--290.Google Scholar
Cross Ref
- Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Kalika Bali, Monojit Choudhury, and Niloy Ganguly. 2016. Understanding language preference for expression of opinion and sentiment: What do Hindi-English speakers do on Twitter? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1131--1141.Google Scholar
Cross Ref
- Kamal Sarkar. 2016. Part-of-speech tagging for code-mixed Indian social media text at ICON 2015. arXiv preprint arXiv:1601.01195 (2016).Google Scholar
- Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673--2681.Google Scholar
Digital Library
- Mike Schuster and John Wiley. 1999. Neural networks for speech processing. Encyclopedia of Electrical and Electronic Engineering. John Wiley and Sons.Google Scholar
- Royal Sequiera, Monojit Choudhury, and Kalika Bali. 2015. POS tagging of Hindi-English code mixed text from social media: Some machine learning experiments. In Proceedings of the 12th International Conference on Natural Language Processing. NLP Association of India, 237--246. Retrieved from http://www.aclweb.org/anthology/W/W15/W15-5936.Google Scholar
- Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Shrivastava, Radhika Mamidi, and Dipti M. Sharma. 2016. Shallow parsing pipeline—Hindi-English code-mixed social media text. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1340--1345. Retrieved from http://www.aclweb.org/anthology/N16-1159.Google Scholar
- Rajat Singh, Nurendra Choudhary, and Manish Shrivastava. 2018. Automatic normalization of word variations in code-mixed social media text. arXiv preprint arXiv:1804.00804 (2018).Google Scholar
- Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad Al Ghamdi, Julia Hirschberg, Alison Chang, et al. 2014. Overview for the first shared task on language identification in code-switched data. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 62--72.Google Scholar
Cross Ref
- Thamar Solorio and Yang Liu. 2008a. Learning to predict code-switching points. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). Association for Computational Linguistics, 973--981. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613841.Google Scholar
Digital Library
- Thamar Solorio and Yang Liu. 2008b. Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). Association for Computational Linguistics, 1051--1060. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613852.Google Scholar
Digital Library
- Nagesh Bhattu Sristy, N. Satya Krishna, B. Shiva Krishna, and Vadlamani Ravi. 2017. Language identification in mixed script. In Proceedings of the 9th Meeting of the Forum for Information Retrieval Evaluation. ACM, 14--20.Google Scholar
Digital Library
- Jana Straková, Milan Straka, and Jan Hajic. 2019. Neural architectures for nested NER through linearization. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. 5326--5331.Google Scholar
Cross Ref
- Ilya Sutskever, James Martens, and Geoffrey E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1017--1024.Google Scholar
- Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. 2005. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, Sept. (2005), 1453--1484.Google Scholar
- P. V. Veena, M. Anand Kumar, and K. P. Soman. 2018. Character embedding for language identification in Hindi-English code-mixed social media text. Comput. Sistemas 22, 1 (2018), 65--74.Google Scholar
- Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. POS tagging of English-Hindi code-mixed social media content. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 974--979.Google Scholar
Cross Ref
- Hu Xu, Bing Liu, Lei Shu, and Philip S. Yu. 2019. Bert post-training for review reading comprehension and aspect-based sentiment analysis. arXiv preprint arXiv:1904.02232 (2019).Google Scholar
- Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-end open-domain question answering with BERTserini. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 72--77.Google Scholar
- Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 647--657. Retrieved from http://www.aclweb.org/anthology/D13-1061.Google Scholar
- Chenguang Zhu, Michael Zeng, and Xuedong Huang. 2018. SDNet: Contextualized attention-based deep network for conversational question answering. arXiv preprint arXiv:1812.03593 (2018).Google Scholar
Index Terms
Improving Code-mixed POS Tagging Using Code-mixed Embeddings
Recommendations
Neural POS tagging of shahmukhi by using contextualized word representations
AbstractPart of Speech (POS) tagging has a preliminary role in building natural language processing applications. This paper presents the development and evaluation of the first POS tagged corpus along with a Bi-directional long-short memory (...
Improving Indic code-mixed to monolingual translation using Mixed Script Augmentation, Generation & Transfer Learning
The use of code-mixed languages (written in Roman character) on social media platforms is prevalent in multilingual nations. Translation from code-mixed to monolingual is necessary for social media analysis, content filtering, and targeted advertising. ...
Unsupervised Joint PoS Tagging and Stemming for Agglutinative Languages
The number of possible word forms is theoretically infinite in agglutinative languages. This brings up the out-of-vocabulary (OOV) issue for part-of-speech (PoS) tagging in agglutinative languages. Since inflectional morphology does not change the PoS ...






Comments