skip to main content
research-article

Improving Code-mixed POS Tagging Using Code-mixed Embeddings

Authors Info & Claims
Published:29 March 2020Publication History
Skip Abstract Section

Abstract

Social media data has become invaluable component of business analytics. A multitude of nuances of social media text make the job of conventional text analytical tools difficult. Code-mixing of text is a phenomenon prevalent among social media users, wherein words used are borrowed from multiple languages, though written in the commonly understood roman script. All the existing supervised learning methods for tasks such as Parts Of Speech (POS) tagging for code-mixed social media (CMSM) text typically depend on a large amount of training data. Preparation of such large training data is resource-intensive, requiring expertise in multiple languages. Though the preparation of small dataset is possible, the out of vocabulary (OOV) words pose major difficulty, while learning models from CMSM text as the number of different ways of writing non-native words in roman script is huge. POS tagging for code-mixed text is non-trivial, as tagging should deal with syntactic rules of multiple languages. The important research question addressed by this article is whether abundantly available unlabeled data can help in resolving the difficulties posed by code-mixed text for POS tagging. We develop an approach for scraping and building word embeddings for code-mixed text illustrating it for Bengali-English, Hindi-English, and Telugu-English code-mixing scenarios. We used a hierarchical deep recurrent neural network with linear-chain CRF layer on top of it to improve the performance of POS tagging in CMSM text by capturing contextual word features and character-sequence–based information. We prepared a labeled resource for POS tagging of CMSM text by correcting 19% of labels from an existing resource. A detailed analysis of the performance of our approach with varying levels of code-mixing is provided. The results indicate that the F1-score of our approach with custom embeddings is better than the CRF-based baseline by 5.81%, 5.69%, and 6.3% in Bengali, Hindi, and Telugu languages, respectively.

References

  1. Hanan Aldarmaki and Mona Diab. 2015. Robust part-of-speech tagging of Arabic text. In Proceedings of the 2nd Workshop on Arabic Natural Language Processing. Association for Computational Linguistics, 173–182. DOI:https://doi.org/10.18653/v1/W15-3222Google ScholarGoogle ScholarCross RefCross Ref
  2. Ozkan Aslan, Serkan Gunal, and Bekir Taner Dincer. November 2018. On constituent chunking for Turkish. Inf. Proc. Manag. 54, 6 (Nov. 2018), 1262--1276.Google ScholarGoogle Scholar
  3. Vinayak Athavale, Shreenivas Bharadwaj, Monik Pamecha, Ameya Prabhu, and Manish Shrivastava. 2016. Towards deep learning in Hindi NER: An approach to tackle the labelled data sparsity. In Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 154--160. Retrieved from http://www.aclweb.org/anthology/W16-6320.Google ScholarGoogle Scholar
  4. J. Atserias, B. Casas, E. Comelles, M. González, L. Padró, and M. Padró. 2006. FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC’06). European Language Resources Association (ELRA). Retrieved from http://www.aclweb.org/anthology/L06-1108.Google ScholarGoogle Scholar
  5. Ngo Xuan Bach, Nguyen Dieu Linh, and Tu Minh Phuong. 2018. An empirical study on POS tagging for Vietnamese social media text. Comput. Speech Lang. 50 (2018), 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kalika Bali, Jatin Sharma, Monojit Choudhury, and Yogarshi Vyas. 2014. “I am borrowing ya mixing?” An analysis of English-Hindi code mixing in Facebook. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. Association for Computational Linguistics, 116--126. DOI:https://doi.org/10.3115/v1/W14-3914Google ScholarGoogle ScholarCross RefCross Ref
  7. Utsab Barman, Amitava Das, Joachim Wagner, and Jennifer Foster. 2014. Code mixing: A challenge for language identification in the language of social media. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 13--23.Google ScholarGoogle ScholarCross RefCross Ref
  8. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Computat. Ling. 5, 1 (2017), 135--146.Google ScholarGoogle ScholarCross RefCross Ref
  9. Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Vincent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based n-gram models of natural language. Computat. Ling. 18, 4 (1992), 467--479.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gokul Chittaranjan, Yogarshi Vyas, Kalika Bali, and Monojit Choudhury. 2014. Word-level language identification using CRF: Code-switching shared task report of MSR india system. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 73--79.Google ScholarGoogle ScholarCross RefCross Ref
  11. Jason Chiu and Eric Nichols. 2016. Named entity recognition with bidirectional LSTM-CNNs. Trans. Assoc. Computat. Ling. 4, 1 (2016), 357--370.Google ScholarGoogle ScholarCross RefCross Ref
  12. Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2015. Gated feedback recurrent neural networks. In Proceedings of the International Conference on Machine Learning. 2067--2075.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning (ICML’08). ACM, New York, NY, 160--167. DOI:https://doi.org/10.1145/1390156.1390177Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  15. Manuel Vilares Ferro, Víctor Manuel Darriba Bilbao, and Francisco José Ribadas Pena. 2017. Modeling of learning curves with applications to POS tagging. Comput. Speech Lang. 41 (2017), 1--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Boris A. Galitsky. 2016. Generalization of parse trees for iterative taxonomy learning. Inf. Sci. 329 (2016), 125--143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Björn Gambäck and Amitava Das. 2016. Comparing the level of code-switching in corpora. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), 23–28.Google ScholarGoogle Scholar
  18. Souvick Ghosh, Satanu Ghosh, and Dipankar Das. 2016. Part-of-speech tagging of code-mixed social media text. In Proceedings of the 2nd Workshop on Computational Approaches to Code Switching. Association for Computational Linguistics, 90--97. DOI:https://doi.org/10.18653/v1/W16-5811.Google ScholarGoogle ScholarCross RefCross Ref
  19. Kevin Gimpel, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for Twitter: Annotation, features, and experiments. In Proceedings of the 49th Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers - Volume 2 (HLT’11). Association for Computational Linguistics, 42--47. Retrieved from http://dl.acm.org/citation.cfm?id=2002736.2002747.Google ScholarGoogle Scholar
  20. Yi Guo, Zhiqing Shao, and Nan Hua. 2010. A cognitive interactionist sentence parser with simple recurrent networks. Inf. Sci. 180, 23 (2010), 4695--4705.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Deepak Gupta, Ankit Lamba, Asif Ekbal, and Pushpak Bhattacharyya. 2016. Opinion mining in a code-mixed environment: A case study with government portals. In Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 249--258. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-6331.Google ScholarGoogle Scholar
  22. Deepak Gupta, Shubham Tripathi, Asif Ekbal, and Pushpak Bhattacharyya. 2017. SMPOST: Parts of speech tagger for code-mixed Indic social media text. arXiv preprint arXiv:1702.00167 (2017).Google ScholarGoogle Scholar
  23. Parth Gupta, Kalika Bali, Rafael E. Banchs, Monojit Choudhury, and Paolo Rosso. 2014. Query expansion for mixed-script information retrieval. In Proceedings of the 37th International ACM SIGIR Conference on Research 8 Development in Information Retrieval (SIGIR’14). ACM, New York, NY, 677--686. DOI:https://doi.org/10.1145/2600428.2609622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Marjolein Gysels. 1992. French in urban Lubumbashi Swahili: Codeswitching, borrowing, or both? J. Multiling. Multicult. Dev. 13, 1--2 (1992), 41--55.Google ScholarGoogle ScholarCross RefCross Ref
  25. Donald Hindle. 1989. Acquiring disambiguation rules from text. In Proceedings of the 27th Meeting on Association for Computational Linguistics (ACL’89). Association for Computational Linguistics, 118--125. DOI:https://doi.org/10.3115/981623.981638Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computat. 9, 8 (1997), 1735--1780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, and Eric Xing. 2016. Harnessing deep neural networks with logic rules. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2410--2420. DOI:https://doi.org/10.18653/v1/P16-1228Google ScholarGoogle ScholarCross RefCross Ref
  28. Aaron Jaech, George Mulcaire, Shobhit Hathi, Mari Ostendorf, and Noah A. Smith. 2016. Hierarchical character-word models for language identification. In Proceedings of the 4th International Workshop on Natural Language Processing for Social Media (Socia[email protected]’16). 84--93. DOI:https://doi.org/10.18653/v1/W16-6212Google ScholarGoogle Scholar
  29. Anupam Jamatia, Björn Gambäck, and Amitava Das. 2015. Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. In Proceedings of the International Conference on Recent Advances in Natural Language Processing. INCOMA Ltd., 239--248. Retrieved from http://www.aclweb.org/anthology/R15-1033.Google ScholarGoogle Scholar
  30. Aditya Joshi, Ameya Prabhu, Manish Shrivastava, and Vasudeva Varma. 2016. Towards sub-word level compositions for sentiment analysis of Hindi-English code mixed text. In Proceedings of the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, 2482--2491. Retrieved from http://aclweb.org/anthology/C16-1234.Google ScholarGoogle Scholar
  31. Katharina Kann, Johannes Bjerva, Isabelle Augenstein, Barbara Plank, and Anders Søgaard. 2018. Character-level supervision for low-resource POS tagging. In Proceedings of the Workshop on Deep Learning Approaches for Low-resource NLP. Association for Computational Linguistics, 1--11. Retrieved from http://aclweb.org/anthology/W18-3401.Google ScholarGoogle ScholarCross RefCross Ref
  32. Daisuke Kawahara, Sadao Kurohashi, and Kôiti Hasida. 2002. Construction of a Japanese relevance-tagged corpus. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC’02). European Language Resources Association (ELRA). Retrieved from http://www.aclweb.org/anthology/L02-1302.Google ScholarGoogle Scholar
  33. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR abs/1412.6980 (2014).Google ScholarGoogle Scholar
  34. Subham Kumar, Anwesh Sinha Ray, Sabyasachi Kamila, Asif Ekbal, Sriparna Saha, and Pushpak Bhattacharyya. 2016. Improving document ranking using query expansion and classification techniques for mixed script information retrieval. In Proceedings of the 13th International Conference on Natural Language Processing. NLP Association of India, 81--89. Retrieved from http://www.aclweb.org/anthology/W/W16/W16-6311.Google ScholarGoogle Scholar
  35. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML’01). Morgan Kaufmann Publishers Inc., 282--289. Retrieved from http://dl.acm.org/citation.cfm?id=645530.655813.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. Neural architectures for named entity recognition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260--270. DOI:https://doi.org/10.18653/v1/N16-1030Google ScholarGoogle ScholarCross RefCross Ref
  37. Ying Li and Pascale Fung. 2012. Code-switch language model with inversion constraints for mixed language speech recognition. In Proceedings of the International Conference on Computational Linguistics (COLING’12). The COLING 2012 Organizing Committee, 1671--1680. Retrieved from http://aclweb.org/anthology/C12-1102.Google ScholarGoogle Scholar
  38. Xiaohua Liu and Ming Zhou. 2013. Two-stage NER for tweets with clustering. Inf. Proc. Manag. 49 (2013), 264--273. DOI:https://doi.org/10.1016/j.ipm.2012.05.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Gang Luo, Xiaojiang Huang, Chin-Yew Lin, and Zaiqing Nie. 2015. Joint entity recognition and disambiguation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 879--888. Retrieved from https://aclweb.org/anthology/D/D15/D15-1104.Google ScholarGoogle ScholarCross RefCross Ref
  40. Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. arXiv preprint arXiv:1904.07094 (2019).Google ScholarGoogle Scholar
  41. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19, 2 (June 1993), 313--330. Retrieved from http://dl.acm.org/citation.cfm?id=972470.972475.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of the 43rd Meeting on Association for Computational Linguistics (ACL’05). Association for Computational Linguistics, 91--98. DOI:https://doi.org/10.3115/1219840.1219852Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 3111--3119.Google ScholarGoogle Scholar
  44. Rudra Murthy, Mitesh M. Khapra, and Pushpak Bhattacharyya. 2018. Improving NER tagging performance in low-resource languages via multilingual learning. ACM Trans. Asian Low-Resour. Lang. Inf. Proc. 18, 2, Article 9 (Dec. 2018), 20 pages. DOI:https://doi.org/10.1145/3238797Google ScholarGoogle Scholar
  45. Tetsuji Nakagawa and Kiyotaka Uchimoto. 2007. A hybrid approach to word segmentation and POS tagging. In Proceedings of the 45th Meeting of the ACL on Interactive Poster and Demonstration Sessions. Association for Computational Linguistics, 217--220.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Phuong-Thai Nguyen, Xuan-Luong Vu, Thi-Minh-Huyen Nguyen, Van-Hiep Nguyen, and Hong-Phuong Le. 2009. Building a large syntactically annotated corpus of vietnamese. In Proceedings of the 3rd Linguistic Annotation Workshop (ACL-IJCNLP’09). Association for Computational Linguistics, 182--185. Retrieved from http://dl.acm.org/citation.cfm?id=1698381.1698416.Google ScholarGoogle ScholarCross RefCross Ref
  47. Joakim Nivre and Mario Scholz. 2004. Deterministic dependency parsing of English text. In Proceedings of the 20th International Conference on Computational Linguistics (COLING’04). Association for Computational Linguistics, Article 64. DOI:https://doi.org/10.3115/1220355.1220365Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Olutobi Owoputi, Brendan O’Connor, Chris Dyer, Kevin Gimpel, Nathan Schneider, and Noah A. Smith. 2013. Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 380--390.Google ScholarGoogle Scholar
  49. Rana D. Parshad, Suman Bhowmick, Vineeta Chand, Nitu Kumari, and Neha Sinha. 2016. What is India speaking? Exploring the “Hinglish” invasion. Phys. A: Statist. Mech. Applic. 449 (2016), 375--389.Google ScholarGoogle ScholarCross RefCross Ref
  50. Alexandre Passos, Vineet Kumar, and Andrew McCallum. 2014. Lexicon infused phrase embeddings for named entity resolution. In Proceedings of the 18th Conference on Computational Natural Language Learning. Association for Computational Linguistics, 78--86. Retrieved from http://www.aclweb.org/anthology/W/W14/W14-1609.Google ScholarGoogle ScholarCross RefCross Ref
  51. Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543.Google ScholarGoogle ScholarCross RefCross Ref
  52. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). European Language Resources Association (ELRA).Google ScholarGoogle Scholar
  53. Prakash B. Pimpale and Raj Nath Patel. 2016. Experiments with POS tagging code-mixed Indian social media text. arXiv preprint arXiv:1610.09799 (2016).Google ScholarGoogle Scholar
  54. Adithya Pratapa, Gayatri Bhat, Monojit Choudhury, Sunayana Sitaram, Sandipan Dandapat, and Kalika Bali. 2018a. Language modeling for code-mixing: The role of linguistic theory based synthetic data. In Proceedings of the 56th Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1543--1553.Google ScholarGoogle ScholarCross RefCross Ref
  55. Adithya Pratapa, Monojit Choudhury, and Sunayana Sitaram. 2018b. Word embeddings for code-mixed language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3067--3072. Retrieved from https://www.aclweb.org/anthology/D18-1344.Google ScholarGoogle ScholarCross RefCross Ref
  56. L. R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 2 (Feb. 1989), 257--286. DOI:https://doi.org/10.1109/5.18626Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Khyathi Chandu Raghavi, Manoj Kumar Chinnakotla, and Manish Shrivastava. 2015. “Answer Ka Type Kya He?”: Learning to classify questions in code-mixed language. In Proceedings of the 24th International Conference on World Wide Web (WWW’15 Companion). ACM, New York, NY, 853--858. DOI:https://doi.org/10.1145/2740908.2743006Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Pattabhi R. K. Rao and Sobha Lalitha Devi. 2016. CMEE-IL: Code mix entity extraction in Indian languages from social media Text@ FIRE 2016-An overview. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16) (Working Notes). 289--295.Google ScholarGoogle Scholar
  59. Lev Ratinov and Dan Roth. 2009. Design challenges and misconceptions in named entity recognition. In Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL’09). Association for Computational Linguistics, 147--155. Retrieved from http://dl.acm.org/citation.cfm?id=1596374.1596399.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Retrieved from http://www.aclweb.org/anthology/W96-0213.Google ScholarGoogle Scholar
  61. Alan Ritter, Sam Clark, Oren Etzioni, et al. 2011. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1524--1534.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Salvatore Romeo, Giovanni Da San Martino, Yonatan Belinkov, Alberto Barrón-Cedeño, Mohamed Eldesouki, Kareem Darwish, Hamdy Mubarak, James Glass, and Alessandro Moschitti. 2019. Language processing and learning models for community question answering in Arabic. Inf. Proc. Manag. 56, 2 (2019), 274--290.Google ScholarGoogle ScholarCross RefCross Ref
  63. Koustav Rudra, Shruti Rijhwani, Rafiya Begum, Kalika Bali, Monojit Choudhury, and Niloy Ganguly. 2016. Understanding language preference for expression of opinion and sentiment: What do Hindi-English speakers do on Twitter? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1131--1141.Google ScholarGoogle ScholarCross RefCross Ref
  64. Kamal Sarkar. 2016. Part-of-speech tagging for code-mixed Indian social media text at ICON 2015. arXiv preprint arXiv:1601.01195 (2016).Google ScholarGoogle Scholar
  65. Mike Schuster and Kuldip K. Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Trans. Sig. Proc. 45, 11 (1997), 2673--2681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Mike Schuster and John Wiley. 1999. Neural networks for speech processing. Encyclopedia of Electrical and Electronic Engineering. John Wiley and Sons.Google ScholarGoogle Scholar
  67. Royal Sequiera, Monojit Choudhury, and Kalika Bali. 2015. POS tagging of Hindi-English code mixed text from social media: Some machine learning experiments. In Proceedings of the 12th International Conference on Natural Language Processing. NLP Association of India, 237--246. Retrieved from http://www.aclweb.org/anthology/W/W15/W15-5936.Google ScholarGoogle Scholar
  68. Arnav Sharma, Sakshi Gupta, Raveesh Motlani, Piyush Bansal, Manish Shrivastava, Radhika Mamidi, and Dipti M. Sharma. 2016. Shallow parsing pipeline—Hindi-English code-mixed social media text. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1340--1345. Retrieved from http://www.aclweb.org/anthology/N16-1159.Google ScholarGoogle Scholar
  69. Rajat Singh, Nurendra Choudhary, and Manish Shrivastava. 2018. Automatic normalization of word variations in code-mixed social media text. arXiv preprint arXiv:1804.00804 (2018).Google ScholarGoogle Scholar
  70. Thamar Solorio, Elizabeth Blair, Suraj Maharjan, Steven Bethard, Mona Diab, Mahmoud Ghoneim, Abdelati Hawwari, Fahad Al Ghamdi, Julia Hirschberg, Alison Chang, et al. 2014. Overview for the first shared task on language identification in code-switched data. In Proceedings of the 1st Workshop on Computational Approaches to Code Switching. 62--72.Google ScholarGoogle ScholarCross RefCross Ref
  71. Thamar Solorio and Yang Liu. 2008a. Learning to predict code-switching points. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). Association for Computational Linguistics, 973--981. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613841.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Thamar Solorio and Yang Liu. 2008b. Part-of-speech tagging for English-Spanish code-switched text. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’08). Association for Computational Linguistics, 1051--1060. Retrieved from http://dl.acm.org/citation.cfm?id=1613715.1613852.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Nagesh Bhattu Sristy, N. Satya Krishna, B. Shiva Krishna, and Vadlamani Ravi. 2017. Language identification in mixed script. In Proceedings of the 9th Meeting of the Forum for Information Retrieval Evaluation. ACM, 14--20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Jana Straková, Milan Straka, and Jan Hajic. 2019. Neural architectures for nested NER through linearization. In Proceedings of the 57th Meeting of the Association for Computational Linguistics. 5326--5331.Google ScholarGoogle ScholarCross RefCross Ref
  75. Ilya Sutskever, James Martens, and Geoffrey E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1017--1024.Google ScholarGoogle Scholar
  76. Ioannis Tsochantaridis, Thorsten Joachims, Thomas Hofmann, and Yasemin Altun. 2005. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, Sept. (2005), 1453--1484.Google ScholarGoogle Scholar
  77. P. V. Veena, M. Anand Kumar, and K. P. Soman. 2018. Character embedding for language identification in Hindi-English code-mixed social media text. Comput. Sistemas 22, 1 (2018), 65--74.Google ScholarGoogle Scholar
  78. Yogarshi Vyas, Spandana Gella, Jatin Sharma, Kalika Bali, and Monojit Choudhury. 2014. POS tagging of English-Hindi code-mixed social media content. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 974--979.Google ScholarGoogle ScholarCross RefCross Ref
  79. Hu Xu, Bing Liu, Lei Shu, and Philip S. Yu. 2019. Bert post-training for review reading comprehension and aspect-based sentiment analysis. arXiv preprint arXiv:1904.02232 (2019).Google ScholarGoogle Scholar
  80. Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and Jimmy Lin. 2019. End-to-end open-domain question answering with BERTserini. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). 72--77.Google ScholarGoogle Scholar
  81. Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 647--657. Retrieved from http://www.aclweb.org/anthology/D13-1061.Google ScholarGoogle Scholar
  82. Chenguang Zhu, Michael Zeng, and Xuedong Huang. 2018. SDNet: Contextualized attention-based deep network for conversational question answering. arXiv preprint arXiv:1812.03593 (2018).Google ScholarGoogle Scholar

Index Terms

  1. Improving Code-mixed POS Tagging Using Code-mixed Embeddings

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 4
        July 2020
        291 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3391538
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 March 2020
        • Accepted: 1 January 2020
        • Revised: 1 October 2019
        • Received: 1 April 2019
        Published in tallip Volume 19, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!