skip to main content
research-article

Am I a Resource-Poor Language? Data Sets, Embeddings, Models and Analysis for four different NLP Tasks in Telugu Language

Published:25 November 2022Publication History
Skip Abstract Section

Abstract

Due to the lack of a large annotated corpus, many resource-poor Indian languages struggle to reap the benefits of recent deep feature representations in Natural Language Processing (NLP). Moreover, adopting existing language models trained on large English corpora for Indian languages is often limited by data availability, rich morphological variation, syntax, and semantic differences. In this paper, we explore the traditional to recent efficient representations to overcome the challenges of a low resource language, Telugu. In particular, our main objective is to mitigate the low-resource problem for Telugu. Overall, we present several contributions to a resource-poor language viz. Telugu. (i) a large annotated data (35,142 sentences in each task) for multiple NLP tasks such as sentiment analysis, emotion identification, hate-speech detection, and sarcasm detection, (ii) we create different lexicons for sentiment, emotion, and hate-speech for improving the efficiency of the models, (iii) pretrained word and sentence embeddings, and (iv) different pretrained language models for Telugu such as ELMo-Te, BERT-Te, RoBERTa-Te, ALBERT-Te, and DistilBERT-Te on a large Telugu corpus consisting of 8,015,588 sentences (1,637,408 sentences from Telugu Wikipedia and 6,378,180 sentences crawled from different Telugu websites). Further, we show that these representations significantly improve the performance of four NLP tasks and present the benchmark results for Telugu. We argue that our pretrained embeddings are competitive or better than the existing multilingual pretrained models: mBERT, XLM-R, and IndicBERT. Lastly, the fine-tuning of pretrained models show higher performance than linear probing results on four NLP tasks with the following F1-scores: Sentiment (68.72), Emotion (58.04), Hate-Speech (64.27), and Sarcasm (77.93). We also experiment on publicly available Telugu datasets (Named Entity Recognition, Article Genre Classification, and Sentiment Analysis) and find that our Telugu pretrained language models (BERT-Te and RoBERTa-Te) outperform the state-of-the-art system except for the sentiment task. We open-source our corpus, four different datasets, lexicons, embeddings, and code  https://github.com/Cha14ran/DREAM-T. The pretrained Transformer models for Telugu are available at  https://huggingface.co/ltrctelugu.

REFERENCES

  1. [1] Abdul-Mageed Muhammad and Ungar Lyle. 2017. EmoNet: Fine-grained emotion detection with gated recurrent neural networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long papers). 718728.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Amir Silvio, Wallace Byron C., Lyu Hao, Silva Paula Carvalho, and Mário J.. 2016. Modelling context with user embeddings for sarcasm detection in social media. arXiv preprint arXiv:1607.00976 (2016).Google ScholarGoogle Scholar
  3. [3] Arulmozi S. and Murty M. C. Kesava. 2017. Building Telugu WordNet using expansion approach. In The WordNet in Indian Languages. Springer, 201208.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Barbieri Francesco, Saggion Horacio, and Ronzano Francesco. 2014. Modelling sarcasm in Twitter, a novel approach. In Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 5058.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Bhowmick Plaban Kr., Basu Anupam, Mitra Pabitra, and Prasad Abhishek. 2009. Multi-label text classification approach for sentence level news emotion analysis. In International Conference on Pattern Recognition and Machine Intelligence. Springer, 261266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Bojanowski Piotr, Grave Edouard, Joulin Armand, and Mikolov Tomas. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135146.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Brill Eric, Dumais Susan, and Banko Michele. 2002. An analysis of the AskMSR question-answering system. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002). 257264.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Burnap Pete and Williams Matthew L.. 2015. Cyber hate speech on Twitter: An application of machine classification and statistical modeling for policy and decision making. Policy & Internet 7, 2 (2015), 223242.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Burnap Pete and Williams Matthew L.. 2016. Us and them: Identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science 5, 1 (2016), 11.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Chawla Nitesh V., Bowyer Kevin W., Hall Lawrence O., and Kegelmeyer W. Philip. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16 (2002), 321357.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chen Ying, Lee Sophia Yat Mei, Li Shoushan, and Huang Chu-Ren. 2010. Emotion cause detection with linguistic constructions. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 179187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Chen Zhiyuan, Ma Nianzu, and Liu Bing. 2018. Lifelong learning for sentiment classification. arXiv preprint arXiv:1801.02808 (2018).Google ScholarGoogle Scholar
  13. [13] Cho Kyunghyun, Merriënboer Bart van, Bahdanau Dzmitry, and Bengio Yoshua. 2014. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. 103111.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Choi Yejin and Cardie Claire. 2008. Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 793801.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Choudhary Nurendra, Singh Rajat, Bindlish Ishita, and Shrivastava Manish. 2018. Sentiment analysis of code-mixed languages leveraging resource rich languages. arXiv preprint arXiv:1804.00806 (2018).Google ScholarGoogle Scholar
  16. [16] Chung Junyoung, Gulcehre Caglar, Cho KyungHyun, and Bengio Yoshua. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).Google ScholarGoogle Scholar
  17. [17] Cohn Trevor and Blunsom Phil. 2005. Semantic role labelling with tree conditional random fields. (2005).Google ScholarGoogle Scholar
  18. [18] Conneau Alexis and Lample Guillaume. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems. 70577067.Google ScholarGoogle Scholar
  19. [19] Das Amitava and Bandyopadhyay Sivaji. 2010. SentiWordNet for Indian languages. In Proceedings of the Eighth Workshop on Asian Language Resources. 5663.Google ScholarGoogle Scholar
  20. [20] Davidov Dmitry, Tsur Oren, and Rappoport Ari. 2010. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 107116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Multilingual BERT -r. https://github.com/google-research/bert/blob/master/multilingual.md.Google ScholarGoogle Scholar
  22. [22] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 41714186.Google ScholarGoogle Scholar
  23. [23] Ekman Paul. 1992. An argument for basic emotions. Cognition & Emotion 6, 3–4 (1992), 169200.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Esuli Andrea and Sebastiani Fabrizio. 2006. SentiWordNet: A publicly available lexical resource for opinion mining. In LREC, Vol. 6. Citeseer, 417422.Google ScholarGoogle Scholar
  25. [25] Felbo Bjarke, Mislove Alan, Søgaard Anders, Rahwan Iyad, and Lehmann Sune. 2017. Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv preprint arXiv:1708.00524 (2017).Google ScholarGoogle Scholar
  26. [26] Gangula Rama Rohit Reddy and Mamidi Radhika. 2018. Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).Google ScholarGoogle Scholar
  27. [27] Gitari Njagi Dennis, Zuping Zhang, Damien Hanyurwimfura, and Long Jun. 2015. A lexicon-based approach for hate speech detection. International Journal of Multimedia and Ubiquitous Engineering 10, 4 (2015), 215230.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Hadiwinoto Christian, Ng Hwee Tou, and Gan Wee Chung. 2019. Improved word sense disambiguation using pre-trained contextualized word representations. arXiv preprint arXiv:1910.00194 (2019).Google ScholarGoogle Scholar
  29. [29] Heinzerling Benjamin and Strube Michael. 2018. BPEmb: Tokenization-free pre-trained subword embeddings in 275 languages. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan. Google ScholarGoogle Scholar
  30. [30] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Hu Minqing and Liu Bing. 2004. Mining and summarizing customer reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 168177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Joshi Aditya, Bhattacharyya Pushpak, and Carman Mark J.. 2017. Automatic sarcasm detection: A survey. ACM Computing Surveys (CSUR) 50, 5 (2017), 122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Joshi Aditya, Tripathi Vaibhav, Patel Kevin, Bhattacharyya Pushpak, and Carman Mark. 2016. Are word embedding-based features useful for sarcasm detection? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 10061011.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Joulin Armand, Grave Edouard, Bojanowski Piotr, and Mikolov Tomas. 2016. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016).Google ScholarGoogle Scholar
  35. [35] Kakwani Divyanshu, Kunchukuttan Anoop, Golla Satish, Gokul N. C., Bhattacharyya Avik, Khapra Mitesh M., and Kumar Pratyush. 2020. iNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 49484961.Google ScholarGoogle Scholar
  36. [36] Kalchbrenner Nal, Grefenstette Edward, and Blunsom Phil. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 655665.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Kim Hae-Young. 2014. Analysis of variance (ANOVA) comparing means of more than two groups. Restorative Dentistry & Endodontics 39, 1 (2014), 7477.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Kim Soo-Min and Hovy Eduard. 2004. Determining the sentiment of opinions. In Proceedings of the 20th International Conference on Computational Linguistics. Association for Computational Linguistics, 1367.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 17461751.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Kiros Ryan, Zhu Yukun, Salakhutdinov Russ R., Zemel Richard, Urtasun Raquel, Torralba Antonio, and Fidler Sanja. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems. 32943302.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Kshirsagar Rohan, Cukuvac Tyus, McKeown Kathleen, and McGregor Susan. 2018. Predictive embeddings for hate speech detection on Twitter. arXiv preprint arXiv:1809.10644 (2018).Google ScholarGoogle Scholar
  42. [42] Lample Guillaume and Conneau Alexis. 2019. Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019).Google ScholarGoogle Scholar
  43. [43] Lan Zhenzhong, Chen Mingda, Goodman Sebastian, Gimpel Kevin, Sharma Piyush, and Soricut Radu. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).Google ScholarGoogle Scholar
  44. [44] Le Quoc and Mikolov Tomas. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 11881196.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  46. [46] Marreddy Mounika, Oota Subba Reddy, Vakada Lakshmi Sireesha, Chinni Venkata Charan, and Mamidi Radhika. 2021. Clickbait detection in Telugu: Overcoming NLP challenges in resource-poor languages using benchmarked techniques. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] McCann Bryan, Bradbury James, Xiong Caiming, and Socher Richard. 2017. Learned in translation: Contextualized word vectors. In Advances in Neural Information Processing Systems. 62946305.Google ScholarGoogle Scholar
  48. [48] Mihalcea Rada and Strapparava Carlo. 2012. Lyrics, music, and emotions. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 590599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Mikolov Tomas, Sutskever Ilya, Chen Kai, Corrado Greg S., and Dean Jeff. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 31113119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Mohammad Saif M., Kiritchenko Svetlana, and Zhu Xiaodan. 2013. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242 (2013).Google ScholarGoogle Scholar
  51. [51] Mohammad Saif M. and Turney Peter D.. 2010. Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text. Association for Computational Linguistics, 2634.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Mohammad Saif M. and Turney Peter D.. 2013. Crowdsourcing a word–emotion association lexicon. Computational Intelligence 29, 3 (2013), 436465.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Moilanen Karo and Pulman Stephen. 2007. Sentiment Composition. (2007).Google ScholarGoogle Scholar
  54. [54] Mukku Sandeep Sricharan and Mamidi Radhika. 2017. ACTSA: Annotated corpus for Telugu sentiment analysis. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems. 5458.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Mukku Sandeep Sricharan, Oota Subba Reddy, and Mamidi Radhika. 2017. Tag me a label with multi-arm: Active learning for Telugu sentiment analysis. In International Conference on Big Data Analytics and Knowledge Discovery. Springer, 355367.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Nandy Ankita. Beyond Words: Pictograms for Indian Languages. ([n.d.]).Google ScholarGoogle Scholar
  57. [57] Nobata Chikashi, Tetreault Joel, Thomas Achint, Mehdad Yashar, and Chang Yi. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. 145153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Pang Bo and Lee Lillian. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of ACL. 115124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Parupalli Sreekavitha, Rao Vijjini Anvesh, and Mamidi Radhika. 2018. BCSAT: A benchmark corpus for sentiment analysis in Telugu using word-level annotations. In Proceedings of ACL 2018, Student Research Workshop. 99104.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Parupalli Sreekavitha and Singh Navjyoti. 2018. Enrichment of OntoSenseNet: Adding a sense-annotated Telugu lexicon. arXiv preprint arXiv:1804.02186 (2018).Google ScholarGoogle Scholar
  61. [61] Pennington Jeffrey, Socher Richard, and Manning Christopher. 2014. GloVe: Global Vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 15321543.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Peters Matthew E., Neumann Mark, Iyyer Mohit, Gardner Matt, Clark Christopher, Lee Kenton, and Zettlemoyer Luke. 2018. Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018).Google ScholarGoogle Scholar
  63. [63] Plutchik Robert. 2001. The nature of emotions: Human emotions have deep evolutionary roots, a fact that may explain their complexity and provide tools for clinical practice. American Scientist 89, 4 (2001), 344350.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Qian Qiao, Huang Minlie, Lei Jinhao, and Zhu Xiaoyan. 2016. Linguistically regularized LSTMs for sentiment classification. arXiv preprint arXiv:1611.03949 (2016).Google ScholarGoogle Scholar
  65. [65] Ramos Juan et al. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Vol. 242. Piscataway, NJ, 133142.Google ScholarGoogle Scholar
  66. [66] Reagan Andrew J., Mitchell Lewis, Kiley Dilan, Danforth Christopher M., and Dodds Peter Sheridan. 2016. The emotional arcs of stories are dominated by six basic shapes. EPJ Data Science 5, 1 (2016), 31.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Riloff Ellen, Qadir Ashequl, Surve Prafulla, Silva Lalindra De, Gilbert Nathan, and Huang Ruihong. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704714.Google ScholarGoogle Scholar
  68. [68] Ruxton Graeme D. and Beauchamp Guy. 2008. Time for some a priori thinking about post hoc testing. Behavioral Ecology 19, 3 (2008), 690693.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Sanh Victor, Debut Lysandre, Chaumond Julien, and Wolf Thomas. 2019. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).Google ScholarGoogle Scholar
  70. [70] Sharma Raksha, Somani Arpan, Kumar Lakshya, and Bhattacharyya Pushpak. 2017. Sentiment intensity ranking among adjectives using sentiment bearing word embeddings. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 547552.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Socher Richard, Perelygin Alex, Wu Jean, Chuang Jason, Manning Christopher D., Ng Andrew Y., and Potts Christopher. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 16311642.Google ScholarGoogle Scholar
  72. [72] Stieglitz Stefan and Dang-Xuan Linh. 2013. Emotions and information diffusion in social media—sentiment of microblogs and sharing behavior. Journal of Management Information Systems 29, 4 (2013), 217248.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Suárez Pedro Javier Ortiz, Sagot Benoît, and Romary Laurent. 2019. Asynchronous pipeline for processing huge corpora on medium to low resource infrastructures. In 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7). Leibniz-Institut für Deutsche Sprache.Google ScholarGoogle Scholar
  74. [74] Swier Robert S. and Stevenson Suzanne. 2005. Exploiting a verb lexicon in automatic semantic role labelling. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 883890.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. [75] Tang Duyu, Qin Bing, and Liu Ting. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 14221432.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Tokuhisa Ryoko, Inui Kentaro, and Matsumoto Yuji. 2008. Emotion classification using massive examples extracted from the web. In Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 881888.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. [77] Tummalapalli Madhuri, Chinnakotla Manoj, and Mamidi Radhika. 2018. Towards better sentence classification for morphologically rich languages. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing.Google ScholarGoogle Scholar
  78. [78] Varshit Battu, Vishal Batchu Venkat, Reddy Dakannagari Mohana Murali Krishna, and Mamidi Radhika. 2018. Sentiment as a prior for movie rating prediction. In Proceedings of the 2nd International Conference on Innovation in Artificial Intelligence. 148153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Wallach Hanna M.. 2006. Topic modeling: Beyond bag-of-words. In Proceedings of the 23rd International Conference on Machine Learning. 977984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Waseem Zeerak and Hovy Dirk. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In Proceedings of the NAACL Student Research Workshop. 8893.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Wilson Theresa, Wiebe Janyce, and Hoffmann Paul. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 347354.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Wu Shijie and Dredze Mark. 2020. Are all languages created equal in multilingual BERT? In Proceedings of the 5th Workshop on Representation Learning for NLP. 120130.Google ScholarGoogle ScholarCross RefCross Ref
  83. [83] Xiong Caiming, Zhong Victor, and Socher Richard. 2016. Dynamic coattention networks for question answering. arXiv preprint arXiv:1611.01604 (2016).Google ScholarGoogle Scholar
  84. [84] Yang Changhua, Lin Kevin Hsin-Yih, and Chen Hsin-Hsi. 2007. Building emotion lexicon from weblog corpora. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions. 133136.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime, Salakhutdinov Russ R., and Le Quoc V.. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems. 57545764.Google ScholarGoogle Scholar
  86. [86] Yen Show-Jane and Lee Yue-Shi. 2006. Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In Intelligent Control and Automation. Springer, 731740.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Yessenalina Ainur, Yue Yisong, and Cardie Claire. 2010. Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 10461056.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Yin Wenpeng and Schütze Hinrich. 2016. Learning word meta-embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13511360.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Yu Hsiang-Fu, Huang Fang-Lan, and Lin Chih-Jen. 2011. Dual coordinate descent methods for logistic regression and maximum entropy models. Machine Learning 85, 1–2 (2011), 4175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. [90] Yu Liang-Chih, Wang Jin, Lai K. Robert, and Zhang Xuejie. 2017. Refining word embeddings for sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 534539.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Zhang Lei, Wang Shuai, and Liu Bing. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Zhang Meishan, Zhang Yue, and Fu Guohong. 2016. Tweet sarcasm detection using deep neural network. In Proceedings of COLING 2016, The 26th International Conference on Computational Linguistics: Technical Papers. 24492460.Google ScholarGoogle Scholar
  93. [93] Zhang Xiang, Zhao Junbo, and LeCun Yann. 2015. Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems 28 (2015), 649657.Google ScholarGoogle Scholar
  94. [94] Zhang Ziqi, Robinson David, and Tepper Jonathan. 2018. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In European Semantic Web Conference. Springer, 745760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Zhao Han, Lu Zhengdong, and Poupart Pascal. 2015. Self-adaptive hierarchical sentence model. In Twenty-Fourth International Joint Conference on Artificial Intelligence.Google ScholarGoogle Scholar
  96. [96] Zhou Deyu, Zhang Xuan, Zhou Yin, Zhao Quan, and Geng Xin. 2016. Emotion distribution learning from texts. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 638647.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Zhou Peng, Qi Zhenyu, Zheng Suncong, Xu Jiaming, Bao Hongyun, and Xu Bo. 2016. Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639 (2016).Google ScholarGoogle Scholar
  98. [98] Zhu Yukun, Kiros Ryan, Zemel Rich, Salakhutdinov Ruslan, Urtasun Raquel, Torralba Antonio, and Fidler Sanja. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Am I a Resource-Poor Language? Data Sets, Embeddings, Models and Analysis for four different NLP Tasks in Telugu Language

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Asian and Low-Resource Language Information Processing
            ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 1
            January 2023
            340 pages
            ISSN:2375-4699
            EISSN:2375-4702
            DOI:10.1145/3572718
            Issue’s Table of Contents

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 November 2022
            • Online AM: 29 April 2022
            • Accepted: 29 March 2022
            • Revised: 26 March 2022
            • Received: 10 July 2021
            Published in tallip Volume 22, Issue 1

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          View Full Text

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!