Abstract
Sentiment analysis is an important sub-task of Natural Language Processing that aims to determine the polarity of a review. Most of the work done on sentiment analysis is for the resource-rich languages of the world, but very limited work has been done on resource-poor languages. In this work, we focus on developing a Sentiment Analysis System for Roman Urdu, which is a resource-poor language. To this end, a dataset of 11,000 reviews has been gathered from six different domains. Comprehensive annotation guidelines were defined and the dataset was annotated using the multi-annotator methodology. Using the annotated dataset, state-of-the-art algorithms were used to build a sentiment analysis system. To improve the results of these algorithms, four different studies were carried out based on: word-level features, character level features, and feature union. The best results showed that we could reduce the error rate by 12% from the baseline (80.07%). Also, to see if the improvements are statistically significant, we applied t-test and Confidence Interval on the obtained results and found that the best results of each study are statistically significant from the baseline.
- R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS). IEEE, 1--6.Google Scholar
- B. Anwar. 2009. Urdu-English code switching: The use of Urdu phrases and clauses in Pakistani English (A non-native variety). Int. J. Lang. Stud. 3, 4 (2009), 409--424.Google Scholar
- A. Pak and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In LREc, (Vol. 10, No. 2010), 1320--1326.Google Scholar
- Gary F. Simons and Charles D. Fennig (Eds.). 2017. Ethnologue: Languages of the World, 20th edition. Dallas, Texas: SIL International. Retrieved from http://www.ethnologue.com.Google Scholar
- R. Feldman. 2013. Techniques and applications for sentiment analysis. Communications of the ACM 56, 4 (2013), 82--89. Google Scholar
Digital Library
- B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Vol. 10. 79--86. Google Scholar
Digital Library
- W. Medhat, A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5, 4 (2014), 1093--1113.Google Scholar
Cross Ref
- A. Abbasi, H. Chen, and A. Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12:11--12.34. Google Scholar
Digital Library
- C. Yang, K. H. Y. Lin, and H. H. Chen. 2007. Emotion classification using web blog corpora. In IEEE/WIC/ACM International Conference on Web Intelligence. IEEE, 275--278. Google Scholar
Digital Library
- K. Mehmood, D. Essam, and K. Shafi. 2018. Sentiment analysis system for Roman Urdu. In Science and Information Conference. Springer, Cham, 29--42.Google Scholar
- R. Socher, D. Chen, C. D. Manning, and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems. 926--934. Google Scholar
Digital Library
- C. Zhang, D. Zeng, J. Li, F. Y. Wang, and W. Zuo. 2009. Sentiment analysis of Chinese documents: From sentence to document level. J. Assoc. Inf. Sci. Tech. 60, 12 (2009), 2474--2487. Google Scholar
Digital Library
- C. Clavel and Z. Callejas. 2016. Sentiment analysis: From opinion mining to human-agent interaction. IEEE Trans. Affective Comput. 7, 1 (2016), 74--93. Google Scholar
Digital Library
- S. Ahmed, S. Hina, and R. Asif. 2018. Detection of sentiment polarity of unstructured multi-language text from social media. Int. J. Adv. Comput. Sci. Appl. 9, 7 (2018), 199--203.Google Scholar
- M. Daud, R. Khan, and A. Daud. 2015. Roman Urdu opinion mining system (RUOMiS). arXiv preprint arXiv:1501.01386.Google Scholar
- A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2010. Lexicon based sentiment analysis of Urdu text using SentiUnits. In Mexican International Conference on Artificial Intelligence. Springer, Berlin, 32--43. Google Scholar
Digital Library
- N. Mukhtar, M. A. Khan, and N. Chiragh. 2017. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation (2017), 1--11.Google Scholar
- N. Mukhtar and M. A. Khan. 2018. Urdu sentiment analysis using supervised machine learning approach. Int. J. Pattern Recognit. Artif. Intell. (2018), 32.Google Scholar
- S. Mukund and R. K. Srihari. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the Second Workshop on Language in Social Media. ACL, 1--8 Google Scholar
Digital Library
- J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37--46.Google Scholar
Cross Ref
- S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160 (2007), 3--24. Google Scholar
Digital Library
- T. Hastie, R. Tibshirani, and J. Friedman. 2009. Overview of supervised learning. In The Elements of Statistical Learning. Springer New York, 9--41.Google Scholar
- S. I. Gallant. 1990. Perceptron-based learning algorithms. IEEE Trans. Neural Networks 1, 2 (1990), 179--191. Google Scholar
Digital Library
- B. E. Boser, I. M. Guyon, and V. N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory. ACM, 144--152. Google Scholar
Digital Library
- G. Zenobi and P. Cunningham. 2001. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. Machine Learning: ECML 2001, 576--587. Google Scholar
Digital Library
- A. Yessenalina, Y. Yue, and C. Cardie. 2010. Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL, 1046--1056. Google Scholar
Digital Library
- W. Medhat, A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5, 4 (2014), 1093--1113.Google Scholar
Cross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, and J. Vanderplas. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, (Oct. 2011), 2825--2830. Google Scholar
Digital Library
- P. H. Shahana and B. Omman. 2015. Evaluation of features on sentimental analysis. Procedia Comp. Sci. 46 (2015), 1585--1592.Google Scholar
Cross Ref
- C. W. Hsu and C. J. Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks 13, 2 (2002), 415--425. Google Scholar
Digital Library
- E. Cambria, B. Schuller, Y. Xia, and C. Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 2 (2013), 15--21. Google Scholar
Digital Library
- K. Oouchida, J. D. Kim, T. Takagi, and J. I. Tsujii. 2009. GuideLink: A corpus annotation system that integrates the management of annotation guidelines. In Proceedings of 23rd Pacific Asia Conference on Language, Information, and Computation. Vol. 2.Google Scholar
- Y. Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Association for Computational Linguistics, 1746--1751.Google Scholar
Cross Ref
- M. Bilal, H. Israr, M. Shahid, and A. Khan. 2016. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree, and KNN classification techniques. J. King Saud Univ. Comp, Inf. Sci. 28, 3 (2016), 330--344. Google Scholar
Digital Library
- S. Lai, L. Xu, K. Liu, and J. Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI, Vol. 333. 2267--2273. Google Scholar
Digital Library
- M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for sentiment analysis. Comput. Ling. 37, 2 (2011), 267--307. Google Scholar
Digital Library
- R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS), IEEE. 1--6.Google Scholar
- D. Alessia, F. Ferri, P. Grifoni, and T. Guzzo. 2015. Approaches, tools, and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125, 3 (2015), 26--33.Google Scholar
- M. K. Malik. 2017. Urdu named entity recognition and classification system using artificial neural network. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 1 (2017), 2. Google Scholar
Digital Library
- S. Mohammad. 2016. A practical guide to sentiment annotation: Challenges and solutions. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 174--179.Google Scholar
Cross Ref
- Y. Sun, A. K. Wong, and M. S. Kamel. 2009. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 23, 4 (2009), 687--719.Google Scholar
Cross Ref
- R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing.Google Scholar
- Z. Lu, M. Bada, P. V. Ogren, K. B. Cohen, and L. Hunter. 2006. Improving biomedical corpus annotation guidelines. In Proceedings of the Joint BioLink and 9th Bio-ontologies Meeting. 89--92.Google Scholar
- Z. Sharf and S. U. Rahman. 2018. Performing natural language processing on roman urdu datasets. Int. J. Comput. Sci. Network Secur. 18, 1 (2018), 141--148.Google Scholar
- Ravi Kumar and Ravi Vadlamani. 2015. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Syst. 89 (2015), 14--46. Google Scholar
Digital Library
Index Terms
Sentiment Analysis for a Resource Poor Language—Roman Urdu
Recommendations
Emotion detection in Roman Urdu text using machine learning
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software EngineeringEmotion detection is playing a very important role in our life. People express their emotions in different ways i.e face expression, gestures, speech, and text. This research focuses on detecting emotions from the Roman Urdu text. Previously, A lot of ...
An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu
During the last two decades, sentiment analysis, also known as opinion mining, has become one of the most explored research areas in Natural Language Processing (NLP) and data mining. Sentiment analysis focuses on the sentiments or opinions of consumers ...
Roman Urdu toxic comment classification
AbstractWith the increasing popularity of user-generated content on social media, the number of toxic texts is also on the rise. Such texts cause adverse effects on users and society at large, therefore, the identification of toxic comments is a growing ...






Comments