skip to main content
short-paper

Sentiment Analysis for a Resource Poor Language—Roman Urdu

Published:16 August 2019Publication History
Skip Abstract Section

Abstract

Sentiment analysis is an important sub-task of Natural Language Processing that aims to determine the polarity of a review. Most of the work done on sentiment analysis is for the resource-rich languages of the world, but very limited work has been done on resource-poor languages. In this work, we focus on developing a Sentiment Analysis System for Roman Urdu, which is a resource-poor language. To this end, a dataset of 11,000 reviews has been gathered from six different domains. Comprehensive annotation guidelines were defined and the dataset was annotated using the multi-annotator methodology. Using the annotated dataset, state-of-the-art algorithms were used to build a sentiment analysis system. To improve the results of these algorithms, four different studies were carried out based on: word-level features, character level features, and feature union. The best results showed that we could reduce the error rate by 12% from the baseline (80.07%). Also, to see if the improvements are statistically significant, we applied t-test and Confidence Interval on the obtained results and found that the best results of each study are statistically significant from the baseline.

References

  1. R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS). IEEE, 1--6.Google ScholarGoogle Scholar
  2. B. Anwar. 2009. Urdu-English code switching: The use of Urdu phrases and clauses in Pakistani English (A non-native variety). Int. J. Lang. Stud. 3, 4 (2009), 409--424.Google ScholarGoogle Scholar
  3. A. Pak and P. Paroubek. 2010. Twitter as a corpus for sentiment analysis and opinion mining. In LREc, (Vol. 10, No. 2010), 1320--1326.Google ScholarGoogle Scholar
  4. Gary F. Simons and Charles D. Fennig (Eds.). 2017. Ethnologue: Languages of the World, 20th edition. Dallas, Texas: SIL International. Retrieved from http://www.ethnologue.com.Google ScholarGoogle Scholar
  5. R. Feldman. 2013. Techniques and applications for sentiment analysis. Communications of the ACM 56, 4 (2013), 82--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Pang, L. Lee, and S. Vaithyanathan. 2002. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Vol. 10. 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Medhat, A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5, 4 (2014), 1093--1113.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Abbasi, H. Chen, and A. Salem. 2008. Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Trans. Inf. Syst. 26, 3 (2008), 12:11--12.34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Yang, K. H. Y. Lin, and H. H. Chen. 2007. Emotion classification using web blog corpora. In IEEE/WIC/ACM International Conference on Web Intelligence. IEEE, 275--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Mehmood, D. Essam, and K. Shafi. 2018. Sentiment analysis system for Roman Urdu. In Science and Information Conference. Springer, Cham, 29--42.Google ScholarGoogle Scholar
  11. R. Socher, D. Chen, C. D. Manning, and A. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems. 926--934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Zhang, D. Zeng, J. Li, F. Y. Wang, and W. Zuo. 2009. Sentiment analysis of Chinese documents: From sentence to document level. J. Assoc. Inf. Sci. Tech. 60, 12 (2009), 2474--2487. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Clavel and Z. Callejas. 2016. Sentiment analysis: From opinion mining to human-agent interaction. IEEE Trans. Affective Comput. 7, 1 (2016), 74--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ahmed, S. Hina, and R. Asif. 2018. Detection of sentiment polarity of unstructured multi-language text from social media. Int. J. Adv. Comput. Sci. Appl. 9, 7 (2018), 199--203.Google ScholarGoogle Scholar
  15. M. Daud, R. Khan, and A. Daud. 2015. Roman Urdu opinion mining system (RUOMiS). arXiv preprint arXiv:1501.01386.Google ScholarGoogle Scholar
  16. A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2010. Lexicon based sentiment analysis of Urdu text using SentiUnits. In Mexican International Conference on Artificial Intelligence. Springer, Berlin, 32--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Mukhtar, M. A. Khan, and N. Chiragh. 2017. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cognitive Computation (2017), 1--11.Google ScholarGoogle Scholar
  18. N. Mukhtar and M. A. Khan. 2018. Urdu sentiment analysis using supervised machine learning approach. Int. J. Pattern Recognit. Artif. Intell. (2018), 32.Google ScholarGoogle Scholar
  19. S. Mukund and R. K. Srihari. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the Second Workshop on Language in Social Media. ACL, 1--8 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 1 (1960), 37--46.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. B. Kotsiantis, I. Zaharakis, and P. Pintelas. 2007. Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering 160 (2007), 3--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Hastie, R. Tibshirani, and J. Friedman. 2009. Overview of supervised learning. In The Elements of Statistical Learning. Springer New York, 9--41.Google ScholarGoogle Scholar
  23. S. I. Gallant. 1990. Perceptron-based learning algorithms. IEEE Trans. Neural Networks 1, 2 (1990), 179--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. E. Boser, I. M. Guyon, and V. N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory. ACM, 144--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Zenobi and P. Cunningham. 2001. Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. Machine Learning: ECML 2001, 576--587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Yessenalina, Y. Yue, and C. Cardie. 2010. Multi-level structured models for document-level sentiment classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. ACL, 1046--1056. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. W. Medhat, A. Hassan, and H. Korashy. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 5, 4 (2014), 1093--1113.Google ScholarGoogle ScholarCross RefCross Ref
  28. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, and J. Vanderplas. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, (Oct. 2011), 2825--2830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. P. H. Shahana and B. Omman. 2015. Evaluation of features on sentimental analysis. Procedia Comp. Sci. 46 (2015), 1585--1592.Google ScholarGoogle ScholarCross RefCross Ref
  30. C. W. Hsu and C. J. Lin. 2002. A comparison of methods for multiclass support vector machines. IEEE Trans. Neural Networks 13, 2 (2002), 415--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Cambria, B. Schuller, Y. Xia, and C. Havasi. 2013. New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28, 2 (2013), 15--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Oouchida, J. D. Kim, T. Takagi, and J. I. Tsujii. 2009. GuideLink: A corpus annotation system that integrates the management of annotation guidelines. In Proceedings of 23rd Pacific Asia Conference on Language, Information, and Computation. Vol. 2.Google ScholarGoogle Scholar
  33. Y. Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP'14). Association for Computational Linguistics, 1746--1751.Google ScholarGoogle ScholarCross RefCross Ref
  34. M. Bilal, H. Israr, M. Shahid, and A. Khan. 2016. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree, and KNN classification techniques. J. King Saud Univ. Comp, Inf. Sci. 28, 3 (2016), 330--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Lai, L. Xu, K. Liu, and J. Zhao. 2015. Recurrent convolutional neural networks for text classification. In AAAI, Vol. 333. 2267--2273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede. 2011. Lexicon-based methods for sentiment analysis. Comput. Ling. 37, 2 (2011), 267--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. M. Duwairi, R. Marji, N. Sha'ban, and S. Rushaidat. 2014. Sentiment analysis in Arabic tweets. In 2014 5th International Conference on Information and Communication Systems (ICICS), IEEE. 1--6.Google ScholarGoogle Scholar
  38. D. Alessia, F. Ferri, P. Grifoni, and T. Guzzo. 2015. Approaches, tools, and applications for sentiment analysis implementation. Int. J. Comput. Appl. 125, 3 (2015), 26--33.Google ScholarGoogle Scholar
  39. M. K. Malik. 2017. Urdu named entity recognition and classification system using artificial neural network. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 17, 1 (2017), 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Mohammad. 2016. A practical guide to sentiment annotation: Challenges and solutions. In Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 174--179.Google ScholarGoogle ScholarCross RefCross Ref
  41. Y. Sun, A. K. Wong, and M. S. Kamel. 2009. Classification of imbalanced data: A review. Int. J. Pattern Recognit. Artif. Intell. 23, 4 (2009), 687--719.Google ScholarGoogle ScholarCross RefCross Ref
  42. R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing.Google ScholarGoogle Scholar
  43. Z. Lu, M. Bada, P. V. Ogren, K. B. Cohen, and L. Hunter. 2006. Improving biomedical corpus annotation guidelines. In Proceedings of the Joint BioLink and 9th Bio-ontologies Meeting. 89--92.Google ScholarGoogle Scholar
  44. Z. Sharf and S. U. Rahman. 2018. Performing natural language processing on roman urdu datasets. Int. J. Comput. Sci. Network Secur. 18, 1 (2018), 141--148.Google ScholarGoogle Scholar
  45. Ravi Kumar and Ravi Vadlamani. 2015. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Syst. 89 (2015), 14--46. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sentiment Analysis for a Resource Poor Language—Roman Urdu

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 1
      January 2020
      345 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3338846
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 August 2019
      • Accepted: 1 April 2019
      • Revised: 1 February 2019
      • Received: 1 February 2018
      Published in tallip Volume 19, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!