skip to main content
research-article

An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu

Authors Info & Claims
Published:03 November 2021Publication History
Skip Abstract Section

Abstract

During the last two decades, sentiment analysis, also known as opinion mining, has become one of the most explored research areas in Natural Language Processing (NLP) and data mining. Sentiment analysis focuses on the sentiments or opinions of consumers expressed over social media or different web sites. Due to exposure on the Internet, sentiment analysis has attracted vast numbers of researchers over the globe. A large amount of research has been conducted in English, Chinese, and other languages used worldwide. However, Roman Urdu has been neglected despite being the third most used language for communication in the world, covering millions of users around the globe. Although some techniques have been proposed for sentiment analysis in Roman Urdu, these techniques are limited to a specific domain or developed incorrectly due to the unavailability of language resources available for Roman Urdu. Therefore, in this article, we are proposing an unsupervised approach for sentiment analysis in Roman Urdu. First, the proposed model normalizes the text to overcome spelling variations of different words. After normalizing text, we have used Roman Urdu and English opinion lexicons to correctly identify users’ opinions from the text. We have also incorporated negation terms and stemming to assign polarities to each extracted opinion. Furthermore, our model assigns a score to each sentence on the basis of the polarities of extracted opinions and classifies each sentence as positive, negative, or neutral. In order to verify our approach, we have conducted experiments on two publicly available datasets for Roman Urdu and compared our approach with the existing model. Results have demonstrated that our approach outperforms existing models for sentiment analysis tasks in Roman Urdu. Furthermore, our approach does not suffer from domain dependency.

REFERENCES

  1. [1] Akhter Muhammad Pervez, Jiangbin Zheng, Naqvi Irfan Raza, Abdelmajeed Mohammed, and Tariq Sadiq Muhammad. 2020. Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access 8, (2020), 9121391226.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Abbas Raza Ali and Ijaz Maliha. 2009. Urdu text classification. In Proceedings of the 7th International Conference on Frontiers of Information Technology, Abbottabad, Pakistan. 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ali Mubashir, Khalid Shehzad, and Aslam Muhammad Haseeb. 2017. Pattern based comprehensive Urdu stemmer and short text classification. IEEE Access 6, (2017), 73747389.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Amin Ahmad, Rana Toqir A., Mian Natash Ali, Iqbal Muhammad Waseem, Khalid Abbas, Alyas Tahir, and Tubishat Mohammad. 2020. TOP-Rank: A novel unsupervised approach for topic prediction using keyphrase extraction for Urdu documents. IEEE Access 8, (2020), 212675212686.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Asghar Muhammad Zubair, Sattar Anum, Khan Aurangzeb, Ali Amjad, Kundi Fazal Masud, and Ahmad Shakeel. 2019. Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language. Expert Systems 36, 3 (2019), e12397.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Awais Muhammad and Shoaib. Muhammad 2019. Role of discourse information in Urdu sentiment classification: A rule-based method and machine-learning technique. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 4 (2019), 34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Al-Ayyoub Mahmoud, Khamaiseh Abed Allah, Jararweh Yaser, and Al-Kabi. Mohammed N 2019. A comprehensive survey of Arabic sentiment analysis. Information Processing & Management 56, 2 (2019), 320342.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Bilal Muhammad, Israr Huma, Shahid Muhammad, and Khan Amin. 2016. Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, decision tree and KNN classification techniques. Journal of King Saud University-Computer and Information Sciences 28, 3 (2016), 330344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dargan Shaveta, Kumar Munish, Garg Anupam, and Thakur Kutub. 2020. Writer identification system for pre-segmented offline handwritten Devanagari characters using k-NN and SVM. Soft Computing 24 (2020), 1011–10122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Daud Misbah, Khan Rafiullah, Daud Aitazaz, and others. 2014. Roman Urdu opinion mining system (RUOMiS). CSEIJ 4, 6 (2014), 19.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Ghulam Hussain, Zeng Feng, Li Wenjia, and Xiao Yutong. 2019. Deep learning-based sentiment analysis for Roman Urdu text. Procedia Computer Science 147, (2019), 131135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gupta Surbhi and Kumar Munish. 2020. Forensic document examination system using boosting and bagging methodologies. Soft Computing 24, 7 (2020), 54095426.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Hassan Muhammad and Shoaib Muhammad. 2018. Opinion within opinion: Segmentation approach for Urdu sentiment analysis. International Arab Journal of Information Technology 15, 1 (2018), 2128.Google ScholarGoogle Scholar
  14. [14] Irvine Ann, Weese Jonathan, and Callison-Burch Chris. 2012. Processing informal, romanized Pakistani text messages. In Proceedings of the 2nd Workshop on Language in Social Media, Association for Computational Linguistics, Montréal, Canada. 7578. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Jabbar Abdul, Iqbal Sajid, and Khan Muhammad Usman Ghani. 2016. Analysis and development of resources for Urdu text stemming. In Proceedings of the 6th International Conference on Language and Technology, Lahore, Pakistan. 1–7.Google ScholarGoogle Scholar
  16. [16] Javed Iqra and Afzal Hammad. 2013. Opinion analysis of Bi-lingual event data from social networks. In ESSEM@ AI* IA, Citeseer, 164172.Google ScholarGoogle Scholar
  17. [17] Kaur Rupinder Pal, Jindal M. K., and Kumar Munish. 2021. Text and graphics segmentation of newspapers printed in Gurmukhi script: A hybrid approach. The Visual Computer 37 (2021), 1637–1659.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Khan Abdul Rafae, Karim Asim, Sajjad Hassan, Kamiran Faisal, and Xu Jia. 2020. A clustering framework for lexical normalization of roman urdu. Natural Language Engineering (2020), 131.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Khan Khairullah, Khan Wahab, Rahman Atta Ur, Khan Aurangzeb, Khan Asfandyar, Khan Ashraf Ullah, and Saqia Bibi. 2018. Urdu sentiment analysis. International Journal of Advanced Computer Science and Applications 9, 9 (2018), 646–651.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Khan Khairullah, Ullah Ashraf, and Baharudin Baharum. 2016. Pattern and semantic analysis to improve unsupervised techniques for opinion target identification. Kuwait Journal of Science 43, 1 (2016), 129–149.Google ScholarGoogle Scholar
  21. [21] Khan Moin and Malik Kamran. 2018. Sentiment classification of customer's reviews about automobiles in Roman Urdu. In Future of Information and Communication Conference, Singapore, Springer, 630640.Google ScholarGoogle Scholar
  22. [22] Korayem Mohammed, Aljadda Khalifeh, and Crandall David. 2016. Sentiment/subjectivity analysis survey for languages other than English. Social Network Analysis and Mining 6, 1 (2016), 1–17.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Kumar Akshi, Srinivasan Kathiravan, Cheng Wen-Huang, and Zomaya Albert Y.. 2020. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management 57, 1 (2020), 102141.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Kumar Munish, Jindal Manish Kumar, Sharma Rajendra Kumar, and Jindal Simpel Rani. 2019. Character and numeral recognition for non-Indic and Indic scripts: A survey. Artificial Intelligence Review 52, 4 (2019), 22352261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Kumar Munish, Jindal Manish Kumar, Sharma Rajendra Kumar, and Jindal Simpel Rani. 2020. Performance evaluation of classifiers for the recognition of offline handwritten Gurmukhi characters and numerals: A study. Artificial Intelligence Review 53, 3 (2020), 20752097.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Kumar Munish, Jindal Manish Kumar, Sharma Rajendra Kumar, and Jindal Simpel Rani. 2018. Performance comparison of several feature selection techniques for offline handwritten character recognition. In 2018 International Conference on Research in Intelligent and Computing in Engineering (RICE), San Salvador, El Salvador, IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Kumar Munish, Jindal Simpel Rani, Jindal Manish Kumar, and Singh Lehal Gurpreet. 2019. Improved recognition results of medieval handwritten Gurmukhi manuscripts using boosting and bagging methodologies. Neural Processing Letters 50, 1 (2019), 4356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Kumar Munish and Rani Jindal Simpel. 2020. A study on recognition of pre-segmented handwritten multi-lingual characters. Archives of Computational Methods in Engineering 27, 2 (2020), 577589.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Mahmood Zainab, Safder Iqra, Nawab Rao Muhammad Adeel, Bukhari Faisal, Nawaz Raheel, Alfakeeh Ahmed S., Aljohani Naif Radi, and Hassan. Saeed-Ul 2020. Deep sentiments in Roman Urdu text using recurrent convolutional neural network model. Information Processing & Management 57, 4 (2020), 102233.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Medhat Walaa, Hassan Ahmed, and Korashy Hoda. 2014. Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal 5, 4 (2014), 10931113.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Mehmood Khawar, Essam Daryl, Shafi Kamran, and Malik Muhammad Kamran. 2019. Sentiment analysis for a resource poor language—Roman Urdu. ACM Transactions on Asian and Low-Resource Language Information Processing 19, 1 (2019), 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Meškel Donatas and Frasincar Flavius. 2020. ALDONAr: A hybrid solution for sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention model. Information Processing & Management 57, 3 (2020), 102211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Mukhtar Neelam, Khan Mohammad Abid, Chiragh Nadia, and Nazir Shah. 2018. Identification and handling of intensifiers for enhancing accuracy of Urdu sentiment analysis. Expert Systems 35, 6 (2018), e12317.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Mukhtar Neelam, Khan Mohammad Abid, and Chiragh Nadia. 2018. Lexicon-based approach outperforms supervised machine learning approach for Urdu sentiment analysis in multiple domains. Telematics and Informatics 35, 8 (2018), 21732183.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Mukhtar Neelam and Abid Khan Mohammad. 2018. Urdu sentiment analysis using supervised machine learning approach. International Journal of Pattern Recognition and Artificial Intelligence 32, 02 (2018), 1851001.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Mukund Smruthi and Srihari Rohini K.. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the 2nd Workshop on Language in Social Media, Montréal, Canada, ACL. 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Mukund Smruthi, Srihari Rohini, and Peterson Erik. 2010. An information-extraction system for Urdu—a resource-poor language. ACM Transactions on Asian Language Information Processing 9, 4 (2010), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Narang Sonika, Jindal M. K., and Kumar Munish. 2019. Devanagari ancient documents recognition using statistical feature extraction techniques. Sādhanā 44, 6 (2019), 18.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Narang Sonika Rani, Jindal Manish Kumar, and Kumar Munish. 2019. Devanagari ancient character recognition using DCT features with adaptive boosting and bootstrap aggregating. Soft Computing 23, 24 (2019), 1360313614.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Narang Sonika Rani, Jindal M. K., Ahuja Shruti, and Kumar Munish. 2020. On the recognition of Devanagari ancient handwritten characters using SIFT and Gabor features. Soft Computing 24, 22 (2020), 1727917289.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Nargis Gule Zulf and Jamil Noreen. 2016. Generating an emotion ontology for Roman Urdu text. International Journal of Computational Linguistics Research 7, (2016), 83–91.Google ScholarGoogle Scholar
  42. [42] Noor Faiza, Bakhtyar Maheen, and Baber Junaid. 2019. Sentiment analysis in E-commerce using SVM on Roman Urdu text. In International Conference for Emerging Technologies in Computing, London, UK. Springer, 213222.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Peng Haiyun, Cambria Erik, and Hussain Amir. 2017. A review of sentiment analysis research in Chinese language. Cognitive Computation 9, 4 (2017), 423435.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Pergola Gabriele, Gui Lin, and He Yulan. 2019. TDAM: A topic-dependent attention model for sentiment analysis. Information Processing & Management 56, 6 (2019), 102084.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Rana Toqir A., Bakht Bahrooz, Afzal Mehtab, Mian Natash Ali, Iqbal Muhammad Waseem, Khalid Abbas, and Naqvi Muhammad Raza. 2021. Extraction of opinion target using syntactic rules in Urdu text. Intelligent Automation & Soft Computing 29, 3 (2021), 839853.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Rana Toqir A., Cheah Yu-N., and Letchmunan Sukumar. 2016. Topic modeling in sentiment analysis: A systematic review. Journal of ICT Research and Applications 10, 1 (2016), 7693.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Rana Toqir A., Cheah Yu-N., and Rana Tauseef. 2020. Multi-level knowledge-based approach for implicit aspect identification. Applied Intelligence 50, 12 (2020), 46164630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Rana Toqir A. and Cheah Yu-N.. 2016. Aspect extraction in sentiment analysis: Comparative analysis and survey. Artificial Intelligence Review 46, 4 (2016), 459483. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Rana Toqir A. and Cheah Yu-N.. 2016. Exploiting sequential patterns to detect objective aspects from online reviews. In 2016 International Conference on Advanced Informatics: Concepts, Theory and Application (ICAICTA’16), Penang Malaysia. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Rana Toqir A. and Cheah Yu-N.. 2017. A two-fold rule-based model for aspect extraction. Expert Systems with Applications 89, (2017), 273285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Rana Toqir A. and Cheah Yu-N.. 2017. Improving aspect extraction using aspect frequency and semantic similarity-based approach for aspect-based sentiment analysis. In International Conference on Computing and Information Technology, Bangkok, Thailand, Springer, 317326.Google ScholarGoogle Scholar
  52. [52] Rana Toqir A. and Cheah Yu-N.. 2018. Sequential patterns-based rules for aspect-based sentiment analysis. Advanced Science Letters 24, 2 (2018), 13701374.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Rana Toqir A. and Cheah Yu-N.. 2019. Sequential patterns rule-based approach for opinion target extraction from customer reviews. Journal of Information Science 45, 5 (2019), 643655.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Rana Toqir Ahmad and Cheah Yu-N. 2015. Hybrid rule-based approach for aspect extraction and categorization from customer reviews. In 9th International Conference on IT in Asia (CITA’15), Sarawak, Malaysia. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Ravi Kumar and Ravi Vadlamani. 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems 89, (2015), 1446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Ul Rehman Zia and Sarwar Bajwa Imran. 2016. Lexicon-based sentiment analysis for Urdu language. In 6th International Conference on Innovative Computing Technology (INTECH’16), Dublin, Ireland. IEEE, 497501.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Rezaeinia Seyed Mahdi, Rahmani Rouhollah, Ghodsi Ali, and Veisi Hadi. 2019. Sentiment analysis based on improved pre-trained word embeddings. Expert Systems with Applications 117, (2019), 139147.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Schouten Kim and Frasincar Flavius. 2015. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering 28, 3 (2015), 813830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Sharf Zareen and Ur Rahman Saif. 2017. Lexical normalization of Roman Urdu text. International Journal of Computer Science and Network Security 17, 12 (2017), 213221.Google ScholarGoogle Scholar
  60. [60] Snae Chakkrit. 2007. A comparison and analysis of name matching algorithms. International Journal of Applied Science, Engineering and Technology 4, 1 (2007), 252257.Google ScholarGoogle Scholar
  61. [61] Sohail Omayya, Elahi Inam, Ijaz Ahsan, Karim Asim, and Kamiran Faisal. 2018. Text classification in an under-resourced language via lexical normalization and feature pooling. In PACIS, Yokohama, Japan. 96.Google ScholarGoogle Scholar
  62. [62] Song Chao, Wang Xiao-Kang, Cheng Peng-fei, Wang Jian-qiang, and Li Lin. 2020. SACPC: A framework based on probabilistic linguistic terms for short text sentiment analysis. Knowledge-Based Systems 194, (2020), 105572.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Syed Afraz Z., Aslam Muhammad, and Martinez-Enriquez Ana Maria. 2010. Lexicon based sentiment analysis of Urdu text using SentiUnits. In Mexican International Conference on Artificial Intelligence, Pachuca, Mexico. Springer, 3243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Syed Afraz Zahra, Aslam Muhammad, and Martinez-Enriquez Ana Maria. 2011. Sentiment analysis of Urdu language: Handling phrase-level negation. In Mexican International Conference on Artificial Intelligence, Puebla, Mexico. Springer, 382393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Vashishtha Srishti and Susan Seba. 2019. Fuzzy rule based unsupervised sentiment analysis from social media posts. Expert Systems with Applications 138, (2019), 112834.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Xu Guixian, Meng Yueting, Qiu Xiaoyu, Yu Ziheng, and Wu Xu. 2019. Sentiment analysis of comment texts based on BiLSTM. IEEE Access 7, (2019), 5152251532.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Yang Chao, Zhang Hefeng, Jiang Bin, and Li Keqin. 2019. Aspect-based sentiment analysis with alternating coattention networks. Information Processing & Management 56, 3 (2019), 463478.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Zhang Lei, Wang Shuai, and Liu Bing. 2018. Deep learning for sentiment analysis: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1253.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Unsupervised Approach for Sentiment Analysis on Social Media Short Text Classification in Roman Urdu

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 2
      March 2022
      413 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494070
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2021
      • Accepted: 1 July 2021
      • Revised: 1 May 2021
      • Received: 1 January 2021
      Published in tallip Volume 21, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!