skip to main content
research-article

Role of Discourse Information in Urdu Sentiment Classification: A Rule-based Method and Machine-learning Technique

Published:21 May 2019Publication History
Skip Abstract Section

Abstract

In computational linguistics, sentiment analysis refers to the classification of opinions in a positive class or a negative class. There exist a lot of different methods for sentiment analysis of the English language, but the literature lacks the availability of methods and techniques for Urdu, which is the largely spoken language in the South Asian sub-continent and the national language of Pakistan. The currently available techniques, such as adjective count method known as Bag of Words (BoW), is not sufficient for classification of complex sentiment written in the Urdu language. Also, the performance of available machine-learning techniques (with legacy features), for classification of Urdu sentiments, are not comparable with the achieved accuracy of other languages. In the case of the English language, the discourse information (sub-sentence-level information) boosts the performance of both the BoW method and machine-learning techniques, but there are very few works available that have tested the context-level information for the sentiment analysis of the Urdu language. This research aims to extract the discourse information from the Urdu sentiments and utilise the discourse information to improve the performance and reduce the error rate of existing techniques for Urdu Sentiment classification. The proposed solution extracts the discourse information, suggests a new set of features for machine-learning techniques, and introduces a set of rules to extend the capabilities of the BoW model. The results show that the task has been enhanced significantly and the performance metrics such as recall, precision, and accuracy are increased by 31.25%, 8.46%, and 21.6%, respectively. In future, the proposed technique can be extended to sentiments with more than two sub-opinions, such as for blogs, reviews, and TV talk shows.

References

  1. Z. Afraz, A. Muhammad, and A. Martinez-Enriquez. 2011. Sentiment-Annotated lexicon construction for an Urdu text-based sentiment analyzer. Pakistan J. Sci. 63, 4 (2011), 222--225.Google ScholarGoogle Scholar
  2. M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics. 482--493.Google ScholarGoogle Scholar
  3. P. Arora, A. Bakliwal, and V. Varma. 2012. Hindi subjective lexicon generation using WordNet graph traversal. Int. J. Comput. Linguist. Appl. 3, 25--39.Google ScholarGoogle Scholar
  4. N. Asher, F. Benamara, and Y. Y. Mathieu. 2009. Appraisal of opinion expressions in discourse. Lingvist. Invest. 32, 279--292.Google ScholarGoogle ScholarCross RefCross Ref
  5. N. Asher, F. Benamara, Y. Y. Mathieu, et al. 2008. Distilling Opinion in Discourse: A Preliminary Study. Lahore, 7--10.Google ScholarGoogle Scholar
  6. N. X. Bach, N. Le Minh, and A. Shimazu. 2014. Exploiting discourse information to identify paraphrases. Expert Syst. Appl. 41, 2832--2841. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Bai. 2011. Predicting consumer sentiments from online text. Decis. Support Syst. 50, 732--742. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Bilal, H. Israr, M. Shahid, and A. Khan. 2016. Sentiment classification of Roman-Urdu opinions using Naive Bayesian, Decision Tree, and KNN classification techniques. J. King Saud Univ.-Comput. Info. Sci. 28, 330--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. R. Bureau. 2016. World Population Data Sheet with a Special Focus on Human Needs and Sustainable Resources. Population Reference Bureau, Washington, D.C.Google ScholarGoogle Scholar
  10. E. Cambria, A. Livingstone, and A. Hussain. 2012. The Hourglass of Emotions. Springer, 144--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. N. Card and W. W. Agresti. 1988. Measuring software design complexity. J. Syst. Softw. 8, 185--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Di Caro and M. Grella. 2013. Sentiment analysis via dependency parsing. Comput. Stand. Interfaces 35, 442--453.Google ScholarGoogle ScholarCross RefCross Ref
  13. D. Chaffey. 2017. Global social media statistics summary 2017. Smart Insights 17 (2017).Google ScholarGoogle Scholar
  14. C. C. Chen and Y.-D. Tseng. 2011. Quality evaluation of product reviews using an information quality framework. Decis. Supp. Syst. 50, 755--768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. M. Chenlo, A. Hogenboom, and D. E. Losada. 2014. Rhetorical structure theory for polarity estimation: An experimental study. Data Knowl. Eng. 94, 135--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Clarke, P. Lane, and P. Hender. 2011. Developing robust models for favourability analysis. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis. Association for Computational Linguistics, 44--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Cohen. 1960. A coefficient of agreement for nominal scales. Educat. Psychol. Measure. 20, 37--46.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Daud, W. Khan, and D. Che. 2017. Urdu language processing: A survey. Artific. Intell. Rev. 47, 279--311. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Van De Camp and A. Van Den Bosch. 2012. The socialist network. Decis. Supp. Syst. 53, 761--769. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133--3181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Friendly. 1994. A Fourfold Display for 2 by 2 by K Tables. Technical Report 217, Psychology Department, York University.Google ScholarGoogle Scholar
  22. K. Garg and P. K. Buttar. 2017. Aspect-based sentiment analysis of hindi text review. Int. J. Adv. Res. Comput. Sci. 8.Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Hassan. 2014. Urdu Sentiment Corpus. https://github.com/resonotech/sentimentanalysis.Google ScholarGoogle Scholar
  24. M. Hassan and M. Shoaib. 2018. Opinion within opinion: Segmentation approach for Urdu sentiment analysis. Int. Arab J. Info. Technology (IAJIT) 15.Google ScholarGoogle Scholar
  25. E. H. Hovy. 2015. What are sentiment, affect, and emotion? Applying the methodology of Michael Zock to sentiment analysis. In Language Production, Cognition, and the Lexicon. Springer, 13--24.Google ScholarGoogle Scholar
  26. Y. Hu and W. Li. 2011. Document sentiment classification by exploring description model of topical terms. Comput. Speech Lang. 25, 386--403. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Irvine, J. Weese, and C. Callison-Burch. 2012. Processing informal, romanized Pakistani text messages. In Proceedings of the 2nd Workshop on Language in Social Media. Association for Computational Linguistics, 75--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Janitza, C. Strobl, and A.-L. Boulesteix. 2013. An AUC-based permutation variable importance measure for random forests. BMC Bioinform. 14, 119.Google ScholarGoogle ScholarCross RefCross Ref
  29. I. Javed and H. Afzal. 2013. Opinion analysis of Bi-lingual Event Data from Social Networks. Citeseer, 164--172.Google ScholarGoogle Scholar
  30. G. H. John, R. Kohavi, and K. Pfleger. 1994. Irrelevant features and the subset selection problem. In Machine Learning Proceedings. Elsevier, 121--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Joshi, A. Balamurali, and P. Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: A case study. In Proceedings of the 8th International Conference on Natural Language (ICON’10).Google ScholarGoogle Scholar
  32. H. Kang, S. J. Yoo, and D. Han. 2012. Senti-lexicon and improved Naive Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39, 6000--6010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Kuhn. 2008. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  34. M. Kuhn. 2012. Variable selection using the caret package. Retrieved from http://cran.cermin.lipi.go.id/web/packages/caret/vignettes/caretSelection.pdf.Google ScholarGoogle Scholar
  35. B. Liu. 2010. Sentiment analysis and subjectivity. Handbook Nat. Lang. Process. 2, 627--666.Google ScholarGoogle Scholar
  36. M.-T. Martɩn-Valdivia, E. Martɩnez-Cámara, J.-M. Perea-Ortega, and L. A. Ureña-López. 2013. Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst. Appl. 40, 3934--3942. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. T. J. McCabe. 1976. A complexity measure. IEEE Trans. Softw. Eng. SE-2, 4 (1976), 308--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Meyer, K. Hornik, and I. Feinerer. 2008. Text mining infrastructure in R. J. Stat. Softw. 25, 1--54.Google ScholarGoogle Scholar
  39. F. Milani. 2017. Sentiment and the U.S. business cycle. J. Econ. Dynam. Control 82, 289--311.Google ScholarGoogle ScholarCross RefCross Ref
  40. D. T. Miranda and M. Mascarenhas. 2018. A study of opinion mining in Indian languages. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Springer, 71--77.Google ScholarGoogle Scholar
  41. N. Mittal, B. Agarwal, G. Chouhan, P. Pareek, and N. Bania. 2013. Discourse-based sentiment analysis for Hindi reviews. In Proceedings of the International Conference on Pattern Recognition and Machine Intelligence. Springer, 720--725.Google ScholarGoogle Scholar
  42. S. Mukherjee and P. Bhattacharyya. 2012. Sentiment analysis in Twitter with lightweight discourse analysis. In Proceedings of the International Conference on Computational Linguistics (COLING’12). 1847--1864.Google ScholarGoogle Scholar
  43. S. Mukund and R. K. Srihari. 2010. A vector space model for subjectivity classification in Urdu aided by co-training. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 860--868. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. S. Mukund and R. K. Srihari. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the 2nd Workshop on Language in Social Media. Association for Computational Linguistics, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. A. Ortigosa, J. M. Martɩn, and R. M. Carro. 2014. Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 31, 527--541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Ortigosa-Hernández, J. D. Rodrɩguez, L. Alzate, M. Lucania, I. Inza, and J. A. Lozano. 2012. Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers. Neurocomputing 92, 98--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. C. D. Paice. 1994. An evaluation method for stemming algorithms. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York, 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. P. Pandey and S. Govilkar. 2015. A framework for sentiment analysis in Hindi using HSWN. Int. J. Comput. Appl. 119, 19 (2015), 23--26.Google ScholarGoogle ScholarCross RefCross Ref
  49. M. Perelló-Sobrepere. 2017. Affective Publics: Sentiment, Technology, and Politics. Taylor 8 Francis.Google ScholarGoogle Scholar
  50. J. T. Platts. 2002. A Grammar of the Hindustani or Urdu Language. Sang-e-Meel Publications.Google ScholarGoogle Scholar
  51. S. Poria, E. Cambria, G. Winterstein, and G.-B. Huang. 2014. Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowl.-Based Syst. 69, 45--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. K. Prasad and S. M. Virk. 2012. Computational evidence that Hindi and Urdu share a grammar but not the lexicon. In Proceedings of the 24th International Conference on Computational Linguistics. Citeseer, 1.Google ScholarGoogle Scholar
  53. M. Rafique. {n.d.}. Urdu Qawaid-o-insha Pardazi, Vol. 2. Ferozsons.Google ScholarGoogle Scholar
  54. T. Rahman. 2015. From Hindi to Urdu: A social and political history. Orientalistische Literaturzeitung 110.Google ScholarGoogle Scholar
  55. H. Rui, Y. Liu, and A. Whinston. 2013. Whose and what chatter matters? The effect of tweets on movie sales. Decis. Supp. Syst. 55, 863--870. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. Saraee and A. Bagheri. 2013. Feature selection methods in Persian sentiment analysis. In Proceedings of the International Conference on Application of Natural Language to Information Systems. Springer, 303--308.Google ScholarGoogle Scholar
  57. A. Selamat and C.-C. Ng. 2011. Arabic script web page language identifications using decision tree neural networks. Pattern Recogn. 44, 133--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. A. Selamat, I. M. I. Subroto, and C.-C. Ng. 2009. Arabic script web page language identification using hybrid-KNN method. Int. J. Comput. Intell. Appl. 8, 315--343.Google ScholarGoogle ScholarCross RefCross Ref
  59. M. Shams, A. Shakery, and H. Faili. 2012. A non-parametric LDA-based induction method for sentiment analysis. In Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, 216--221.Google ScholarGoogle Scholar
  60. A. Sharada and P. P. Krishna. 2017. Sentiment Mining: An Approach for Hindi Reviews. International Journal of Innovations 8 Advancement in Computer Science (IJIACS). 6, 11 (2017).Google ScholarGoogle Scholar
  61. G. F. Simons and C. D. Fennig. 2017. Ethnologue: Languages of the World. SIL, Dallas, Texas.Google ScholarGoogle Scholar
  62. M. E. Sobel. 1995. The analysis of contingency tables. In Handbook of Statistical Modeling for the Social and Behavioral Sciences. Springer, 251--310.Google ScholarGoogle Scholar
  63. S. Somasundaran, G. Namata, L. Getoor, and J. Wiebe. 2009. Opinion graphs for polarity and discourse classification. In Proceedings of the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, 66--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. S. Somasundaran, J. Ruppenhofer, and J. Wiebe. 2008. Discourse-level opinion relations: An annotation study. In Proceedings of the 9th SIGDIAL Workshop on Discourse and Dialogue. Association for Computational Linguistics, 129--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2010. Lexicon-based sentiment analysis of Urdu text using SentiUnits. In Advances in Artificial Intelligence. Springer, 32--43. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2011a. Sentiment analysis of urdu language: Handling phrase-level negation. In Advances in Artificial Intelligence. Springer, 382--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Z. Syed, M. Aslam, and A. Martinez-Enriquez. 2011b. Adjectival Phrases as the Sentiment Carriers in the Urdu Text. J. Amer. Sci. 7, 644--652.Google ScholarGoogle Scholar
  68. M. Taboada, K. Voll, and J. Brooke. 2008. Extracting sentiment as a function of discourse structure and topicality. Simon Fraser Univeristy School of Computing Science Technical Report.Google ScholarGoogle Scholar
  69. R. Trnavac and M. Taboada. 2012. The contribution of nonveridical rhetorical relations to evaluation in discourse. Lang. Sci. 34, 301--318.Google ScholarGoogle ScholarCross RefCross Ref
  70. G. V. Trunk. 1979. A problem of dimensionality: A simple example. In IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 306--307. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. M. William and S. Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text 8, 243--281.Google ScholarGoogle Scholar
  72. L. Zhou, B. Li, W. Gao, Z. Wei, and K.-F. Wong. 2011. Unsupervised discovery of discourse relations for eliminating intra-sentence polarity ambiguities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 162--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. C. Zirn, M. Niepert, H. Stuckenschmidt, and M. Strube. 2011. Fine-grained sentiment analysis with structural features. In Proceedings of the International Joint Conference on Natural Language Processsing (IJCNLP’11). 336--344.Google ScholarGoogle Scholar

Index Terms

  1. Role of Discourse Information in Urdu Sentiment Classification: A Rule-based Method and Machine-learning Technique

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!