Abstract
In computational linguistics, sentiment analysis refers to the classification of opinions in a positive class or a negative class. There exist a lot of different methods for sentiment analysis of the English language, but the literature lacks the availability of methods and techniques for Urdu, which is the largely spoken language in the South Asian sub-continent and the national language of Pakistan. The currently available techniques, such as adjective count method known as Bag of Words (BoW), is not sufficient for classification of complex sentiment written in the Urdu language. Also, the performance of available machine-learning techniques (with legacy features), for classification of Urdu sentiments, are not comparable with the achieved accuracy of other languages. In the case of the English language, the discourse information (sub-sentence-level information) boosts the performance of both the BoW method and machine-learning techniques, but there are very few works available that have tested the context-level information for the sentiment analysis of the Urdu language. This research aims to extract the discourse information from the Urdu sentiments and utilise the discourse information to improve the performance and reduce the error rate of existing techniques for Urdu Sentiment classification. The proposed solution extracts the discourse information, suggests a new set of features for machine-learning techniques, and introduces a set of rules to extend the capabilities of the BoW model. The results show that the task has been enhanced significantly and the performance metrics such as recall, precision, and accuracy are increased by 31.25%, 8.46%, and 21.6%, respectively. In future, the proposed technique can be extended to sentiments with more than two sub-opinions, such as for blogs, reviews, and TV talk shows.
- Z. Afraz, A. Muhammad, and A. Martinez-Enriquez. 2011. Sentiment-Annotated lexicon construction for an Urdu text-based sentiment analyzer. Pakistan J. Sci. 63, 4 (2011), 222--225.Google Scholar
- M. S. Akhtar, A. Kumar, A. Ekbal, and P. Bhattacharyya. 2016. A hybrid deep learning architecture for sentiment analysis. In Proceedings of the 26th International Conference on Computational Linguistics. 482--493.Google Scholar
- P. Arora, A. Bakliwal, and V. Varma. 2012. Hindi subjective lexicon generation using WordNet graph traversal. Int. J. Comput. Linguist. Appl. 3, 25--39.Google Scholar
- N. Asher, F. Benamara, and Y. Y. Mathieu. 2009. Appraisal of opinion expressions in discourse. Lingvist. Invest. 32, 279--292.Google Scholar
Cross Ref
- N. Asher, F. Benamara, Y. Y. Mathieu, et al. 2008. Distilling Opinion in Discourse: A Preliminary Study. Lahore, 7--10.Google Scholar
- N. X. Bach, N. Le Minh, and A. Shimazu. 2014. Exploiting discourse information to identify paraphrases. Expert Syst. Appl. 41, 2832--2841. Google Scholar
Digital Library
- X. Bai. 2011. Predicting consumer sentiments from online text. Decis. Support Syst. 50, 732--742. Google Scholar
Digital Library
- M. Bilal, H. Israr, M. Shahid, and A. Khan. 2016. Sentiment classification of Roman-Urdu opinions using Naive Bayesian, Decision Tree, and KNN classification techniques. J. King Saud Univ.-Comput. Info. Sci. 28, 330--344. Google Scholar
Digital Library
- P. R. Bureau. 2016. World Population Data Sheet with a Special Focus on Human Needs and Sustainable Resources. Population Reference Bureau, Washington, D.C.Google Scholar
- E. Cambria, A. Livingstone, and A. Hussain. 2012. The Hourglass of Emotions. Springer, 144--157. Google Scholar
Digital Library
- D. N. Card and W. W. Agresti. 1988. Measuring software design complexity. J. Syst. Softw. 8, 185--197. Google Scholar
Digital Library
- L. Di Caro and M. Grella. 2013. Sentiment analysis via dependency parsing. Comput. Stand. Interfaces 35, 442--453.Google Scholar
Cross Ref
- D. Chaffey. 2017. Global social media statistics summary 2017. Smart Insights 17 (2017).Google Scholar
- C. C. Chen and Y.-D. Tseng. 2011. Quality evaluation of product reviews using an information quality framework. Decis. Supp. Syst. 50, 755--768. Google Scholar
Digital Library
- J. M. Chenlo, A. Hogenboom, and D. E. Losada. 2014. Rhetorical structure theory for polarity estimation: An experimental study. Data Knowl. Eng. 94, 135--147. Google Scholar
Digital Library
- D. Clarke, P. Lane, and P. Hender. 2011. Developing robust models for favourability analysis. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis. Association for Computational Linguistics, 44--52. Google Scholar
Digital Library
- J. Cohen. 1960. A coefficient of agreement for nominal scales. Educat. Psychol. Measure. 20, 37--46.Google Scholar
Cross Ref
- A. Daud, W. Khan, and D. Che. 2017. Urdu language processing: A survey. Artific. Intell. Rev. 47, 279--311. Google Scholar
Digital Library
- M. Van De Camp and A. Van Den Bosch. 2012. The socialist network. Decis. Supp. Syst. 53, 761--769. Google Scholar
Digital Library
- M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15, 3133--3181. Google Scholar
Digital Library
- M. Friendly. 1994. A Fourfold Display for 2 by 2 by K Tables. Technical Report 217, Psychology Department, York University.Google Scholar
- K. Garg and P. K. Buttar. 2017. Aspect-based sentiment analysis of hindi text review. Int. J. Adv. Res. Comput. Sci. 8.Google Scholar
Cross Ref
- A. Hassan. 2014. Urdu Sentiment Corpus. https://github.com/resonotech/sentimentanalysis.Google Scholar
- M. Hassan and M. Shoaib. 2018. Opinion within opinion: Segmentation approach for Urdu sentiment analysis. Int. Arab J. Info. Technology (IAJIT) 15.Google Scholar
- E. H. Hovy. 2015. What are sentiment, affect, and emotion? Applying the methodology of Michael Zock to sentiment analysis. In Language Production, Cognition, and the Lexicon. Springer, 13--24.Google Scholar
- Y. Hu and W. Li. 2011. Document sentiment classification by exploring description model of topical terms. Comput. Speech Lang. 25, 386--403. Google Scholar
Digital Library
- A. Irvine, J. Weese, and C. Callison-Burch. 2012. Processing informal, romanized Pakistani text messages. In Proceedings of the 2nd Workshop on Language in Social Media. Association for Computational Linguistics, 75--78. Google Scholar
Digital Library
- S. Janitza, C. Strobl, and A.-L. Boulesteix. 2013. An AUC-based permutation variable importance measure for random forests. BMC Bioinform. 14, 119.Google Scholar
Cross Ref
- I. Javed and H. Afzal. 2013. Opinion analysis of Bi-lingual Event Data from Social Networks. Citeseer, 164--172.Google Scholar
- G. H. John, R. Kohavi, and K. Pfleger. 1994. Irrelevant features and the subset selection problem. In Machine Learning Proceedings. Elsevier, 121--129. Google Scholar
Digital Library
- A. Joshi, A. Balamurali, and P. Bhattacharyya. 2010. A fall-back strategy for sentiment analysis in Hindi: A case study. In Proceedings of the 8th International Conference on Natural Language (ICON’10).Google Scholar
- H. Kang, S. J. Yoo, and D. Han. 2012. Senti-lexicon and improved Naive Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst. Appl. 39, 6000--6010. Google Scholar
Digital Library
- M. Kuhn. 2008. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1--26.Google Scholar
Cross Ref
- M. Kuhn. 2012. Variable selection using the caret package. Retrieved from http://cran.cermin.lipi.go.id/web/packages/caret/vignettes/caretSelection.pdf.Google Scholar
- B. Liu. 2010. Sentiment analysis and subjectivity. Handbook Nat. Lang. Process. 2, 627--666.Google Scholar
- M.-T. Martɩn-Valdivia, E. Martɩnez-Cámara, J.-M. Perea-Ortega, and L. A. Ureña-López. 2013. Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst. Appl. 40, 3934--3942. Google Scholar
Digital Library
- T. J. McCabe. 1976. A complexity measure. IEEE Trans. Softw. Eng. SE-2, 4 (1976), 308--320. Google Scholar
Digital Library
- D. Meyer, K. Hornik, and I. Feinerer. 2008. Text mining infrastructure in R. J. Stat. Softw. 25, 1--54.Google Scholar
- F. Milani. 2017. Sentiment and the U.S. business cycle. J. Econ. Dynam. Control 82, 289--311.Google Scholar
Cross Ref
- D. T. Miranda and M. Mascarenhas. 2018. A study of opinion mining in Indian languages. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications. Springer, 71--77.Google Scholar
- N. Mittal, B. Agarwal, G. Chouhan, P. Pareek, and N. Bania. 2013. Discourse-based sentiment analysis for Hindi reviews. In Proceedings of the International Conference on Pattern Recognition and Machine Intelligence. Springer, 720--725.Google Scholar
- S. Mukherjee and P. Bhattacharyya. 2012. Sentiment analysis in Twitter with lightweight discourse analysis. In Proceedings of the International Conference on Computational Linguistics (COLING’12). 1847--1864.Google Scholar
- S. Mukund and R. K. Srihari. 2010. A vector space model for subjectivity classification in Urdu aided by co-training. In Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 860--868. Google Scholar
Digital Library
- S. Mukund and R. K. Srihari. 2012. Analyzing Urdu social media for sentiments using transfer learning with controlled translations. In Proceedings of the 2nd Workshop on Language in Social Media. Association for Computational Linguistics, 1--8. Google Scholar
Digital Library
- A. Ortigosa, J. M. Martɩn, and R. M. Carro. 2014. Sentiment analysis in Facebook and its application to e-learning. Comput. Hum. Behav. 31, 527--541. Google Scholar
Digital Library
- J. Ortigosa-Hernández, J. D. Rodrɩguez, L. Alzate, M. Lucania, I. Inza, and J. A. Lozano. 2012. Approaching Sentiment Analysis by using semi-supervised learning of multi-dimensional classifiers. Neurocomputing 92, 98--115. Google Scholar
Digital Library
- C. D. Paice. 1994. An evaluation method for stemming algorithms. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, New York, 42--50. Google Scholar
Digital Library
- P. Pandey and S. Govilkar. 2015. A framework for sentiment analysis in Hindi using HSWN. Int. J. Comput. Appl. 119, 19 (2015), 23--26.Google Scholar
Cross Ref
- M. Perelló-Sobrepere. 2017. Affective Publics: Sentiment, Technology, and Politics. Taylor 8 Francis.Google Scholar
- J. T. Platts. 2002. A Grammar of the Hindustani or Urdu Language. Sang-e-Meel Publications.Google Scholar
- S. Poria, E. Cambria, G. Winterstein, and G.-B. Huang. 2014. Sentic patterns: Dependency-based rules for concept-level sentiment analysis. Knowl.-Based Syst. 69, 45--63. Google Scholar
Digital Library
- K. Prasad and S. M. Virk. 2012. Computational evidence that Hindi and Urdu share a grammar but not the lexicon. In Proceedings of the 24th International Conference on Computational Linguistics. Citeseer, 1.Google Scholar
- M. Rafique. {n.d.}. Urdu Qawaid-o-insha Pardazi, Vol. 2. Ferozsons.Google Scholar
- T. Rahman. 2015. From Hindi to Urdu: A social and political history. Orientalistische Literaturzeitung 110.Google Scholar
- H. Rui, Y. Liu, and A. Whinston. 2013. Whose and what chatter matters? The effect of tweets on movie sales. Decis. Supp. Syst. 55, 863--870. Google Scholar
Digital Library
- M. Saraee and A. Bagheri. 2013. Feature selection methods in Persian sentiment analysis. In Proceedings of the International Conference on Application of Natural Language to Information Systems. Springer, 303--308.Google Scholar
- A. Selamat and C.-C. Ng. 2011. Arabic script web page language identifications using decision tree neural networks. Pattern Recogn. 44, 133--144. Google Scholar
Digital Library
- A. Selamat, I. M. I. Subroto, and C.-C. Ng. 2009. Arabic script web page language identification using hybrid-KNN method. Int. J. Comput. Intell. Appl. 8, 315--343.Google Scholar
Cross Ref
- M. Shams, A. Shakery, and H. Faili. 2012. A non-parametric LDA-based induction method for sentiment analysis. In Proceedings of the 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP’12). IEEE, 216--221.Google Scholar
- A. Sharada and P. P. Krishna. 2017. Sentiment Mining: An Approach for Hindi Reviews. International Journal of Innovations 8 Advancement in Computer Science (IJIACS). 6, 11 (2017).Google Scholar
- G. F. Simons and C. D. Fennig. 2017. Ethnologue: Languages of the World. SIL, Dallas, Texas.Google Scholar
- M. E. Sobel. 1995. The analysis of contingency tables. In Handbook of Statistical Modeling for the Social and Behavioral Sciences. Springer, 251--310.Google Scholar
- S. Somasundaran, G. Namata, L. Getoor, and J. Wiebe. 2009. Opinion graphs for polarity and discourse classification. In Proceedings of the Workshop on Graph-based Methods for Natural Language Processing. Association for Computational Linguistics, 66--74. Google Scholar
Digital Library
- S. Somasundaran, J. Ruppenhofer, and J. Wiebe. 2008. Discourse-level opinion relations: An annotation study. In Proceedings of the 9th SIGDIAL Workshop on Discourse and Dialogue. Association for Computational Linguistics, 129--137. Google Scholar
Digital Library
- A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2010. Lexicon-based sentiment analysis of Urdu text using SentiUnits. In Advances in Artificial Intelligence. Springer, 32--43. Google Scholar
Digital Library
- A. Z. Syed, M. Aslam, and A. M. Martinez-Enriquez. 2011a. Sentiment analysis of urdu language: Handling phrase-level negation. In Advances in Artificial Intelligence. Springer, 382--393. Google Scholar
Digital Library
- Z. Syed, M. Aslam, and A. Martinez-Enriquez. 2011b. Adjectival Phrases as the Sentiment Carriers in the Urdu Text. J. Amer. Sci. 7, 644--652.Google Scholar
- M. Taboada, K. Voll, and J. Brooke. 2008. Extracting sentiment as a function of discourse structure and topicality. Simon Fraser Univeristy School of Computing Science Technical Report.Google Scholar
- R. Trnavac and M. Taboada. 2012. The contribution of nonveridical rhetorical relations to evaluation in discourse. Lang. Sci. 34, 301--318.Google Scholar
Cross Ref
- G. V. Trunk. 1979. A problem of dimensionality: A simple example. In IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 306--307. Google Scholar
Digital Library
- M. William and S. Thompson. 1988. Rhetorical structure theory: Towards a functional theory of text organization. Text 8, 243--281.Google Scholar
- L. Zhou, B. Li, W. Gao, Z. Wei, and K.-F. Wong. 2011. Unsupervised discovery of discourse relations for eliminating intra-sentence polarity ambiguities. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 162--171. Google Scholar
Digital Library
- C. Zirn, M. Niepert, H. Stuckenschmidt, and M. Strube. 2011. Fine-grained sentiment analysis with structural features. In Proceedings of the International Joint Conference on Natural Language Processsing (IJCNLP’11). 336--344.Google Scholar
Index Terms
Role of Discourse Information in Urdu Sentiment Classification: A Rule-based Method and Machine-learning Technique
Recommendations
An Information-Extraction System for Urdu---A Resource-Poor Language
There has been an increase in the amount of multilingual text on the Internet due to the proliferation of news sources and blogs. The Urdu language, in particular, has experienced explosive growth on the Web. Text mining for information discovery, which ...
Urdu text classification
FIT '09: Proceedings of the 7th International Conference on Frontiers of Information TechnologyThis paper compares statistical techniques for text classification using Naïve Bayes and Support Vector Machines, in context of Urdu language. A large corpus is used for training and testing purpose of the classifiers. However, those classifiers cannot ...
Sentiment analysis of urdu language: handling phrase-level negation
MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part IThe paper investigates and proposes the treatment of the effect of the phrase-level negation on the sentiment analysis of the Urdu text based reviews. The negation acts as the valence shifter and flips or switches the inherent sentiments of the ...






Comments