skip to main content
research-article

Multi-Label Annotation and Classification of Arabic Texts Based on Extracted Seed Keyphrases and Bi-Gram Alphabet Feed Forward Neural Networks Model

Published:25 November 2022Publication History
Skip Abstract Section

Abstract

In natural language processing, text classification is a fundamental problem. Multi-label classification of textual data is a challenging topic in text classification where an instance can be associated with more than one label. This paper presents a multi-label annotation and classification methodology for Arabic text data that is not currently classified as multi-label, aiming to analyze and compare the performance of various multi-label learning approaches. The current work includes two phases: The first involves automatic annotation of hotel reviews with more than one label based on the aspects found in the reviews. In this phase, review data instances were automatically annotated as multi-label based on the extracted seed keyphrases clusters. The second phase involves experiments to compare the performance of various multi-label classification learning methods. In this phase, we introduced different models including a feed-forward networks model that learns a vector representation based on the bi-gram alphabet rather than the commonly used bag-of-words model. The bi-gram alphabet vector representation model has the advantage of having reduced feature dimensions and not requiring natural language processing tools. The results indicated that employing the bi-gram alphabet vector representation feed forward neural network is a competitive solution for the multi-label text classification problem. It has achieved an accuracy of about 75.2%, and standard deviation (0.062).

REFERENCES

  1. Afzaal M., Usman M., Fong A. C. M., Fong S., and Zhuang Y.. 2016. Fuzzy aspect based opinion classification system for mining tourist reviews. Advances in Fuzzy Systems, (2016).Google ScholarGoogle Scholar
  2. Afzaal M., Usman M., Fong A. C., and Fong S.. 2019. Multiaspect-based opinion classification model for tourist reviews. Expert Systems 36, 2 (2019), e12371.Google ScholarGoogle ScholarCross RefCross Ref
  3. D. W. Aha, D. Kibler, and M. K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37--66.Google ScholarGoogle Scholar
  4. Ahmed N. A., Shehab M. A., Al-Ayyoub M., Hmeidi and I.. 2015. Scalable multi-label Arabic text classification. In 2015 6th International Conference on Information and Communication Systems (ICICS). IEEE. 212217.Google ScholarGoogle ScholarCross RefCross Ref
  5. Akhter M. P., Jiangbin Z., Naqvi I. R., Abdelmajeed M., and Fayyaz M.. 2022. Exploring deep learning approaches for Urdu text classification in product manufacturing. Enterprise Information Systems 16, 2 (2022), 223248.Google ScholarGoogle ScholarCross RefCross Ref
  6. Alaei A. R., Becken S., and Stantic B.. 2019. Sentiment analysis in tourism: Capitalizing on big data. Journal of Travel Research 58, 2 (2019), 175191.Google ScholarGoogle ScholarCross RefCross Ref
  7. Aljedani N., Alotaibi R., and Taileb M.. 2020. HMATC: Hierarchical multi-label Arabic text classification model using machine learning. Egyptian Informatics Journal.Google ScholarGoogle Scholar
  8. Al-Moslmi T., Omar N., Abdullah S., and Albared M.. 2017. Approaches to cross-domain sentiment analysis: A systematic literature review. IEEE Access 5, 1617316192.Google ScholarGoogle ScholarCross RefCross Ref
  9. Al-Salemi B., Ayob M., Kendall G., and Noah S. A. M.. 2019. Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms. Information Processing and Management 56, 1 (2019), 212227.Google ScholarGoogle ScholarCross RefCross Ref
  10. Alzubi J. A., Jain R., Nagrath P., Satapathy S., Taneja S., and Gupta P.. 2021. Deep image captioning using an ensemble of CNN and LSTM based deep neural networks. Journal of Intelligent & Fuzzy Systems 40, 4 (2021), 57615769.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Asghar M. Z., Kundi F. M., Ahmad S., Khan A., and Khan F.. 2018. T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Systems 35, 1 (2018), e12233.Google ScholarGoogle ScholarCross RefCross Ref
  12. Bojanowski P., Grave E.A. Joulin, and Mikolov T.. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135146.Google ScholarGoogle ScholarCross RefCross Ref
  13. Boutell M. R., Luo J., Shen X., and Brown C. M.. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 17571771.Google ScholarGoogle ScholarCross RefCross Ref
  14. Chang Y. C., Ku C. H., and Chen C. H.. 2019. Social media analytics: Extracting and visualizing Hilton hotel ratings and reviews from TripAdvisor. International Journal of Information Management 48 (2019), 263279.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Cheng and E. Hüllermeier. 2009. Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 2 (2009), 211--225.Google ScholarGoogle Scholar
  16. Durand T., Mehrasa N., and Mori G.. 2019. Learning a deep ConvNet for multi-label classification with partial labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 647657.Google ScholarGoogle ScholarCross RefCross Ref
  17. Ejaz A., Turabee Z., Rahim M., and Khoja S.. 2017. Opinion mining approaches on Amazon product reviews: A comparative study. In Information and Communication Technologies (ICICT), 2017 International Conference on IEEE. 173179.Google ScholarGoogle Scholar
  18. Elghannam F.. 2016. Improving the performance of adopted approaches for extracting Arabic keyphrases. International Journal of Computer Applications 975 (2016), 8887.Google ScholarGoogle Scholar
  19. F. Elghannam. 2021. Text representation and classification based on bi-gram alphabet. Journal of King Saud University-Computer and Information Sciences 33, 2 (2021), 235--242.Google ScholarGoogle Scholar
  20. Elghannam F.. 2019. Text representation and classification based on bi-gram alphabet. Journal of King Saud University-Computer and Information Sciences.Google ScholarGoogle Scholar
  21. Elnagar A., Al-Debsi R., and Einea O.. 2020. Arabic text classification using deep learning models. Information Processing & Management 57, 1 (2020), 102121.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fang B., Ye Q., Kucukusta D., and Law R.. 2016. Analysis of the perceived value of online tourism reviews: Influence of readability and reviewer characteristics. Tourism Management 52 (2016), 498506.Google ScholarGoogle ScholarCross RefCross Ref
  23. Fleiss J. L.. 1975. Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 651659.Google ScholarGoogle ScholarCross RefCross Ref
  24. Y. Freund and R. E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.Google ScholarGoogle Scholar
  25. D. Greene and P. Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd International Conference on Machine Learning. 377--384.Google ScholarGoogle Scholar
  26. E. Gibaja and S. Ventura. 2015. A tutorial on multilabel learning. ACM Computing Surveys (CSUR), 47, 3 (2015), 1--38.Google ScholarGoogle Scholar
  27. Haykin S.. 2004. Kalman filtering and neural networks. John Wiley & Sons. 47.Google ScholarGoogle Scholar
  28. Hüllermeier E., Fürnkranz J., Cheng W., and Brinker K.. 2008. Label ranking by learning pairwise preferences. Artificial Intelligence 172, 16–17 (2008), 18971916.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jain S.. 2017. Multi label classification | solving multi label classification problems (analyticsvidhya.com), https://www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/Google ScholarGoogle Scholar
  30. Kaggle. 2020. Accessed 1 March 2022, https://www.kaggle.com/ibtesama/getting-started-with-a-movie-recommendation-system.Google ScholarGoogle Scholar
  31. Kalchbrenner N., Grefenstette E., and Blunsom P.. 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.Google ScholarGoogle Scholar
  32. Kang X., Shi X., Wu Y., and Ren F.. 2020. Active learning with complementary sampling for instructing class-biased multi-label text emotion classification. IEEE Transactions on Affective Computing.Google ScholarGoogle Scholar
  33. Kim Y., Jernite Y., Sontag D., and Rush A.. 2016. Character-aware neural language models. In Proceedings of the AAAI Conference on Artificial Intelligence 30, 1.Google ScholarGoogle ScholarCross RefCross Ref
  34. N. F. Kayaalp, G. R. Weckman, W. A. Young II, D. F. Millie, and C. Celikbilek. 2017. Extracting customer opinions associated with an aspect by using a heuristic based sentence segmentation approach. Int. J. Bus. Inf. Syst. 26, 2 (2017), 236--260.Google ScholarGoogle Scholar
  35. Li J., Xu L., Tang L., Wang S., and Li L.. 2018. Big data in tourism research: A literature review. Tourism Management 68 (2018), 301323.Google ScholarGoogle ScholarCross RefCross Ref
  36. Lilleberg J., Zhu Y., and Zhang Y.. 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136140.Google ScholarGoogle ScholarCross RefCross Ref
  37. Ling W., Tsvetkov Y., Amir S., Fermandez R., Dyer C., Black A. W., and Lin C. C.. 2015. Not all contexts are created equal: Better word representations with variable attention. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 13671372.Google ScholarGoogle ScholarCross RefCross Ref
  38. Liu B., Hu M., and Cheng J.. 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web. 342351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Liu S. M., and Chen J. H.. 2015. A multi-label classification based approach for sentiment classification. Expert Systems with Applications 42, 3 (2015), 10831093.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Liu Y., Chen Y., Lusch R., Chen H., Zimbra D., and Zeng S.. 2010. User-generated content on social media: Predicting market success with online word-of-mouth. IEEE Intelligent Systems.Google ScholarGoogle Scholar
  41. Liu H., He J., Wang T., Song W., and Du X.. 2013. Combining user preferences and user opinions for accurate recommendation. Electronic Commerce Research and Applications 12, 1 (2013), 1423.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150.Google ScholarGoogle Scholar
  43. Maas A. L., Hannun A. Y., and Ng A. Y., 2013. Rectifier nonlinearities improve neural network acoustic models. Proc. Icml. 30, 1 (2013), 3.Google ScholarGoogle Scholar
  44. Mikolov T., Chen K., and Corrado G.. 2013a. Efficient estimation of word representations in vector space. CoRR abs/1301.3781. http://arxiv.org/abs/1301.3781.Google ScholarGoogle Scholar
  45. Mikolov T., Karafiát M., Burget L., Černocký J., and Khudanpur S.. 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  46. Mikolov T., Sutskever I., Chen K., Corrado G., and Dean J.. 2013b. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546.Google ScholarGoogle Scholar
  47. Minaee S., Kalchbrenner N., Cambria E., Nikzad N., Chenaghlu M., and Gao J.. 2021. Deep learning-based text classification: A comprehensive review. arXiv e-prints. arXiv preprint arXiv:2004.03705.Google ScholarGoogle Scholar
  48. Movassagh A. A., Alzubi J. A., Gheisari M., Rahimi M., Mohan S., Abbasi A. A., and Nabipour N.. 2021. Artificial neural networks training algorithm integrating invasive weed optimization with differential evolutionary model. Journal of Ambient Intelligence and Humanized Computing. 19.Google ScholarGoogle Scholar
  49. Murat. 2020. Metrics for multilabel classification | mustafa murat ARAT (.github.io). http://mmuratarat.gethup.io.2020-01-25/multi_lable_classification_metrics.Google ScholarGoogle Scholar
  50. Navarro G.. 2001. A guided tour to approximate string matching. ACM Computing Surveys (CSUR) 33, 1 (2001), 3188.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ravenscroft J., Oellrich A., Saha S., and Liakata M.. 2016. Multi-label annotation in scientific articles-the multi-label cancer risk assessment corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 41154123.Google ScholarGoogle Scholar
  52. Read J., Pfahringer B., and Holmes G.. 2008. Multi-label classification using ensembles of pruned sets. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 9951000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Read J., Pfahringer B., Holmes G., and Frank E.. 2011. Classifier chains for multi-label classification. Machine Learning 85, 3 (2011), 333359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. R. E. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 3 (1999), 297--336.Google ScholarGoogle Scholar
  55. Rubinstein R. Y. and Kroese D. P.. 2004. The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Book Manuscript.Google ScholarGoogle Scholar
  56. Omara E., Mosa M., and Ismail N.. 2022. Applying Recurrent Networks for Arabic Sentiment Analysis. Menoufia Journal of Electronic Engineering Research 31, 1 (2022), 2128.Google ScholarGoogle ScholarCross RefCross Ref
  57. Silla C. N. and Freitas A. A.. 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 1 (2011), 3172.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Sodanil M.. 2016. Multi-language sentiment analysis for hotel reviews. In MATEC Web of Conferences. EDP Sciences. 75, 03002.Google ScholarGoogle ScholarCross RefCross Ref
  59. Song K., Feng S., Gao W., Wang D., Chen L., and Zhang C.. 2015. Build emotion lexicon from microblogs by combining effects of seed words and emoticons in a heterogeneous graph. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. 283292.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. G. Tsoumakas, I. Katakis, and I. Vlahavas. 2011. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2011), 1079--1089. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Tsoumakas G. and Vlahavas I.. 2007. Random k-labelsets: An ensemble method for multilabel classification. In European Conference on Machine Learning. Springer, Berlin. 406417.Google ScholarGoogle Scholar
  62. Tsoumakas G., Katakis I., and Vlahavas I.. 2009. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. 667685.Google ScholarGoogle ScholarCross RefCross Ref
  63. Tsoumakas G., Katakis I., and Vlahavas I.. 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 10791089.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Van Rijsbergen C. J.. 2004. The Geometry of Information Retrieval. Cambridge University Press.Google ScholarGoogle ScholarCross RefCross Ref
  65. Vázquez S., Muñoz-García Ó., Campanella I., Poch M., Fisas B., Bel N., and Andreu G.. 2014. A classification of user-generated content into consumer decision journey stages. Neural Networks 58 (2014), 6881.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Yujian L. and Bo L.. 2007. A normalized Levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 10911095.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Zayed R. A., Hady M. F. A., and Hefny H.. 2015. Islamic fatwa request outing via hierarchical multi-label Arabic text categorization. In 2015 First International Conference on Arabic Computational Linguistics (ACLing). IEEE, 145151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. M. L. Zhang and Z. H. Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038--2048.Google ScholarGoogle Scholar
  69. M. L. Zhang and K. Zhang. 2010. Multi-label learning by exploiting label dependency. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 999--1008.Google ScholarGoogle Scholar
  70. Zhang L. and Liu B.. 2014. Aspect and entity extraction for opinion mining. In data mining and knowledge discovery for big data. Berlin, Springer. 140. Google ScholarGoogle ScholarCross RefCross Ref
  71. Zhang M. L. and Zhou Z. H.. 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18, 10 (2006), 13381351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Zhang X., Zhao J., and LeCun Y.. 2015. Character-level convolutional networks for text classification. arXiv preprint arXiv:1509.01626.Google ScholarGoogle Scholar
  73. Zhu S., Ji X., Xu W., and Gong Y.. 2005. Multi-labelled classification using maximum entropy method. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 274281.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multi-Label Annotation and Classification of Arabic Texts Based on Extracted Seed Keyphrases and Bi-Gram Alphabet Feed Forward Neural Networks Model

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Asian and Low-Resource Language Information Processing
              ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 1
              January 2023
              340 pages
              ISSN:2375-4699
              EISSN:2375-4702
              DOI:10.1145/3572718
              Issue’s Table of Contents

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 25 November 2022
              • Online AM: 1 June 2022
              • Accepted: 16 May 2022
              • Revised: 21 March 2022
              • Received: 11 August 2021
              Published in tallip Volume 22, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Refereed
            • Article Metrics

              • Downloads (Last 12 months)182
              • Downloads (Last 6 weeks)9

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            Full Text

            View this article in Full Text.

            View Full Text

            HTML Format

            View this article in HTML Format .

            View HTML Format
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!