Abstract
In natural language processing, text classification is a fundamental problem. Multi-label classification of textual data is a challenging topic in text classification where an instance can be associated with more than one label. This paper presents a multi-label annotation and classification methodology for Arabic text data that is not currently classified as multi-label, aiming to analyze and compare the performance of various multi-label learning approaches. The current work includes two phases: The first involves automatic annotation of hotel reviews with more than one label based on the aspects found in the reviews. In this phase, review data instances were automatically annotated as multi-label based on the extracted seed keyphrases clusters. The second phase involves experiments to compare the performance of various multi-label classification learning methods. In this phase, we introduced different models including a feed-forward networks model that learns a vector representation based on the bi-gram alphabet rather than the commonly used bag-of-words model. The bi-gram alphabet vector representation model has the advantage of having reduced feature dimensions and not requiring natural language processing tools. The results indicated that employing the bi-gram alphabet vector representation feed forward neural network is a competitive solution for the multi-label text classification problem. It has achieved an accuracy of about 75.2%, and standard deviation (0.062).
- . 2016. Fuzzy aspect based opinion classification system for mining tourist reviews. Advances in Fuzzy Systems, (2016).Google Scholar
- . 2019. Multiaspect-based opinion classification model for tourist reviews. Expert Systems 36, 2 (2019), e12371.Google Scholar
Cross Ref
- D. W. Aha, D. Kibler, and M. K. Albert. 1991. Instance-based learning algorithms. Machine Learning 6, 1 (1991), 37--66.Google Scholar
- . 2015. Scalable multi-label Arabic text classification. In 2015 6th International Conference on Information and Communication Systems (ICICS). IEEE. 212–217.Google Scholar
Cross Ref
- . 2022. Exploring deep learning approaches for Urdu text classification in product manufacturing. Enterprise Information Systems 16, 2 (2022), 223–248.Google Scholar
Cross Ref
- . 2019. Sentiment analysis in tourism: Capitalizing on big data. Journal of Travel Research 58, 2 (2019), 175–191.Google Scholar
Cross Ref
- . 2020. HMATC: Hierarchical multi-label Arabic text classification model using machine learning. Egyptian Informatics Journal.Google Scholar
- . 2017. Approaches to cross-domain sentiment analysis: A systematic literature review. IEEE Access 5, 16173–16192.Google Scholar
Cross Ref
- . 2019. Multi-label Arabic text categorization: A benchmark and baseline comparison of multi-label learning algorithms. Information Processing and Management 56, 1 (2019), 212–227.Google Scholar
Cross Ref
- . 2021. Deep image captioning using an ensemble of CNN and LSTM based deep neural networks. Journal of Intelligent & Fuzzy Systems 40, 4 (2021), 5761–5769.Google Scholar
Digital Library
- . 2018. T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme. Expert Systems 35, 1 (2018), e12233.Google Scholar
Cross Ref
- . 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135–146.Google Scholar
Cross Ref
- . 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.Google Scholar
Cross Ref
- . 2019. Social media analytics: Extracting and visualizing Hilton hotel ratings and reviews from TripAdvisor. International Journal of Information Management 48 (2019), 263–279.Google Scholar
Digital Library
- W. Cheng and E. Hüllermeier. 2009. Combining instance-based learning and logistic regression for multilabel classification. Machine Learning 76, 2 (2009), 211--225.Google Scholar
- . 2019. Learning a deep ConvNet for multi-label classification with partial labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 647–657.Google Scholar
Cross Ref
- . 2017. Opinion mining approaches on Amazon product reviews: A comparative study. In Information and Communication Technologies (ICICT), 2017 International Conference on IEEE. 173–179.Google Scholar
- . 2016. Improving the performance of adopted approaches for extracting Arabic keyphrases. International Journal of Computer Applications 975 (2016), 8887.Google Scholar
- F. Elghannam. 2021. Text representation and classification based on bi-gram alphabet. Journal of King Saud University-Computer and Information Sciences 33, 2 (2021), 235--242.Google Scholar
- . 2019. Text representation and classification based on bi-gram alphabet. Journal of King Saud University-Computer and Information Sciences.Google Scholar
- . 2020. Arabic text classification using deep learning models. Information Processing & Management 57, 1 (2020), 102121.Google Scholar
Digital Library
- . 2016. Analysis of the perceived value of online tourism reviews: Influence of readability and reviewer characteristics. Tourism Management 52 (2016), 498–506.Google Scholar
Cross Ref
- . 1975. Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 651–659.Google Scholar
Cross Ref
- Y. Freund and R. E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 1 (1997), 119--139.Google Scholar
- D. Greene and P. Cunningham. 2006. Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings of the 23rd International Conference on Machine Learning. 377--384.Google Scholar
- E. Gibaja and S. Ventura. 2015. A tutorial on multilabel learning. ACM Computing Surveys (CSUR), 47, 3 (2015), 1--38.Google Scholar
- . 2004. Kalman filtering and neural networks. John Wiley & Sons. 47.Google Scholar
- . 2008. Label ranking by learning pairwise preferences. Artificial Intelligence 172, 16–17 (2008), 1897–1916.Google Scholar
Digital Library
- . 2017. Multi label classification | solving multi label classification problems (analyticsvidhya.com), https://www.analyticsvidhya.com/blog/2017/08/introduction-to-multi-label-classification/Google Scholar
- Kaggle. 2020. Accessed 1 March 2022, https://www.kaggle.com/ibtesama/getting-started-with-a-movie-recommendation-system.Google Scholar
- . 2014. A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.Google Scholar
- . 2020. Active learning with complementary sampling for instructing class-biased multi-label text emotion classification. IEEE Transactions on Affective Computing.Google Scholar
- . 2016. Character-aware neural language models. In Proceedings of the AAAI Conference on Artificial Intelligence 30, 1.Google Scholar
Cross Ref
- N. F. Kayaalp, G. R. Weckman, W. A. Young II, D. F. Millie, and C. Celikbilek. 2017. Extracting customer opinions associated with an aspect by using a heuristic based sentence segmentation approach. Int. J. Bus. Inf. Syst. 26, 2 (2017), 236--260.Google Scholar
- . 2018. Big data in tourism research: A literature review. Tourism Management 68 (2018), 301–323.Google Scholar
Cross Ref
- . 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136–140.Google Scholar
Cross Ref
- . 2015. Not all contexts are created equal: Better word representations with variable attention. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 1367–1372.Google Scholar
Cross Ref
- . 2005. Opinion observer: Analyzing and comparing opinions on the web. In Proceedings of the 14th International Conference on World Wide Web. 342–351.Google Scholar
Digital Library
- . 2015. A multi-label classification based approach for sentiment classification. Expert Systems with Applications 42, 3 (2015), 1083–1093.Google Scholar
Digital Library
- . 2010. User-generated content on social media: Predicting market success with online word-of-mouth. IEEE Intelligent Systems.Google Scholar
- . 2013. Combining user preferences and user opinions for accurate recommendation. Electronic Commerce Research and Applications 12, 1 (2013), 14–23.Google Scholar
Digital Library
- A. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 142--150.Google Scholar
- , 2013. Rectifier nonlinearities improve neural network acoustic models. Proc. Icml. 30, 1 (2013), 3.Google Scholar
- . 2013a. Efficient estimation of word representations in vector space. CoRR abs/1301.3781. http://arxiv.org/abs/1301.3781.Google Scholar
- . 2010. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- . 2013b. Distributed representations of words and phrases and their compositionality. arXiv preprint arXiv:1310.4546.Google Scholar
- . 2021. Deep learning-based text classification: A comprehensive review. arXiv e-prints. arXiv preprint arXiv:2004.03705.Google Scholar
- . 2021. Artificial neural networks training algorithm integrating invasive weed optimization with differential evolutionary model. Journal of Ambient Intelligence and Humanized Computing. 1–9.Google Scholar
- Murat. 2020. Metrics for multilabel classification | mustafa murat ARAT (.github.io). http://mmuratarat.gethup.io.2020-01-25/multi_lable_classification_metrics.Google Scholar
- . 2001. A guided tour to approximate string matching. ACM Computing Surveys (CSUR) 33, 1 (2001), 31–88.Google Scholar
Digital Library
- . 2016. Multi-label annotation in scientific articles-the multi-label cancer risk assessment corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 4115–4123.Google Scholar
- . 2008. Multi-label classification using ensembles of pruned sets. In 2008 Eighth IEEE International Conference on Data Mining. IEEE, 995–1000.Google Scholar
Digital Library
- . 2011. Classifier chains for multi-label classification. Machine Learning 85, 3 (2011), 333–359.Google Scholar
Digital Library
- R. E. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning 37, 3 (1999), 297--336.Google Scholar
- . 2004. The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation and machine learning. Book Manuscript.Google Scholar
- . 2022. Applying Recurrent Networks for Arabic Sentiment Analysis. Menoufia Journal of Electronic Engineering Research 31, 1 (2022), 21–28.Google Scholar
Cross Ref
- . 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 1 (2011), 31–72.Google Scholar
Digital Library
- . 2016. Multi-language sentiment analysis for hotel reviews. In MATEC Web of Conferences. EDP Sciences. 75, 03002.Google Scholar
Cross Ref
- . 2015. Build emotion lexicon from microblogs by combining effects of seed words and emoticons in a heterogeneous graph. In Proceedings of the 26th ACM Conference on Hypertext & Social Media. 283–292.Google Scholar
Digital Library
- G. Tsoumakas, I. Katakis, and I. Vlahavas. 2011. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2011), 1079--1089. Google Scholar
Digital Library
- . 2007. Random k-labelsets: An ensemble method for multilabel classification. In European Conference on Machine Learning. Springer, Berlin. 406–417.Google Scholar
- . 2009. Mining multi-label data. In Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. 667–685.Google Scholar
Cross Ref
- . 2010. Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering 23, 7 (2010), 1079–1089.Google Scholar
Digital Library
- . 2004. The Geometry of Information Retrieval. Cambridge University Press.Google Scholar
Cross Ref
- . 2014. A classification of user-generated content into consumer decision journey stages. Neural Networks 58 (2014), 68–81.Google Scholar
Digital Library
- . 2007. A normalized Levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 6 (2007), 1091–1095.Google Scholar
Digital Library
- . 2015. Islamic fatwa request outing via hierarchical multi-label Arabic text categorization. In 2015 First International Conference on Arabic Computational Linguistics (ACLing). IEEE, 145–151.Google Scholar
Digital Library
- M. L. Zhang and Z. H. Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition 40, 7 (2007), 2038--2048.Google Scholar
- M. L. Zhang and K. Zhang. 2010. Multi-label learning by exploiting label dependency. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 999--1008.Google Scholar
- . 2014. Aspect and entity extraction for opinion mining. In data mining and knowledge discovery for big data. Berlin, Springer. 1–40. Google Scholar
Cross Ref
- . 2006. Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering 18, 10 (2006), 1338–1351.Google Scholar
Digital Library
- . 2015. Character-level convolutional networks for text classification. arXiv preprint arXiv:1509.01626.Google Scholar
- . 2005. Multi-labelled classification using maximum entropy method. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 274–281.Google Scholar
Digital Library
Index Terms
Multi-Label Annotation and Classification of Arabic Texts Based on Extracted Seed Keyphrases and Bi-Gram Alphabet Feed Forward Neural Networks Model
Recommendations
Confidence-based Weighted Loss for Multi-label Classification with Missing Labels
ICMR '20: Proceedings of the 2020 International Conference on Multimedia RetrievalThe problem of multi-label classification with missing labels (MLML) is a common challenge that is prevalent in several domains, e.g. image annotation and auto-tagging. In multi-label classification, each instance may belong to multiple class labels ...
Self-paced multi-label co-training
AbstractMulti-label learning aims to solve classification problems where instances are associated with a set of labels. In reality, it is generally easy to acquire unlabeled data but expensive or time-consuming to label them, and this ...
Effective multi-label active learning for text classification
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningLabeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical when a very large amount of data is needed for training multi-label text ...






Comments