Abstract
Twitter is currently one of the most popular microblogging platforms allowing people to post short messages, news, thoughts, and so on. The Twitter user community is growing very fast. It has an average of 328 million active accounts today, making it one of the most common media for getting information during any influential or important event. Because it is freely used by the public, some credibility checking is required, especially when it comes to events of high importance. Automatic rumor detection in Arabic tweets is a challenging task due to the changes in the structural and morphological nature of the Arabic language, which makes the detection of rumors more difficult than in other languages. In this article, we proposed an effective approach for rumor detection of Arabic tweets using an eXtreme gradient boosting (XGBoost) classifier. We conducted a set of experiments on a public dataset that contained a large number of rumor and non-rumor tweets. The model uses a comprehensive set of features, including content-based, user-based, and topic-based features, allowing one to look at credibility from different angles. The experimental results demonstrated that the proposed XGBoost-based approach achieves 97.18% accuracy on 60% of the dataset as a training set, which is the highest accuracy rate compared with the other methods used in recent related work.
- [1] . 2018. Detection and resolution of rumours in social media: A survey. ACM Computing Surveys 51, 2 (2018), 1–36.Google Scholar
Digital Library
- [2] . 2018. Rumor has it: The effects of virality metrics on rumor believability and transmission on Twitter. New Media & Society 20, 12 (2018), 4807–4825.Google Scholar
Cross Ref
- [3] . 2020. Lies kill, facts save: Detecting COVID-19 misinformation in Twitter. IEEE Access 8 (2020), 155961–155970.Google Scholar
Cross Ref
- [4] . 2017. Rumor detection over varying time windows. PLoS One 12, 1, 2017.Google Scholar
Cross Ref
- [5] . 2018. Credibility in online social networks: A survey. IEEE Access 7 (2018), 2828–2855.Google Scholar
Cross Ref
- [6] . 2020. A rumor & anti-rumor propagation model based on data enhancement and evolutionary game. IEEE Transactions on Emerging Topics in Computing.Google Scholar
Cross Ref
- [7] 2016. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 11, 3 (2016), e0150989.Google Scholar
Cross Ref
- [8] . 2017. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In Proceedings of the 2017 SIAM International Conference on Data Mining (SDM’17). SIAM, 99–107.Google Scholar
Cross Ref
- [9] . 2020. Predicting rumor retweeting behavior of social media users in public emergencies. IEEE Access 8 (2020), 87121–87132.Google Scholar
Cross Ref
- [10] . 2020. The four dimensions of social network analysis: An overview of research methods, applications, and software tools. Information Fusion 63 (2020), 88–120.Google Scholar
Cross Ref
- [11] . 2017. Reputation-based credibility analysis of Twitter social network users. Concurrency and Computation: Practice and Experience 29, 7 (2017), e3873.Google Scholar
Cross Ref
- [12] . 2018. Detecting rumors in social media: A survey. Procedia Computer Science 142 (2018), 294–300.Google Scholar
Digital Library
- [13] . 2020. Linguistic analysis of pro-ISIS users on Twitter. Behavioral Sciences of Terrorism and Political Aggression 12, 3 (2020), 171–185.Google Scholar
Cross Ref
- [14] . 2020. Arabic rumours identification by measuring the credibility of Arabic tweet content. In Media Controversy: Breakthroughs in Research and Practice, IGI Global 2020, 236–248.Google Scholar
Cross Ref
- [15] . 2018. Arabic News Credibility on Twitter: An enhanced model using hybrid features. Journal of Theoretical & Applied Information Technology 96, 8 (2018), 2327--2338.Google Scholar
- [16] . 2011. An experimental system for measuring the credibility of news content in Twitter. International Journal of Web Information Systems 7, 2 (2011), 130--151.Google Scholar
Cross Ref
- [17] . 2019. Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization. Knowledge-Based Systems 185 (2019), 104945.Google Scholar
Digital Library
- [18] . 2017. Tree Boosting data Competitions with XGBoost. Universitat Politècnica de Catalunya, 2017.Google Scholar
- [19] 2019. Power load forecasting based on the combined model of LSTM and XGBoost. In Proceedings of the 2019 International Conference on Pattern Recognition and Artificial Intelligence. Association for Computing Machinery, New York, NY, United States, 46–51.Google Scholar
Digital Library
- [20] . 2017. Machine learning in rock facies classification: An application of XGBoost. In International Geophysical Conference, Qingdao, China, 17–20 April 2017. Society of Exploration Geophysicists, 1371–1374.Google Scholar
Cross Ref
- [21] . 2013. Extracting collective trends from Twitter using social-based data mining. In International Conference on Computational Collective Intelligence, Springer Publishing Company, 622–630.Google Scholar
Digital Library
- [22] . 2016. Overview of the special issue on trust and veracity of information in social media. ACM Transactions on Information Systems 34, 3 (2016), 1–5.Google Scholar
Digital Library
- [23] . 2019. Misinformation in social media: Definition, manipulation, and detection. ACM SIGKDD Explorations Newsletter 21, 2 (2019), 80–90.Google Scholar
Digital Library
- [24] . 2019. Rumor detection and classification for Twitter data. arXiv preprint arXiv:1912.08926, 2019.Google Scholar
- [25] . 2011. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the Conference on Empirical Methods In Natural Language Processing 1589–1599.Google Scholar
Digital Library
- [26] . 2018. Unsupervised rumor detection based on users’ behaviors using neural networks. Pattern Recognition Letters 105 (2018), 226–233.Google Scholar
Digital Library
- [27] . 2019. An Arabic corpus of fake news: Collection, analysis and classification. In International Conference on Arabic Language Processing. Cham Springer International Publishing, 292–302.Google Scholar
Cross Ref
- [28] . 2012. Automatic detection of rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 1–7.Google Scholar
Digital Library
- [29] . 2015. Rumor detection in Twitter: An analysis in retrospect. In 2015 IEEE International Conference on Advanced Networks and Telecommuncations Systems (ANTS’15) IEEE, 1–3.Google Scholar
Cross Ref
- [30] . 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web. Association for Computing Machinery, 675–684.Google Scholar
Digital Library
- [31] . 2019. Supervised learning for fake news detection. IEEE Intelligent Systems 34, 2 (2019), 76–81.Google Scholar
Digital Library
- [32] . 2018. Buzzface: A news veracity dataset with Facebook user commentary and egos. In 12th International AAAI Conference on Web and Social Media (ICWSM’18). The AAAI Press, Palo Alto, California, USA, 531--540.Google Scholar
Cross Ref
- [33] . 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 10–18.Google Scholar
Digital Library
- [34] . 2015. Automatic detection of rumoured tweets and finding its origin. In 2015 International Conference on Computing and Network Communications (CoCoNet’15) 607–612.Google Scholar
Cross Ref
- [35] . 2018. Rumor detection on Twitter using a supervised machine learning framework. International Journal of Information Retrieval Research 8, 3 (2018), 1–13.Google Scholar
Digital Library
- [36] . 2005. KEA: Practical automated keyphrase extraction. In Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI Global, 129–152.Google Scholar
Cross Ref
- [37] . 2014. Textblob: Simplified text processing. Secondary TextBlob: Simplified Text Processing 3, 2014.Google Scholar
- [38] . 2009. A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability. Soft Computing 13, 10 (2009), 959.Google Scholar
Digital Library
Index Terms
An Effective Approach for Rumor Detection of Arabic Tweets Using eXtreme Gradient Boosting Method
Recommendations
Automatic detection of rumor on Sina Weibo
MDS '12: Proceedings of the ACM SIGKDD Workshop on Mining Data SemanticsThe problem of gauging information credibility on social networks has received considerable attention in recent years. Most previous work has chosen Twitter, the world's largest micro-blogging platform, as the premise of research. In this work, we shift ...
Rumor Gauge: Predicting the Veracity of Rumors on Twitter
Special Issue on KDD 2016 and Regular PapersThe spread of malicious or accidental misinformation in social media, especially in time-sensitive situations, such as real-world emergencies, can have harmful effects on individuals and society. In this work, we developed models for automated ...
Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization
AbstractWith the continued development of social networks, the spreading of information has become faster than ever. Consequently, this has resulted in a problem with the reliability of the information, where any user can publish whatever he/...






Comments