skip to main content
research-article

An Effective Approach for Rumor Detection of Arabic Tweets Using eXtreme Gradient Boosting Method

Authors Info & Claims
Published:28 January 2022Publication History
Skip Abstract Section

Abstract

Twitter is currently one of the most popular microblogging platforms allowing people to post short messages, news, thoughts, and so on. The Twitter user community is growing very fast. It has an average of 328 million active accounts today, making it one of the most common media for getting information during any influential or important event. Because it is freely used by the public, some credibility checking is required, especially when it comes to events of high importance. Automatic rumor detection in Arabic tweets is a challenging task due to the changes in the structural and morphological nature of the Arabic language, which makes the detection of rumors more difficult than in other languages. In this article, we proposed an effective approach for rumor detection of Arabic tweets using an eXtreme gradient boosting (XGBoost) classifier. We conducted a set of experiments on a public dataset that contained a large number of rumor and non-rumor tweets. The model uses a comprehensive set of features, including content-based, user-based, and topic-based features, allowing one to look at credibility from different angles. The experimental results demonstrated that the proposed XGBoost-based approach achieves 97.18% accuracy on 60% of the dataset as a training set, which is the highest accuracy rate compared with the other methods used in recent related work.

REFERENCES

  1. [1] Zubiaga A., Aker A., Bontcheva K., Liakata M., and Procter R.. 2018. Detection and resolution of rumours in social media: A survey. ACM Computing Surveys 51, 2 (2018), 136.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Kim J. W.. 2018. Rumor has it: The effects of virality metrics on rumor believability and transmission on Twitter. New Media & Society 20, 12 (2018), 48074825.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Al-Rakhami M. S. and Al-Amri A. M.. 2020. Lies kill, facts save: Detecting COVID-19 misinformation in Twitter. IEEE Access 8 (2020), 155961155970.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Kwon S., Cha M., and Jung K.. 2017. Rumor detection over varying time windows. PLoS One 12, 1, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Alrubaian M., Al-Qurishi M., Alamri A., Al-Rakhami M., Hassan M. M., and Fortino G.. 2018. Credibility in online social networks: A survey. IEEE Access 7 (2018), 28282855.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Xiao Y., Li W., Qiang S., Li Q., Xiao H., and Liu Y.. 2020. A rumor & anti-rumor propagation model based on data enhancement and evolutionary game. IEEE Transactions on Emerging Topics in Computing.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Zubiaga A., Liakata M., Procter R., Wong Sak Hoi G., and Tolmie P.. 2016. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 11, 3 (2016), e0150989.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Wu L., Li J., Hu X., and Liu H.. 2017. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In Proceedings of the 2017 SIAM International Conference on Data Mining (SDM’17). SIAM, 99107.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Tian Y., Fan R., Ding X., Zhang X., and Gan T.. 2020. Predicting rumor retweeting behavior of social media users in public emergencies. IEEE Access 8 (2020), 8712187132.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Camacho D., Panizo-LLedot A., Bello-Orgaz G., Gonzalez-Pardo A., and Cambria E.. 2020. The four dimensions of social network analysis: An overview of research methods, applications, and software tools. Information Fusion 63 (2020), 88120.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Alrubaian M., Al-Qurishi M., Al-Rakhami M., Hassan M. M., and Alamri A.. 2017. Reputation-based credibility analysis of Twitter social network users. Concurrency and Computation: Practice and Experience 29, 7 (2017), e3873.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Alzanin S. M. and Azmi A. M.. 2018. Detecting rumors in social media: A survey. Procedia Computer Science 142 (2018), 294300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Torregrosa J., Thorburn J., Lara-Cabrera R., Camacho D., and Trujillo H. M.. 2020. Linguistic analysis of pro-ISIS users on Twitter. Behavioral Sciences of Terrorism and Political Aggression 12, 3 (2020), 171185.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Floos A. Y. M.. 2020. Arabic rumours identification by measuring the credibility of Arabic tweet content. In Media Controversy: Breakthroughs in Research and Practice, IGI Global 2020, 236248.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Sabbeh S. F. and Baatwah S. Y.. 2018. Arabic News Credibility on Twitter: An enhanced model using hybrid features. Journal of Theoretical & Applied Information Technology 96, 8 (2018), 2327--2338.Google ScholarGoogle Scholar
  16. [16] Al-Khalifa H. S. and Al-Eidan R. M.. 2011. An experimental system for measuring the credibility of news content in Twitter. International Journal of Web Information Systems 7, 2 (2011), 130--151.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Alzanin S. M. and Azmi A. M.. 2019. Rumor detection in Arabic tweets using semi-supervised and unsupervised expectation–maximization. Knowledge-Based Systems 185 (2019), 104945.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Bort Escabias C.. 2017. Tree Boosting data Competitions with XGBoost. Universitat Politècnica de Catalunya, 2017.Google ScholarGoogle Scholar
  19. [19] Li C. et al. 2019. Power load forecasting based on the combined model of LSTM and XGBoost. In Proceedings of the 2019 International Conference on Pattern Recognition and Artificial Intelligence. Association for Computing Machinery, New York, NY, United States, 4651.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Zhang L. and Zhan C.. 2017. Machine learning in rock facies classification: An application of XGBoost. In International Geophysical Conference, Qingdao, China, 17–20 April 2017. Society of Exploration Geophysicists, 13711374.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Bello G., Menéndez H., Okazaki S., and Camacho D.. 2013. Extracting collective trends from Twitter using social-based data mining. In International Conference on Computational Collective Intelligence, Springer Publishing Company, 622630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Papadopoulos S., Bontcheva K., Jaho E., Lupu M., and Castillo C.. 2016. Overview of the special issue on trust and veracity of information in social media. ACM Transactions on Information Systems 34, 3 (2016), 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Wu L., Morstatter F., Carley K. M., and Liu H.. 2019. Misinformation in social media: Definition, manipulation, and detection. ACM SIGKDD Explorations Newsletter 21, 2 (2019), 8090.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Hamidian S. and Diab M. T.. 2019. Rumor detection and classification for Twitter data. arXiv preprint arXiv:1912.08926, 2019.Google ScholarGoogle Scholar
  25. [25] Qazvinian V., Rosengren E., Radev D. R., and Mei Q.. 2011. Rumor has it: Identifying misinformation in microblogs. In Proceedings of the Conference on Empirical Methods In Natural Language Processing 15891599.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Chen W., Zhang Y., Yeo C. K., Lau C. T., and Lee B. S.. 2018. Unsupervised rumor detection based on users’ behaviors using neural networks. Pattern Recognition Letters 105 (2018), 226233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Alkhair M., Meftouh K., Smaïli K., and Othman N.. 2019. An Arabic corpus of fake news: Collection, analysis and classification. In International Conference on Arabic Language Processing. Cham Springer International Publishing, 292302.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Yang F., Liu Y., Yu X., and Yang M.. 2012. Automatic detection of rumor on Sina Weibo. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics. ACM, 17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Dayani R., Chhabra N., Kadian T., and Kaushal R.. 2015. Rumor detection in Twitter: An analysis in retrospect. In 2015 IEEE International Conference on Advanced Networks and Telecommuncations Systems (ANTS’15) IEEE, 13.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Castillo C., Mendoza M., and Poblete B.. 2011. Information credibility on Twitter. In Proceedings of the 20th International Conference on World Wide Web. Association for Computing Machinery, 675684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Reis J. C. S., Correia A., Murai F., Veloso A., and Benevenuto F.. 2019. Supervised learning for fake news detection. IEEE Intelligent Systems 34, 2 (2019), 7681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Santia G. C. and Williams J. R.. 2018. Buzzface: A news veracity dataset with Facebook user commentary and egos. In 12th International AAAI Conference on Web and Social Media (ICWSM’18). The AAAI Press, Palo Alto, California, USA, 531--540.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Hall M., Frank E., Holmes G., Pfahringer B., Reutemann P., and Witten I. H.. 2009. The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter 11, 1 (2009), 1018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Sahana V. P., Pias A. R., Shastri R., and Mandloi S.. 2015. Automatic detection of rumoured tweets and finding its origin. In 2015 International Conference on Computing and Network Communications (CoCoNet’15) 607612.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Thakur H. K., Gupta A., Bhardwaj A., and Verma D.. 2018. Rumor detection on Twitter using a supervised machine learning framework. International Journal of Information Retrieval Research 8, 3 (2018), 113.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Witten I. H., Paynter G. W., Frank E., Gutwin C., and Nevill-Manning C. G.. 2005. KEA: Practical automated keyphrase extraction. In Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI Global, 129152.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Loria S., Keen P., Honnibal M., Yankovsky R., Karesh D., and Dempsey E.. 2014. Textblob: Simplified text processing. Secondary TextBlob: Simplified Text Processing 3, 2014.Google ScholarGoogle Scholar
  38. [38] García S., Fernández A., Luengo J., and Herrera F.. 2009. A study of statistical techniques and performance measures for genetics-based machine learning: Accuracy and interpretability. Soft Computing 13, 10 (2009), 959.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Effective Approach for Rumor Detection of Arabic Tweets Using eXtreme Gradient Boosting Method

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
      January 2022
      442 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3494068
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 January 2022
      • Accepted: 1 April 2021
      • Revised: 1 March 2021
      • Received: 1 November 2020
      Published in tallip Volume 21, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!