skip to main content
research-article

Arabic Fake News Detection: A Fact Checking Based Deep Learning Approach

Authors Info & Claims
Published:19 January 2022Publication History
Skip Abstract Section

Abstract

Fake news stories can polarize society, particularly during political events. They undermine confidence in the media in general. Current NLP systems are still lacking the ability to properly interpret and classify Arabic fake news. Given the high stakes involved, determining truth in social media has recently become an emerging research that is attracting tremendous attention. Our literature review indicates that applying the state-of-the-art approaches on news content address some challenges in detecting fake news’ characteristics, which needs auxiliary information to make a clear determination. Moreover, the ‘Social-context-based’ and ‘propagation-based’ approaches can be either an alternative or complementary strategy to content-based approaches. The main goal of our research is to develop a model capable of automatically detecting truth given an Arabic news or claim. In particular, we propose a deep neural network approach that can classify fake and real news claims by exploiting ‘Convolutional Neuron Networks’. Our approach attempts to solve the problem from the fact checking perspective, where the fact-checking task involves predicting whether a given news text claim is factually authentic or fake. We opt to use an Arabic balanced corpus to build our model because it unifies stance detection, stance rationale, relevant document retrieval and fact-checking. The model is trained on different well selected attributes. An extensive evaluation has been conducted to demonstrate the ability of the fact-checking task in detecting the Arabic fake news. Our model outperforms the performance of the state-of-the-art approaches when applied to the same Arabic dataset with the highest accuracy of 91%.

REFERENCES

  1. [1] Monti Federico & Frasca Fabrizio & Eynard Davide & Mannion Damon & Bronstein Michael. 2019. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673. Retrieved from https://arxiv.org/abs/1902.06673.Google ScholarGoogle Scholar
  2. [2] Shu Kai, Wang Suhang, and Liu Huan. Beyond news contents: The role of social context for fake news detection. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. ACM, New York, NY, 9 pages. DOI: https://doi.org/10.1145/3289600.3290994 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Xu Brian. 2019. Combating Fake News with Adversarial Domain Adaptation and Neural Models. Master's thesis in Computer Sciences and Engineering. Massachusetts Institute of Technology. 80 pages.Google ScholarGoogle Scholar
  4. [4] Popat K., Mukherjee S., Strötgen J., and Weikum G.. 2017. Where the truth Lies: Explaining the credibility of emerging claims on the web and social media. In Proceedings of the 26th International Conference on World Wide Web Companion. 10031012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Jin Z., Cao J., Guo H., Zhang Y., Wang Y. and Luo J.. 2017. Detection and analysis of 2016 US presidential election related rumors on twitter. In Proceedings of the International Conference on Social, Cultural, and Behavioral Modeling. 10354, Springer, Cham, 2017, 1424.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Wu L., Li J., Hu X., and Liu H.. 2017. Gleaning wisdom from the past: Early detection of emerging rumors in social media. In Proceedings of the 2017 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics. 99107.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Jin Z., Cao J., Zhang Y., and Luo J.. 2016. News verification by exploiting conflicting social viewpoints in microblogs. In Proceedings of the 13th AAAI Conference on Artificial Intelligence. 29722978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Liu Q., Wu S., Yu F., Wang L., and Tan T.. 2016. ICE: Information credibility evaluation on social media via representation learning. arXiv preprint arXiv:1609.09226. Retrieved from https://arxiv.org/abs/1609.09226.Google ScholarGoogle Scholar
  9. [9] Baly R., Mohtarami M., Glass J., Marquez L., Moschitti A., and Nakov P.. 2017. Integrating stance detection and fact checking in a unified corpus. In Proceedings of the 16th Annual Conference of North American Chapter of the Association for Computational Linguistics: Human Language Technologies.Google ScholarGoogle Scholar
  10. [10] Shu Kai, Sliva Amy, Wang Suhang, Tang Jiliang, and Liu Huan. 2017. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter 19, 1 (2017), 2236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Zubiaga Arkaitz, Aker Ahmet, Bontcheva Kalina, Liakata Maria, and Procter Rob. 2018. Detection and resolution of rumours in social media: A survey. ACM Computing Surveys 51, 2 (February 2018), 32:1–32:36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Steinberg Luc. 2017. Infographic: Beyond Fake News –10 Types of Misleading News – thirteen Languages. Retrieved March 9, 2019 from https://eavi.eu/beyond-fake-news-10-types-misleading-info/.Google ScholarGoogle Scholar
  13. [13] Bergstrom Guy. 2019. Understanding the Mechanisms of Propaganda. Retrieved April 4, 2019 from https://www.thebalancesmb.com/what-is-propaganda-and-how-does-it-work-2295248.Google ScholarGoogle Scholar
  14. [14] Merriam-Webster, pseudoscience. 2019. Retrieved March 10, 2019 from https://www.merriam-webster.com/dictionary/pseudoscience.Google ScholarGoogle Scholar
  15. [15] van Prooijen Jan-Willem and van Vugt Mark. 2018. Conspiracy theories: Evolved functions and psychological mechanisms. Perspectives on Psychological Science: A Journal of the Association for Psychological Science 13, 6 (2018), 770788. DOI: https://doi.org/10.1177/1745691618774270Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Allcott Hunt and Gentzkow Matthew. 2017. Social media and fake news in the 2016 election. Journal of Economic Perspectives 31, 2 (2017), 211–36. DOI: 10.1257/jep.31.2.211Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Obar J. A. and Wildman S.. 2015. Social media definition and the governance challenge: An introduction to the special issue. Telecommunications Policy 39, 9 (2015), 745750. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] McFadden Christopher. 2018. A chronological history of social media. Retrieved March 16, 2019 from https://interestingengineering.com/a-chronological-history-of-social-media.Google ScholarGoogle Scholar
  19. [19] Karch Marziah. 2019. YouTube: Everything you need to know. Retrieved May 4, 2019 from https://www.lifewire.com/youtube-explained-1616693.Google ScholarGoogle Scholar
  20. [20] Go Gregory. 2018. Social media marketing overview: What It Is and how to use It. Retrieved March 21, 2019 from https://www.thebalancesmb.com/social-media-overview-what-it-is-and-how-to-use-it-2531971.Google ScholarGoogle Scholar
  21. [21] Baccarella Christian V., Wagner Timm F., Kietzmann Jan H., and McCarthy Ian P.. 2018. Social Media? It's Serious!: Understanding the dark side of social media. European Management Journal. 36, 4 (2018), 431438.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] McIntyre Lee. 2018. Post-Truth Retrieved March 23, 2019 from https://mitpress.mit.edu/books/post-truth.Google ScholarGoogle Scholar
  23. [23] Han Jiawei, Kamber Micheline, and Pei Jian. 2012. Data Mining: Concepts and Techniques, Vol. A (3rd ed.). Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Encyclopedia BI, Text Mining. Retrieved March 28, 2019 from https://www.logianalytics.com/resources/bi-encyclopedia/text-mining/.Google ScholarGoogle Scholar
  25. [25] Vosoughi Soroush, Roy Deb, and Aral Sinan. 2018. The spread of true and false news online. Science 359, 6380 (2018), 11461151. DOI: 10.1126/science.aap9559Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Rehling John. 2011. How natural language processing helps uncover social media sentiment. Retrieved November 8, 2011 from https://mashable.com/2011/11/08/natural-language-processing-social-media/#VbWC8PySNqqy.Google ScholarGoogle Scholar
  27. [27] Manning Christopher D., Raghavan Prabhakar, and Schütze Hinrich. 2009. Introduction to information retrieval: Tokenization. Retrieved April 4, 2019 from https://nlp.stanford.edu/IR-book/html/htmledition/tokenization-1.html.Google ScholarGoogle Scholar
  28. [28] Abdelali Ahmed, Darwish Kareem, Durrani Nadir, and Mubarak Hamdy. 2016. Farasa: A fast and furious segmenter for arabic. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. 1116.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Jaafar Younes, Namly Driss, Bouzoubaa Karim, Yousfi Abdellah. 2017. Enhancing Arabic stemming process using resources and benchmarking tools. Journal of King Saud University - Computer and Information Sciences 29, 2 (April 2017), 264–170. DOI: 10.1016/j.jksuci.2016.11.010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Dichy J. 2001. On lemmatization in Arabic, A formal definition of the Arabic entries of multilingual lexical databases. In Proceedings of the ACL 39th Annual Meeting. Workshop on Arabic Language Processing; Status and Prospect. 2330.Google ScholarGoogle Scholar
  31. [31] Cambridge English Dictionary, lemmatization. Retrieved April 12, 2019 from https://dictionary.cambridge.org/dictionary/english/lemmatization.Google ScholarGoogle Scholar
  32. [32] Bird Steven, Klein Ewan, and Loper Edward. 2019. Natural language processing with python: 5. Categorizing and Tagging Words. Retrieved September 4, 2019 from https://www.nltk.org/book/ch05.html.Google ScholarGoogle Scholar
  33. [33] Lopez Yse Diego. 2019. Your Guide to Natural Language Processing (NLP). Retrieved April 14, 2019 from https://towardsdatascience.com/your-guide-to-natural-language-processing-nlp-48ea2511f6e1.Google ScholarGoogle Scholar
  34. [34] Samia. 2018. Understanding Word Embeddings. Retrieved April 14, 2019 from https://towardsml.com/2018/06/12/understanding-word-embeddings/.Google ScholarGoogle Scholar
  35. [35] Jayesh Bapu Ahire. 2018. Introduction to Word Vectors. Retrieved March 12, 2018 from https://medium.com/@jayeshbahire/introduction-to-word-vectors-ea1d4e4b84bf.Google ScholarGoogle Scholar
  36. [36] Vasudev Rakshith. 2017. What is One Hot Encoding? Why And When do you have to use it? Retrieved April 15, 2019 from https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f.Google ScholarGoogle Scholar
  37. [37] DeepAI, Machine Learning Glossary and Terms: One Hot Encoding. Retrieved April 16, 2019 from https://deepai.org/machine-learning-glossary-and-terms/one-hot-encoding.Google ScholarGoogle Scholar
  38. [38] Rakhmanberdieva Nurzat. 2018. Word Representation in Natural Language Processing Part I. Retrieved April 16, 2019 from https://towardsdatascience.com/word-representation-in-natural-language-processing-part-i-e4cd54fed3d4.Google ScholarGoogle Scholar
  39. [39] Brownlee Jason. 2018. A Gentle Introduction to k-fold Cross-Validation. Retrieved April 17, 2019 from https://machinelearningmastery.com/k-fold-cross-validation/.Google ScholarGoogle Scholar
  40. [40] Kim Y. 2014. Convolutional neural networks for sentence classifcation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 17461751.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Mishra Mayank. 2019. Convolutional Neural Networks, Explained. Retrieved April 17, 2019 from https://www.datascience.com/blog/convolutional-neural-network.Google ScholarGoogle Scholar
  42. [42] Saha Sumit. 2018. A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way. Retrieved April 17, 2019 from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53.Google ScholarGoogle Scholar
  43. [43] FileInfo: The File Extensions Database, JSONFile Extension. Retrieved August 20, 2018 from https://fileinfo.com/extension/json.Google ScholarGoogle Scholar
  44. [44] pythonTM, What is Python? Retrieved June 3, 2019 from https://docs.python.org/3/faq/general.html#general-information.Google ScholarGoogle Scholar
  45. [45] Spyder, Spyder: Overview. 2018. Retrieved July 26, 2019 from https://www.spyder-ide.org/.Google ScholarGoogle Scholar
  46. [46] Anaconda Distribution. 2019. Retrieved July 26, 2019 from https://docs.anaconda.com/anaconda/.Google ScholarGoogle Scholar
  47. [47] Anaconda Documentation: Anaconda Distribution. 2019. Retrieved July 26, 2019 from https://docs.anaconda.com/.Google ScholarGoogle Scholar
  48. [48] QCRI Arabic Language Technologies: Tools & Demos “FARASA”. 2016. Retrieved July 28, 2019 from http://qatsdemo.cloudapp.net/farasa/.Google ScholarGoogle Scholar
  49. [49] Arabic Language Technologies Group: Farasa. 2019. Retrieved July 28, 2019 from http://alt.qcri.org/farasa/.Google ScholarGoogle Scholar
  50. [50] NLTK 3.4.5 documentation: Natural Language Toolkit. Retrieved September 2, 2019 from https://www.nltk.org/.Google ScholarGoogle Scholar
  51. [51] Pandas: Python Data Analysis Library. 2019. Retrieved July 29, 2019 from https://pandas.pydata.org/.Google ScholarGoogle Scholar
  52. [52] pandas 0.25.1 documentation - API reference: DataFrame Constructor. Retrieved July 29, 2019 from https://pandas.pydata.org/pandas-docs/stable/reference/frame.html.Google ScholarGoogle Scholar
  53. [53] scikit-learn - Machine Learning in Python. 2019. Retrieved July 29, 2019 from https://scikit-learn.org/stable/index.html.Google ScholarGoogle Scholar
  54. [54] scikit-learn, sklearn.preprocessing. MultiLabelBinarizer. 2019. Retrieved July 30, 2019 from https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html.Google ScholarGoogle Scholar
  55. [55] Facebook, fastText: Library for efficient text classification and representation learning. 2019. Retrieved August 3, 2019 from https://fasttext.cc/.Google ScholarGoogle Scholar
  56. [56] Facebook, Resources: Word vectors for 157 languages. 2019. Retrieved August 3, 2019 from https://fasttext.cc/docs/en/crawl-vectors.html.Google ScholarGoogle Scholar
  57. [57] Subedi Nishan. 2018. FastText: Under the Hood. Retrieved August 3, 2019 from https://towardsdatascience.com/fasttext-under-the-hood-11efc57b2b3.Google ScholarGoogle Scholar
  58. [58] Joshi Renuka. 2016. Accuracy, Precision, Recall & F1 Score: Interpretation of Performance Measures. Retrieved August 4, 2019 from https://blog.exsilio.com/all/accuracy-precision-recall-f1-score-interpretation-of-performance-measures/.Google ScholarGoogle Scholar
  59. [59] Markham Kevin. 2014. Simple guide to confusion matrix terminology. Retrieved August 4, 2019 from https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/.Google ScholarGoogle Scholar
  60. [60] Narkhede Sarang. 2018. Understanding AUC - ROC Curve. Retrieved August 4, 2019 from https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5.Google ScholarGoogle Scholar
  61. [61] Hanselowski Andreas, Avinesh P. V. S., Schiller Benjamin, and Caspelherr Felix. 2017. Team Athene on the fake news challenge. Retrieved Oct, 29, 2019 from https://medium.com/@andre134679/team-atheneon-the-fake-news-challenge-28a5cf5e017b.Google ScholarGoogle Scholar
  62. [62] Riedel Benjamin, Augenstein Isabelle, Spithourakis Georgios P., and Riedel Sebastian. 2017. A simple but tough-to-beat baseline for the Fake News Challenge stance detection task. arXiv preprint arXiv:1707.03264. Retrieved from https://arxiv.org/abs/1707.03264.Google ScholarGoogle Scholar
  63. [63] Mohtarami Mitra, Baly Ramy, Glass James, Nakov Preslav, Màrquez Lluís, and Moschitti Alessandro. 2018. Automatic stance detection using end-to-end memory networks. In Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Harrag F., Al-Salman A. S., and AlQuahtani A.. 2020. A Hybrid Recommender System For Rating Prediction Of Arabic Reviews. International Journal of Asian Language Processing 30, 2 (2020), 25. DOI: https://doi.org/10.1142/S2717554520500101Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Husain Fatemah and Uzuner Ozlem. 2021. A survey of offensive language detection for the arabic language. ACM Transactions on Asian and Low-Resource Language Information Processing 20, 1 (April 2021), Article 12, 44 pages. DOI: https://doi.org/10.1145/3421504 Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Harrag F., Debbah Maria, Darwish Kareem, and Abdelali Ahmed. 2020. BERT Transformer model for detecting arabic GPT2 Auto-Generated Tweets. In Proceedings of the The 5th Arabic Natural Language Processing Workshop.Google ScholarGoogle Scholar

Index Terms

  1. Arabic Fake News Detection: A Fact Checking Based Deep Learning Approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 4
      July 2022
      464 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3511099
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 19 January 2022
      • Accepted: 1 November 2021
      • Received: 1 August 2021
      Published in tallip Volume 21, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!