skip to main content
research-article

A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

Authors Info & Claims
Published:02 November 2021Publication History
Skip Abstract Section

Abstract

Fake news classification is one of the most interesting problems that has attracted huge attention to the researchers of artificial intelligence, natural language processing, and machine learning (ML). Most of the current works on fake news detection are in the English language, and hence this has limited its widespread usability, especially outside the English literate population. Although there has been a growth in multilingual web content, fake news classification in low-resource languages is still a challenge due to the non-availability of an annotated corpus and tools. This article proposes an effective neural model based on the multilingual Bidirectional Encoder Representations from Transformer (BERT) for domain-agnostic multilingual fake news classification. Large varieties of experiments, including language-specific and domain-specific settings, are conducted. The proposed model achieves high accuracy in domain-specific and domain-agnostic experiments, and it also outperforms the current state-of-the-art models. We perform experiments on zero-shot settings to assess the effectiveness of language-agnostic feature transfer across different languages, showing encouraging results. Cross-domain transfer experiments are also performed to assess language-independent feature transfer of the model. We also offer a multilingual multidomain fake news detection dataset of five languages and seven different domains that could be useful for the research and development in resource-scarce scenarios.

REFERENCES

  1. [1] Abonizio Hugo Queiroz, Morais Janaína Ignácio de, Tavares Gabriel Marques, and Junior Sylvio Barbon. 2020. Language-independent fake news detection: English, Portuguese, and Spanish mutual features. Future Internet 12, 5 (2020), 87. DOI: DOI: https://doi.org/10.3390/fi12050087Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Agarap Abien Fred. 2018. Deep learning using rectified linear units (ReLU). CoRR abs/1803.08375 (2018). arxiv:1803.08375. http://arxiv.org/abs/1803.08375.Google ScholarGoogle Scholar
  3. [3] Barthel Michael, Mitchell Amy, and Holcomb Jesse. 2016. Many Americans believe fake news is sowing confusion. Pew Research Center 15 (2016), 12.Google ScholarGoogle Scholar
  4. [4] Bhatt Gaurav, Sharma Aman, Sharma Shivam, Nagpal Ankush, Raman Balasubramanian, and Mittal Ankush. 2017. On the benefit of combining neural, statistical and external features for fake news identification. CoRR abs/1712.03935 (2017). arxiv:1712.03935. http://arxiv.org/abs/1712.03935.Google ScholarGoogle Scholar
  5. [5] Cer Daniel, Yang Yinfei, Kong Sheng-Yi, Hua Nan, Limtiaco Nicole, John Rhomni St., Constant Noah, et al. 2018. Universal sentence encoder. CoRR abs/1803.11175 (2018). arxiv:1803.11175. http://arxiv.org/abs/1803.11175.Google ScholarGoogle Scholar
  6. [6] Conneau Alexis, Kiela Douwe, Schwenk Holger, Barrault Loïc, and Bordes Antoine. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 670680. DOI: DOI: https://doi.org/10.18653/v1/D17-1070Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Conroy Nadia K., Rubin Victoria L., and Chen Yimin. 2015. Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 14. DOI: DOI: https://doi.org/10.1002/pra2.2015.145052010082Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers). 41714186. DOI: DOI: https://doi.org/10.18653/v1/N19-1423Google ScholarGoogle Scholar
  9. [9] Fleiss Joseph L.. 1971. Measuring nominal scale agreement among many rater. Psychological Bulletin 76 (1971), 378382.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Goodyear Michael. 2020. Fake news in the time of COVID-19: Inherent powers over public health. Available at SSRN 3740639 (2020). https://ssrn.com/abstract=3740639 or http://dx.doi.org/10.2139/ssrn.3740639.Google ScholarGoogle Scholar
  11. [11] Guibon Gael, Ermakova Liana, Seffih Hosni, Firsov Anton, and Noe-Bienvenu Guillaume Le. 2019. Multilingual Fake News Detection with Satire. Retrieved October 17, 2021 from https://halshs.archives-ouvertes.fr/halshs-02391141.Google ScholarGoogle Scholar
  12. [12] Hanselowski Andreas, PVS Avinesh, Schiller Benjamin, Caspelherr Felix, Chaudhuri Debanjan, Meyer Christian M., and Gurevych Iryna. 2018. A retrospective analysis of the fake news challenge stance-detection task. In Proceedings of the 27th International Conference on Computational Linguistics. 18591874. https://www.aclweb.org/anthology/C18-1158.Google ScholarGoogle Scholar
  13. [13] Jamieson Kathleen and Cappella Joseph. 2008. Echo Chamber: Rush Limbaugh and the Conservative Media Establishment. Oxford University Press. Google ScholarGoogle Scholar
  14. [14] Lai Guokun, Xie Qizhe, Liu Hanxiao, Yang Yiming, and Hovy Eduard. 2017. RACE: Large-scale ReAding comprehension dataset from examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785794. DOI: DOI: https://doi.org/10.18653/v1/D17-1082Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Landis J. Richard and Koch Gary G.. 1977. The measurement of observer agreement for categorical data. Biometrics 33, 1 (1977), 159174.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Nickerson Raymond S.. 1998. Confirmation bias: A ubiquitous phenomenon in many guises. Review of General Psychology 2, 2 (1998), 175220. DOI: DOI: https://doi.org/10.1037/1089-2680.2.2.175Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Ofcom. 2020. Covid-19 News and Information: Consumption and Attitudes. Results from Week One of Ofcom’s Online Survey. Retrieved October 17, 2021 from https://www.ofcom.org.uk/__data/assets/pdf_file/0031/193747/covid-19-news-consumption-week-one-findings.pdf.Google ScholarGoogle Scholar
  18. [18] World Health Organization. 2020. Coronavirus Disease (COVID-19) Advice for the Public: Mythbusters. World Health Organization.Google ScholarGoogle Scholar
  19. [19] Pagliardini Matteo, Gupta Prakhar, and Jaggi Martin. 2018. Unsupervised learning of sentence embeddings using compositional n-gram features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 528540. DOI: DOI: https://doi.org/10.18653/v1/N18-1049Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Pérez-Rosas Verónica, Kleinberg Bennett, Lefevre Alexandra, and Mihalcea Rada. 2018. Automatic detection of fake news. In Proceedings of the 27th International Conference on Computational Linguistics. 33913401. https://www.aclweb.org/anthology/C18-1287.Google ScholarGoogle Scholar
  21. [21] Radford Alec, Narasimhan Karthik, Salimans Tim, and Sutskever Ilya. 2018. Improving Language Understanding by Generative Pre-Training. Retrieved October 17, 2021 from https://s3-us-west-2.amazonaws.com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.Google ScholarGoogle Scholar
  22. [22] Rajpurkar Pranav, Zhang Jian, Lopyrev Konstantin, and Liang Percy. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 23832392. DOI: DOI: https://doi.org/10.18653/v1/D16-1264Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Rubin Victoria L., Chen Yimin, and Conroy Nadia K.. 2015. Deception detection for news: Three types of fakes. Proceedings of the Association for Information Science and Technology 52, 1 (2015), 14. DOI: DOI: https://doi.org/10.1002/pra2.2015.145052010083Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Saikh Tanik, De Arkadipta, Ekbal Asif, and Bhattacharyya Pushpak. 2019. A deep learning approach for automatic detection of fake news. In Proceedings of the 16th International Conference on Natural Language Processing (ICON’19). 230238. https://cdn.iiit.ac.in/cdn/ltrc.iiit.ac.in/icon2019/icon2019proceedings.pdf.Google ScholarGoogle Scholar
  25. [25] Shu Kai, Sliva Amy, Wang Suhang, Tang Jiliang, and Liu Huan. 2017. Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter 19, 1 (Sept. 2017), 2236. DOI: DOI: https://doi.org/10.1145/3137597.3137600Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Shu Kai, Sliva Amy, Wang Suhang, Tang Jiliang, and Liu Huan. 2017. Fake News Detection on Social Media: A Data Mining Perspective. Retrieved October 17, 2021 from https://doi.org/10.1145/3137597.3137600Google ScholarGoogle Scholar
  27. [27] Thorne James, Vlachos Andreas, Christodoulopoulos Christos, and Mittal Arpit. 2018. FEVER: A large-scale dataset for fact extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 809819. DOI: DOI: https://doi.org/10.18653/v1/N18-1074Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, Guyon I., Luxburg U. V., Bengio S., Wallach H., Fergus R., Vishwanathan S., and Garnett R. (Eds.). Curran Associates Inc., 59986008. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.Google ScholarGoogle Scholar
  29. [29] Vosoughi Soroush, Roy Deb, and Aral Sinan. 2018. The spread of true and false news online. Science 359, 6380 (2018), 11461151. DOI: DOI: https://doi.org/10.1126/science.aap9559Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Wang Alex, Singh Amanpreet, Michael Julian, Hill Felix, Levy Omer, and Bowman Samuel R.. 2019. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=rJ4km2R5t7.Google ScholarGoogle Scholar
  31. [31] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, et al. 2016. Google’s neural Machine translation system: Bridging the gap between human and machine translation. arxiv:1609.08144 [cs.CL]Google ScholarGoogle Scholar
  32. [32] Zellers Rowan, Holtzman Ari, Rashkin Hannah, Bisk Yonatan, Farhadi Ali, Roesner Franziska, and Choi Yejin. 2019. Defending against neural fake news. In Advances in Neural Information Processing Systems 32.Google ScholarGoogle Scholar

Index Terms

  1. A Transformer-Based Approach to Multilingual Fake News Detection in Low-Resource Languages

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 1
        January 2022
        442 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3494068
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 November 2021
        • Revised: 1 June 2021
        • Accepted: 1 June 2021
        • Received: 1 September 2020
        Published in tallip Volume 21, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!