skip to main content
research-article

An Intelligent Approach Based on Cleaning up of Inutile Contents for Extremism Detection and Classification in Social Networks

Published:09 May 2023Publication History
Skip Abstract Section

Abstract

Extremism is a growing threat worldwide that presents a significant danger to public safety and national security. Social networks provide extremists with spaces to spread their ideas through commentaries or tweets, often in Asian English. In this paper, we propose an intelligent approach that cleans the text’s content, analyzes its sentiment, and extracts its features after converting it to digital data for machine learning treatments. We apply 16 intelligent machine learning classifiers for extremism detection and classification. The proposed artificial intelligence methods for Asian English language data are used to extract the essential features from the text. Our evaluation of the proposed model with an extremism dataset proves its effectiveness compared to the standard classification models based on various performance metrics. The proposed model achieves 93,6% accuracy for extremism detection and 97,0% for extremism classification.

REFERENCES

  1. [1] 5725-1 BS ISO. 1994. Accuracy (Trueness and precision) of measurement methods and results - Part 1: General principles and definitions. 1.Google ScholarGoogle Scholar
  2. [2] Rajaraman A. and Ullman J. D.. 2011. Data Mining. Mining of Massive Datasets. 1–17. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Abd-Elaal Ahmed I. A., Badr Ahmed Z., and Mahdi Hani M. K.. 2020. Detecting Violent Radical Accounts on Twitter, Vol. 11. 1 pages.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Agarwal S. and Sureka A.. 2015. Using a KNN and SVM-based One-class Classifier to Detect Online Radicalization on Twitter. In International Conference on Distributed Computing and Internet Technology. 431–442.Google ScholarGoogle Scholar
  5. [5] Alsbouí T., Hammoudeh Mohammad, Bandar Zuhair, and Nisbet Andy. 2011. An overview and classification of approaches to information extraction in wireless sensor networks.Google ScholarGoogle Scholar
  6. [6] Anwas E., Sugiarti Yuni, Permatasari Anggraeni, Warsihna Jaka, Anas Zulfikri, Alhapip Leli, Siswanto Heni, and Rivalina Rahmi. 2020. Social media usage for enhancing English language skill. (2020).Google ScholarGoogle Scholar
  7. [7] Ashcroft Michael, Fisher Ali, Kaati Lisa, Omer Enghin, and Prucha Nico. 2015. Detecting jihadist messages on Twitter. In 2015 European Intelligence and Security Informatics Conference. 161164. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Ball Nicholas M. and Brunner Robert J.. 2010. Data Mining and Machine Learning in Astronomy, Vol. 19. arXiv:0906.2173v2 [astro-ph.IM]. 1049–1106. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Bisong Ekaba. 2019. Introduction to Scikit-learn. Springer, 215229. DOI: DOI: https://doi.org/978-1-4842-4470-8_18Google ScholarGoogle Scholar
  10. [10] Bolton Kingsley and Bacon-Shone John. 2020. The statistics of English across Asia. The Handbook of Asian Englishes. 4980.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Calisir Emre and Brambilla Marco. 2018. The problem of data cleaning for knowledge extraction from social media. Springer. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Cheong F. and Cheong C.. 2011. Social media data mining- a social network analysis of tweets during the 2010-2011 Australian floods. Pacific Asia Conference on Information Systems (PACIS) (July 2011), 59.Google ScholarGoogle Scholar
  13. [13] Daelemans Tom De Smedt and Walter. 2012. “Terribly beautiful!” (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives. In LREC. 35683572.Google ScholarGoogle Scholar
  14. [14] Denoeux Guilain and Carter Lynn. 2011. Guide to the Drivers of Violent Extremism. United States Agency for International Development.Google ScholarGoogle Scholar
  15. [15] Rozza Adam Arvidsson, Elanor Colleoni, and Alessandro. 2014. Echo Chamber or Public Sphere? Predicting Political Orientation and Measuring Political Homophily in Twitter using Big Data. Number 64, 317–332.Google ScholarGoogle Scholar
  16. [16] Gipp Joeran Beel, Bela, Breitinger Stefan Langer, and Corinna. 2016. Research-paper Recommender Systems: A Literature Survey, Vol. 17. International Journal on Digital Libraries. 305–338. DOI: DOI: https://doi.org/s00799-015-0156-0Google ScholarGoogle Scholar
  17. [17] Hammoudeh Mohammad, Newman Robert, Dennett Christopher, Mount Sarah, and Aldabbas Omar. 2015. Map as a service: A framework for visualising and maximising information return from multi-modal wireless sensor networks. Sensors 15, 9 (2015), 2297023003.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Hu Guangwei and McKay Sandra Lee. 2012. English language education in East Asia: Some recent developments. Journal of Multilingual and Multicultural Development 33, 4 (2012), 345362. DOI: arXiv:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Kayes A. S. M., Kalaria Rudri, Sarker Iqbal H., Islam Md., Watters Paul A., Ng Alex, Hammoudeh Mohammad, Badsha Shahriar, Kumara Indika, et al. 2020. A survey of context-aware access control mechanisms for cloud and fog networks: Taxonomy and open research issues. Sensors 20, 9 (2020), 2464.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Lee Ga Young, Alzamil Lubna, Doskenov Bakhtiyar, and Termehchy Arash. 2021. A survey on data cleaning methods for improved machine learning model performance. arXiv. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Lehmann Erich L. and Casella George. 2006. Theory of Point Estimation. Springer Science & Business Media.Google ScholarGoogle Scholar
  22. [22] Lenihan E.. 2022. A Classification of Antifa Twitter Accounts based on Social Network Mapping and Linguistic Analysis, Vol. 12. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Li Quanzhi, Shah Sameena, Fang Rui, Nourbakhsh Armineh, and Liu Xiaomo. 2016. Tweet sentiment analysis by incorporating sentiment-specific word embedding and weighted text features. In 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI’16). 568571. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Benigni M. C., Joseph K., and Carley K. M.. 2017. Online extremism and the communities that sustain it: Detecting the ISIS supporting community on Twitter. 12, 12. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Muftah Muneera. 2022. Impact of social media on learning English language during the COVID-19 pandemic. PSU Research Review (2022).Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Mussiraliyeva Shynar, Bolatbek Milana, Omarov Batyrkhan, Medetbek Zhanar, Baispay Gulshat, and Ospanov Ruslan. 2020. On detecting online radicalization and extremism using natural language processing. In 2020 21st International Arab Conference on Information Technology (ACIT’20). IEEE, 15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Powers M. David and W.. 2011. Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. arXiv:2010.16061v1. 37–63. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Ramachandran D. and Parvathi R.. 2019. Analysis of Twitter specific preprocessing technique for tweets. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Rao Parupalli. 2019. The Role of English as a Global Language, Vol. 4. 6579.Google ScholarGoogle Scholar
  30. [30] Reddy G. Thippa, Bhattacharya Sweta, Ramakrishnan S. Siva, Chowdhary Chiranji Lal, Hakak Saqib, Kaluri Rajesh, and Reddy M. Praveen Kumar. 2020. An ensemble based machine learning model for diabetic retinopathy classification. In 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE’20). 16. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Rehman Zia Ul, Abbas Sagheer, Khan Muhammad Adnan, Mustafa Ghulam, Fayyaz Hira, Hanif Muhammad, and Saeed Muhammad Anwar. 2021. Understanding the language of ISIS: An empirical approach to detect radical content on Twitter using machine learning. 2, 66, 10751090. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Compton Ryan, Lee Craig, Xu Jiejun, and Macy Michael. September 2014. Using publicly visible social media to build detailed forecasts of civil unrest (September 2014). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Agarwal S. and Sureka A.. 2015. Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats.Google ScholarGoogle Scholar
  34. [34] Saif Hassan, Dickinson Thomas, Kastler Leon, Fernandez Miriam, and Alani Harith. 2017. A semantic graph-based approach for radicalisation detection on social media. In The Semantic Web, Blomqvist Eva, Maynard Diana, Gangemi Aldo, Hoekstra Rinke, Hitzler Pascal, and Hartig Olaf (Eds.). Springer International Publishing, Cham, 571587.Google ScholarGoogle Scholar
  35. [35] Sarker Aditi, Chakraborty Partha, Sha S. M. Shaheen, and Banerjee Kawshik. 2020. Improvised Technique for Analyzing Data and Detecting Terrorist Attack Using Machine Learning Approach Based on Twitter Data. Journal of Computer and Communications, Vol. 7 8, 50–62. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Scikit-learn. 2022. https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html.Google ScholarGoogle Scholar
  37. [37] Sharif Omar, Hoque Mohammed Moshiul, Kayes A. S. M., Nowrozy Raza, and Sarker Iqbal H.. 2020. Detecting suspicious texts using machine learning techniques. 18, 10, 10751090.Google ScholarGoogle Scholar
  38. [38] Trip Simona, Bora Carmen Hortensia, Marian Mihai, Halmajan Angelica, and Drugas Marius Ioan. 2019. Psychological mechanisms involved in radicalization and extremism. A rational emotive behavioral conceptualization. Frontiers in Psychology. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Unal Devrim, Hammoudeh Mohammad, and Kiraz Mehmet Sabir. 2020. Policy specification and verification for blockchain and smart contracts in 5G networks. ICT Express 6, 1 (2020), 4347.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Vidgen Bertie, Botelho Austin, Broniatowski David, Guest Ella, Hall Matthew, Margetts Helen, Tromble Rebekah, Waseem Zeerak, and Hale Scott. 2020. Detecting East Asian Prejudice on Social Media. (2020). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Farha Ibrahim Abu W. and Magdy. 2019. Mazajak: An online Arabic sentiment analyser. Proceedings of the Fourth Arabic Natural Language Processing Workshop. 192198.Google ScholarGoogle Scholar
  42. [42] Wang Zhou and Bovik Alan C.. 2009. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine 26, 1 (2009), 98117.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Intelligent Approach Based on Cleaning up of Inutile Contents for Extremism Detection and Classification in Social Networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 5
      May 2023
      653 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3596451
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 May 2023
      • Online AM: 19 January 2023
      • Accepted: 25 September 2022
      • Revised: 9 September 2022
      • Received: 1 May 2022
      Published in tallip Volume 22, Issue 5

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)136
      • Downloads (Last 6 weeks)15

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!