skip to main content
research-article

Survey of Authorship Identification Tasks on Arabic Texts

Published:12 April 2023Publication History
Skip Abstract Section

Abstract

Authorship identification is the process of extracting and analysing the writing styles of authors to identify the authorship. From the writing style, the author and his/her different characteristics can be recognised, which is very useful in digital forensics and cyber investigations. In the literature, authorship identification tasks were addressed on both long and short documents and performed on different languages, such as English, Arabic, Chinese, and Greek. This survey has reviewed the authorship identification tasks for the Arabic language to contribute to this area of research by exploring Arabic language performance and challenges. A total of 27 prominent Arabic studies of each authorship identification domain were reviewed considering the used data, selected features, utilised methods, and results. After a review of the various studies, it was concluded that the results of authorship identification tasks vary based on mostly the selected features and used dataset. Furthermore, the effective features differ from one dataset to another based on the various types of the Arabic language. However, all authorship identification tasks involving the Arabic language face considerable challenges with data pre-processing due to the challenging Arabic concatenative morphology.

REFERENCES

  1. Abbasi A. and Chen H.. 2005. Applying authorship analysis to extremist-group Web forum messages. IEEE Intelligent Systems 20, 5 (2005), 6775. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abdelhalim A. and Alansary S.. 2018. Modern standard Arabic grammar automatic extraction from Penn 1 Arabic Treebank using natural language toolkit. The Egyptian Journal of Language Engineering 5, 1 (2018), 110. Google ScholarGoogle ScholarCross RefCross Ref
  3. Abdul-Mageed M., Zhang C., Hashemi A., and Nagoudi E. M. B.. 2019. AraNet: A deep learning toolkit for Arabic social media. Proceedings, 4th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT). Retrieved from http://arxiv.org/abs/1912.13072.Google ScholarGoogle Scholar
  4. Abdelali A., Darwish K., Durrani N., and Mubarak H.. 2016. Farasa: A fast and furious segmenter for Arabic. Proceedings, 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 1116.Google ScholarGoogle ScholarCross RefCross Ref
  5. Abuhaiba I. S. I. and Eltibi M. F.. 2016. Author attribution of Arabic texts using extended probabilistic context-free grammar language model. International Journal of Intelligent Systems and Applications 8, 6 (2016), 2739. Google ScholarGoogle ScholarCross RefCross Ref
  6. Adil AbdulRazzaq A. and Kamil Mustafa T.. 2014. Burrows-Delta method fitness for Arabic text authorship stylometric detection. International Journal of Computer Science and Mobile Computing 36, 6 (2014), 6978. http://www.ijcsmc.com/docs/papers/June2014/V3I6201419.pdf.Google ScholarGoogle Scholar
  7. Ahmed H.. 2019a. Sample size in Arabic authorship verification. Proceedings, 3rd International Conference on Natural Language and Speech Processing. 18.Google ScholarGoogle Scholar
  8. Ahmed H.. 2019b. Distance-based authorship verification across modern standard Arabic genres. Proceedings of the 3rd Workshop on Arabic Corpus Linguistics. 8996.Google ScholarGoogle Scholar
  9. Ahmed H.. 2017. Dynamic similarity threshold in authorship verification: Evidence from classical Arabic. Procedia Computer Science 117, 0 (2017), 145152. Google ScholarGoogle ScholarCross RefCross Ref
  10. Ahmed H.. 2018. The role of linguistic feature categories in authorship verification. Procedia Computer Science 142 (2018), 214221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Al-Ayyoub M., Alwajeeh A., and Hmeidi I.. 2017a. An extensive study of authorship authentication of Arabic articles. International Journal of Web Information Systems 13, 1 (2017), 85104. Google ScholarGoogle ScholarCross RefCross Ref
  12. Al-Ayyoub M., Jararweh Y., Rabab'ah A., and Aldwairi M.. 2017b. Feature extraction and selection for Arabic tweets authorship authentication. Journal of Ambient Intelligence and Humanized Computing 8, 3 (2017), 383393.Google ScholarGoogle ScholarCross RefCross Ref
  13. Al-Sarem M. and Emara A.-H.. 2019. The effect of training set size in authorship attribution: Application on short Arabic texts. International Journal of Electrical and Computer Engineering (IJECE) 9, 1 (2019), 652.Google ScholarGoogle ScholarCross RefCross Ref
  14. Al-Sarem M., Emara A. H., Cherif W., Kissi M., and Wahab A. A.. 2018. Combination of stylo-based features and frequency-based features for identifying the author of short Arabic text. In ACM International Conference Proceeding Series.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Al-Sarem M., Saeed F., Alsaeedi A., Boulila W., and Al-Hadhrami T.. 2020. Ensemble methods for instance-based Arabic language authorship attribution. IEEE Access 8 (2020), 1733117345. Google ScholarGoogle ScholarCross RefCross Ref
  16. Alanazi S.. 2015. Classical Arabic authorship attribution using simple features. Natural Language Processing and Cognitive Science (2015), 4551.Google ScholarGoogle Scholar
  17. Albadarneh J., Talafha B., Al-Ayyoub M., Zaqaibeh B., Al-Smadi M., Jararweh Y., and Benkhelifa E.. 2015. Using big data analytics for authorship authentication of Arabic tweets. Proceedings, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing, UCC 2015, 448452. Google ScholarGoogle ScholarCross RefCross Ref
  18. Altakrori M. H., Iqbal F., Fung B. C. M., Ding S. H. H., and Tubaishat A.. 2018. Arabic authorship attribution: An extensive study on Twitter posts. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 1 (2018)Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Altamimi A., Clarke N., and Furnell S.. 2019. Multi-platform authorship verification. Proceedings, Third ACM Central European Cybersecurity Conference 13 (2019). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Altheneyan A. S. and Menai M. E. B.. 2014. Naïve Bayes classifiers for authorship attribution of Arabic texts. Journal of King Saud University, Computer and Information Sciences 26, 4 (2014), 473484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Alwajeeh A., Al-Ayyoub M., and Hmeidi I.. 2014. On authorship authentication of Arabic articles. Proceedings, 2014 5th International Conference on Information and Communication Systems (ICICS'14). Google ScholarGoogle ScholarCross RefCross Ref
  22. Antoun W., Baly F., and Hajj H.. 2020. AraBERT: Transformer-based model for Arabic language understanding. arXiv preprint, arXiv:2003.00104.Google ScholarGoogle Scholar
  23. Baraka R., Salem S., Abu M., Nayef N., and Shaban W. A.. 2014. Arabic text author identification using support vector machines. Journal of Advanced Computer Science and Technology Research 4, 1 (2014), 111. http://www.sign-ific-ance.co.uk/dsr/index.php/JACSTR/article/view/852/1163.Google ScholarGoogle Scholar
  24. Benjamin V., Chung W., Abbasi A., Chuang J., Larson C. A., and Chen H.. 2013. Evaluating text visualization: An experiment in authorship analysis. IEEE ISI 2013 - 2013 IEEE International Conference on Intelligence and Security Informatics: Big Data, Emergent Threats, and Decision-Making in Security Informatics. 1620.Google ScholarGoogle ScholarCross RefCross Ref
  25. Benzebouchi N. E., Azizi N., Aldwairi M., and Farah N.. 2018. Multi-classifier system for authorship verification task using word embeddings. Proceedings, 2018 2nd International Conference on Natural Language and Speech Processing (ICNLSP). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  26. Boenninghoff B., Hessler S., Kolossa D., and Nickel R. M.. 2019. Explainable authorship verification in social media via attention-based similarity learning. IEEE International Conference on Big Data (Big Data). Retrieved from http://arxiv.org/abs/1910.08144.Google ScholarGoogle Scholar
  27. Burrows J.. 2002. ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing 17, 3 (2002), 267287.Google ScholarGoogle ScholarCross RefCross Ref
  28. Castro D. C., Arcia Y. A., Brioso M. P., and Guillena R. M.. 2015. Authorship verification, average similarity analysis. International Conference on Recent Advances in Natural Language Processing (RANLP) (2015). 8490.Google ScholarGoogle Scholar
  29. Davis S. and Tsujimura N.. 2018. Arabic nonconcatenative morphology in construction morphology. In The Construction of Words. G. Booij (ed.). Studies in Morphology, 4. Springer, Cham. 315339.Google ScholarGoogle Scholar
  30. El-Said M. B. and Elgibali A.. 1996. Understanding Arabic: Essays in Contemporary Arabic Linguistics in Honor of El-Said Badawi. American University in Cairo Press.Google ScholarGoogle Scholar
  31. Howedi F. and Mohd M.. 2014. Text classification for authorship attribution using Naïve Bayes classifier with limited training data. Computer Engineering and Intelligent Systems 5, 4 (2014), 4856. Retrieved from http://iiste.org/Journals/index.php/CEIS/article/view/12132.Google ScholarGoogle Scholar
  32. Iqbal F., Binsalleeh H., Fung B. C. M., and Debbabi M.. 2010. Mining writeprints from anonymous e-mails for forensic investigation. Digital Investigation 7, 1–2, (2010), 5664. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Iqbal F., Binsalleeh H., Fung B. C. M., and Debbabi M.. 2013. A unified data mining solution for authorship analysis in anonymous textual communications. Information Sciences 231 (2013), 98112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Karypis G.. 2002. CLUTO: A Clustering Toolkit.Google ScholarGoogle Scholar
  35. Kešelj V., Peng F., Cercone N., and Thomas C.. 2003. N-gram-based author profiles for authorship attribution. In Proceedings of the Pacific Conference, Association for Computational Linguistics (PACLING) 3 (2003), 255264.Google ScholarGoogle Scholar
  36. Khalil H. M., Ahmed T., and El-Shistawy T. A. R. E. K.. 2020. Authorship authentication of political Arabic articles based on modified TF-IGF algorithm. Journal of Theoretical and Applied Information Technology 98, 17 (2020).Google ScholarGoogle Scholar
  37. Kumar S.. 2012. Assessment on stylometry for multilingual manuscript. IOSR Journal of Engineering 2, 9 (2012), 16.Google ScholarGoogle Scholar
  38. Larkey L. S. and Connel M. E.. 2001. Arabic information retrieval at UMass in TREC-10. Tenth Text Retrieval Conference (Lm).Google ScholarGoogle Scholar
  39. López-Escobedo F. and Sierra G.. 2013. Analysis of stylometric variables in long and short texts. Procedia - Social and Behavioral Sciences 95 (2013), 604611. Google ScholarGoogle ScholarCross RefCross Ref
  40. McCarthy J.. 1981. A prosodic theory of nonconcatenative morphology. Linguistic Inquiry 12, 3 (1981), 373418.Google ScholarGoogle Scholar
  41. Mosteller F. and Wallace D. L.. 1963. Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed federalist papers. Journal of the American Statistical Association 58, (302) 275309.Google ScholarGoogle Scholar
  42. Nasser Alsager H.. 2021. Towards a stylometric authorship recognition model for the social media texts in Arabic. Arab World English Journal (AWEJ), 11.Google ScholarGoogle Scholar
  43. Obeid O., Zalmout N., Khalifa S., Taji D., Oudah M., Alhafni B., Inoue G., Eryani F., Erdmann A., and Habash N.. 2020. CAMeL tools: An open-source Python toolkit for Arabic natural language processing. In Proceedings, 12th Language Resources and Evaluation Conference. 70227032.Google ScholarGoogle Scholar
  44. Omar A., Ibrahim Elghayesh B., and Ali Mohamed Kassem M.. 2019. Authorship attribution revisited: The problem of flash fiction. Arab World English Journal 10, 3 (2019), 318329.Google ScholarGoogle ScholarCross RefCross Ref
  45. Omar A. and Hamouda W. I.. 2020. The effectiveness of stemming in the stylometric authorship attribution in Arabic. International Journal of Advanced Computer Science and Applications 11, 1 (2020) 116121.Google ScholarGoogle ScholarCross RefCross Ref
  46. Otoom A. F., Abdullah E. E., Jaafer S., Hamdallh A., and Amer D.. 2014. Towards author identification of Arabic text articles. Proceedings, 2014 5th International Conference on Information and Communication Systems (ICICS'14). 58. Google ScholarGoogle ScholarCross RefCross Ref
  47. Ouamour S., Khennouf S., Bourib S., Hadjadj H., and Sayoud H.. 2016. Effect of the text size on stylometry—application on Arabic religious texts. Advances in Intelligent Systems and Computing 453 (2016), 215228. Google ScholarGoogle ScholarCross RefCross Ref
  48. Ouamour S. and Sayoud H.. 2013. Authorship attribution of ancient texts written by ten Arabic travelers using character n-grams. In 2013 International Conference on Computer, Information and Telecommunication Systems (CITS). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  49. Ouamour S. and Sayoud H.. 2012. Authorship attribution of ancient texts written by ten Arabic travelers using a SMO-SVM classifier. In 2012 International Conference on Communications and Information Technology (ICCIT). IEEE, 4447.Google ScholarGoogle ScholarCross RefCross Ref
  50. Pasha A., Al-Badrashiny M., Diab M., El Kholy A., Eskander R., Habash N., and Roth R. M.. 2014. MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. Proceedings, 9th International Conference on Language Resources and Evaluation (LREC'14). 10941101.Google ScholarGoogle Scholar
  51. Pillay S. R. and Solorio T.. 2010. Authorship attribution of web forum posts. General Members Meeting and eCrime Researchers Summit, eCrime 2010, 17.Google ScholarGoogle ScholarCross RefCross Ref
  52. Potha N. and Stamatatos E.. 2018. Intrinsic author verification using topic modeling. Proceedings of the 10th Hellenic Conference on Artificial Intelligence. 20.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Rabab'ah A., Al-Ayyoub M., Jararweh Y., and Aldwairi M.. 2016. Authorship attribution of Arabic tweets. Proceedings, 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA). 16.Google ScholarGoogle ScholarCross RefCross Ref
  54. Shaalan K. and Raza H.. 2009. NERA: Named entity recognition for Arabic. Journal of the American Society for Information Science and Technology 60, 8 (2009), 16521663. Google ScholarGoogle ScholarCross RefCross Ref
  55. Shaker K. and Corne D.. 2010. Authorship attribution in Arabic using a hybrid of evolutionary search and linear discriminant analysis. 2010 UK Workshop on Computational Intelligence (UKCI'10). 1217. Google ScholarGoogle ScholarCross RefCross Ref
  56. Shaker K., Corne D., and Everson R.. 2007. Investigating hybrids of evolutionary search and linear discriminant analysis for authorship attribution. Proceedings, 2007 IEEE Congress on Evolutionary Computation (2007), 20712077.Google ScholarGoogle ScholarCross RefCross Ref
  57. Srinivasan L. and Nalini C.. 2019. An improved framework for authorship identification in online messages. Cluster Computing 22, s5 (2019), 1210112110. Google ScholarGoogle ScholarCross RefCross Ref
  58. Stamatatos E.. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538556.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Surette R.. 2018. Current Issues in Criminal Justice Performance Crime and Justice. 5329 2015. Google ScholarGoogle ScholarCross RefCross Ref
  60. Talafha B., Ali M., Za'ter M. E., Seelawi H., Tuffaha I., Samir M., Farhan W., and Al-Natsheh H. T.. 2020. Multi-dialect Arabic BERT for country-level dialect identification. arXiv preprint arXiv:2007.05612.Google ScholarGoogle Scholar
  61. Zaghouani W. and Charfi A.. 2018. Arap-Tweet: A large multi-dialect Twitter corpus for gender, age, and language variety identification. arXiv preprint, arXiv:1808.07674.Google ScholarGoogle Scholar
  62. Zhang C. and Abdul-Mageed M.. 2019. BERT-based Arabic social media author profiling. CEUR Workshop Proceedings 2517, 1 (2019), 8491.Google ScholarGoogle Scholar
  63. Zheng R., Qin Y., Huang Z., and Chen H.. 2003. Authorship analysis in cybercrime investigation. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2665 (2003), 5973. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Survey of Authorship Identification Tasks on Arabic Texts

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 4
        April 2023
        682 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3588902
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 April 2023
        • Online AM: 22 September 2022
        • Accepted: 8 September 2022
        • Revised: 26 June 2022
        • Received: 5 January 2021
        Published in tallip Volume 22, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
      • Article Metrics

        • Downloads (Last 12 months)304
        • Downloads (Last 6 weeks)12

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!