skip to main content
research-article

Combining a Novel Scoring Approach with Arabic Stemming Techniques for Arabic Chatbots Conversation Engine

Authors Info & Claims
Published:20 January 2022Publication History
Skip Abstract Section

Abstract

Arabic is recognized as one of the main languages around the world. Many attempts and efforts have been done to provide computing solutions to support the language. Developing Arabic chatbots is still an evolving research field and requires extra efforts due to the nature of the language. One of the common tasks of any natural language processing application is the stemming step. It is important for developing chatbots, since it helps with pre-processing the input data and it can be involved with different phases of the chatbot development process. The aim of this article is to combine a scoring approach with Arabic stemming techniques for developing an Arabic chatbot conversation engine. Two experiments are conducted to evaluate the proposed solution. The first experiment is to select which stemmer is more accurate when applying our solution, since our algorithm can support various stemmers. The second experiment was conducted to evaluate our proposed approach against various machine learning models. The results show that the ISRIS stemming algorithm is the best fit for our solution with accuracy 78.06%. The results also indicate that our novel solution achieved an F1 score of 65.5%, while the other machine learning models achieved slightly lower scores. Our study presents a novel technique by combining scoring mechanisms with stemming processes to produce the best answer for every query sent by chatbots users compared to other approaches. This can be helpful for developing Arabic chatbot and can support many domains such as education, business, and health. This technique is among the first techniques that developed purposefully to serve the development of Arabic chatbots conversation engine.

REFERENCES

  1. [1] Al-Emran Mostafa, Zaza Sarween, and Shaalan Khaled. 2015. Parsing modern standard Arabic using Treebank resources. In International Conference on Information and Communication Technology Research (ICTRC). IEEE, 8083.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] AL-Hagbani Eman Saad and Khan Mohammad Badruddin. 2018. Support of existing chatbot development framework for Arabic language: A brief survey. In 5th International Symposium on Data Mining Applications. Springer, 2635.Google ScholarGoogle Scholar
  3. [3] Al-Kabi Mohammed N., Kazakzeh Saif A., Ata Belal M. Abu, Al-Rababah Saif A., and Alsmadi Izzat M.. 2015. A novel root based Arabic stemmer. J. King Saud Univ.-Comput. Inf. Sci. 27, 2 (2015), 94103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Al-Madi Nagham A., Maria Khulood Abu, Al-Madi Mohammad Azmi, Alia Mohammad Ahmad, and Maria Eman Abu. 2021. An intelligent Arabic chatbot system proposed framework. In International Conference on Information Technology (ICIT). IEEE, 592597.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Al-Smadi Mohammad, Qawasmeh Omar, Talafha Bashar, and Quwaider Muhannad. 2015. Human annotated Arabic dataset of book reviews for aspect based sentiment analysis. In 3rd International Conference on Future Internet of Things and Cloud. IEEE, 726730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Alhaj Yousif A., Xiang Jianwen, Zhao Dongdong, Al-Qaness Mohammed A. A., Elaziz Mohamed Abd, and Dahou Abdelghani. 2019. A study of the effects of stemming strategies on Arabic document classification. IEEE Access 7 (2019), 3266432671.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Alhazmi Marwa and Salim Naomie. 2015. Arabic opinion target extraction from tweets. ARPN J. Eng. Appl. Sci. 10, 3 (2015), 10231026.Google ScholarGoogle Scholar
  8. [8] AlHumoud Sarah, Wazrah Asma Al, and Aldamegh Wafa. 2018. Arabic chatbots: A survey. Int. J. Adv. Comput. Sci. Applic. 9, 8 (2018), 535541.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Ali Dana Abu and Habash Nizar. 2016. Botta: An Arabic dialect chatbot. In 26th International Conference on Computational Linguistics: System Demonstrations. 208212.Google ScholarGoogle Scholar
  10. [10] Aljameel Sumayh S., O’Shea James D., Crockett Keeley A., Latham Annabel, and Kaleem Mohammad. 2017. Development of an Arabic conversational intelligent tutoring system for education of children with ASD. In IEEE International Conference on Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA). IEEE, 2429.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Almusaddar Mohammed Yahya. 2014. Improving Arabic Light Stemming in Information Retrieval Systems. Islamic University in Gaza. https://iugspace.iugaza.edu.ps/bitstream/handle/20.500.12358/18964/file_1.pdf?sequence=1&isAllowed=y.Google ScholarGoogle Scholar
  12. [12] Altabba Muhammad. 2017. Qutuf: An Arabic Morphological Analyzer. Retrieved from https://github.com/Qutuf/qutuf.Google ScholarGoogle Scholar
  13. [13] Bird Steven, Klein Ewan, and Loper Edward. 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Bottou Léon and LeCun Yann. 2004. Large scale online learning. In Advances in Neural Information Processing Systems 16 (NIPS 2003), Thrun Sebastian, Saul Lawrence, and Schölkopf Bernhard (Eds.). The MIT Press, Cambridge, MA. Retrieved from http://leon.bottou.org/papers/bottou-lecun-2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Boudad Naaima, Faizi Rdouan, Thami Rachid Oulad Haj, and Chiheb Raddouane. 2018. Sentiment analysis in Arabic: A review of the literature. Ain Shams Eng. J. 9, 4 (2018), 24792490.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Boudchiche Mohamed and Mazroui Azzeddine. 2020. Spline functions for Arabic morphological disambiguation. Appl. Comput. Inform. 16, 1 (2020).Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Boudchiche Mohamed, Mazroui Azzeddine, Bebah Mohamed Ould Abdallahi Ould, Lakhouaja Abdelhak, and Boudlal Abderrahim. 2017. AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. J. King Saud Univ.-Comput. Inf. Sci. 29, 2 (2017), 141146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Boudlal Abderrahim, Lakhouaja Abdelhak, Mazroui Azzeddine, Meziane Abdelouafi, Bebah M. O. A. O., and Shoul Mostafa. 2010. Alkhalil Morpho Sys1: A morphosyntactic analysis system for Arabic texts. In International Arab Conference on Information Technology. Elsevier Science Inc New York, NY, 16.Google ScholarGoogle Scholar
  19. [19] Breiman Leo. 2001. Random forests. Mach. Learn. 45, 1 (2001), 532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Breiman Leo, Friedman Jerome, Stone Charles J., and Olshen Richard A.. 1984. Classification and Regression Trees. CRC Press.Google ScholarGoogle Scholar
  21. [21] Buckwalter Tim. 2004. Issues in Arabic orthography and morphology analysis. In Workshop on Computational Approaches to Arabic Script-based Languages. 3134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Chakrabarti Chayan and Luger George F.. 2012. A semantic architecture for artificial conversations. In 6th International Conference on Soft Computing and Intelligent Systems and the 13th International Symposium on Advanced Intelligence Systems. IEEE, 2126.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Dahab Mohamed Y., Ibrahim Al, and Al-Mutawa Rihab. 2015. A comparative study on Arabic stemmers. Int. J. Comput. Applic. 125, 8 (2015).Google ScholarGoogle Scholar
  24. [24] Darwish Kareem. 2002. Building a shallow Arabic morphological analyser in one day. In ACL-02 Workshop on Computational Approaches to Semitic Languages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Darwish Kareem and Oard Douglas W.. 2003. CLIR Experiments at Maryland for TREC-2002: Evidence Combination for Arabic-English Retrieval. Technical Report. Maryland University College Park Institute for Advanced Computer Studies.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] El-Defrawy Mahmoud, El-Sonbaty Yasser, and Belal Nahla A.. 2015. CBAS: Context based Arabic stemmer. arXiv preprint arXiv:1611.00027 (2015).Google ScholarGoogle Scholar
  27. [27] Eldesouki Mohamed. 2017. Arabic Processing Cog. Retrieved from https://github.com/disooqi/ArabicProcessingCog.Google ScholarGoogle Scholar
  28. [28] Farghaly Ali and Shaalan Khaled. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8, 4 (2009), 122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Friedman Jerome, Hastie Trevor, and Tibshirani Robert. 2001. The Elements of Statistical Learning. Vol. 1. Springer Series in Statistics, New York.Google ScholarGoogle Scholar
  30. [30] Ghwanmeh Sameh, Kanaan Ghassan, Al-Shalabi Riyad, and Rabab’ah Saif. 2009. Enhanced algorithm for extracting the root of Arabic words. In 6th International Conference on Computer Graphics, Imaging and Visualization. IEEE, 388391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Glorot Xavier and Bengio Yoshua. 2010. Understanding the difficulty of training deep feedforward neural networks. In 13th International Conference on Artificial Intelligence and Statistics. 249256.Google ScholarGoogle Scholar
  32. [32] Guellil Imane, Saâdane Houda, Azouaou Faical, Gueni Billel, and Nouvel Damien. 2019. Arabic natural language processing: An overview. J. King Saud Univ.-Comput. Inf. Sci. 33, 5 (2019).Google ScholarGoogle Scholar
  33. [33] Hijjawi Mohammad, Bandar Zuhair, Crockett Keeley, and Mclean David. 2014. ArabChat: An Arabic conversational agent. In 6th International Conference on Computer Science and Information Technology (CSIT). IEEE, 227237.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Khoja Shereen and Garside Roger. 1999. Stemming Arabic text. Lancaster, UK, Computing Department, Lancaster University.Google ScholarGoogle Scholar
  35. [35] Klüwer Tina. 2011. From chatbots to dialog systems. In Conversational Agents and Natural Language Interaction: Techniques and Effective Practices. IGI Global, 122.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Kreaa Abdel Hamid, Ahmad Ahmad S., and Kabalan Kassem. 2014. Arabic words stemming approach using Arabic WordNet. Int. J. Data Mining Knowl. Manag. Process 4, 6 (2014), 1.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Larkey Leah S., Ballesteros Lisa, and Connell Margaret E.. 2007. Light stemming for Arabic information retrieval. In Arabic Computational Morphology. Springer, 221243.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] LeCun Yann A., Bottou Léon, Orr Genevieve B., and Müller Klaus-Robert. 2012. Efficient backprop. In Neural Networks: Tricks of the Trade. Springer, 948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Makatchev Maxim, Fanaswala Imran, Abdulsalam Ameer, Browning Brett, Ghazzawi Wael, Sakr Majd, and Simmons Reid. 2010. Dialogue patterns of an Arabic robot receptionist. In 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 167168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Masche Julia and Le Nguyen-Thinh. 2017. A review of technologies for conversational systems. In International Conference on Computer Science, Applied Mathematics and Applications. Springer, 212225.Google ScholarGoogle Scholar
  41. [41] Mustafa Mohammad, Eldeen Afag Salah, Bani-Ahmad Sulieman, and Elfaki Abdelrahman Osman. 2017. A comparative survey on Arabic stemming: Approaches and challenges. Intell. Inf. Manag. 9, 02 (2017), 39.Google ScholarGoogle Scholar
  42. [42] Team The Pandas Development. 2020. pandas-dev/pandas: Pandas. DOI: https://doi.org/10.5281/zenodo.3509134Google ScholarGoogle Scholar
  43. [43] Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., and Duchesnay E.. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 28252830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Rasmussen Carl Edward. 2003. Gaussian processes in machine learning. In Summer School on Machine Learning. Springer, 6371.Google ScholarGoogle Scholar
  45. [45] Roy Quentin, Zhang Futian, and Vogel Daniel. 2019. Automation accuracy is good, but high controllability may be better. In CHI Conference on Human Factors in Computing Systems. 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Sarhan Mouaffak A.. 2015. Lucene Arabic Analyzer. Retrieved from https://github.com/msarhan/lucene-arabic-analyzer.Google ScholarGoogle Scholar
  47. [47] Shawar Abu and Atwell E. S.. 2004. An Arabic chatbot giving answers from the Qur’an. In Conference sur le Traitement Automatique des Langues Naturelles, Vol. 2. ATALA, 197202.Google ScholarGoogle Scholar
  48. [48] Taghva Kazem, Elkhoury Rania, and Coombs Jeffrey. 2005. Arabic stemming without a root dictionary. In International Conference on Information Technology: Coding and Computing (ITCC’05). IEEE, 152157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Troyanskaya Olga, Cantor Michael, Sherlock Gavin, Brown Pat, Hastie Trevor, Tibshirani Robert, Botstein David, and Altman Russ B.. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 6 (2001), 520525.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Wu Ting-Fan, Lin Chih-Jen, and Weng Ruby C.. 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5, Aug. (2004), 9751005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Wu Yu, Li Zhoujun, Wu Wei, and Zhou Ming. 2018. Response selection with topic clues for retrieval-based chatbots. Neurocomputing 316 (2018), 251261.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Yahya Mohammed. 2015. Arabic Stemming Toolkit. Retrieved from https://github.com/mhmdio/Arabic-Stemming-Toolkit.Google ScholarGoogle Scholar
  53. [53] Zerrouki Taha. 2017. Tashaphyne. Retrieved from https://github.com/linuxscout/tashaphyne.Google ScholarGoogle Scholar
  54. [54] Zhang Harry. 2005. Exploring conditions for the optimality of naive Bayes. Int. J. Pattern Recog. Artif. Intell. 19, 02 (2005), 183198.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Combining a Novel Scoring Approach with Arabic Stemming Techniques for Arabic Chatbots Conversation Engine

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 4
      July 2022
      464 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3511099
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 January 2022
      • Accepted: 1 December 2021
      • Revised: 1 August 2021
      • Received: 1 March 2021
      Published in tallip Volume 21, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed
    • Article Metrics

      • Downloads (Last 12 months)191
      • Downloads (Last 6 weeks)8

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!