skip to main content
research-article

Real-time Assistive Reader Pen for Arabic Language

Published:31 March 2021Publication History
Skip Abstract Section

Abstract

Disability is an impairment affecting an individual's livelihood and independence. Assistive technology enables the disabled cohort of the community to break the barriers to learning, access information, contribute to the community, and live independently. This article proposes an assistive device to enable people with visual disabilities and learning disabilities to access printed Arabic material in real-time, and to help them participate in the education system and the professional workforce.

This proposed assistive device employs Optical Character Recognition (OCR) and Text To Speech (TTS) conversion, using concatenation synthesis. OCR is achieved using image processing, character extraction, and classification, while Arabic speech synthesis is achieved through concatenation synthesis, followed by Multi Band Re-synthesis Overlap-Add (MBROLA). Waveform generation in the second phase produces vocal output for the disabled user to hear. OCR character and word accuracy tests were conducted for nine Arabic fonts. The results show that six fonts were recognized with over 60% character accuracy and two fonts were recognized with over 88% accuracy. A Mean Opinion Score (MOS) test for speech quality was conducted. The results showed an overall MOS score of 3.53/5 and indicated that users were able to understand the speech. A real-time usability testing was conducted with 10 subjects. The results showed an overall average of agreements scores of 3.9/5 and indicated that the proposed Arabic reader pen meets the real-time constraints and is pleasant and satisfying to use and can contribute to make printed Arabic material accessible to visually impaired persons and people with learning disabilities.

References

  1. World Health Organization. 2001. International classification of functioning, disability and health ICF. World Health Organization.Google ScholarGoogle Scholar
  2. World Health Organization. 2014. Fact sheet no. 352, 2014.Google ScholarGoogle Scholar
  3. World Health Organization. 2014. Visual impairment and blindness fact sheet N 282. World Health Organization 2014.Google ScholarGoogle Scholar
  4. J. Taylor. 2018. Educating students with visual impairments for inclusion in society. Amer. Found. Blind, 2000. Retrieved from http://www.afb.org/info/teachers/inclusive-education/35.Google ScholarGoogle Scholar
  5. T. Cavanaugh. 2002. The need for assistive technology in educational technology. AACE Rev. 10, 1 (2002), 27--31.Google ScholarGoogle Scholar
  6. J. Allen. 1979. MITalk-79: The 1979 MIT text-to-speech system. J. Acoust. Soc. Amer. 65, S1 (1979).Google ScholarGoogle ScholarCross RefCross Ref
  7. N. N. Akhlagi, F. Lonn, and P. Wittrup. 2003. Reading pen. United States of America Patent 6, 509 893, 21 2003.Google ScholarGoogle Scholar
  8. K. C. Ray and A. Rawoof. 2014. ARM based implementation of text-to-speech (TTS) for real time embedded system. In International Conference on Signal and Image Processing (ICSIP’14).Google ScholarGoogle Scholar
  9. S. A. Sanaki and B. B. S. 2015. Embedded based implementation of real time text-to-speech conversion. Int. J. Res. 2, 8 (2015), 339--345.Google ScholarGoogle Scholar
  10. M. Hamad and M. Hussain. 2011. Arabic text-to-speech synthesizer. In IEEE Student Conference on Research and Development (SCOReD’11).Google ScholarGoogle Scholar
  11. P. K. Bamini. 2003. FPGA-based Implementation of Concatenative Speech Synthesis Algorithm 2003.Google ScholarGoogle Scholar
  12. H. Tora, İ. B. Uslu, and T. Karameh. 2017. Implementation of Turkish text-to-speech synthesis on a voice synthesizer card with prosodic features. Anadolu Univ. J. Sci. Technol. A- Appl. Sci. Eng. 18, 3 (2017).Google ScholarGoogle Scholar
  13. RC Systems. 2006. DoubleTalk RC8660, 23 Mar 2006. Retrieved on December 2020 from https://www.rcsys.com/Downloads/rc8660.pdf.Google ScholarGoogle Scholar
  14. A. Chabchoub and A. Cherif. 2011. High quality Arabic concatenative speech synthesis. Sig. Image Proc. Int. J. 2 (2011).Google ScholarGoogle Scholar
  15. A. W. Black. 2002. Perfect synthesis for all of the people all of the time. In IEEE Workshop on Speech Synthesis.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Bachan and M. Tokarski. 2017. Creation and evaluation of MaryTTS speech synthesis for polish. In Language and Technology Conference.Google ScholarGoogle Scholar
  17. K. P. Sarathy and A. G. Ramakrishnan. 2008. Text to speech synthesis system for mobile applications. In Workshop in Image and Signal Processing (WISP’08).Google ScholarGoogle Scholar
  18. E. Vanitha, P. K. Kasarla, and E. Kuamarswamy. 2015. Implementation of text-to-speech for real time embedded system using Raspberry Pi processor. Int. J. Mag. Eng. Technol. Manag. Res. 2, 7 (2015).Google ScholarGoogle Scholar
  19. I. Rebai and Y. BenAyed. 2016. Arabic speech synthesis and diacritic recognition. Int. J. Speech Technol. 19, 3 (2016), 485--494.Google ScholarGoogle ScholarCross RefCross Ref
  20. D. Frontini and M. Malcangi. 2006. Neural network-based speech synthesis. In DSP Application Day.Google ScholarGoogle Scholar
  21. K. Lakshmi and T. C. S. Rao. 2016. Design and implementation of text to speech conversion using Raspberry Pi. Int. J. Innov. Technol. Res. 4, 6 (2016).Google ScholarGoogle Scholar
  22. P. Fogarassy-Neszly and C. Pribeanu. 2016. Multilingual text-to-speech software component for dynamic language identification and voice switching. In International Conference on Human-computer Interaction.Google ScholarGoogle Scholar
  23. Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. Le, Y. Agiomyrgiannakis, R. Clark, and R. A. Saurous. 2017. Tacotron: Towards end-to-end speech synthesis. In Interspeech. 4006--4010.Google ScholarGoogle Scholar
  24. Yu Zhang, Ron Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, R. J. Skerry-Ryan, Ye Jia, Andrew Rosenberg, and Bhuvana Ramabhadran. 2019. Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning. In Interspeech. 2080--2084. Retrieved from 10.21437/Interspeech.2019-2668.Google ScholarGoogle Scholar
  25. B. Phil, S. Polansky, D. Repetto, M. Roberts, and D. Rockmore. 2011. Music and computers: A theoretical and historical approach. Preface to the Archival Version.Google ScholarGoogle Scholar
  26. S. Lukose and S. S. Upadhya. 2017. Text to speech synthesizer-formant synthesis. In International Conference on Nascent Technologies in Engineering (ICNTE’17).Google ScholarGoogle Scholar
  27. G. Toussaint. 1983. Solving geometric problems with the rotating calipers. In IEEE MELECON’83.Google ScholarGoogle Scholar
  28. M. I. Shamos. 1978. Computational Geometry, Yale University.Google ScholarGoogle Scholar
  29. R. Smith. 2007. An overview of the tesseract OCR engine. In 9th International Conference on Document Analysis and Recognition (ICDAR’07). 629--633. Retrieved from 10.1109/ICDAR.2007.4376991.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neur. Comput. J. 9 (1997).Google ScholarGoogle Scholar
  31. T. Zerrouki. 2014. Mishkal diacritiser. Retrieved from https://github.com/linuxscout/mishkal.Google ScholarGoogle Scholar
  32. F. A. Gers, J. Schmidhuber, and F. Cummins. 1999. Learning to forget: Continual prediction with LSTM. Neural Comput. 12 (1999).Google ScholarGoogle Scholar
  33. S. H. Al-Ani. 2014. Arabic Phonology: An Acoustical and Physiological Investigation. Walter de Gruyter.Google ScholarGoogle Scholar
  34. Faculte Polytechnique de Mons - TCTS lab. 1998. MBROLA voices project at Github. Retrieved from https://github.com/numediart/MBROLA-voices/tree/master/data/ar2.Google ScholarGoogle Scholar
  35. T. Dutoit, V. Pagel, N. Pierret, and F. Bataille. 1996. The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes. In International Conference on Spoken Language Processing (ICSLP’96).Google ScholarGoogle Scholar
  36. M. H. Hayes. 1998. Schaum's Outline of Digital Signal Processing. McGraw-Hill.Google ScholarGoogle Scholar
  37. N. Health. 2018. Raspberry Pi Zero W: The smart person's guide. TechRepublic, 2018. Retrieved from https://techrepublic.com/article/raspberry-pi-zero-wireless-the-smart-persons-guide/.Google ScholarGoogle Scholar
  38. M. Gibbs. 2018. Ten operating systems for the Raspberry Pi. Netw. World 3 Nov. (2014). Retrieved from https://networkworld.com/article/2842678/computers/ten-operating-systems-for-the-raspberry-pi.html.Google ScholarGoogle Scholar
  39. Raspi TV. 2017. How much power does pi zero w use? Retrieved from http://raspi.tv/2017/how-much-power-does-pi-zero-w-use.Google ScholarGoogle Scholar
  40. Adafriut. Adafriut POWERBOOST 500 CHARGER. Retrieved on December 2020 from https://adafruit.com/product/1944.Google ScholarGoogle Scholar
  41. F. E. A. Slimane. 2009. A new Arabic printed text image database and evaluation protocols. In 10th International Conference on Document Analysis and Recognition (ICDAR’09).Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Saber, A. Ahmed, A. Elsisi, and M. Hadhoud. 2016. Performance evaluation of Arabic optical. In International Conference on Advanced Intelligent Systems and Informatics (AISI’15).Google ScholarGoogle Scholar
  43. V. Grancharov and W. Kleijn. 2008. Speech quality assessment. In Springer Handbook of Speech Processing, Berlin, Springer, 83--100.Google ScholarGoogle Scholar
  44. W. B. Kleijn and K. K. Paliwal. 1995. Quality evaluation of synthesized speech. In Speech Coding and Synthesis, Elsevier Science Inc., 709--734.Google ScholarGoogle Scholar
  45. International Telecommunication Union. 1996. Recommendation P.800, ITU, 1996. Retrieved on December 2020 from https://www.itu.int/rec/T-REC-P.800-199608-I.Google ScholarGoogle Scholar
  46. eSpeak. 2020. eSpeak NG Text-To-Speech. GitHub, Inc. Retrieved from https://github.com/espeak-ng/espeak-ng.Google ScholarGoogle Scholar
  47. M. A. Alzubaidi and M. Otoom. 2018. Discussion-facilitator: towards enabling students with hearing disabilities to participate in classroom discussions. Int. J. Technol. Enhanc. Learn. 10, 1--2 (2018), 73--90.Google ScholarGoogle ScholarCross RefCross Ref
  48. M. Otoom and M. A. Alzubaidi. 2018. Ambient intelligence framework for real-time speech-to-sign translation. Assist. Technol. 27, 30 (2018), 119--132.Google ScholarGoogle ScholarCross RefCross Ref
  49. M. Otoom, M. A. Alzubaidi, and R. Aloufee. 2020. Novel navigation assistive device for deaf drivers. Assist. Technol. 2020 10 (2020), 1--1.Google ScholarGoogle Scholar
  50. T. Zerrouki, M. M. A. Shquier, A. Balla, N. Bousbia, I. Sakraoui, and F. Boudardara. 2019. Adapting eSpeak to Arabic language: Converting arabic text to speech language using eSpeak. Int. J. Reas.-based Intell. Syst. 11, 1 (2019), 76--89.Google ScholarGoogle Scholar
  51. Imene Zangar, Zied Mnasri, Vincent Colotte, Denis Jouvet, and Amal Houidhek. 2018. Duration modeling using DNN for Arabic speech synthesis. In 9th International Conference on Speech Prosody.Google ScholarGoogle ScholarCross RefCross Ref
  52. O. Zine and A. Meziane. 2017. Novel approach for quality enhancement of Arabic text to speech synthesis. In International Conference on Advanced Technologies for Signal and Image Processing (ATSIP’17). IEEE, 1--6.Google ScholarGoogle Scholar
  53. O. Zine, A. Meziane, and M. Boudchiche. 2017. Towards a high-quality lemma-based text to speech system for the Arabic language. In International Conference on Arabic Language Processing. Springer, Cham, 53--66.Google ScholarGoogle Scholar
  54. Amrouche Aissa, Leila Falek, and Hocine Teffahi. 2017. Design and implementation of a diacritic Arabic text-to-speech system. Int. Arab J. Inf. Technol. 14, 4 (2017).Google ScholarGoogle Scholar
  55. Abdelali Ahmed, Mohammed Attia, Younes Samih, Kareem Darwish, and Hamdy Mubarak. 2018. Diacritization of Maghrebi Arabic sub-dialects. ArXiv Preprint arXiv:1810.06619 (2018).Google ScholarGoogle Scholar
  56. S. Abed, M. Alshayeji, and S. Sultan. 2019. Diacritics effect on Arabic speech recognition. Arab. J. Sci. Eng. 44, 11 (2019), 9043--9056.Google ScholarGoogle ScholarCross RefCross Ref
  57. K. Darwish, H. Mubarak, and A. Abdelali. 2017. Arabic diacritization: Stats, rules, and hacks. In 3rd Arabic Natural Language Processing Workshop. 9--17.Google ScholarGoogle Scholar
  58. R. Abdelmalek and Z. Mnasri. 2016. High quality Arabic text-to-speech synthesis using unit selection. In 13th International Multi-conference on Systems, Signals & Devices (SSD’16). IEEE, 1--5.Google ScholarGoogle Scholar
  59. A. Alsaif, N. Albadrani, A. Alamro, and R. Alsaif. 2017. Towards intelligent Arabic text-to-speech application for disabled people. In International Conference on Informatics, Health & Technology (ICIHT’17). IEEE, 1--6.Google ScholarGoogle Scholar
  60. O. Abdo, S. M. Abdou, and M. Fashal. 2017. Building audio-visual phonetically annotated Arabic corpus for expressive text to speech. In INTERSPEECH, 3767--3771.Google ScholarGoogle Scholar
  61. I. H. Ali, Z. Mnasri, and Z. Laachri. 2019. Gemination prediction using DNN for Arabic text-to-speech synthesis. In 16th International Multi-conference on Systems, Signals & Devices (SSD’19). IEEE, 366--370.Google ScholarGoogle Scholar
  62. Z. Oumaima, M. Abdelouafi, and M. El Hadi. 2018. Text-to-speech technology for Arabic language learners. In IEEE 5th International Congress on Information Science and Technology (CiSt’18). IEEE, 366--370.Google ScholarGoogle Scholar
  63. I. Rebai and Y. BenAyed. 2016. Arabic speech synthesis and diacritic recognition. Int. J. Speech Technol. 19, 3 (2016), 485--494.Google ScholarGoogle ScholarCross RefCross Ref
  64. F. Fahmy, M. Khalil, and H. Abbas. 2020. A transfer learning end-to-end arabictext-to-speech (TTS) deep architecture. arXiv preprint arXiv:2007.11541 (2020).Google ScholarGoogle Scholar
  65. H. A. Elharati, M. Alshaari, and V. Z. Këpuska. 2020. Arabic speech recognition system based on MFCC and HMMs. J. Comput. Commun. 8, 03 (2020) 28.Google ScholarGoogle ScholarCross RefCross Ref
  66. H. Bouressace and J. Csirik. 2019. A convolutional neural network for Arabic document analysis. In IEEE International Symposium on Signal Processing and Information Technology (ISSPIT’19). IEEE, 1--6.Google ScholarGoogle Scholar
  67. M. Eltay, A. Zidouri, and I. Ahmad. 2020. Exploring deep learning approaches to recognize handwritten Arabic texts. IEEE Access 8 (2020), 89882--89898.Google ScholarGoogle ScholarCross RefCross Ref
  68. A. Arora, C. C. Chang, B. Rekabdar, B. BabaAli, D. Povey, D. Etter, D. Raj, H. Hadian, J. Trmal, P. Garcia, and S. Watanabe. 2019. Using ASR methods for OCR. In International Conference on Document Analysis and Recognition (ICDAR’19). IEEE, 663--668.Google ScholarGoogle Scholar
  69. H. Mohamad, S. A. Hashim, and A. H. Al-Saleh. 2019. Recognize printed Arabic letter using new geometrical features. Indon. J. Electr. Eng. Comput. Sci. 14, 3 (2019), 1518--1524.Google ScholarGoogle Scholar
  70. K. Mohammad, A. Qaroush, M. Ayesh, M. Washha, A. Alsadeh, and S. Agaian. 2019. Contour-based character segmentation for printed Arabic text with diacritics. J. Electron. Imag. 28, 4 (2019), 043030.Google ScholarGoogle ScholarCross RefCross Ref
  71. M. E. Mustafa and M. K. Elbashir. 2020. A deep learning approach for handwritten Arabic names recognition. Int. J. Adv. Comput. Sci. Applic. 11, 1 (2020).Google ScholarGoogle Scholar
  72. A. Qaroush, B. Jaber, K. Mohammad, M. Washaha, E. Maali, and N. Nayef. 2019. An efficient, font independent word and character segmentation algorithm for printed Arabic text. J. King Saud Univ.-Comput. Inf. Sci. DOI:https://doi.org/10.1016/j.jksuci.2019.08.013Google ScholarGoogle Scholar
  73. I. S. Al-Sheikh, M. Mohd, and L. Warlina. 2020. A review of arabic text recognition dataset. Asia-Pac. J. Inf. Technol. Multimedia 9, 1 (2020), 69--81.Google ScholarGoogle ScholarCross RefCross Ref
  74. T. Milo and A. G. Martínez. 2019. A new strategy for Arabic OCR: Archigraphemes, letter blocks, script grammar, and shape synthesis. In 3rd International Conference on Digital Access to Textual Cultural Heritage (DATeCH’19). Association for Computing Machinery, New York, NY, 93--96. 2019. DOI:https://doi.org/10.1145/3322905.3322928Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. S. M. Darwish and K. O. Elzoghaly. 2020. An enhanced offline printed Arabic OCR model based on bio-inspired fuzzy classifier. IEEE Access 8 (2020), 117770--117781.Google ScholarGoogle ScholarCross RefCross Ref
  76. M. Kadi and M. Nasri. 2019. Isolated Arabic characters recognition using a robust method against noise and scaling based on the «hough transform». Int. J. Inf. Sci. Technol. 3, 4 (2019), 34--43.Google ScholarGoogle Scholar
  77. W. N. Hussein and H. N. Hussain. 2019. A design of a hybrid algorithm for optical character recognition of online hand-written Arabic alphabets. Iraqi J. Sci. 60, 9 (2019), 2067--2079.Google ScholarGoogle ScholarCross RefCross Ref
  78. M. W. Ok and K. Rao. 2017. Using a digital pen to support secondary students with learning disabilities. Interv. School Clin. 53, 1 (2017), 36--43.Google ScholarGoogle ScholarCross RefCross Ref
  79. Wizcomtech. 2020. The freedom to read. Retrieved from https://www.wizcomtech.com.Google ScholarGoogle Scholar
  80. C-Pen. 2020. The original pen scanner brand. Retrieved from https://cpen.com/.Google ScholarGoogle Scholar
  81. IRISPen. 2020. The digital highlighter that types what you scan! Retrieved from https://www.irislink.com/EN-JO/c1708/IRISPen-Air-7—Portable-Digital-Highlighter.aspx.Google ScholarGoogle Scholar
  82. WorldPenScan X. Entry & Translation Retrieved on December 2020 from http://www.penpowerinc.com/product.asp?sn=735.Google ScholarGoogle Scholar
  83. Livescribe. A pen for every occasion. Retrieved on December 2020 from https://us.livescribe.com/collections/smartpens.Google ScholarGoogle Scholar
  84. K. C. Huang, C. K. Sun, D. Y. Huang, Y. C. Chen, R. C. Chang, S. W. Hsu, C. Y. Yang, and B. Y. Chen. 2020. Glissade: Generating balance shifting feedback to facilitate auxiliary digital pen input. In CHI Conference on Human Factors in Computing Systems, 1--13.Google ScholarGoogle Scholar
  85. C. M. Chen, J. Y. Wang, and M. Lin. 2019. Enhancement of English learning performance by using an attention-based diagnosing and review mechanism in paper-based learning context with digital pen support. Univ. Access Inf. Soc. 18, 1 (2019), 141--153.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. C. M. Chen, C. C. Tan, and B. J. Lo. 2016. Facilitating English-language learners’ oral reading fluency with digital pen technology. Interact. Learn. Environ. 24, 1 (2016), 96--118.Google ScholarGoogle ScholarCross RefCross Ref
  87. C. C. Tan, C. M. Chen, and H. M. Lee. 2020. Effectiveness of a digital pen-based learning system with a reward mechanism to improve learners’ metacognitive strategies in listening. Comput. Assist. Lang. Learning. 33, 7 (2020), 1--26.Google ScholarGoogle ScholarCross RefCross Ref
  88. N. Choi, S. Kang, and J. Sheo. 2020. Children's interest in learning English through picture books in an EFL context: The effects of parent--child interaction and digital pen use. Educ. Sci. 10, 2 (2020), 40.Google ScholarGoogle ScholarCross RefCross Ref
  89. P. Krish. 2020. The use of the audio pen in enhancing reading skills among preschool children. Int. J. Inf. Educ. Technol. 10, 5 (2020).Google ScholarGoogle Scholar

Index Terms

  1. Real-time Assistive Reader Pen for Arabic Language

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)40
        • Downloads (Last 6 weeks)1

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!