skip to main content
research-article

Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

Published:05 October 2017Publication History
Skip Abstract Section

Abstract

This article discusses the process of automatically building Arabic multi-dialect speech corpora using Voice over Internet Protocol (VoIP). The Asterisk framework was adopted to act as the main connection between the parties, for which two virtual machines were created: a sender and a receiver. The sender makes a VoIP call to the receiver using the Asterisk framework, while the receiver records the call automatically, a process that is repeated for all the audio files involved in the corpora. In this work, more than 67,000 automatic calls were made between the sender and receiver machines, generating VoIP Arabic corpora for four Arabic dialects. The resulting corpora can be considered the first Arabic VoIP parallel speech corpora and will be made freely available to researchers in Arabic NLP and speech recognition research.

References

  1. Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, and Yuqing Gao. 2006. On the use of morphological analysis for dialectal arabic speech recognition. In Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH’06). IBM T.J. Watson Research Center, Pittsburgh, PA, 277--280.Google ScholarGoogle ScholarCross RefCross Ref
  2. Imad A. Al-Sughaiyer and Ibrahim A. Al-Kharashi. 2004. Arabic morphological analysis techniques: A comprehensive survey. J. Amer. Soc. Info. Sci. Technol. 55, 3 (February 2004), 189--213. Retrieved from http://dl.acm.org/citation.cfm?id=985352.985354. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Mansour Alghamdi, Husni Almuhtasib, and Mustafei Elshafei. 2004. Arabic phonological rules. King Saud Univ. J.: Comput. Sci. Info. (in Arabic) 16 (2004), 1--25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Khalid Almeman. 2015. Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition. PhD thesis, University of Birmingham.Google ScholarGoogle Scholar
  5. Khalid Almeman and Mark Lee. 2013. A comparison of arabic speech recognition for multi-dialect vs. specific dialects. In Proceedings of the 7th International Conference on Speech Technology and Human-Computer Dialogue (SpeD’13). Cluj-Napoca, Romania.Google ScholarGoogle Scholar
  6. Khalid Almeman and Mark Lee. 2013. An incremental methodology for improving pronunciation dictionaries for arabic speech recognition. In Proceedings of the 7th International Conference on Speech Technology and Human-Computer Dialogue (SpeD’13). Cluj-Napoca, Romania.Google ScholarGoogle Scholar
  7. Khalid Almeman and Mark Lee. 2013. Automatic building of arabic multi dialect text corpora by bootstrapping dialect words. In Proceedings of the 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA’13). Sharjah, UAE, 1--6. DOI:http://dx.doi.org/10.1109/ICCSPA.2013.6487247 Google ScholarGoogle ScholarCross RefCross Ref
  8. Khalid Almeman, Mark Lee, and Ali Abdulrahman Almiman. 2013. Multi dialect arabic speech parallel corpora. In Proceedings of the 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA’13). Sharjah, UAE, 1--6. Google ScholarGoogle ScholarCross RefCross Ref
  9. Gopala Krishna Anumanchipalli, Luis C. Oliveira, and Alan W. Black. 2012. Intent transfer in speech-to-speech machine translation. In Proceedings of the Spoken Language Technology Workshop (SLT’12). IEEE, 153--158. Google ScholarGoogle ScholarCross RefCross Ref
  10. Irina Bokova. 2012. World Arabic Language Day. Retrieved from http://www.unesco.org/new/en/unesco/events/prizes-and-celebrations/celebrations/international-days/world-arabic-language-day/.Google ScholarGoogle Scholar
  11. Sami Boudelaa and William D. Marslen-Wilson. 2010. Aralex: A lexical database for modern standard arabic. Behav. Res. Methods 42, 2 (2010), 481--487. Google ScholarGoogle ScholarCross RefCross Ref
  12. Ronald Carter, Michael McCarthy, Geraldine Mark, and Anne O’Keeffe. 2011. English Grammar Today: An AZ of Spoken and Written Grammar. Cambridge University Press, The Edinburgh Building, Cambridge CB2 8RU, UK.Google ScholarGoogle Scholar
  13. CIA. 2013. The World Factbook. Retrieved from https://www.cia.gov/library/publications/the-world-factbook/.Google ScholarGoogle Scholar
  14. Suparnakanti Das, Anupam Mandal, K. R. Prasanna Kumar, Paromita Choudhury, and Anil Kumar Chilli. 2013. A framework for creation of telephone, cellular and VoIP speech corpus. In Proceedings of the 2013 International Conference on Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE’13). IEEE, 1--4. Google ScholarGoogle ScholarCross RefCross Ref
  15. Mona Diab, Mahmoud Ghoneim, and Nizar Habash. 2007. Arabic diacritization in the context of statistical machine translation. In Proceedings of the Machine Translation Summit (MT-Summit’07). Copenhagen, Denmark, 143--149.Google ScholarGoogle Scholar
  16. Mohamed Elmahdy, Rainer Gruhn, and Wolfgang Minker. 2012. Novel Techniques for Dialectal Arabic Speech Recognition. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  17. Moustafa Elshafei, Husni Al-Muhtaseb, and Mansour Alghamdi. 2006. Statistical methods for automatic diacritization of Arabic text. In Proceedings of the Saudi 18th National Computer Conference, Vol. 18. Riyadh, Saudi Arabia, 301--306.Google ScholarGoogle Scholar
  18. Tomaz Erjavec. 2004. MULTEXT-east version 3: Multilingual morphosyntactic specifications, lexicons and corpora. In Proceedings of the LREC. 2544--2547.Google ScholarGoogle Scholar
  19. Ethnologue. 17th ed., 2013. Arabic, Standard. Retrieved from http://www.ethnologue.com/language/arb.Google ScholarGoogle Scholar
  20. Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Info. Process. (TALIP) 8, 4 (2009), 14:1--14:22.Google ScholarGoogle Scholar
  21. Edward Finegan. 2008. Language: Its Structure and Use (5th ed.). Michael Rosenberg.Google ScholarGoogle Scholar
  22. John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathon G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and V. Zue. 1993. TIMIT Acoustic-phonetic Continuous Speech Corpus. Technical Report 5. Linguistic Data Consortium (LDC), University of Pennsylvania, Philadelphia, PA. LDC Catalog No: LDC93S1, Retrieved from http://catalog.ldc.upenn.edu/LDC93S1.Google ScholarGoogle Scholar
  23. Martine Haak. 1996. The Arabic Verb. A Functional Grammar Approach to Verbal Expressions in Classical and Modern Arabic. Ph.D. dissertation. University of Amsterdam.Google ScholarGoogle Scholar
  24. Nizar Habash. 2010. Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers. DOI:http://dx.doi.org/10.2200/S00277ED1V01Y201008HLT010 Google ScholarGoogle ScholarCross RefCross Ref
  25. Nizar Habash, Abdelhadi Soudi, and Timothy Buckwalter. 2007. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Text, Speech and Language Technology, Vol. 38. Springer, 15--22. Google ScholarGoogle ScholarCross RefCross Ref
  26. Grover Hudson. 1986. Arabic root and pattern morphology without tiers. J. Linguist. 22, 1 (1986), 85--122. Google ScholarGoogle ScholarCross RefCross Ref
  27. Alexander Kain, John-Paul Hosom, Sarah Hargus Ferguson, and Brian Bush. 2011. Creating a Speech Corpus with Semi-spontaneous, Parallel Conversational and Clear Speech Tech Report: CSLU-11-003. Technical Report. Center for Spoken Language Understanding, Oregon Health 8 Science University.Google ScholarGoogle Scholar
  28. Katrin Kirchhoff and Dimitra Vergyri. 2005. Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition. Speech Commun. 46, 1 (2005), 37--51. Google ScholarGoogle ScholarCross RefCross Ref
  29. KAI-FU Lee, HSIAO-WUEN Hon, and R. A. J. Reddy. 1990. An overview of the SPHINX speech recognition system. Acoust. Speech Signal Process. 38, 1 (1990), 35--45. Google ScholarGoogle ScholarCross RefCross Ref
  30. Nadia Mana, Susanne Burger, Roldano Cattoni, Laurent Besacier, Victoria MacLaren, John W. McDonough, and Florian Metze. 2003. The NESPOLE! voIP multilingual corpora in tourism and medical domains. In Proceedings of International Conference on Spoken Language Processing (INTERSPEECH’03).Google ScholarGoogle Scholar
  31. Joseph Olive, Caitlin Christianson, and John McCary. 2011. Handbook of Natural Language Processing and Machine Translation (1st ed.). Springer Publishing Company, Incorporated. Google ScholarGoogle ScholarCross RefCross Ref
  32. Alicia Pérez, José M. Alcaide, and M. Inés Torres. 2012. EuskoParl: A speech and text spanish-basque parallel corpus. In Proceedings of the 13th International Conference on Spoken Language Processing (INTERSPEECH’12). Portland, OR. 2362--2365.Google ScholarGoogle Scholar
  33. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (December 11--15, 2011). IEEE Signal Processing Society.Google ScholarGoogle Scholar
  34. Karin C. Ryding. 2005. A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarCross RefCross Ref
  35. Abdelhadi Soudi, Günter Neumann, and Antal van den Bosch. 2007. Arabic Computational Morphology: Knowledge-based and Empirical Methods, in Text, Speech and Language Technology, Vol. 38. Springer.Google ScholarGoogle Scholar
  36. Sphinx. 2009. Sphinx 3.0.8 [software]. Retrieved from http://sourceforge.net/projects/cmusphinx/files/sphinx3/0.8/.Google ScholarGoogle Scholar
  37. Sphinxtrain. 2011. Sphinxtrain 1.0.7 [software]. Retrieved from http://sourceforge.net/projects/cmusphinx/files/sphinxtrain/1.0.7/.Google ScholarGoogle Scholar
  38. Jim Van Meggelen, Leif Madsen, and Jared Smith. 2007. Asterisk: The Future of Telephony. O’Reilly Media, Inc.Google ScholarGoogle Scholar
  39. Kees Versteegh. 2001. The Arabic Language (Islamic Surveys). Edinburgh University Press, Edinburgh, UK.Google ScholarGoogle Scholar
  40. Janet C. E. Watson. 2007. The Phonology and Morphology of Arabic. Oxford University Press.Google ScholarGoogle Scholar

Index Terms

  1. Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 17, Issue 1
          March 2018
          152 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3141228
          Issue’s Table of Contents

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 October 2017
          • Revised: 1 July 2017
          • Accepted: 1 July 2017
          • Received: 1 April 2017
          Published in tallip Volume 17, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!