Abstract
This article discusses the process of automatically building Arabic multi-dialect speech corpora using Voice over Internet Protocol (VoIP). The Asterisk framework was adopted to act as the main connection between the parties, for which two virtual machines were created: a sender and a receiver. The sender makes a VoIP call to the receiver using the Asterisk framework, while the receiver records the call automatically, a process that is repeated for all the audio files involved in the corpora. In this work, more than 67,000 automatic calls were made between the sender and receiver machines, generating VoIP Arabic corpora for four Arabic dialects. The resulting corpora can be considered the first Arabic VoIP parallel speech corpora and will be made freely available to researchers in Arabic NLP and speech recognition research.
- Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, and Yuqing Gao. 2006. On the use of morphological analysis for dialectal arabic speech recognition. In Proceedings of the 9th International Conference on Spoken Language Processing (INTERSPEECH’06). IBM T.J. Watson Research Center, Pittsburgh, PA, 277--280.Google Scholar
Cross Ref
- Imad A. Al-Sughaiyer and Ibrahim A. Al-Kharashi. 2004. Arabic morphological analysis techniques: A comprehensive survey. J. Amer. Soc. Info. Sci. Technol. 55, 3 (February 2004), 189--213. Retrieved from http://dl.acm.org/citation.cfm?id=985352.985354. Google Scholar
Digital Library
- Mansour Alghamdi, Husni Almuhtasib, and Mustafei Elshafei. 2004. Arabic phonological rules. King Saud Univ. J.: Comput. Sci. Info. (in Arabic) 16 (2004), 1--25.Google Scholar
Digital Library
- Khalid Almeman. 2015. Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition. PhD thesis, University of Birmingham.Google Scholar
- Khalid Almeman and Mark Lee. 2013. A comparison of arabic speech recognition for multi-dialect vs. specific dialects. In Proceedings of the 7th International Conference on Speech Technology and Human-Computer Dialogue (SpeD’13). Cluj-Napoca, Romania.Google Scholar
- Khalid Almeman and Mark Lee. 2013. An incremental methodology for improving pronunciation dictionaries for arabic speech recognition. In Proceedings of the 7th International Conference on Speech Technology and Human-Computer Dialogue (SpeD’13). Cluj-Napoca, Romania.Google Scholar
- Khalid Almeman and Mark Lee. 2013. Automatic building of arabic multi dialect text corpora by bootstrapping dialect words. In Proceedings of the 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA’13). Sharjah, UAE, 1--6. DOI:http://dx.doi.org/10.1109/ICCSPA.2013.6487247 Google Scholar
Cross Ref
- Khalid Almeman, Mark Lee, and Ali Abdulrahman Almiman. 2013. Multi dialect arabic speech parallel corpora. In Proceedings of the 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA’13). Sharjah, UAE, 1--6. Google Scholar
Cross Ref
- Gopala Krishna Anumanchipalli, Luis C. Oliveira, and Alan W. Black. 2012. Intent transfer in speech-to-speech machine translation. In Proceedings of the Spoken Language Technology Workshop (SLT’12). IEEE, 153--158. Google Scholar
Cross Ref
- Irina Bokova. 2012. World Arabic Language Day. Retrieved from http://www.unesco.org/new/en/unesco/events/prizes-and-celebrations/celebrations/international-days/world-arabic-language-day/.Google Scholar
- Sami Boudelaa and William D. Marslen-Wilson. 2010. Aralex: A lexical database for modern standard arabic. Behav. Res. Methods 42, 2 (2010), 481--487. Google Scholar
Cross Ref
- Ronald Carter, Michael McCarthy, Geraldine Mark, and Anne O’Keeffe. 2011. English Grammar Today: An AZ of Spoken and Written Grammar. Cambridge University Press, The Edinburgh Building, Cambridge CB2 8RU, UK.Google Scholar
- CIA. 2013. The World Factbook. Retrieved from https://www.cia.gov/library/publications/the-world-factbook/.Google Scholar
- Suparnakanti Das, Anupam Mandal, K. R. Prasanna Kumar, Paromita Choudhury, and Anil Kumar Chilli. 2013. A framework for creation of telephone, cellular and VoIP speech corpus. In Proceedings of the 2013 International Conference on Oriental COCOSDA Held Jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE’13). IEEE, 1--4. Google Scholar
Cross Ref
- Mona Diab, Mahmoud Ghoneim, and Nizar Habash. 2007. Arabic diacritization in the context of statistical machine translation. In Proceedings of the Machine Translation Summit (MT-Summit’07). Copenhagen, Denmark, 143--149.Google Scholar
- Mohamed Elmahdy, Rainer Gruhn, and Wolfgang Minker. 2012. Novel Techniques for Dialectal Arabic Speech Recognition. Springer. Google Scholar
Cross Ref
- Moustafa Elshafei, Husni Al-Muhtaseb, and Mansour Alghamdi. 2006. Statistical methods for automatic diacritization of Arabic text. In Proceedings of the Saudi 18th National Computer Conference, Vol. 18. Riyadh, Saudi Arabia, 301--306.Google Scholar
- Tomaz Erjavec. 2004. MULTEXT-east version 3: Multilingual morphosyntactic specifications, lexicons and corpora. In Proceedings of the LREC. 2544--2547.Google Scholar
- Ethnologue. 17th ed., 2013. Arabic, Standard. Retrieved from http://www.ethnologue.com/language/arb.Google Scholar
- Ali Farghaly and Khaled Shaalan. 2009. Arabic natural language processing: Challenges and solutions. ACM Trans. Asian Lang. Info. Process. (TALIP) 8, 4 (2009), 14:1--14:22.Google Scholar
- Edward Finegan. 2008. Language: Its Structure and Use (5th ed.). Michael Rosenberg.Google Scholar
- John S. Garofolo, Lori F. Lamel, William M. Fisher, Jonathon G. Fiscus, David S. Pallett, Nancy L. Dahlgren, and V. Zue. 1993. TIMIT Acoustic-phonetic Continuous Speech Corpus. Technical Report 5. Linguistic Data Consortium (LDC), University of Pennsylvania, Philadelphia, PA. LDC Catalog No: LDC93S1, Retrieved from http://catalog.ldc.upenn.edu/LDC93S1.Google Scholar
- Martine Haak. 1996. The Arabic Verb. A Functional Grammar Approach to Verbal Expressions in Classical and Modern Arabic. Ph.D. dissertation. University of Amsterdam.Google Scholar
- Nizar Habash. 2010. Introduction to Arabic Natural Language Processing. Morgan & Claypool Publishers. DOI:http://dx.doi.org/10.2200/S00277ED1V01Y201008HLT010 Google Scholar
Cross Ref
- Nizar Habash, Abdelhadi Soudi, and Timothy Buckwalter. 2007. Arabic Computational Morphology: Knowledge-based and Empirical Methods. Text, Speech and Language Technology, Vol. 38. Springer, 15--22. Google Scholar
Cross Ref
- Grover Hudson. 1986. Arabic root and pattern morphology without tiers. J. Linguist. 22, 1 (1986), 85--122. Google Scholar
Cross Ref
- Alexander Kain, John-Paul Hosom, Sarah Hargus Ferguson, and Brian Bush. 2011. Creating a Speech Corpus with Semi-spontaneous, Parallel Conversational and Clear Speech Tech Report: CSLU-11-003. Technical Report. Center for Spoken Language Understanding, Oregon Health 8 Science University.Google Scholar
- Katrin Kirchhoff and Dimitra Vergyri. 2005. Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition. Speech Commun. 46, 1 (2005), 37--51. Google Scholar
Cross Ref
- KAI-FU Lee, HSIAO-WUEN Hon, and R. A. J. Reddy. 1990. An overview of the SPHINX speech recognition system. Acoust. Speech Signal Process. 38, 1 (1990), 35--45. Google Scholar
Cross Ref
- Nadia Mana, Susanne Burger, Roldano Cattoni, Laurent Besacier, Victoria MacLaren, John W. McDonough, and Florian Metze. 2003. The NESPOLE! voIP multilingual corpora in tourism and medical domains. In Proceedings of International Conference on Spoken Language Processing (INTERSPEECH’03).Google Scholar
- Joseph Olive, Caitlin Christianson, and John McCary. 2011. Handbook of Natural Language Processing and Machine Translation (1st ed.). Springer Publishing Company, Incorporated. Google Scholar
Cross Ref
- Alicia Pérez, José M. Alcaide, and M. Inés Torres. 2012. EuskoParl: A speech and text spanish-basque parallel corpus. In Proceedings of the 13th International Conference on Spoken Language Processing (INTERSPEECH’12). Portland, OR. 2362--2365.Google Scholar
- Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (December 11--15, 2011). IEEE Signal Processing Society.Google Scholar
- Karin C. Ryding. 2005. A Reference Grammar of Modern Standard Arabic. Cambridge University Press, Cambridge, UK. Google Scholar
Cross Ref
- Abdelhadi Soudi, Günter Neumann, and Antal van den Bosch. 2007. Arabic Computational Morphology: Knowledge-based and Empirical Methods, in Text, Speech and Language Technology, Vol. 38. Springer.Google Scholar
- Sphinx. 2009. Sphinx 3.0.8 [software]. Retrieved from http://sourceforge.net/projects/cmusphinx/files/sphinx3/0.8/.Google Scholar
- Sphinxtrain. 2011. Sphinxtrain 1.0.7 [software]. Retrieved from http://sourceforge.net/projects/cmusphinx/files/sphinxtrain/1.0.7/.Google Scholar
- Jim Van Meggelen, Leif Madsen, and Jared Smith. 2007. Asterisk: The Future of Telephony. O’Reilly Media, Inc.Google Scholar
- Kees Versteegh. 2001. The Arabic Language (Islamic Surveys). Edinburgh University Press, Edinburgh, UK.Google Scholar
- Janet C. E. Watson. 2007. The Phonology and Morphology of Arabic. Oxford University Press.Google Scholar
Index Terms
Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects
Recommendations
Statistical analysis of arabic phonemes used in arabic speech recognition
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IThis study is specifically concerned with the statistical analysis of the Arabic phonemes due to its significant role in continuous Arabic Speech Recognition System (ASR). When building Arabic speech recognizer , the number of frames that a phoneme ...
Non-diacritized Arabic speech recognition based on CNN-LSTM and attention-based models
Arabic language has a set of sound letters called diacritics, these diacritics play an essential role in the meaning of words and their articulations. The change in some diacritics leads to a change in the context of the sentence. However, the existence ...






Comments