Abstract
In this study, we evaluate and compare two different approaches for multilingual phone recognition in code-switched and non-code-switched scenarios. First approach is a front-end Language Identification (LID)-switched to a monolingual phone recognizer (LID-Mono), trained individually on each of the languages present in multilingual dataset. In the second approach, a common multilingual phone-set derived from the International Phonetic Alphabet (IPA) transcription of the multilingual dataset is used to develop a Multilingual Phone Recognition System (Multi-PRS). The bilingual code-switching experiments are conducted using Kannada and Urdu languages. In the first approach, LID is performed using the state-of-the-art i-vectors. Both monolingual and multilingual phone recognition systems are trained using Deep Neural Networks. The performance of LID-Mono and Multi-PRS approaches are compared and analysed in detail. It is found that the performance of Multi-PRS approach is superior compared to more conventional LID-Mono approach in both code-switched and non-code-switched scenarios. For code-switched speech, the effect of length of segments (that are used to perform LID) on the performance of LID-Mono system is studied by varying the window size from 500 ms to 5.0 s, and full utterance. The LID-Mono approach heavily depends on the accuracy of the LID system and the LID errors cannot be recovered. But, the Multi-PRS system by virtue of not having to do a front-end LID switching and designed based on the common multilingual phone-set derived from several languages, is not constrained by the accuracy of the LID system, and hence performs effectively on code-switched and non-code-switched speech, offering low Phone Error Rates than the LID-Mono system.
- K. Bhuvanagirir and S. K. Kopparapu. 2012. Mixed Language Speech Recognition without Explicit identification of language. Amer. J. Signal Process. 2(5), (2012), 92–97. DOI:https://doi.org/10.5923/j.ajsp.20120205.02Google Scholar
Cross Ref
- A. Biswas, E. Yilmaz, F. d. Wet, E. v. d. Westhuizen, and T. Niesler. 2019. Semi-supervised acoustic model training for five-lingual code-switched ASR. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). 3745–3749. DOI:https://doi.org/10.21437/Interspeech.2019-1325Google Scholar
- W. M. Campbell, J. Campbell, D. A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo. 2006. Support vector machines for speaker and language recognition. Comput. Speech Lang. 20, 2-3, (2006), 210–229. DOI:https://doi.org/10.1016/j.csl.2005.06.003Google Scholar
Cross Ref
- W. M. Campbell, E. Singer, P. A. Torres-Carrasquillo, and D. A. Reynolds. 2004. Language recognition with support vector machines. In Proceedings of Odyssey: The Speaker and Language Recognition Workshop. 41–44.Google Scholar
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), (2011), 1–27. Retrieved from http://www.csie.ntu.edu.tw/∼cjlin/libsvm. Google Scholar
Digital Library
- N. Dehak, P. A T. Carrasquillo, D. Reynolds, and R. Dehak. 2011. Language recognition via i-vectors and dimensionality reduction. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’11). 857–860.Google Scholar
- Department of Higher Education, Ministry of Education, Government of India. Language education. Retrieved from https://mhrd.gov.in/language-education.Google Scholar
- Department of Higher Education, Ministry of Education, Government of India. To know more about Indian languages. Retrieved from http://mhrd.gov.in/sites/upload_files/mhrd/files/upload_document/languagebr.pdf.Google Scholar
- Development of Prosodically Guided Phonetic Engine for Searching Speech Databases in Indian Languages. 2012. Retrieved from http://speech.iiit.ac.in/svldownloads/pro_po_en_report/.Google Scholar
- J. G. Dominguez, D. Eustis, I. L. Moreno, A. Senior, F. Beaufays, and P. J. Moreno. 2015. A real-time end-to-end multilingual speech recognition architecture. IEEE J. Select. Top. Signal Process. 10, 4, (2015). DOI:https://doi.org/10.1109/JSTSP.2014.2364559Google Scholar
- S. Ford. Language Mixing among Bilingual Children. Retrieved from http://www2.hawaii.edu/ sford/research/mixing.htm.Google Scholar
- V. Golla. 2011. California Indian Languages. University of California Press—Language Arts & Disciplines, 380 pages.Google Scholar
- R. R. Heredia and J. Altarriba. 2001. Bilingual language mixing: Why do bilinguals code-switch? Curr. Direct. Psychol. Sci. 10, (2001), 164–168. DOI:https://doi.org/10.1111/1467-8721.00140Google Scholar
Cross Ref
- A. K. V. Sai Jayram, V. Ramasubramanian, and T. V. Sreenivas. 2003. Language identification using parallel sub-word recognition.. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSAP’03), Vol. 1, I-32. DOI:https://doi.org/10.1109/ICASSP.2003.1198709Google Scholar
- B. Jiang, Y. Song, S. Wei, J. H. Liu, I. McLoughlin, and L. Dai. 2014. Deep bottleneck features for spoken language identification. PLoS ONE, 9(7) (2014). DOI:https://doi.org/10.1371/journal.pone.0100795Google Scholar
- B. Jiang, Y. Song, S. Wei, M. Wang, I. McLoughlin, and L. Dai. 2014. Performance evaluation of deep bottleneck features for spoken language identification. In Proceedings of the International Symposium on Chinese Spoken Language Processing, 143–147. DOI:https://doi.org/10.1109/ISCSLP.2014.6936580Google Scholar
Cross Ref
- L. Jorschick, A. E. Quick, D. Glasser, E. Lieven, and M. Tomasello. 2011. German-English-speaking children’s mixed NPs with “correct” agreement. Biling.: Lang. Cogn. 14, 2, (2011), 173–183. DOI:https://doi.org/10.1017/S1366728910000131Google Scholar
- S. Kim and M. L. Seltzer. 2018. Towards language-universal end-to-end speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). 4914–4918. DOI:https://doi.org/10.1109/ICASSP.2018.8462201Google Scholar
Digital Library
- J. F. Kroll and A. M. B. De Groot (Ed.). 2005. Handbook of Bilingualism: Psycholinguistic Approaches. Oxford University Press.Google Scholar
- S. B. S. Kumar, K. S. Rao, and D. Pati. 2013. Phonetic and prosodically rich transcribed speech corpus in Indian languages : Bengali and Odia. In Proceedings of the 16th IEEE International Oriental COCOSDA (O-COCOSDA’13). 1–5. DOI:https://doi.org/10.1109/ICSDA.2013.6709901Google Scholar
- C. S. Kumar, V. P. Mohandas, and L. Haizhou. 2005. Multilingual speech recognition: A unified approach. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’05), 3357–3360.Google Scholar
- Z. T. Kyaw Z. H. Lim E. S. Chng H. Xu, V. T. Pham and H. Li. 2018. Mandarin-English code-switching speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 554–555.Google Scholar
- M. Li, H. Suo, X. Wu, P. Lu, and Y. Yan. 2007. Spoken language identification using score vector modeling and support vector machine. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’07). 350–353.Google Scholar
- H. Lin, J. T. Huang, F. Beaufays, B. Strope, and H. Sung. 2012. Recognition of multilingual speech in mobile applications. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12), 4881–4884. DOI:https://doi.org/10.1109/ICASSP.2012.6289013Google Scholar
- D. Lyu, R. Lyu, Y. Chiang, and C. Hsu. 2006. Speech recognition on code-switching among the chinese dialects. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’06), I–I. DOI:https://doi.org/10.1109/ICASSP.2006.1660218Google Scholar
- B. Ma, C. Guan, H. Li, and C. Lee. 2002. Multilingual speech recognition with language identification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’02).Google Scholar
- M. C. Madhavi, S. Sharma, and H. A. Patil. 2014. Development of language resources for speech application in Gujarati and Marathi. In Proceedings of the IEEE International Conference on Asian Language Processing (IALP’14), Vol. 1, 115–118. DOI:https://doi.org/10.1109/IALP.2014.6973517Google Scholar
- K. E. Manjunath, K. S. Rao, D. B. Jayagopi, and V. Ramasubramanian. 2018. Indian languages ASR: A multilingual phone recognition framework with IPA-based common phone-set, predicted articulatory features and feature fusion. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 1016–1020. DOI:https://doi.org/10.21437/Interspeech.2018-2529Google Scholar
- L. Mary and B. Yegnanarayana. 2004. Autoassociative neural network models for language identification. In Proceedings of the International Conference on Intelligent Sensing and Information Processing (ICISIP’04). DOI:https://doi.org/10.1109/ICISIP.2004.1287674Google Scholar
- M. Muller, S. Stuker, and A. Waibel. 2016. Towards improving low-resource speech recognition using articulatory and language features. In Proceedings of the International Workshop on Spoken Language Translation (IWSLT’16), 1–7.Google Scholar
- T. Nagarajan and H. A. Murthy. 2003. A pair-wise multiple codebook approach to implicit language identification. In Proceedings of the Workshop on Spoken Language Processing. 101–108. DOI:https://doi.org/10.1109/ICASSP.2018.8461972Google Scholar
Digital Library
- D. Nandi, D. Pati, and K. S. Rao. 2017. Implicit processing of LP residual for language identification. Comput. Speech Lang. (2017), 68–87. DOI:https://doi.org/10.1016/j.csl.2016.06.002 Google Scholar
Digital Library
- B. Padi, S. Ramoji, V. Yeruva, S. Kumar, and S. Ganapathy. 2018. The LEAP language recognition system for LRE 2017 challenge—Improvements and error analysis. In Proceedings of the Odyssey: The Speaker and Language Recognition Workshop, 31–38. DOI:https://doi.org/10.21437/Odyssey.2018-5Google Scholar
- V. T. Pham H. Xu E. S. Chng Z. Zeng, Y. Khassanov, and H. Li. 2019. On the end-to-end solution to Mandarin-English code-switching speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). 2165–2169. DOI:https://doi.org/10.21437/Interspeech.2019-1429Google Scholar
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlcek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely. 2011. The Kaldi speech recognition toolkit. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Undertsanding (ASRU’11). Retrieved from http://kaldi-asr.org/.Google Scholar
- L. Rabiner, B. Juang, and B. Yegnanarayana. 2008. Fundamentals of Speech Recognition. Pearson Education. Google Scholar
Digital Library
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. 2000. Speaker verification using adapted Gaussian mixture models. Dig. Signal Process. 10, 1--3, (2000), 19–41. Google Scholar
Digital Library
- K. T. Riedhammer, T. Bocklet, A. Ghoshal, and D. Povey. 2012. Revisiting semi-continuous hidden Markov models. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). 4721–4724. DOI:https://doi.org/10.1109/ICASSP.2012.6288973Google Scholar
- S. A. SantoshKumar and V. Ramasubramanian. 2005. Automatic language identification using ergodic-HMM. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’05). 609–612. DOI:https://doi.org/10.1109/ICASSP.2005.1415187Google Scholar
Cross Ref
- B. D. Sarma, M. Sarma, M. Sarma, and S. R. M. Prasanna. 2013. Development of assamese phonetic engine: Some issues. In Proceedings of the IEEE Conference of the India Council of Computer Science and Engineering (INDICON’13). 1–6. DOI:https://doi.org/10.1109/INDCON.2013.6725966Google Scholar
- T. Schultz. 2014. Multilingual automatic speech recognition for code-switching speech. In Proceedings of the 9th International Symposium on Chinese Spoken Language Processing.Google Scholar
- T. Schultz and A. Waibel. 1998a. Language independent and language adaptive large vocabulary speech recognition. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’98). 1819–1822.Google Scholar
- T. Schultz and A. Waibel. 1998b. Multilingual and crosslingual speech recognition. In Proceedings of the DARPA Workshop on Broadcast News Transcription and Understanding. 259–262.Google Scholar
- T. Schultz and A. Waibel. 2001. Language independent and language adaptive acoustic modeling for speech recognition. Speech Commun. 35, (2001), 31–51. DOI:https://doi.org/10.1016/S0167-6393(00)00094-7 Google Scholar
Digital Library
- T. Schultz and K. Kirchhoff. 2006. Multilingual Speech Processing. Academic Press. DOI:https://doi.org/10.1016/B978-0-12-088501-5.X5000-8 Google Scholar
Cross Ref
- scikit-learn. scikit-learn: Machine learning in Python. Retrieved from https://scikit-learn.org.Google Scholar
- Sclite Tool. Retrieved from http://www1.icsi.berkeley.edu/Speech/docs/sctk-1.2/sclite.htm.Google Scholar
- M. V. Shridhara, B. K Banahatti, L. Narthan, V. Karjigi, and R. Kumaraswamy. 2013. Development of Kannada speech corpus for prosodically guided phonetic search engine. In Proceedings of the 16th International Oriental COCOSDA (O-COCOSDA’13), 1–6. DOI:https://doi.org/10.1109/ICSDA.2013.6709875Google Scholar
- S. M. Siniscalchi, D. Lyu, T. Svendsen, and C. Lee. 2012. Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data. IEEE Trans. Acoust. Speech Signal Process. 20, 3 (2012), 875–887. DOI:https://doi.org/10.1109/TASL.2011.2167610 Google Scholar
Digital Library
- S. Sitaram K. Bali S. Sivasankaran, B. M. L. Srivastava and M. Choudhury. 2018. Phone merging for code-switched speech recognition. In Proceedings of the 3rd Workshop on Computational Approaches to Linguistic Code-switching, 11–19.Google Scholar
- The International Phonetic Association. 2007. Handbook of the International Phonetic Association. Cambridge University Press. Retrieved from https://www.internationalphoneticassociation.org/.Google Scholar
- S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and K. Rao. 2018. Multilingual speech recognition with a single end-to-end model. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’18). 4904–4908. DOI:https://doi.org/10.1109/ICASSP.2018.8461972Google Scholar
Digital Library
- G. R. Tucker. 1999. A global perspective on bilingualism and bilingual education. ERIC Digest, Office of Educational Research and Improvement (ED), Washington, DC.Google Scholar
- N. T. Vu, D. Imseng, D. Povey, P. Motlicek, T. Schultz, and H. Bourlard2014. Multilingual deep neural network-based acoustic modeling for rapid language adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14), 7639-7643. DOI:https://doi.org/10.1109/ICASSP.2014.6855086Google Scholar
Cross Ref
- N. T. Vu, D. Lyu, J. Weiner, D. Telaar, T. Schlippe, F. Blaicher, E. Chng, T. Schultz, and Haizhou Li. 2012. A first speech recognition system for mandarin-english code-switch conversational speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’12). 4889–4892. DOI:https://doi.org/10.1109/ICASSP.2012.6289015Google Scholar
Cross Ref
- A. Waibel, H. Soltau, T. Schultz, T. Schaaf, and F. Metze. 2000. Multilingual speech recognition. In Verbmobil: Foundations of Speech-to-Speech Translation. Artificial Intelligence. Springer, 33–45. DOI:https://doi.org/10.1007/978-3-662-04230-4_3Google Scholar
- J. Weiner, N. T. Vu, D. Telaar, F. Metze, T. Schultz, D. Lyu, E. Chng, and H. Li. 2012. Integration of language identification into a recognition system for spoken conversations containing code-switches. In Proceedings of the 3rd Workshop on Spoken Language Technology for Under-resourced Languages (SLTU’12).Google Scholar
- L. Xie P. Guo, H. Xu, and E. S. Chng. 2018. Study of semi-supervised approaches to improving english-Mandarin code-switching speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 1928–1932. DOI:https://doi.org/10.21437/Interspeech.2018-1974Google Scholar
- E. Yilmaz, A. Biswas, F. De Wet, E. v. d. Westhuizen, and T. Niesler. 2018. Building a unified code-switching asr system for south african languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18), 1923–1927. DOI:https://doi.org/10.21437/Interspeech.2018-1966Google Scholar
- E. Yilmaz, H. v. d. Heuvel, and D. v. Leeuwen. 2016. Investigating Bilingual Deep Neural Networks for automatic recognition of code-switching frisian speech. In Proceedings of the 5th Workshop on Spoken Language Technology for Under-resourced Languages(SLTU), 159–166. DOI:https://doi.org/10.1016/j.procs.2016.04.044Google Scholar
Cross Ref
- X. Zhang, J. Trmal, D. Povey, and S. Khudanpur. 2014. Improving deep neural network acoustic models using generalized maxout networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’14). 215–219. DOI:https://doi.org/10.1109/ICASSP.2014.6853589Google Scholar
- S. Zhao C. Gong W. Zou N. Luo, D. Jiang and X. Li. 2018. Towards end-to-end code-switching speech recognition. Retrieved from https://arxiv.org/abs/1810.13091.Google Scholar
Index Terms
Approaches for Multilingual Phone Recognition in Code-switched and Non-code-switched Scenarios Using Indian Languages
Recommendations
Code-switched automatic speech recognition in five South African languages
AbstractMost automatic speech recognition (ASR) systems are optimised for one specific language and their performance consequently deteriorates drastically when confronted with multilingual or code-switched speech. We describe our efforts to ...
Highlights- Addressed different aspects of ASR for South African code-switched speech.
- Four ...
Development and analysis of multilingual phone recognition systems using Indian languages
In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages--Kannada, Telugu, Bengali, and Odia--is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone ...
Learning Vietnamese-English Code-Switching Speech Synthesis Model Under Limited Code-Switched Data Scenario
PRICAI 2021: Trends in Artificial IntelligenceAbstractRecent advances in deep learning facilitate the development of end-to-end Vietnamese text-to-speech (TTS) systems that produce Vietnamese voices with high intelligibility and naturalness. However, enabling these systems to speak Vietnamese and ...






Comments