skip to main content
research-article

An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

Authors Info & Claims
Published:12 November 2022Publication History
Skip Abstract Section

Abstract

Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction. A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum since the early 2000s, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.

REFERENCES

  1. [1] Li Haizhou, Ma Bin, and Lee Kong Aik. 2013. Spoken language recognition: From fundamentals to practice. Proc. IEEE 101, 5 (2013), 11361159.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Tong Sibo, Garner Philip N., and Bourlard Hervé. 2017. An investigation of deep neural networks for multilingual speech recognition training and adaptation. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17). ISCA, 714718.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Jain Priyam, Gurugubelli Krishna, and Vuppala Anil Kumar. 2020. Towards emotion independent language identification system. In Proceedings of the International Conference on Signal Processing and Communications (SPCOM’20). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Matějka Pavel, Novotnỳ Ondřej, Plchot Oldřich, Burget Lukáš, Sánchez Mireia Diez, and Černockỳ Jan. 2017. Analysis of score normalization in multilingual speaker recognition. Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17) (2017), 15671571.Google ScholarGoogle Scholar
  5. [5] Eberhard, M. David, Simons Gary F., and (eds.) Charles D. Fennig. 2020. Ethnologue: Languages of the World, Twenty-third Edition. SIL International, Dallas, TX.Google ScholarGoogle Scholar
  6. [6] Akmajian Adrian, Farmer Ann K., Bickmore Lee, Demers Richard A., and Harnish Robert M.. 2017. Linguistics: An Introduction to Language and Communication. MIT Press.Google ScholarGoogle Scholar
  7. [7] Bauer Laurie. 2003. Introducing Linguistic Morphology. Edinburgh University Press, Edinburgh.Google ScholarGoogle Scholar
  8. [8] Carroll David. 2007. Psychology of Language. Nelson Education.Google ScholarGoogle Scholar
  9. [9] Ambikairajah Eliathamby, Li Haizhou, Wang Liang, Yin Bo, and Sethu Vidhyasaharan. 2011. Language identification: A tutorial. IEEE Circ. Syst. Mag. 11, 2 (2011), 82108.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Hemakumar G. and Punitha P.. 2013. Speech recognition technology: A survey on Indian languages. Int. J. Inf. Sci. Intell. Syst. 2, 4 (2013), 138.Google ScholarGoogle Scholar
  11. [11] Plauche Madelaine, Nallasamy Udhyakumar, Pal Joyojeet, Wooters Chuck, and Ramachandran Divya. 2006. Speech recognition for illiterate access to information and technology. In Proceedings of the International Conference on Information and Communication Technologies and Development. IEEE, 8392.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Kumar Rohit, Kishore S., Gopalakrishna Anumanchipalli, Chitturi Rahul, Joshi Sachin, Singh Satinder, and Sitaram R.. 2005. Development of Indian language speech databases for large vocabulary speech recognition systems. In Proceedings of the International Conference on Speech and Computer (SPECOM’05). ISCA, 343347.Google ScholarGoogle Scholar
  13. [13] Singh Amitoj, Kadyan Virender, Kumar Munish, and Bassan Nancy. 2019. ASRoIL: A comprehensive survey for automatic speech recognition of Indian languages. Artif. Intell. Rev. (2019), 132.Google ScholarGoogle Scholar
  14. [14] Fathima Noor, Patel Tanvina, Mahima C., and Iyengar Anuroop. 2018. TDNN-based multilingual speech recognition system for low resource Indian languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). ISCA, 31973201.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Panda Soumya Priyadarsini, Nayak Ajit Kumar, and Rai Satyananda Champati. 2020. A survey on speech synthesis techniques in Indian languages. Multimedia Syst. 26 (2020), 453478.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Baljekar Pallavi, Rallabandi Sai Krishna, and Black Alan W.. 2018. An investigation of convolution attention based models for multilingual speech synthesis of Indian languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). ISCA, 24742478.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Haris B. C., Pradhan Gayadhar, Misra A., Prasanna S. R. M., Das Rohan Kumar, and Sinha Rohit. 2012. Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15, 4 (2012), 441453.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Maity Sudhamay, Vuppala Anil Kumar, Rao K. Sreenivasa, and Nandi Dipanjan. 2012. IITKGP-MLILSC speech database for language identification. In Proceedings of the National Conference on Communications (NCC’12). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Shrishrimal Pukhraj P., Deshmukh Ratnadeep R., and Waghmare Vishal B.. 2012. Indian language speech database: A review. Int. Comput. Appl. 47, 5 (2012), 1721.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Kiruthiga S. and Krishnamoorthy K.. 2012. Design issues in developing speech corpus for Indian languages—A survey. In Proceedings of the International Conference on Computer Communication and Informatics. IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kurian Cini. 2015. A review on speech corpus development for automatic speech recognition in Indian languages. Int. J. Adv. Netw. Appl. 6, 6 (2015), 2556.Google ScholarGoogle Scholar
  22. [22] Debapriya Sengupta and Goutam Saha. 2016. Identification of the major language families of India and evaluation of their mutual influence. Current Science 110 (2016), 667–681.Google ScholarGoogle Scholar
  23. [23] Jothilakshmi S., Ramalingam Vennila, and Palanivel S.. 2012. A hierarchical language identification system for Indian languages. Digital Sign. Process. 22, 3 (2012), 544553.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Sengupta Debapriya and Saha Goutam. 2015. Study on similarity among Indian languages using language verification framework. Advances in Artificial Intelligence 2015 (2015), 1–24.Google ScholarGoogle Scholar
  25. [25] Koolagudi Shashidhar G., Rastogi Deepika, and Rao K. Sreenivasa. 2012. Identification of language using Mel-frequency cepstral coefficients (MFCC). Proc. Eng. 38 (2012), 33913398.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Nandi Dipanjan, Pati Debadatta, and Rao K. Sreenivasa. 2015. Implicit excitation source features for robust language identification. Int. J. Speech Technol. 18, 3 (2015), 459477.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Dutta Arup Kumar and Rao K. Sreenivasa. 2018. Language identification using phase information. Int. J. Speech Technol. 21, 3 (2018), 509519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Siddhartha Soma, Mishra Jagabandhu, and Prasanna S. R. Mahadeva. 2020. Language specific information from LP residual signal using linear sub band filters. In Proceedings of the National Conference on Communications (NCC’20). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Mohanty Sanghamitra. 2011. Phonotactic model for spoken language identification in Indian language perspective. Int. J. Comput. Appl. 19, 9 (2011), 1824.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Reddy V. Ramu, Maity Sudhamay, and Rao K. Sreenivasa. 2013. Identification of Indian languages using multi-level spectral and prosodic features. Int. J. Speech Technol. 16, 4 (2013), 489511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Bhanja Chuya China, Laskar Mohammad Azharuddin, and Laskar Rabul Hussain. 2019. A pre-classification-based language identification for Northeast Indian languages using prosody and spectral features. Circ. Syst. Sign. Process. 38, 5 (2019), 22662296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Das Himanish Shekhar and Roy Pinki. 2020. Bottleneck feature-based hybrid deep autoencoder approach for Indian language Identification. Arab. J. Sci. Eng. 45, 4 (2020), 34253436.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Das Aankit, Guha Samarpan, Singh Pawan Kumar, Ahmadian Ali, Senu Norazak, and Sarkar Ram. 2020. A hybrid meta-heuristic feature selection method for identification of Indian spoken languages from audio signals. IEEE Access 8 (2020), 181432181449.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Deshwal Deepti, Sangwan Pardeep, and Kumar Divya. 2020. A language identification system using hybrid features and back-propagation neural network. Appl. Acoust. 164 (2020), 107289.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Manwani Naresh, Mitra Suman K., and Joshi Manjunath V.. 2007. Spoken language identification for Indian languages using split and merge EM algorithm. In Proceedings of the International Conference on Pattern Recognition and Machine Intelligence. Springer, 463468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Kumar V. Ravi, Vydana Hari Krishna, and Vuppala Anil Kumar. 2015. Significance of GMM-UBM based modelling for Indian language identification. Proc. Comput. Sci. 54 (2015), 231236.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Mounika K. V., Achanta Sivanand, Lakshmi H. R., Gangashetty Suryakanth V., and Vuppala Anil Kumar. 2016. An investigation of deep neural network architectures for language recognition in Indian languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’16). ISCA, 29302933.Google ScholarGoogle Scholar
  38. [38] Veera Mounika Kamsali, Vuddagiri Ravi Kumar, Gangashetty Suryakanth V., and Vuppala Anil Kumar. 2018. Combining evidences from excitation source and vocal tract system features for Indian language identification using deep neural networks. Int. J. Speech Technol. 21, 3 (2018), 501508.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Bhanja Chuya China, Laskar Mohammad Azharuddin, Laskar Rabul Hussain, and Bandyopadhyay Sivaji. 2019. Deep neural network based two-stage Indian language identification system using glottal closure instants as anchor points. J. King Saud Univ. Comput. Inf. Sci. 34, 4 (2019), 1439–14574.Google ScholarGoogle Scholar
  40. [40] Mandava Tirusha and Vuppala Anil Kumar. 2019. Attention based residual-time delay neural network for Indian language identification. In Proceedings of the International Conference on Contemporary Computing (IC3’19). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Mandava Tirusha, Vuddagiri Ravi Kumar, Vydana Hari Krishna, and Vuppala Anil Kumar. 2019. An investigation of LSTM-CTC based joint acoustic model for Indian language identification. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’19). IEEE, 389396.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Singer Elliot, Torres-Carrasquillo Pedro A., Gleason Terry P., Campbell William M., and Reynolds Douglas A.. 2003. Acoustic, phonetic, and discriminative approaches to automatic language identification. In Proceedings of the European Conference on Speech Communication and Technology. ISCA, 13451348.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Zissman Marc A.. 1996. Comparison of four approaches to automatic language identification of telephone speech. IEEE Trans. Speech Aud. Process. 4, 1 (1996), 31.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Aarti Bakshi and Kopparapu Sunil Kumar. 2018. Spoken Indian language identification: A review of features and databases. Sādhanā 43, 4 (2018), 114.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Reddy V. Ramu, Sinha Aniruddha, and Seshadri Guruprasad. 2013. Fusion of spectral and time domain features for crowd noise classification system. In Proceedings of the International Conference on Intelligent Systems Design and Applications. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Benesty Jacob, Sondhi M. Mohan, and Huang Yiteng. 2007. Springer Handbook of Speech Processing. Springer.Google ScholarGoogle Scholar
  47. [47] Bishop Christopher M.. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarGoogle Scholar
  48. [48] Goodfellow Ian, Bengio Yoshua, Courville Aaron, and Bengio Yoshua. 2016. Deep Learning. Vol. 1. MIT Press, Cambridge, MA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. [49] Wang Haipeng, Leung Cheung-Chi, Lee Tan, Ma Bin, and Li Haizhou. 2012. Shifted-delta MLP features for spoken language recognition. IEEE Sign. Process. Lett. 20, 1 (2012), 1518.Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Torres-Carrasquillo Pedro A., Singer Elliot, Kohler Mary A., Greene Richard J., Reynolds Douglas A., and Jr John R. Deller. 2002. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In Proceedings of the International Conference on Spoken Language Processing. 8992.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Bielefeld Bocchieri. 1994. Language identification using shifted delta cepstrum. In Proceedings of the Annual Speech Research Symposium, Vol. 41, 42.Google ScholarGoogle Scholar
  52. [52] Vuddagiri Ravi Kumar, Gurugubelli Krishna, Jain Priyam, Vydana Hari Krishna, and Vuppala Anil Kumar. 2018. IIITH-ILSC speech database for Indain language identification. In Proceedings of the Spoken Language Technologies for Under-Resourced Languages (SLTU). 5660.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Snyder David, Garcia-Romero Daniel, Sell Gregory, Povey Daniel, and Khudanpur Sanjeev. 2018. X-vectors: Robust DNN embeddings for speaker recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 53295333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32 (2019), 80268037.Google ScholarGoogle Scholar
  55. [55] Loshchilov Ilya and Hutter Frank. 2019. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  56. [56] Diez Mireia, Varona Amparo, Penagarikano Mikel, Rodriguez-Fuentes Luis Javier, and Bordel German. 2012. On the use of phone log-likelihood ratios as features in spoken language recognition. In Proceedings of the Spoken Language Technology Workshop (SLT’12). IEEE, 274279.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Kukanov Ivan, Trong Trung Ngo, Hautamäki Ville, Siniscalchi Sabato Marco, Salerno Valerio Mario, and Lee Kong Aik. 2020. Maximal figure-of-merit framework to detect multi-label phonetic features for spoken language recognition. IEEE/ACM Trans. Aud. Speech Lang. Process. 28 (2020), 682695.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Ng Raymond W. M., Lee Tan, Leung Cheung-Chi, Ma Bin, and Li Haizhou. 2013. Spoken language recognition with prosodic features. IEEE Trans. Aud. Speech Lang. Process. 21, 9 (2013), 18411853.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. [59] Matejka Pavel, Zhang Le, Ng Tim, Glembek Ondrej, Ma Jeff Z., Zhang Bing, and Mallidi Sri Harish. 2014. Neural network bottleneck features for language identification. In Proceedings of Odyssey 2014: The Speaker and Language Recognition Workshop. ISCA, 299–304.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Richardson Fred, Reynolds Douglas, and Dehak Najim. 2015. Deep neural network approaches to speaker and language recognition. IEEE Sign. Process. Lett. 22, 10 (2015), 16711675.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Fér Radek, Matějka Pavel, Grézl František, Plchot Oldřich, and Černockỳ Jan. 2015. Multilingual bottleneck features for language recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’15). ISCA, 389393.Google ScholarGoogle Scholar
  62. [62] McLaren Mitchell, Ferrer Luciana, and Lawson Aaron. 2016. Exploring the role of phonetic bottleneck features for speaker and language recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 55755579.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Lozano-Diez Alicia, Zazo Ruben, Toledano Doroteo T., and Gonzalez-Rodriguez Joaquin. 2017. An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition. PloS One 12, 8 (2017), e0182580.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Pelecanos Jason and Sridharan Sridha. 2001. Feature warping for robust speaker verification. In Proceedings of Odyssey 2001: The Speaker Recognition Workshop. European Speech Communication Association, 213218.Google ScholarGoogle Scholar
  65. [65] Kalinli Ozlem, Bhattacharya Gautam, and Weng Chao. 2019. Parametric cepstral mean normalization for robust speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 67356739.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Hermansky Hynek and Morgan Nelson. 1994. RASTA processing of speech. IEEE Trans. Speech Aud. Process. 2, 4 (1994), 578589.Google ScholarGoogle ScholarCross RefCross Ref
  67. [67] Eide Ellen and Gish Herbert. 1996. A parametric approach to vocal tract length normalization. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’96), Vol. 1. IEEE, 346348.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Wang Yuxuan, Getreuer Pascal, Hughes Thad, Lyon Richard F., and Saurous Rif A.. 2017. Trainable frontend for robust and far-field keyword spotting. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 56705674.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Lostanlen Vincent, Salamon Justin, Cartwright Mark, McFee Brian, Farnsworth Andrew, Kelling Steve, and Bello Juan Pablo. 2018. Per-channel energy normalization: Why and how. IEEE Sign. Process. Lett. 26, 1 (2018), 3943.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Silnova Anna, Matejka Pavel, Glembek Ondrej, Plchot Oldrich, Novotnỳ Ondrej, Grezl Frantisek, Schwarz Petr, Burget Lukas, and Cernockỳ Jan. 2018. BUT/Phonexia bottleneck feature extractor. In Proceedings of Odyssey 2018: The Speaker and Language Recognition Workshop. ISCA, 283287.Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] House Arthur S. and Neuburg Edward P.. 1977. Toward automatic identification of the language of an utterance. I. Preliminary methodological considerations. J. Acoust. Soc. Am. 62, 3 (1977), 708713.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Sugiyama Masahide. 1991. Automatic language recognition using acoustic features. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’91). IEEE, 813816.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Zissman M. A. and Singer E.. 1994. Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’94), Vol. i. IEEE, 305–308.Google ScholarGoogle ScholarCross RefCross Ref
  74. [74] Reynolds Douglas A. and Rose Richard C.. 1995. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Aud. Process. 3, 1 (1995), 7283.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Reynolds Douglas A., Quatieri Thomas F., and Dunn Robert B.. 2000. Speaker verification using adapted Gaussian mixture models. Digit. Sign. Process. 10, 1-3 (2000), 1941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. [76] Verma Vicky Kumar and Khanna Nitin. 2013. Indian language identification using k-means clustering and support vector machine (SVM). In Proceedings of the Students Conference on Engineering and Systems (SCES’13). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Campbell William M., Singer Elliot, Torres-Carrasquillo Pedro A., and Reynolds Douglas A.. 2004. Language recognition with support vector machines. In Proceedings of Odyssey 2004: The Speaker and Language Recognition Workshop. ISCA, 41–44.Google ScholarGoogle Scholar
  78. [78] Zhai Lu-Feng, Siu Man-hung, Yang Xi, and Gish Herbert. 2006. Discriminatively trained language models using support vector machines for language identification. In Proceedings of Odyssey 2006: The Speaker and Language Recognition Workshop. ISCA, 16.Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Castaldo Fabio, Dalmasso Emanuele, Laface Pietro, Colibro Daniele, and Vair Claudio. 2007. Language identification using acoustic models and speaker compensated cepstral-time matrices. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’07), Vol. 4. IEEE, 1013–1016.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Dehak Najim, Torres-Carrasquillo Pedro A., Reynolds Douglas, and Dehak Reda. 2011. Language recognition via i-vectors and dimensionality reduction. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’11). ISCA, 857860.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Ferrer Luciana, Lei Yun, McLaren Mitchell, and Scheffer Nicolas. 2015. Study of senone-based deep neural network approaches for spoken language recognition. IEEE/ACM Trans. Aud. Speech Lang. Process. 24, 1 (2015), 105116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Sizov Aleksandr, Lee Kong Aik, and Kinnunen Tomi. 2017. Direct optimization of the detection cost for I-vector-based spoken language recognition. IEEE/ACM Trans. Aud. Speech Lang. Process. 25, 3 (2017), 588597.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Padi Bharat, Mohan Anand, and Ganapathy Sriram. 2019. Attention based hybrid i-vector BLSTM model for language recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 12631267.Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Dehak Najim, Kenny Patrick J., Dehak Réda, Dumouchel Pierre, and Ouellet Pierre. 2010. Front-end factor analysis for speaker verification. IEEE Trans. Aud. Speech Lang. Process. 19, 4 (2010), 788798.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. [85] Gonzalez-Dominguez Javier, Lopez-Moreno Ignacio, Moreno Pedro J., and Gonzalez-Rodriguez Joaquin. 2015. Frame-by-frame language identification in short utterances using deep neural networks. Neural Netw. 64 (2015), 4958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Lopez-Moreno Ignacio, Gonzalez-Dominguez Javier, Plchot Oldrich, Martinez David, Gonzalez-Rodriguez Joaquin, and Moreno Pedro. 2014. Automatic language identification using deep neural networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’14). IEEE, 53375341.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Montavon Gregoire. 2009. Deep learning for spoken language identification. In Proceedings of the NIPS Workshop on Deep Learning for Speech Recognition and Related Applications. Citeseer, 14.Google ScholarGoogle Scholar
  88. [88] Lei Yun, Ferrer Luciana, Lawson Aaron, McLaren Mitchell, and Scheffer Nicolas. 2014. Application of convolutional neural networks to language identification in noisy conditions. In Proceedings of Odyssey 2014: The Speaker and Language Recognition Workshop. IEEE, 287–292.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Geng Wang, Wang Wenfu, Zhao Yuanyuan, Cai Xinyuan, and Xu Bo. 2016. End-to-end language identification using attention-based recurrent neural networks. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’16). ISCA, 29442948.Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Gonzalez-Dominguez Javier, Lopez-Moreno Ignacio, Sak Haşim, Gonzalez-Rodriguez Joaquin, and Moreno Pedro J.. 2014. Automatic language identification using long short-term memory recurrent neural networks. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’14). ISCA, 21552159.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Zazo Ruben, Lozano-Diez Alicia, and Gonzalez-Rodriguez Joaquin. 2016. Evaluation of an LSTM-RNN system in different NIST language recognition frameworks. In Proceedings of Odyssey 2016: The Speaker and Language Recognition Workshop. ISCA, 231236.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Fernando Sarith, Sethu Vidhyasaharan, Ambikairajah Eliathamby, and Epps Julien. 2017. Bidirectional modelling for short duration language identification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17). ISCA, 28092813.Google ScholarGoogle ScholarCross RefCross Ref
  93. [93] Padi Bharat, Mohan Anand, and Ganapathy Sriram. 2019. End-to-end language recognition using attention based hierarchical gated recurrent unit models. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 59665970.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Garcia-Romero Daniel and McCree Alan. 2016. Stacked long-term TDNN for spoken language recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’16). ISCA, 32263230.Google ScholarGoogle ScholarCross RefCross Ref
  95. [95] Miao Xiaoxiao, McLoughlin Ian, and Yan Yonghong. 2019. A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN, with application to language identification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 40804084.Google ScholarGoogle ScholarCross RefCross Ref
  96. [96] Snyder David, Garcia-Romero Daniel, McCree Alan, Sell Gregory, Povey Daniel, and Khudanpur Sanjeev. 2018. Spoken language recognition using x-vectors. In Proceedings of Odyssey 2018: The Speaker and Language Recognition Workshop. ISCA, 105111.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Villalba Jesús, Brümmer Niko, and Dehak Najim. 2018. End-to-end versus embedding neural networks for language recognition in mismatched conditions. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 112–119.Google ScholarGoogle Scholar
  98. [98] Shen Peng, Lu Xugang, Sugiura Komei, Li Sheng, and Kawai Hisashi. 2020. Compensation on x-vector for short utterance spoken language identification. In Proceedings of Odyssey 2018: The Speaker and Language Recognition Workshop. ISCA, 4752.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Povey Daniel, Cheng Gaofeng, Wang Yiming, Li Ke, Xu Hainan, Yarmohammadi Mahsa, and Khudanpur Sanjeev. 2018. Semi-orthogonal low-rank matrix factorization for deep neural networks. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). ISCA, 37433747.Google ScholarGoogle ScholarCross RefCross Ref
  100. [100] Snyder David, Villalba Jesús, Chen Nanxin, Povey Daniel, Sell Gregory, Dehak Najim, and Khudanpur Sanjeev. 2019. The JHU speaker recognition system for the VOiCES 2019 challenge. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 24682472.Google ScholarGoogle ScholarCross RefCross Ref
  101. [101] Desplanques Brecht, Thienpondt Jenthe, and Demuynck Kris. 2020. ECAPA-TDNN: Emphasized channel attention, propagation and aggregation in TDNN based speaker verification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’20). ISCA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  102. [102] Karita Shigeki, Chen Nanxin, Hayashi Tomoki, Hori Takaaki, Inaguma Hirofumi, Jiang Ziyan, Someki Masao, Soplin Nelson Enrique Yalta, Yamamoto Ryuichi, Wang Xiaofei, et al. 2019. A comparative study on Transformer vs RNN in speech applications. In Proceedings of the Automatic Speech Recognition and Understanding Workshop (ASRU’19). IEEE, 449456.Google ScholarGoogle ScholarCross RefCross Ref
  103. [103] Vuddagiri Ravi Kumar, Vydana Hari Krishna, and Vuppala Anil Kumar. 2018. Improved language identification using stacked SDC features and residual neural network. In Proceedings of the Spoken Language Technologies for Under-Resourced Languages (SLTU’18)). 210214.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Shen Peng, Lu Xugang, Li Sheng, and Kawai Hisashi. 2020. Knowledge distillation-based representation learning for short-utterance spoken language identification. IEEE/ACM Trans. Aud. Speech Lang. Process. 28 (2020), 26742683.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. [105] Sokolova Marina and Lapalme Guy. 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 4 (2009), 427437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. [106] Haixiang Guo, Yijing Li, Shang Jennifer, Mingyun Gu, Yuanyue Huang, and Bing Gong. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73 (2017), 220239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. [107] Brümmer Niko and Preez Johan Du. 2006. Application-independent evaluation of speaker detection. Comput. Speech Lang. 20, 2-3 (2006), 230275.Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Sadjadi Seyed Omid, Kheyrkhah Timothee, Tong Audrey, Greenberg Craig S., Reynolds Douglas A., Singer Elliot, Mason Lisa P., and Hernandez-Cordero Jaime. 2018. The 2017 NIST language recognition evaluation. In Proceedings of Odyssey 2018: The Speaker and Language Recognition Workshop. ISCA, 8289.Google ScholarGoogle ScholarCross RefCross Ref
  109. [109] Greenberg Craig S., Martin Alvin F., and Przybocki Mark A.. 2012. The 2011 NIST language recognition evaluation. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’12). ISCA, 3437.Google ScholarGoogle ScholarCross RefCross Ref
  110. [110] Martin Alvin F. and Greenberg Craig S.. 2010. The 2009 NIST language recognition evaluation. In Proceedings of Odyssey 2010: The Speaker and Language Recognition Workshop, Vol. 30. ISCA, 165–171.Google ScholarGoogle Scholar
  111. [111] Martin Alvin F. and Przybocki Mark A.. 2003. NIST 2003 language recognition evaluation. In Proceedings of the European Conference on Speech Communication and Technology (Eurospeech’03). ISCA, 13411344.Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Li Zheng, Zhao Miao, Hong Qingyang, Li Lin, Tang Zhiyuan, Wang Dong, Song Liming, and Yang Cheng. 2020. AP20-OLR challenge: Three tasks and their baselines. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’20). IEEE, 550555.Google ScholarGoogle Scholar
  113. [113] Tang Zhiyuan, Wang Dong, and Song Liming. 2019. AP19-OLR challenge: Three tasks and their baselines. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’19). IEEE, 19171921.Google ScholarGoogle ScholarCross RefCross Ref
  114. [114] Tang Zhiyuan, Wang Dong, Chen Yixiang, and Chen Qing. 2017. AP17-OLR challenge: Data, plan, and baseline. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC’17). IEEE, 749753.Google ScholarGoogle ScholarCross RefCross Ref
  115. [115] Ferri César, Hernández-Orallo José, and Modroiu R.. 2009. An experimental comparison of performance measures for classification. Pattern Recogn. Lett. 30, 1 (2009), 2738.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. [116] Brummer Niko. 2010. Measuring, Refining and Calibrating Speaker and Language Information Extracted from Speech. Ph.D. Dissertation. University of Stellenbosch.Google ScholarGoogle Scholar
  117. [117] Rowe Bruce M. and Levine Diane P.. 2018. A Concise Introduction to Linguistics. Routledge.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Kolipakam Vishnupriya, Jordan Fiona M., Dunn Michael, Greenhill Simon J., Bouckaert Remco, Gray Russell D., and Verkerk Annemarie. 2018. A Bayesian phylogenetic study of the Dravidian language family. Roy. Soc. Open Sci. 5, 3 (2018), 171504.Google ScholarGoogle ScholarCross RefCross Ref
  119. [119] Kamil Zvelebil. 1990. Dravidian Linguistics: An Introduction. Pondicherry Institute of Linguistics and Culture.Google ScholarGoogle Scholar
  120. [120] Bakshi Aarti and Kopparapu Sunil Kumar. 2021. Improving Indian spoken-language identification by feature selection in duration mismatch framework. SN Comput. Sci. 2, 6 (2021), 116.Google ScholarGoogle ScholarCross RefCross Ref
  121. [121] Besacier Laurent, Barnard Etienne, Karpov Alexey, and Schultz Tanja. 2014. Automatic speech recognition for under-resourced languages: A survey. Speech Commun. 56 (2014), 85100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. [122] Martin Alvin F., Greenberg Craig S., Howard John M., Doddington George R., and Godfrey John J.. 2014. NIST language recognition evaluation-past and future. In Proceedings of Odyssey 2014: The Speaker and Language Recognition Workshop. ISCA, 145151.Google ScholarGoogle ScholarCross RefCross Ref
  123. [123] Grierson George Abraham. 1906. Linguistic Survey of India. Vol. 4. Office of the Superintendent of Government Printing, India.Google ScholarGoogle Scholar
  124. [124] Emeneau Murray B.. 1956. India as a lingustic area. Language 32, 1 (1956), 316.Google ScholarGoogle ScholarCross RefCross Ref
  125. [125] Vásquez-Correa Juan Camilo, Klumpp Philipp, Orozco-Arroyave Juan Rafael, and Nöth Elmar. 2019. Phonet: A tool based on gated recurrent neural networks to extract phonological posteriors from speech. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 549553.Google ScholarGoogle ScholarCross RefCross Ref
  126. [126] Wiltshire Caroline R. and Harnsberger James D.. 2006. The influence of Gujarati and Tamil L1s on Indian English: A preliminary study. World Engl. 25, 1 (2006), 91104.Google ScholarGoogle ScholarCross RefCross Ref
  127. [127] VijayaRajSolomon Sherlin Solomi, Parthasarathy Vijayalakshmi, and Thangavelu Nagarajan. 2017. Exploiting acoustic similarities between Tamil and Indian English in the development of an HMM-based bilingual synthesiser. IET Sign. Process. 11, 3 (2017), 332340.Google ScholarGoogle ScholarCross RefCross Ref
  128. [128] Maxwell Olga and Fletcher Janet. 2010. The acoustic characteristics of diphthongs in Indian English. World Engl. 29, 1 (2010), 2744.Google ScholarGoogle ScholarCross RefCross Ref
  129. [129] Hansen John H. L. and Bořil Hynek. 2018. On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks. Speech Commun. 101 (2018), 94108.Google ScholarGoogle ScholarCross RefCross Ref
  130. [130] Sturm Bob L.. 2014. A simple method to determine if a music information retrieval system is a “horse.”IEEE Trans. Multimedia 16, 6 (2014), 16361644.Google ScholarGoogle ScholarCross RefCross Ref
  131. [131] Behravan Hamid, Hautamäki Ville, and Kinnunen Tomi. 2015. Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish. Speech Commun. 66 (2015), 118129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. [132] Biadsy Fadi. 2011. Automatic Dialect and Accent Recognition and Its Application to Speech Recognition. Ph.D. Dissertation. Columbia University.Google ScholarGoogle Scholar
  133. [133] Gonzalez-Dominguez Javier, Lopez-Moreno Ignacio, Franco-Pedroso Javier, Ramos Daniel, Toledano Doroteo Torre, and Gonzalez-Rodriguez Joaquin. 2010. Multilevel and session variability compensated language recognition: AVS-UAM systems at NIST LRE 2009. IEEE J. Select. Top. Sign. Process. 4, 6 (2010), 10841093.Google ScholarGoogle ScholarCross RefCross Ref
  134. [134] Xiao R. Z., McEnery A. M., Baker J. P., and Hardie Andrew. 2004. Developing Asian language corpora: Standards and practice. In Asian Language Resources.Google ScholarGoogle Scholar
  135. [135] Muthusamy Yeshwant K., Cole Ronald A., and Oshika Beatrice T.. 1992. The OGI multi-language telephone speech corpus. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’92). ISCA, 895898.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Karen Jones, Graff David, Walker Kevin, and Strassel Stephanie. 2017. Multi-language conversational telephone speech 2011—South Asian LDC2017S14. Linguistic Data Consortium, Philadelphia, PA.Google ScholarGoogle Scholar
  137. [137] Nagrani Arsha, Chung Joon Son, Xie Weidi, and Zisserman Andrew. 2020. Voxceleb: Large-scale speaker verification in the wild. Comput. Speech Lang. 60 (2020), 101027.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Valk Jörgen and Alumäe Tanel. 2021. VoxLingua107: A dataset for spoken language recognition. In Proceedings of the Spoken Language Technology (SLT’21). IEEE, 895898.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Basu Joyanta, Khan Soma, Roy Rajib, Basu Tapan Kumar, and Majumder Swanirbhar. 2021. Multilingual speech corpus in low-resource Eastern and Northeastern Indian languages for speaker and language identification. Circ. Syst. Sign. Process. (2021), 128.Google ScholarGoogle Scholar
  140. [140] Balleda Jyotsana, Murthy Hema A., and Nagarajan T.. 2000. Language identification from short segments of speech. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’00). ISCA, 10331036.Google ScholarGoogle ScholarCross RefCross Ref
  141. [141] Kumar C. S. and Li Haizhou. 2004. Language identification for multilingual speech recognition systems. In Proceedings of the Conference Speech and Computer.Google ScholarGoogle Scholar
  142. [142] Mary Leena and Yegnanarayana B.. 2004. Autoassociative neural network models for language identification. In Proceedings of the International Conference on Intelligent Sensing and Information Processing. IEEE, 317320.Google ScholarGoogle ScholarCross RefCross Ref
  143. [143] Bhaskar B., Nandi Dipanjan, and Rao K. Sreenivasa. 2013. Analysis of language identification performance based on gender and hierarchial grouping approaches. In Proceedings of the International Conference on Natural Language Processing. 127.Google ScholarGoogle Scholar
  144. [144] Aarti Bakshi and Kopparapu Sunil Kumar. 2017. Spoken Indian language classification using artificial neural network—An experimental study. In Proceedings of the International Conference on Signal Processing and Integrated Networks (SPIN’17). IEEE, 424430.Google ScholarGoogle ScholarCross RefCross Ref
  145. [145] Bakshi Aarti and Kopparapu Sunil Kumar. 2021. A GMM supervector approach for spoken Indian language identification for mismatch utterance length. Bull. Electr. Eng. Inf. 10, 2 (2021), 11141121.Google ScholarGoogle ScholarCross RefCross Ref
  146. [146] Madhu Chithra, George Anu, and Mary Leena. 2017. Automatic language identification for seven Indian languages using higher level features. In Proceedings of the International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES’17). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  147. [147] Bhanja Chuya China, Laskar Mohammad A., and Laskar Rabul H.. 2020. Cascade convolutional neural network-long short-term memory recurrent neural networks for automatic tonal and non-tonal preclassification-based Indian language identification. Expert Syst. 37, 5 (2020), e12544.Google ScholarGoogle Scholar
  148. [148] Bhanja Chuya China, Laskar Mohammad Azharuddin, and Laskar Rabul Hussain. 2021. Modelling multi-level prosody and spectral features using deep neural network for an automatic tonal and non-tonal pre-classification-based Indian language identification system. Lang. Resourc. Eval. (2021), 142.Google ScholarGoogle Scholar
  149. [149] Mukherjee Himadri, Ghosh Subhankar, Sen Shibaprasad, Md Obaidullah Sk, Santosh K. C., Phadikar Santanu, and Roy Kaushik. 2019. Deep learning for spoken language identification: Can we visualize speech signal patterns?Neural Comput. Appl. 31, 12 (2019), 84838501.Google ScholarGoogle ScholarCross RefCross Ref
  150. [150] Mukherjee Himadri, Obaidullah S. K. Md, Santosh K. C., Phadikar Santanu, and Roy Kaushik. 2020. A lazy learning-based language identification from speech using MFCC-2 features. Int. J. Mach. Learn. Cybernet. 11, 1 (2020), 114.Google ScholarGoogle ScholarCross RefCross Ref
  151. [151] Garain Avishek, Singh Pawan Kumar, and Sarkar Ram. 2021. FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals. Expert Syst. Appl. 168 (2021), 114416.Google ScholarGoogle ScholarCross RefCross Ref
  152. [152] Basu Joyanta and Majumder Swanirbhar. 2021. Performance evaluation of language identification on emotional speech corpus of three Indian languages. In Intelligence Enabled Research. Springer, 5563.Google ScholarGoogle Scholar
  153. [153] Bakshi Aarti and Kopparapu Sunil Kumar. 2021. Feature selection for improving Indian spoken language identification in utterance duration mismatch condition. Bull. Electr. Eng. Inf. 10, 5 (2021), 25782587.Google ScholarGoogle ScholarCross RefCross Ref
  154. [154] Muralikrishna H., Gupta Shikha, Dinesh Dileep Aroor, and Rajan Padmanabhan. 2021. Noise-robust spoken language identification using language relevance factor based embedding. In Proceedings of the Spoken Language Technology Workshop (SLT’21). IEEE, 644651.Google ScholarGoogle Scholar
  155. [155] Muralikrishna H., Kapoor Shantanu, Dinesh Dileep Aroor, and Rajan Padmanabhan. 2021. Spoken language identification in unseen target domain using within-sample similarity loss. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, 72237227.Google ScholarGoogle Scholar
  156. [156] Chakraborty Jaybrata, Chakraborty Bappaditya, and Bhattacharya Ujjwal. 2021. DenseRecognition of spoken languages. In Proceedings of the International Conference on Pattern Recognition (ICPR’21). IEEE, 96749681.Google ScholarGoogle ScholarCross RefCross Ref
  157. [157] Dey Spandan, Saha Goutam, and Sahidullah Md. 2021. Cross-corpora language recognition: A preliminary investigation with Indian languages. In Proceedings of the European Signal Processing Conference (EUSIPCO’21). IEEE, 546550.Google ScholarGoogle ScholarCross RefCross Ref
  158. [158] Ramesh G., Kumar C. Shiva, and Murty K. Sri Rama. 2021. Self-supervised phonotactic representations for language identification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21), 15141518.Google ScholarGoogle Scholar
  159. [159] Tank Vishal, Manavadaria Manthan, and Dudhat Krupal. 2022. A novel approach for spoken language identification and performance comparison using machine learning-based classifiers and neural network. In Proceedings of the International e-Conference on Intelligent Systems and Signal Processing. Springer, 547555.Google ScholarGoogle ScholarCross RefCross Ref
  160. [160] Biswas Mainak, Rahaman Saif, Ahmadian Ali, Subari Kamalularifin, and Singh Pawan Kumar. 2022. Automatic spoken language identification using MFCC based time series features. Multimedia Tools Appl. (2022), 131.Google ScholarGoogle Scholar
  161. [161] Paul Bachchu, Phadikar Santanu, and Bera Somnath. 2021. Indian regional spoken language identification using deep learning approach. In Proceedings of the International Conference on Mathematics and Computing. Springer, 263274.Google ScholarGoogle ScholarCross RefCross Ref
  162. [162] Trong Trung Ngo, Jokinen Kristiina, and Hautamäki Ville. 2019. Enabling spoken dialogue systems for low-resourced languages–end-to-end dialect recognition for North Sami. In Proceedings of the 9th International Workshop on Spoken Dialogue System Technology. Springer, 221235.Google ScholarGoogle ScholarCross RefCross Ref
  163. [163] Cerva Petr, Mateju Lukas, Kynych Frantisek, Zdansky Jindrich, and Nouza Jan. 2021. Identification of Scandinavian languages from speech using bottleneck features and X-vectors. In Proceedings of the International Conference on Text, Speech, and Dialogue. Springer, 371381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. [164] Peché M., Davel M. H., and Barnard E.. 2009. Development of a spoken language identification system for South African languages. SAIEE Afr. Res. J. 100, 4 (2009), 97103.Google ScholarGoogle ScholarCross RefCross Ref
  165. [165] Woods Nancy and Babatunde Gideon. 2020. A robust ensemble model for spoken language recognition. Appl. Comput. Sci. 16, 3 (2020).Google ScholarGoogle Scholar
  166. [166] Wang Dong, Li Lantian, Tang Difei, and Chen Qing. 2016. AP16-OL7: A multilingual database for oriental languages and a language recognition baseline. In Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA’16). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  167. [167] Monteiro João, Alam Jahangir, and Falk Tiago H.. 2019. Residual convolutional neural network with attentive feature pooling for end-to-end language identification from short-duration speech. Comput. Speech Lang. 58 (2019), 364376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. [168] Duroselle Raphaël, Sahidullah Md., Jouvet Denis, and Illina Irina. 2021. Language recognition on unknown conditions: The LORIA-Inria-MULTISPEECH system for AP20-OLR challenge. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21). ISCA, 32563260.Google ScholarGoogle ScholarCross RefCross Ref
  169. [169] Kong Tianlong, Yin Shouyi, Zhang Dawei, Geng Wang, Wang Xin, Song Dandan, Huang Jinwen, Shi Huiyu, and Wang Xiaorui. 2021. Dynamic multi-scale convolution for dialect identification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21). ISCA, 32613265.Google ScholarGoogle ScholarCross RefCross Ref
  170. [170] Plchot Oldrich, Matejka Pavel, Novotnỳ Ondrej, Cumani Sandro, Lozano-Diez Alicia, Slavicek Josef, Diez Mireia, Grézl Frantisek, Glembek Ondrej, Kamsali Mounika, et al. 2018. Analysis of BUT-PT submission for NIST LRE 2017. In Proceedings of Odyssey 2018: The Speaker and Language Recognition Workshop. ISCA, 4753.Google ScholarGoogle ScholarCross RefCross Ref
  171. [171] Basu Joyanta and Majumder Swanirbhar. 2020. Identification of seven low-resource North-Eastern languages: An experimental study. In Intelligence Enabled Research. Springer, 7181.Google ScholarGoogle Scholar
  172. [172] Arendale Brady, Zarandioon Samira, Goodwin Ryan, and Reynolds Douglas. 2020. Spoken language recognition on open-source datasets. SMU Data Sci. Rev. 3, 2 (2020), 3.Google ScholarGoogle Scholar
  173. [173] Ardila Rosana, Branson Megan, Davis Kelly, Kohler Michael, Meyer Josh, Henretty Michael, Morais Reuben, Saunders Lindsay, Tyers Francis, and Weber Gregor. 2020. Common Voice: A massively-multilingual speech corpus. In Proceedings of the Language Resources and Evaluation Conference (LREC’20). 42184222.Google ScholarGoogle Scholar
  174. [174] Basu Joyanta, Khan Soma, Bepari Milton Samirakshma, Roy Rajib, Pal Madhab, Nandi Sushmita, Arora Karunesh Kumar, Arora Sunita, Bansal Shweta, and Agrawal Shyam Sunder. 2018. Designing an IVR based framework for telephony speech data collection and transcription in under-resourced languages. In Proceedings of the Spoken Language Technologies for Under-Resourced Languages (SLTU’18). 4751.Google ScholarGoogle ScholarCross RefCross Ref
  175. [175] Wang Wei, Zheng Vincent W., Yu Han, and Miao Chunyan. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10, 2 (2019), 137.Google ScholarGoogle ScholarDigital LibraryDigital Library
  176. [176] Ravanelli Mirco, Zhong Jianyuan, Pascual Santiago, Swietojanski Pawel, Monteiro Joao, Trmal Jan, and Bengio Yoshua. 2020. Multi-task self-supervised learning for robust speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 69896993.Google ScholarGoogle ScholarCross RefCross Ref
  177. [177] Stafylakis Themos, Rohdin Johan, Plchot Oldřich, Mizera Petr, and Burget Lukáš. 2019. Self-supervised speaker embeddings. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 28632867.Google ScholarGoogle ScholarCross RefCross Ref
  178. [178] Baevski Alexei, Schneider Steffen, and Auli Michael. 2019. vq-wav2vec: Self-supervised learning of discrete speech representations. In Proceedings of the International Conference on Learning Representations (ICLR’19).Google ScholarGoogle Scholar
  179. [179] Paul D., Sahidullah M., and Saha G.. 2017. Generalization of spoofing countermeasures: A case study with ASVspoof 2015 and BTAS 2016 corpora. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 20472051. http://dx.doi.org/10.1109/ICASSP.2017.7952516Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. [180] Pandey Ashutosh and Wang DeLiang. 2020. On cross-corpus generalization of deep learning based speech enhancement. IEEE/ACM Trans. Aud. Speech Lang. Process. 28 (2020), 24892499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  181. [181] Schuller B. et al. 2010. Cross-corpus acoustic emotion recognition: Variances and strategies. IEEE Trans. Affect. Comput. 1, 2 (2010), 119131. DOI: DOI: http://dx.doi.org/10.1109/T-AFFC.2010.8Google ScholarGoogle ScholarDigital LibraryDigital Library
  182. [182] Garcia-Romero Daniel, Sell Gregory, and McCree Alan. 2020. Magneto: X-vector magnitude estimation network plus offset for improved speaker recognition. In Proceedings of Odyssey 2020: The Speaker and Language Recognition Workshop. ISCA, 18.Google ScholarGoogle ScholarCross RefCross Ref
  183. [183] Ragni Anton, Knill Kate M., Rath Shakti P., and Gales Mark J. F.. 2014. Data augmentation for low resource languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’14). ISCA, 810814.Google ScholarGoogle ScholarCross RefCross Ref
  184. [184] Park Daniel S., Chan William, Zhang Yu, Chiu Chung-Cheng, Zoph Barret, Cubuk Ekin D., and Le Quoc V.. 2019. SpecAugment: A simple data augmentation method for automatic speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 26132617.Google ScholarGoogle ScholarCross RefCross Ref
  185. [185] Hongyi Zhang, Moustapha Cissé, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond Empirical Risk Minimization. In International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, April 30 - May 3, 2018.Google ScholarGoogle Scholar
  186. [186] Borsos Zalán, Li Yunpeng, Gfeller Beat, and Tagliasacchi Marco. 2021. Micaugment: One-shot microphone style transfer. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, 34003404.Google ScholarGoogle ScholarCross RefCross Ref
  187. [187] Pappagari Raghavendra, Wang Tianzi, Villalba Jesus, Chen Nanxin, and Dehak Najim. 2020. X-vectors meet emotions: A study on dependencies between emotion and speaker recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 71697173.Google ScholarGoogle ScholarCross RefCross Ref
  188. [188] Latif Siddique, Rana Rajib, Younis Shahzad, Qadir Junaid, and Epps Julien. 2018. Transfer learning for improving speech emotion classification accuracy. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). ISCA, 257261.Google ScholarGoogle ScholarCross RefCross Ref
  189. [189] Wang Changhan, Pino Juan, and Gu Jiatao. 2020. Improving cross-lingual transfer learning for end-to-end speech recognition with speech translation. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH. ISCA, 47314735.Google ScholarGoogle ScholarCross RefCross Ref
  190. [190] Yi Jiangyan, Tao Jianhua, Wen Zhengqi, and Bai Ye. 2018. Language-adversarial transfer learning for low-resource speech recognition. IEEE/ACM Trans. Aud. Speech Lang. Process. 27, 3 (2018), 621630.Google ScholarGoogle ScholarDigital LibraryDigital Library
  191. [191] Deng Jun, Zhang Zixing, Eyben Florian, and Schuller Björn. 2014. Autoencoder-based unsupervised domain adaptation for speech emotion recognition. IEEE Sign. Process. Lett. 21, 9 (2014), 10681072.Google ScholarGoogle ScholarCross RefCross Ref
  192. [192] Sun Sining, Zhang Binbin, Xie Lei, and Zhang Yanning. 2017. An unsupervised deep domain adaptation approach for robust speech recognition. Neurocomputing 257 (2017), 7987.Google ScholarGoogle ScholarCross RefCross Ref
  193. [193] Sun Sining, Yeh Ching-Feng, Hwang Mei-Yuh, Ostendorf Mari, and Xie Lei. 2018. Domain adversarial training for accented speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, 48544858.Google ScholarGoogle ScholarDigital LibraryDigital Library
  194. [194] Fang Xin, Zou Liang, Li Jin, Sun Lei, and Ling Zhen-Hua. 2019. Channel adversarial training for cross-channel text-independent speaker recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 62216225.Google ScholarGoogle ScholarCross RefCross Ref
  195. [195] Auer Peter. 2013. Code-Switching in Conversation: Language, Interaction and Identity. Routledge.Google ScholarGoogle ScholarCross RefCross Ref
  196. [196] Padhi Trideba, Biswas Astik, Wet Febe de, Westhuizen Ewald van der, and Niesler Thomas. 2020. Multilingual bottleneck features for improving ASR performance of code-switched speech in under-resourced languages. In Proceedings of the Workshop on Speech Technologies for Code-Switching in Multilingual Communities (WSTCSMC’20), 65.Google ScholarGoogle Scholar
  197. [197] Lyu Dau-Cheng, Chng Eng-Siong, and Li Haizhou. 2013. Language diarization for code-switch conversational speech. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). IEEE, 73147318.Google ScholarGoogle ScholarCross RefCross Ref
  198. [198] Mishra Jagabandhu, Agarwal Ayush, and Prasanna S. R. Mahadeva. 2021. Spoken language diarization using an attention based neural network. In Proceedings of the National Conference on Communications (NCC’21). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  199. [199] Spoorthy V., Thenkanidiyoor Veena, and Dinesh Dileep Aroor. 2018. SVM based language diarization for code-switched bilingual Indian speech using bottleneck features. In Proceedings of the Spoken Language Technologies for Under-Resourced Languages (SLTU’18). ISCA, 132136.Google ScholarGoogle Scholar
  200. [200] Wiesner Matthew, Sarma Mousmita, Arora Ashish, Raj Desh, Gao Dongji, Huang Ruizhe, Preet Supreet, Johnson Moris, Iqbal Zikra, Goel Nagendra, et al. 2021. Training hybrid models on noisy transliterated transcripts for code-switched speech recognition. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21), 29062910.Google ScholarGoogle Scholar
  201. [201] Kumar Mari Ganesh, Kuriakose Jom, Thyagachandran Anand, A. Arun Kumar, Seth Ashish, Prasad Lodagala V. S. V. Durga, Jaiswal Saish, Prakash Anusha, and Murthy Hema A.. 2021. Dual script E2E framework for multilingual and code-switching ASR. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21). ISCA, 24412445.Google ScholarGoogle ScholarCross RefCross Ref
  202. [202] Sailor Hardik, T. Kiran Praveen, Agrawal Vikas, Jain Abhinav, and Pandey Abhishek. 2021. SRI-B end-to-end system for multilingual and code-switching ASR challenges for low resource Indian languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21). ISCA, 24562460.Google ScholarGoogle ScholarCross RefCross Ref
  203. [203] Rangan Pradeep, Teki Sundeep, and Misra Hemant. 2020. Exploiting spectral augmentation for code-switched spoken language identification. Proceedings of the Workshop on Speech Technologies for Code-Switching in Multilingual Communities (WSTCSMC’20), 36.Google ScholarGoogle Scholar
  204. [204] Nagarsheth J. A. C. Parav and Chandran Jehoshaph Akshay. 2020. Language identification for code-mixed Indian languages in the wild. Proceedings of the Workshop on Speech Technologies for Code-Switching in Multilingual Communities (WSTCSMC’20), 48.Google ScholarGoogle Scholar
  205. [205] Manjunath K. E.. 2022. Applications of multilingual phone recognition in code-switched and non-code-switched scenarios. In Multilingual Phone Recognition in Indian Languages. Springer, 6783.Google ScholarGoogle ScholarCross RefCross Ref
  206. [206] Moro-Velazquez Laureano, Villalba Jesus, and Dehak Najim. 2020. Using X-vectors to automatically detect Parkinson’s disease from speech. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 11551159.Google ScholarGoogle ScholarCross RefCross Ref
  207. [207] Pulido María Luisa Barragán, Hernández Jesús Bernardino Alonso, Ballester Miguel Ángel Ferrer, González Carlos Manuel Travieso, Mekyska Jiří, and Smékal Zdeněk. 2020. Alzheimer’s disease and automatic speech analysis: A review. Expert Syst. Appl. 150 (2020), 113213.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An Overview of Indian Spoken Language Recognition from Machine Learning Perspective

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Asian and Low-Resource Language Information Processing
          ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 6
          November 2022
          372 pages
          ISSN:2375-4699
          EISSN:2375-4702
          DOI:10.1145/3568970
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 November 2022
          • Online AM: 9 March 2022
          • Accepted: 1 March 2022
          • Revised: 23 February 2022
          • Received: 8 May 2021
          Published in tallip Volume 21, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!