Abstract
In the Thai language, tone information is necessary for Thai speech recognition systems. Previous studies show that many acoustic cues are attributed to shapes of tones. Nevertheless, most Thai tone classification studies mainly adopted F0 values and their derivatives without considering other acoustic features. In this article, other acoustic features for Thai tone classification are investigated. In the experiment, energy values and spectral information represented by three spectral-based features including the LPC-based feature, PLP-based feature, and MFCC-based feature are applied to the HCRF-based Thai tone classification, which was reported as the best approach for Thai tone classification. The energy values provide an error rate reduction of 22.40% in the isolated word scenario, while there are slight improvements in the continuous speech scenario. On the contrary, spectral-based features greatly contribute to Thai tone classification in the continuous-speech scenario, whereas spectral-based features slightly degrade performances in the isolated-word scenario. The best achievement in the continuous-speech scenario is obtained from the PLP-based feature, which yields an error rate reduction of 13.90%. Therefore, findings in this article are that energy values and spectral-based features, especially the PLP-based feature, are the main contributors to the improvement of the performances of Thai tone classification in the isolated-word scenario and the continuous-speech scenario, respectively.
- L. R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE, 1989, 257--286.Google Scholar
Cross Ref
- J. R. Glass. 2003. A probabilistic framework for segment-based speech recognition. In Computer, Speech, and Language 17, 2--3, 137--152.Google Scholar
Cross Ref
- Y. H. Sung and D. Jurafsky. 2009. Hidden conditional random fields for phone recognition. In Proceedings of Automatic Speech Recognition Understanding Workshop. 107--112.Google Scholar
- N. Kertkeidkachorn, S. Chanjaradwichai, T. Suri, K. Likitsupin, K. Vorapatratorn, P. Hirankan, W. Limpanadusadee, S. Chuetanapinyo, K. Pitakpawatkul, N. Puangsri, N. Tangsirirat, K. Trakulsuk, P. Punyabukkana, and A. Suchato. 2012a. The CU-MFEC corpus for Thai and English spelling speech recognition. In Proceedings of Oriental-COCOSDA 2012.Google Scholar
- N. Kertkeidkachorn, P. Punyabukkana, and A. Suchato. 2014a. Using tone information in Thai spelling speech recognition. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (PACLIC).Google Scholar
- H. Wei, X. Wang, H. Wu, D. Luo, and X. Wu. 2008. Exploiting prosodic and lexical features for tone modelling in a conditional random field framework. In Proceedings of Acoustics, Speech and Signal Processing. 4549--4552.Google Scholar
- H. Q. Nguyen, P. Nocera, E. Castelli, and L. T. Van. 2008a. Using tone information for Vietnamese continuous speech recognition. In Proceedings of Research, Innovation and Vision for the Future. 103--106.Google Scholar
- N. Thubthong, B. Kijsirikul, and A. Pusittrakul. 2002. A method for isolated Thai tone recognition using combination of neural networks. Computational Intelligence 18, 3, 313--335.Google Scholar
Cross Ref
- N. Thubthong and B. Kijsirikul. 2002. An empirical study for constructing Thai tone models. In Proceedings of the 5th Symposium on Natural Language Processing and Oriental COCOSDA Workshop. 179--186.Google Scholar
- L. Tan, M. Karnjanadecha, and T. Khaorapapong. 2004. A study of tone classification for continuous Thai speech recognition. In Proceedings of 8th International Conference on Spoken Language Processing. 3033--3036.Google Scholar
- S. Maleerat, N. Supot, and H. Choochart. 2009. Tone classification for isolated Thai words using multi-layer perceptron. In Proceedings of the World Congress on Engineering and Computer Science. 1322--1325.Google Scholar
- A. Tungthangthum. 1998. Tone recognition for Thai. In Proceedings of IEEE Asia-Pacific Conference on Circuits and System. 157--160.Google Scholar
Cross Ref
- Z. Xing, J. Pei, and E. Keogh. 2010. A brief survey on sequence classification. In ACM Special Interest Group on Knowledge Discovery in Data Explorations 12, 1, 40--48. Google Scholar
Digital Library
- A. Gunawardana, Mahajan, M. A. Acero, and J. C. Platt. 2005. Hidden conditional random fields for phone classification. In Proceedings of 9th International Conference on Spoken Language Processing. 1117--1120.Google Scholar
- N. Kertkeidkachorn, P. Punyabukkana, and A. Suchato. 2014b. A hidden conditional random field-based approach for Thai tone classification. In Engineering Journal 18, 3, 99--122.Google Scholar
Cross Ref
- N. Kertkeidkachorn, S. Vorapatratorn, S. Tangruamsub, P. Punyabukkana, and A. Suchato. 2012b. Contribution of spectral shapes to tone perception. In Proceedings of 13th Annual Conference of the International Speech Communication Association.Google Scholar
- S. Luksaneeyanawin. 1998. Intonation in Thai. In Intonation Systems: A Survey of Twenty Languages, D. Hirst and A. Di Cristo (Eds.). Cambridge University Press, New York, NY, 376--394.Google Scholar
- T. Lee, P. C. Ching, L. W. Chan, Y. H. Cheng, and B. Mark. 1995. Tone recognition of isolated Cantonese syllables. In IEEE Transactions on Speech Audio Processing 3, 3, 204--209.Google Scholar
Cross Ref
- F. H. L. Jian. 1998. Classification of Taiwanese tones based on pitch and energy movement. In Proceedings of International Conference on Spoken Language Processing. 329--332.Google Scholar
- L. Xu, W. Zhang, N. Zhou, C. Y. Lee, Y. Li, X. Chen, and X. Zhao. 2006. Mandarin Chinese tone recognition with an artificial neural network. In Journal of Otology 1, 1, 3--34.Google Scholar
- Y. Tian, J.-L. Zhou, M. Chu, and E. Chang. 2004. Tone recognition with fractionized models and outlined features. In Proceedings of Acoustics, Speech, and Signal Processing 105--108.Google Scholar
- J. Dong and C. Li. 2011. A comparative study of the classification techniques in isolated Mandarin syllable tone recognition. In Proceedings of the 49th Annual Southeast Regional Conference. 263--269. Google Scholar
Digital Library
- M. Y. Kristine and W. L. Hiu. 2013. The role of creaky voice in Cantonese tonal perception. Journal of the Acoustical Society of America 136, 3, 1320--1333.Google Scholar
- H. Q. Nguyen, P. Nocera, E. Castelli, and T. V. Loan. 2008b. Tone recognition of Vietnamese continuous speech using Hidden Markov Model. In Proceedings of Communications and Electronics. 235--239.Google Scholar
- T. L. Nguyen, A. Michaud, D. D. Tran, D. K. Mac. 2013. The interplay of intonation and complex lexical tones: How speaker attitudes affect the realization of glottalization on Vietnamese sentence-final particles. In Proceedings of Insterspeech 2013.Google Scholar
- G. James. 2013. Perceptual cues to lexical tone in Burmese. LSA Annual Meeting, Elangue.net, 2013.Google Scholar
- M. Garellek, P. Keating, C. M. Esposito, and J. Kreiman Jody. 2013. Voice quality and tone identification in White Hmong. Journal of the Acoustical Society of America 133, 2, 1078--1089.Google Scholar
Cross Ref
- WaveSurfer. Speech, Music and Hearing part of School of Computer Science and Communication. 2012. Retrieved October 12, 2015 from http://www.speech.kth.se/wavesurfer/.Google Scholar
- S. Kasuriya, V. Sornlertlamvanich, P. Cotsomrong, S. Kanokphara, and N. Thatphithakkul. 2003. Thai speech corpus for speech recognition. In International Conference on Speech Databases and Assessments Oriental-COCOSDA. 54--61.Google Scholar
- C. Wutiwiwatchai and S. Furui. 2007. Thai speech processing technology: A review. In Speech Communication 49, 18--27. Google Scholar
Digital Library
- C. Wutiwiwatchai, K. Thangthai, and P. Sertsi. 2012. Thai ASR development for network-based speech translation. In International Conference on Speech Databases and Assessments Oriental-COCOSDA.Google Scholar
- P. Boersma and D. Weenink. 2011 Praat 5. 3.11, a system for doing phonetics by computer. Retrieved October 12, 2015 from http://www.fon.hum.uva.nl/praat/.Google Scholar
- P. Boersma. 1993. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of 17 Institute of Phonetic Sciences. University of Amsterdam. 97--110.Google Scholar
- S. Liu and A. G. Samuel. 2004. Perception of Mandarin lexical tones when F0 information is neutralized. Language and Speech 47, 109.Google Scholar
Cross Ref
- G. Lv and H. Zhao. 2010. Acoustic analyses of whispered Mandarin. In Proceedings of International Congress on Image and Signal Processing (CISP’10).Google Scholar
- X. Chen and H. Zhao. 2008. Relationship between fundamental and formant frequency in whispered Mandarin. In Proceedings of International Conference on Audio, Language and Image Processing.Google Scholar
- L. P. Morency, C. M. Christoudias, A. Quattoni, H. Salamin, G. Stratou, and S. Wang. 2010. Hidden-state Conditional Random Field (HCRF) Library. Retrieved October 12, 2015 from http://sourceforge.net/projects/hcrf/.Google Scholar
- S. Young, G. Evermann, M. Galse, D. Kershaw, and G. Moore. 2009. Hidden Markov model toolkit -- speech recognition toolkit. Retrieved October 12, 2015 from http://htk.eng.cam.ac.uk.Google Scholar
Recommendations
Measuring Norwegian dialect distances using acoustic features
Levenshtein distance has become a popular tool for measuring linguistic dialect distances, and has been applied to Irish Gaelic, Dutch, German and other dialect groups. The method, in the current state of the art, depends upon phonetic transcriptions, ...
Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers
Spasmodic Dysphonia is a voice disorder caused due to spasm of involuntary muscles in the voice box. These spasms can leads to breathy, soundless voice breaks, strangled voice by interrupting the opening of the vocal folds. There is no specific test for ...






Comments