skip to main content
research-article

Acoustic Features for Hidden Conditional Random Fields--Based Thai Tone Classification

Published:11 December 2015Publication History
Skip Abstract Section

Abstract

In the Thai language, tone information is necessary for Thai speech recognition systems. Previous studies show that many acoustic cues are attributed to shapes of tones. Nevertheless, most Thai tone classification studies mainly adopted F0 values and their derivatives without considering other acoustic features. In this article, other acoustic features for Thai tone classification are investigated. In the experiment, energy values and spectral information represented by three spectral-based features including the LPC-based feature, PLP-based feature, and MFCC-based feature are applied to the HCRF-based Thai tone classification, which was reported as the best approach for Thai tone classification. The energy values provide an error rate reduction of 22.40% in the isolated word scenario, while there are slight improvements in the continuous speech scenario. On the contrary, spectral-based features greatly contribute to Thai tone classification in the continuous-speech scenario, whereas spectral-based features slightly degrade performances in the isolated-word scenario. The best achievement in the continuous-speech scenario is obtained from the PLP-based feature, which yields an error rate reduction of 13.90%. Therefore, findings in this article are that energy values and spectral-based features, especially the PLP-based feature, are the main contributors to the improvement of the performances of Thai tone classification in the isolated-word scenario and the continuous-speech scenario, respectively.

References

  1. L. R. Rabiner. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of the IEEE, 1989, 257--286.Google ScholarGoogle ScholarCross RefCross Ref
  2. J. R. Glass. 2003. A probabilistic framework for segment-based speech recognition. In Computer, Speech, and Language 17, 2--3, 137--152.Google ScholarGoogle ScholarCross RefCross Ref
  3. Y. H. Sung and D. Jurafsky. 2009. Hidden conditional random fields for phone recognition. In Proceedings of Automatic Speech Recognition Understanding Workshop. 107--112.Google ScholarGoogle Scholar
  4. N. Kertkeidkachorn, S. Chanjaradwichai, T. Suri, K. Likitsupin, K. Vorapatratorn, P. Hirankan, W. Limpanadusadee, S. Chuetanapinyo, K. Pitakpawatkul, N. Puangsri, N. Tangsirirat, K. Trakulsuk, P. Punyabukkana, and A. Suchato. 2012a. The CU-MFEC corpus for Thai and English spelling speech recognition. In Proceedings of Oriental-COCOSDA 2012.Google ScholarGoogle Scholar
  5. N. Kertkeidkachorn, P. Punyabukkana, and A. Suchato. 2014a. Using tone information in Thai spelling speech recognition. In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (PACLIC).Google ScholarGoogle Scholar
  6. H. Wei, X. Wang, H. Wu, D. Luo, and X. Wu. 2008. Exploiting prosodic and lexical features for tone modelling in a conditional random field framework. In Proceedings of Acoustics, Speech and Signal Processing. 4549--4552.Google ScholarGoogle Scholar
  7. H. Q. Nguyen, P. Nocera, E. Castelli, and L. T. Van. 2008a. Using tone information for Vietnamese continuous speech recognition. In Proceedings of Research, Innovation and Vision for the Future. 103--106.Google ScholarGoogle Scholar
  8. N. Thubthong, B. Kijsirikul, and A. Pusittrakul. 2002. A method for isolated Thai tone recognition using combination of neural networks. Computational Intelligence 18, 3, 313--335.Google ScholarGoogle ScholarCross RefCross Ref
  9. N. Thubthong and B. Kijsirikul. 2002. An empirical study for constructing Thai tone models. In Proceedings of the 5th Symposium on Natural Language Processing and Oriental COCOSDA Workshop. 179--186.Google ScholarGoogle Scholar
  10. L. Tan, M. Karnjanadecha, and T. Khaorapapong. 2004. A study of tone classification for continuous Thai speech recognition. In Proceedings of 8th International Conference on Spoken Language Processing. 3033--3036.Google ScholarGoogle Scholar
  11. S. Maleerat, N. Supot, and H. Choochart. 2009. Tone classification for isolated Thai words using multi-layer perceptron. In Proceedings of the World Congress on Engineering and Computer Science. 1322--1325.Google ScholarGoogle Scholar
  12. A. Tungthangthum. 1998. Tone recognition for Thai. In Proceedings of IEEE Asia-Pacific Conference on Circuits and System. 157--160.Google ScholarGoogle ScholarCross RefCross Ref
  13. Z. Xing, J. Pei, and E. Keogh. 2010. A brief survey on sequence classification. In ACM Special Interest Group on Knowledge Discovery in Data Explorations 12, 1, 40--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gunawardana, Mahajan, M. A. Acero, and J. C. Platt. 2005. Hidden conditional random fields for phone classification. In Proceedings of 9th International Conference on Spoken Language Processing. 1117--1120.Google ScholarGoogle Scholar
  15. N. Kertkeidkachorn, P. Punyabukkana, and A. Suchato. 2014b. A hidden conditional random field-based approach for Thai tone classification. In Engineering Journal 18, 3, 99--122.Google ScholarGoogle ScholarCross RefCross Ref
  16. N. Kertkeidkachorn, S. Vorapatratorn, S. Tangruamsub, P. Punyabukkana, and A. Suchato. 2012b. Contribution of spectral shapes to tone perception. In Proceedings of 13th Annual Conference of the International Speech Communication Association.Google ScholarGoogle Scholar
  17. S. Luksaneeyanawin. 1998. Intonation in Thai. In Intonation Systems: A Survey of Twenty Languages, D. Hirst and A. Di Cristo (Eds.). Cambridge University Press, New York, NY, 376--394.Google ScholarGoogle Scholar
  18. T. Lee, P. C. Ching, L. W. Chan, Y. H. Cheng, and B. Mark. 1995. Tone recognition of isolated Cantonese syllables. In IEEE Transactions on Speech Audio Processing 3, 3, 204--209.Google ScholarGoogle ScholarCross RefCross Ref
  19. F. H. L. Jian. 1998. Classification of Taiwanese tones based on pitch and energy movement. In Proceedings of International Conference on Spoken Language Processing. 329--332.Google ScholarGoogle Scholar
  20. L. Xu, W. Zhang, N. Zhou, C. Y. Lee, Y. Li, X. Chen, and X. Zhao. 2006. Mandarin Chinese tone recognition with an artificial neural network. In Journal of Otology 1, 1, 3--34.Google ScholarGoogle Scholar
  21. Y. Tian, J.-L. Zhou, M. Chu, and E. Chang. 2004. Tone recognition with fractionized models and outlined features. In Proceedings of Acoustics, Speech, and Signal Processing 105--108.Google ScholarGoogle Scholar
  22. J. Dong and C. Li. 2011. A comparative study of the classification techniques in isolated Mandarin syllable tone recognition. In Proceedings of the 49th Annual Southeast Regional Conference. 263--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Y. Kristine and W. L. Hiu. 2013. The role of creaky voice in Cantonese tonal perception. Journal of the Acoustical Society of America 136, 3, 1320--1333.Google ScholarGoogle Scholar
  24. H. Q. Nguyen, P. Nocera, E. Castelli, and T. V. Loan. 2008b. Tone recognition of Vietnamese continuous speech using Hidden Markov Model. In Proceedings of Communications and Electronics. 235--239.Google ScholarGoogle Scholar
  25. T. L. Nguyen, A. Michaud, D. D. Tran, D. K. Mac. 2013. The interplay of intonation and complex lexical tones: How speaker attitudes affect the realization of glottalization on Vietnamese sentence-final particles. In Proceedings of Insterspeech 2013.Google ScholarGoogle Scholar
  26. G. James. 2013. Perceptual cues to lexical tone in Burmese. LSA Annual Meeting, Elangue.net, 2013.Google ScholarGoogle Scholar
  27. M. Garellek, P. Keating, C. M. Esposito, and J. Kreiman Jody. 2013. Voice quality and tone identification in White Hmong. Journal of the Acoustical Society of America 133, 2, 1078--1089.Google ScholarGoogle ScholarCross RefCross Ref
  28. WaveSurfer. Speech, Music and Hearing part of School of Computer Science and Communication. 2012. Retrieved October 12, 2015 from http://www.speech.kth.se/wavesurfer/.Google ScholarGoogle Scholar
  29. S. Kasuriya, V. Sornlertlamvanich, P. Cotsomrong, S. Kanokphara, and N. Thatphithakkul. 2003. Thai speech corpus for speech recognition. In International Conference on Speech Databases and Assessments Oriental-COCOSDA. 54--61.Google ScholarGoogle Scholar
  30. C. Wutiwiwatchai and S. Furui. 2007. Thai speech processing technology: A review. In Speech Communication 49, 18--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Wutiwiwatchai, K. Thangthai, and P. Sertsi. 2012. Thai ASR development for network-based speech translation. In International Conference on Speech Databases and Assessments Oriental-COCOSDA.Google ScholarGoogle Scholar
  32. P. Boersma and D. Weenink. 2011 Praat 5. 3.11, a system for doing phonetics by computer. Retrieved October 12, 2015 from http://www.fon.hum.uva.nl/praat/.Google ScholarGoogle Scholar
  33. P. Boersma. 1993. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. In Proceedings of 17 Institute of Phonetic Sciences. University of Amsterdam. 97--110.Google ScholarGoogle Scholar
  34. S. Liu and A. G. Samuel. 2004. Perception of Mandarin lexical tones when F0 information is neutralized. Language and Speech 47, 109.Google ScholarGoogle ScholarCross RefCross Ref
  35. G. Lv and H. Zhao. 2010. Acoustic analyses of whispered Mandarin. In Proceedings of International Congress on Image and Signal Processing (CISP’10).Google ScholarGoogle Scholar
  36. X. Chen and H. Zhao. 2008. Relationship between fundamental and formant frequency in whispered Mandarin. In Proceedings of International Conference on Audio, Language and Image Processing.Google ScholarGoogle Scholar
  37. L. P. Morency, C. M. Christoudias, A. Quattoni, H. Salamin, G. Stratou, and S. Wang. 2010. Hidden-state Conditional Random Field (HCRF) Library. Retrieved October 12, 2015 from http://sourceforge.net/projects/hcrf/.Google ScholarGoogle Scholar
  38. S. Young, G. Evermann, M. Galse, D. Kershaw, and G. Moore. 2009. Hidden Markov model toolkit -- speech recognition toolkit. Retrieved October 12, 2015 from http://htk.eng.cam.ac.uk.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!