skip to main content
research-article

Development and Analysis of Speech Recognition Systems for Assamese Language Using HTK

Published:18 October 2017Publication History
Skip Abstract Section

Abstract

Language analysis is very important for the native speaker to connect with the digital world. Assamese is a relatively unexplored language. In this report, we analyze different aspects of speech-to-text processing, starting from building a speech corpus, defining syllable rules, and finally developing a speech search engine of Assamese. We have collected about 20 hours of speech in three (viz., read, extempore, and conversation) modes and transcribed it. We also discuss some issues and challenges faced during development of the corpus. We have developed an automatic syllabification model with 11 rules for the Assamese language and found an accuracy of more than 95% in our result. We found 12 different syllable patterns where 5 are found most frequent. The maximum length of a syllable found is four letters. With the help of Hidden Markov Model Toolkit (HTK) 3.5, we used deep learning based neural network for our speech recognition model, where we obtained 78.05% accuracy for automatic transcription of Assamese speech.

References

  1. Connie R. Adsett and Yannick Marchand. 2009. A comparison of data-driven automatic syllabification methods. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE’09). Springer-Verlag, Berlin, 174--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Susan Bartlett, Grzegorz Kondrak, and Colin Cherry. 2008. Automatic syllabification with structured SVMs for letter-to-phoneme conversion. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, 568--576.Google ScholarGoogle Scholar
  3. Sruti Sruba Bharali and Sanjib Kr Kalita. 2015. A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology 18, 4 (2015), 673--684. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gosse Bouma. 2003. Finite state methods for hyphenation. Natural Language Engineering 9, 1 (2003), 5--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Shuangyu Chang, Lokendra Shastri, and Steven Greenberg. 2000. Automatic phonetic transcription of spontaneous speech (American English). In Proceedings of the INTERSPEECH. Beijing, China, 330--333.Google ScholarGoogle Scholar
  6. Xie Chen, Xunying Liu, Yanmin Qian, M. J. F. Gales, and Philip C. Woodland. 2016. CUED-RNNLM—An open-source toolkit for efficient training and evaluation of recurrent neural network language models. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, Shanghai, China, 6000--6004.Google ScholarGoogle Scholar
  7. P. Coxhead. 2007. Phones and Phonemes. (2007).Google ScholarGoogle Scholar
  8. Mina Dan. 1992. Some issues in metrical phonology of Bangla: The indigenous research tradition. Unpublished Ph. D. Dissertation. Deccan College. University of Poona. Poona (1992).Google ScholarGoogle Scholar
  9. David Eddington, Rebecca Treiman, and Dirk Elzinga. 2013. Syllabification of American English: Evidence from a large-scale experiment. Part II. Journal of Quantitative Linguistics 20, 2 (2013), 75--93. Google ScholarGoogle ScholarCross RefCross Ref
  10. Mark J. F. Gales, Kate M. Knill, Anton Ragni, and Shakti P. Rath. May, 2014. Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED. In SLTU. 16--23.Google ScholarGoogle Scholar
  11. Mircea Giurgiu and Ahsanul Kabir. 2012. Automatic transcription and speech recognition of Romanian corpus RO-GRID. In Proceedings of the 35th International Conference on Telecommunications and Signal Processing. 465--468. Google ScholarGoogle ScholarCross RefCross Ref
  12. N. Kalyani and Dr. K. V. N. Sunitha. 2010. Syllable analysis to build a dictation system in Telugu language. arXiv preprint arXiv:1001.2263 (2010).Google ScholarGoogle Scholar
  13. Somdev Kar. 2009. The Syllable Structure of Bangla in Optimality Theory and Its Application to the Analysis of Verbal Inflectional Paradigms in Distributed Morphology. Ph.D. Dissertation. Universität Tübingen. Tübingen.Google ScholarGoogle Scholar
  14. S. P. Kishore, Rajeev Sangal, and M. Srinivas. 2002. Building Hindi and Telugu voices using Festvox. In Proceedings of the International Conference on Natutal Language Processing 2002 (ICON’02).Google ScholarGoogle Scholar
  15. Peter Ladefoged. 1995. Elements of Acoustic Phonetics. University of Chicago Press.Google ScholarGoogle Scholar
  16. Peter Ladefoged and Keith Johnstone. 2011. A Course in Phonetics. CengageBrain. com.Google ScholarGoogle Scholar
  17. Antoine Laurent, Teva Merlin, Sylvain Meignier, Yannick Esteve, and Paul Deléglise. 2009. Iterative filtering of phonetic transcriptions of proper nouns. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 4265--4268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hong Leung and V. Zue. 1984. A procedure for automatic alignment of phonetic transcriptions with continuous speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing., Vol. 9. 73--76. Google ScholarGoogle ScholarCross RefCross Ref
  19. Stephen E. Levinson, Mark Y. Liberman, Andrej Ljolje, and L. G. Miller. 1989. Speaker independent phonetic transcription of fluent speech for large vocabulary speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 441--444. Google ScholarGoogle ScholarCross RefCross Ref
  20. Min-Siong Liang, Ren-Yuan Lyu, and Yuang-Chin Chiang. 2007. Phonetic transcription using speech recognition technique considering variations in pronunciation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 4. 109--112. Google ScholarGoogle ScholarCross RefCross Ref
  21. Shakuntala Mahanta. 2001. Some Aspects of Prominence in Assamese and Assamese English. Ph.D. Dissertation. M. Phil Dissertation, Central Institute of English and Foreign Languages, Hyderabad.Google ScholarGoogle Scholar
  22. Yannick Marchand and Robert I. Damper. 2007. Can syllabification improve pronunciation by analogy of English?Natural Language Engineering 13, 1 (2007), 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Leena Mary and B. Yegnanarayana. 2008. Prosodic features for language identification. In Proceedings of the International Conference on Signal Processing, Communications and Networking, 2008. ICSCN’08. IEEE, 57--62. Google ScholarGoogle ScholarCross RefCross Ref
  24. Bhargab Medhi and P. H. Talukdar. 2015. Isolated Assamese speech recognition using artificial neural network. In Proceedings of the 2015 International Symposium on Advanced Computing and Communication (ISACC). IEEE, 141--148. Google ScholarGoogle ScholarCross RefCross Ref
  25. Hafiz Musa, Rabiah A. Kadir, Azreen Azman, and M. Taufik Abdullah. 2011. Syllabification algorithm based on syllable rules matching for Malay language. In Proceedings of the 10th WSEAS International Conference on Applied Computer and Applied Computational Science (ACACOS’11). World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, 279--286.Google ScholarGoogle Scholar
  26. T. Nagarajan, Hema A. Murthy, and N. Hemalatha. 2004. Automatic segmentation and labeling of continuous speech without bootstrapping. In Proceedings of EUSIPCO. Vienna, Austria, 561--564.Google ScholarGoogle Scholar
  27. Sanghamitra Nath, Himangshu Sarma, and Utpal Sharma. 2014. A preliminary study on the VOT patterns of the Assamese language and its Nalbaria variety. In Computational Linguistics and Intelligent Text Processing. Springer, Kathmandu, Nepal, 542--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hemant A. Patil, Maulik C. Madhavi, Kewal D. Malde, and Bhavik B. Vachhani. 2012. Phonetic transcription of fricatives and plosives for Gujarati and Marathi languages. In Proceedings of International Conference on Asian Language Processing. 177--180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Anton Ragni, Kate M. Knill, Shakti P. Rath, and Mark J. F. Gales. 2014. Data augmentation for low resource languages.. In INTERSPEECH. Singapore, 810--814.Google ScholarGoogle Scholar
  30. Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, and Jugal Kalita. 2009. Part of speech tagger for Assamese text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (ACLShort’09). Association for Computational Linguistics, Stroudsburg, PA, USA, 33--36. Google ScholarGoogle ScholarCross RefCross Ref
  31. N. Saharia, U. Sharma, and J. K. Kalita. 2014. Stemming resource-poor Indian languages. ACM Transactions on Asian Language Information Processing 13, 3 (2014), 14.1--14.26.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Maimaitimin Saimaiti and Zhiwei Feng. 2007. A syllabification algorithm and syllable statistics of written Uyghur. In Proceedings of the 4th Corpus Linguistics Conference.Google ScholarGoogle Scholar
  33. G. Lakshmi Sarada, A. Lakshmi, Hema A. Murthy, and T. Nagarajan. 2009. Automatic transcription of continuous speech into syllable-like units for Indian languages. Sadhana 34, 2 (2009), 221--233. Google ScholarGoogle ScholarCross RefCross Ref
  34. Himangshu Sarma, Navanath Saharia, Utpal Sharma, Smriti Kumar Sinha, and Mancha Jyoti Malakar. 2013. Development and transcription of Assamese speech corpus. In Proceedings of National Seminar cum Conference on Recent threads and Techniques in Computer Sciences.Google ScholarGoogle Scholar
  35. Utpal Sharma, Jugal K. Kalita, and Rajib K. Das. 2008. Acquisition of morphology of an Indic language from text corpus. ACM Transactions on Asian Language Information Processing 7, 3 (2008), 9:1--9:33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Toma Stefan-Adrian and Munteanu Doru-Petru. 2009. Rule-based automatic phonetic transcription for the Romanian language. In Proceedings of the Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. 682--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S.-A. Toma, Eugeniu Oancea, and D.-P. Munteanu. 2009. Automatic rule-based syllabication for Romanian. In Proceedings of the 5th Conference on Speech Technology and Human-Computer Dialogue (SpeD’09). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  38. John C. Wells. 2006. Phonetic transcription and analysis. Encyclopedia of Language and Linguistics. Amsterdam. Elsevier, 386--396. Google ScholarGoogle ScholarCross RefCross Ref
  39. Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, and others. 2009. The HTK book (HTK version 3.5) (version 3.5 ed.). Cambridge University Engineering Department.Google ScholarGoogle Scholar
  40. C. Zhang and P. C. Woodland. 2015. A general artificial neural network extension for HTK. In Proc. Interspeech’15. Dresden, Germany.Google ScholarGoogle Scholar

Index Terms

  1. Development and Analysis of Speech Recognition Systems for Assamese Language Using HTK

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!