Abstract
Language analysis is very important for the native speaker to connect with the digital world. Assamese is a relatively unexplored language. In this report, we analyze different aspects of speech-to-text processing, starting from building a speech corpus, defining syllable rules, and finally developing a speech search engine of Assamese. We have collected about 20 hours of speech in three (viz., read, extempore, and conversation) modes and transcribed it. We also discuss some issues and challenges faced during development of the corpus. We have developed an automatic syllabification model with 11 rules for the Assamese language and found an accuracy of more than 95% in our result. We found 12 different syllable patterns where 5 are found most frequent. The maximum length of a syllable found is four letters. With the help of Hidden Markov Model Toolkit (HTK) 3.5, we used deep learning based neural network for our speech recognition model, where we obtained 78.05% accuracy for automatic transcription of Assamese speech.
- Connie R. Adsett and Yannick Marchand. 2009. A comparison of data-driven automatic syllabification methods. In Proceedings of the 16th International Symposium on String Processing and Information Retrieval (SPIRE’09). Springer-Verlag, Berlin, 174--181. Google Scholar
Digital Library
- Susan Bartlett, Grzegorz Kondrak, and Colin Cherry. 2008. Automatic syllabification with structured SVMs for letter-to-phoneme conversion. In Proceedings of ACL-08: HLT. Association for Computational Linguistics, 568--576.Google Scholar
- Sruti Sruba Bharali and Sanjib Kr Kalita. 2015. A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language. International Journal of Speech Technology 18, 4 (2015), 673--684. Google Scholar
Digital Library
- Gosse Bouma. 2003. Finite state methods for hyphenation. Natural Language Engineering 9, 1 (2003), 5--20. Google Scholar
Digital Library
- Shuangyu Chang, Lokendra Shastri, and Steven Greenberg. 2000. Automatic phonetic transcription of spontaneous speech (American English). In Proceedings of the INTERSPEECH. Beijing, China, 330--333.Google Scholar
- Xie Chen, Xunying Liu, Yanmin Qian, M. J. F. Gales, and Philip C. Woodland. 2016. CUED-RNNLM—An open-source toolkit for efficient training and evaluation of recurrent neural network language models. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, Shanghai, China, 6000--6004.Google Scholar
- P. Coxhead. 2007. Phones and Phonemes. (2007).Google Scholar
- Mina Dan. 1992. Some issues in metrical phonology of Bangla: The indigenous research tradition. Unpublished Ph. D. Dissertation. Deccan College. University of Poona. Poona (1992).Google Scholar
- David Eddington, Rebecca Treiman, and Dirk Elzinga. 2013. Syllabification of American English: Evidence from a large-scale experiment. Part II. Journal of Quantitative Linguistics 20, 2 (2013), 75--93. Google Scholar
Cross Ref
- Mark J. F. Gales, Kate M. Knill, Anton Ragni, and Shakti P. Rath. May, 2014. Speech recognition and keyword spotting for low-resource languages: Babel project research at CUED. In SLTU. 16--23.Google Scholar
- Mircea Giurgiu and Ahsanul Kabir. 2012. Automatic transcription and speech recognition of Romanian corpus RO-GRID. In Proceedings of the 35th International Conference on Telecommunications and Signal Processing. 465--468. Google Scholar
Cross Ref
- N. Kalyani and Dr. K. V. N. Sunitha. 2010. Syllable analysis to build a dictation system in Telugu language. arXiv preprint arXiv:1001.2263 (2010).Google Scholar
- Somdev Kar. 2009. The Syllable Structure of Bangla in Optimality Theory and Its Application to the Analysis of Verbal Inflectional Paradigms in Distributed Morphology. Ph.D. Dissertation. Universität Tübingen. Tübingen.Google Scholar
- S. P. Kishore, Rajeev Sangal, and M. Srinivas. 2002. Building Hindi and Telugu voices using Festvox. In Proceedings of the International Conference on Natutal Language Processing 2002 (ICON’02).Google Scholar
- Peter Ladefoged. 1995. Elements of Acoustic Phonetics. University of Chicago Press.Google Scholar
- Peter Ladefoged and Keith Johnstone. 2011. A Course in Phonetics. CengageBrain. com.Google Scholar
- Antoine Laurent, Teva Merlin, Sylvain Meignier, Yannick Esteve, and Paul Deléglise. 2009. Iterative filtering of phonetic transcriptions of proper nouns. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 4265--4268. Google Scholar
Digital Library
- Hong Leung and V. Zue. 1984. A procedure for automatic alignment of phonetic transcriptions with continuous speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing., Vol. 9. 73--76. Google Scholar
Cross Ref
- Stephen E. Levinson, Mark Y. Liberman, Andrej Ljolje, and L. G. Miller. 1989. Speaker independent phonetic transcription of fluent speech for large vocabulary speech recognition. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. 441--444. Google Scholar
Cross Ref
- Min-Siong Liang, Ren-Yuan Lyu, and Yuang-Chin Chiang. 2007. Phonetic transcription using speech recognition technique considering variations in pronunciation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 4. 109--112. Google Scholar
Cross Ref
- Shakuntala Mahanta. 2001. Some Aspects of Prominence in Assamese and Assamese English. Ph.D. Dissertation. M. Phil Dissertation, Central Institute of English and Foreign Languages, Hyderabad.Google Scholar
- Yannick Marchand and Robert I. Damper. 2007. Can syllabification improve pronunciation by analogy of English?Natural Language Engineering 13, 1 (2007), 1--24. Google Scholar
Digital Library
- Leena Mary and B. Yegnanarayana. 2008. Prosodic features for language identification. In Proceedings of the International Conference on Signal Processing, Communications and Networking, 2008. ICSCN’08. IEEE, 57--62. Google Scholar
Cross Ref
- Bhargab Medhi and P. H. Talukdar. 2015. Isolated Assamese speech recognition using artificial neural network. In Proceedings of the 2015 International Symposium on Advanced Computing and Communication (ISACC). IEEE, 141--148. Google Scholar
Cross Ref
- Hafiz Musa, Rabiah A. Kadir, Azreen Azman, and M. Taufik Abdullah. 2011. Syllabification algorithm based on syllable rules matching for Malay language. In Proceedings of the 10th WSEAS International Conference on Applied Computer and Applied Computational Science (ACACOS’11). World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, 279--286.Google Scholar
- T. Nagarajan, Hema A. Murthy, and N. Hemalatha. 2004. Automatic segmentation and labeling of continuous speech without bootstrapping. In Proceedings of EUSIPCO. Vienna, Austria, 561--564.Google Scholar
- Sanghamitra Nath, Himangshu Sarma, and Utpal Sharma. 2014. A preliminary study on the VOT patterns of the Assamese language and its Nalbaria variety. In Computational Linguistics and Intelligent Text Processing. Springer, Kathmandu, Nepal, 542--552. Google Scholar
Digital Library
- Hemant A. Patil, Maulik C. Madhavi, Kewal D. Malde, and Bhavik B. Vachhani. 2012. Phonetic transcription of fricatives and plosives for Gujarati and Marathi languages. In Proceedings of International Conference on Asian Language Processing. 177--180. Google Scholar
Digital Library
- Anton Ragni, Kate M. Knill, Shakti P. Rath, and Mark J. F. Gales. 2014. Data augmentation for low resource languages.. In INTERSPEECH. Singapore, 810--814.Google Scholar
- Navanath Saharia, Dhrubajyoti Das, Utpal Sharma, and Jugal Kalita. 2009. Part of speech tagger for Assamese text. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (ACLShort’09). Association for Computational Linguistics, Stroudsburg, PA, USA, 33--36. Google Scholar
Cross Ref
- N. Saharia, U. Sharma, and J. K. Kalita. 2014. Stemming resource-poor Indian languages. ACM Transactions on Asian Language Information Processing 13, 3 (2014), 14.1--14.26.Google Scholar
Digital Library
- Maimaitimin Saimaiti and Zhiwei Feng. 2007. A syllabification algorithm and syllable statistics of written Uyghur. In Proceedings of the 4th Corpus Linguistics Conference.Google Scholar
- G. Lakshmi Sarada, A. Lakshmi, Hema A. Murthy, and T. Nagarajan. 2009. Automatic transcription of continuous speech into syllable-like units for Indian languages. Sadhana 34, 2 (2009), 221--233. Google Scholar
Cross Ref
- Himangshu Sarma, Navanath Saharia, Utpal Sharma, Smriti Kumar Sinha, and Mancha Jyoti Malakar. 2013. Development and transcription of Assamese speech corpus. In Proceedings of National Seminar cum Conference on Recent threads and Techniques in Computer Sciences.Google Scholar
- Utpal Sharma, Jugal K. Kalita, and Rajib K. Das. 2008. Acquisition of morphology of an Indic language from text corpus. ACM Transactions on Asian Language Information Processing 7, 3 (2008), 9:1--9:33.Google Scholar
Digital Library
- Toma Stefan-Adrian and Munteanu Doru-Petru. 2009. Rule-based automatic phonetic transcription for the Romanian language. In Proceedings of the Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns. 682--686. Google Scholar
Digital Library
- S.-A. Toma, Eugeniu Oancea, and D.-P. Munteanu. 2009. Automatic rule-based syllabication for Romanian. In Proceedings of the 5th Conference on Speech Technology and Human-Computer Dialogue (SpeD’09). IEEE, 1--6.Google Scholar
Cross Ref
- John C. Wells. 2006. Phonetic transcription and analysis. Encyclopedia of Language and Linguistics. Amsterdam. Elsevier, 386--396. Google Scholar
Cross Ref
- Steve Young, Gunnar Evermann, Mark Gales, Thomas Hain, Dan Kershaw, Xunying Liu, Gareth Moore, Julian Odell, Dave Ollason, Dan Povey, and others. 2009. The HTK book (HTK version 3.5) (version 3.5 ed.). Cambridge University Engineering Department.Google Scholar
- C. Zhang and P. C. Woodland. 2015. A general artificial neural network extension for HTK. In Proc. Interspeech’15. Dresden, Germany.Google Scholar
Index Terms
Development and Analysis of Speech Recognition Systems for Assamese Language Using HTK
Recommendations
Speech recognition with reference to Assamese language using novel fusion technique
This paper describes the implementation of a speech recognition system in Assamese language. The database for this research work consists of a vocabulary of ten Assamese words. The models for speech recognition have been trained using Hidden Markov ...
Development of Standard Yorùbá speech-to-text system using HTK
In this paper, a Standard Yorùbá speech-to-text system capable of recognizing isolated words spoken by the users based on previously stored data was designed and implemented. This system adopted syllable-based approach, and carefully-selected words were ...
A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language
This paper describes the work done in implementation of speaker independent, isolated word recognizer for Assamese language. Linear predictive coding (LPC) analysis, LPC cepstral coefficients (LPCEPSTRA), linear mel-filter bank channel outputs and mel ...






Comments