Abstract
Text-to-speech (TTS) synthesis is an active area of research to generate synthetic speech from the underlying text. Compared to English and many European languages, TTS is yet to mature in Malayalam, the principal language of the South Indian state of Kerala. A syllable has to be uttered with proper durational and prosodic characteristics to emulate natural speech. When it comes to poems in Malayalam, many of them have an inherent rhythm attached to them. In Malayalam, this property is characterized by the Vruta [28] in which the poem is written. Vruta decides the meter of narration of the poem. Therefore, it is only consequential that Vruta can give away vital cues about the durational and prosodic characteristics of the poem verses recited. This study intends to identify the features that determine the durational characteristics of a poem written in a particular Vruta and develop an algorithm to extract those features required to build a dataset to model the duration of syllable utterances for tuneful TTS in Malayalam. Poems written in three Vrutas, namely Kakali, Manjari, and Keka, are considered in this study. Nineteen extractible features from the orthographic representation of a poem are identified for this purpose. A standard dataset is built using these extracted features. Later, support vector machine and feed forward neural network based estimators are proposed to model the duration of Malayalam poem syllables for tuneful speech synthesis. The hyperparameters are optimized using the GridsearchCV algorithm from the Scikit-learn machine learning library [15].
- [1] . 2019. A gentle introduction to the rectified linear unit (ReLU). Machine Learning Mastery. Retrieved September 13, 2022 from https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/.Google Scholar
- [2] . 2021. The Unicode Standard Version 13.0. (2021). Retrieved August 28, 2021 from http://www.unicode.org/charts/PDF/U0D00.pdf.Google Scholar
- [3] 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, UK.Google Scholar
- [4] . 2018. Intonation rules for text reading. In Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, New York, NY, 135–176.Google Scholar
- [5] . 1997. Support vector regression machines. Advances in Neural Information Processing Systems 9 (1997), 155–161.Google Scholar
Digital Library
- [6] . 2017. A prediction of precipitation data based on support vector machine and particle swarm optimization (PSO-SVM) algorithms. Algorithms 10, 2 (2017), 57.Google Scholar
- [7] . 2015. Adhyathma Ramayanam. DC Books, Kottayam, Kerala.Google Scholar
- [8] . 2009. Duration Analysis and Modelling for Malayalam Text to Speech Synthesis Systems. Ph.D. Dissertation. University of Kerala, Thiruvananthapuram, Kerala.Google Scholar
- [9] . 2006. Duration analysis for Malayalam text-to-speech systems. In Proceedings of the 9th International Conference on Information Technology (ICIT’06). IEEE, Los Alamitos, CA, 129–132.Google Scholar
Digital Library
- [10] . 2008. Modeling of vowel duration in Malayalam speech using probability distribution. In Proceedings of the Conference on Speech Prosody. 6–9.Google Scholar
- [11] . 2008. A hybrid duration model using CART and HMM. In Proceedings of the 2008 IEEE Region 10 Conference(TENCON’08). IEEE, Los Alamitos, CA, 1–4.Google Scholar
Cross Ref
- [12] . 1988. Artificial neural networks. IEEE Circuits and Devices Magazine 4, 5 (1988), 3–10. Google Scholar
Cross Ref
- [13] . 2015. Pause duration model for Malayalam TTS. In Proceedings of the 2015 International Conference on Advances in Computing, Communications, and Informatics (ICACCI’15). IEEE, Los Alamitos, CA, 2206–2210.Google Scholar
Cross Ref
- [14] . 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
- [15] . 2016. Scikit-learn. In Machine Learning for Evolution Strategies. Springer, New York, NY, 45–53.Google Scholar
- [16] . 2004. Duration modeling of Indian languages Hindi and Telugu. In Proceedings of the 5th ISCA Workshop on Speech Synthesis. 197–202.Google Scholar
- [17] . 2004. Duration modeling for Hindi text-to-speech synthesis system. In Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP’04). 1–4.Google Scholar
- [18] . 1989. Significance of durational knowledge for speech synthesis system in an Indian Language. In Proceedings of the 4th IEEE Region 10 International Conference (TENCON’89). IEEE, Los Alamitos, CA, 486–489. Google Scholar
Cross Ref
- [19] . 1964. Vrutha Shilpam. Mathrubhumi Printing and Publishing Co., Ernakulam, Kerala.Google Scholar
- [20] . 2000. Vyloppilli Kavithakal. DC Books, Kottayam, Kerala.Google Scholar
- [21] . 2021. Malayalam Poem Syllable Duration Dataset. Retrieved September 13, 2022 from Google Scholar
Cross Ref
- [22] . 2020. Krishna Gadha. DC Books, Kottayam, Kerala.Google Scholar
- [23] . 2011. Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology 14, 3 (2011), 167–181. Google Scholar
Digital Library
- [24] . 2020. Automatic multiclass document classification of Hindi poems using machine learning techniques. In Proceedings of the 2020 International Conference for Emerging Technology (INCET’20). IEEE, Los Alamitos, CA, 1–5.Google Scholar
Cross Ref
- [25] . 2020. Model for classification of poems in Hindi language based on Ras. In Smart Systems and IoT: Innovations in Computing. Springer, New York, NY, 655–661.Google Scholar
- [26] . 1980. Swana Vijnanam. Keralabhasha Institute, Thiruvananthapuram, Kerala.Google Scholar
- [27] . 2015. Duration modeling for text to speech synthesis system using festival speech engine developed for Malayalam language. In Proceedings of the 2015 International Conference on Circuits, Power, and Computing Technologies (ICCPCT’15). IEEE, Los Alamitos, CA, 1–5.Google Scholar
Cross Ref
- [28] . 1904. Vruthamanjari. Current Books, Kottayam, Kerala.Google Scholar
- [29] . 1986. Keralapanineeyam. DC Books, Kottayam, Kerala.Google Scholar
- [30] . 2010. Selection of suitable features for modeling the durations of syllables. Journal of Software Engineering and Applications 3, 12 (2010), 1107.Google Scholar
Cross Ref
- [31] . 2005. Modeling syllable duration in Indian languages using support vector machines. In Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing. IEEE, Los Alamitos, CA, 258–263.Google Scholar
Cross Ref
- [32] . 2007. Modeling durations of syllables using neural networks. Computer Speech & Language 21, 2 (2007), 282–295.Google Scholar
Digital Library
- [33] . 2014. Duration modeling by multi-models based on vowel production characteristics. In Proceedings of the 11th International Conference on Natural Language Processing (ICNLP’14). 39–47.Google Scholar
- [34] . 2014. Duration modeling in Hindi. International Journal of Computer Applications 97, 6 (2014), 42–46.Google Scholar
Cross Ref
- [35] . 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.Google Scholar
- [36] . 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536.Google Scholar
Cross Ref
- [37] . 1986. Durational analysis of Kannada vowels. Journal of Acoustical Society of India 14, 2 (1986), 34–41.Google Scholar
- [38] . 2015. Duration modelling using neural networks for Hindi TTS system considering position of syllable in a word. Procedia Computer Science 46 (2015), 60–67. Google Scholar
Digital Library
- [39] . 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199–222.Google Scholar
Digital Library
- [40] . 2012. Clustering of duration patterns in speech for text-to-speech synthesis. In Proceedings of the 2012 Annual IEEE India Conference (INDICON’12). IEEE, Los Alamitos, CA, 1122–1127.Google Scholar
Cross Ref
- [41] . 2012. Lecture 6.5-RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2 (2012), 26–31.Google Scholar
Index Terms
Identification and Extraction of Features from Malayalam Poems for Analyzing Syllable Duration Patterns
Recommendations
Language identification by using syllable-based duration classification on code-switching speech
ISCSLP'06: Proceedings of the 5th international conference on Chinese Spoken Language ProcessingMany approaches to automatic spoken language identification (LID) on monolingual speech are successfully, but LID on the code-switching speech identifying at least 2 languages from one acoustic utterance challenges these approaches. In [6], we have ...
Acoustic feature extraction method for robust speaker identification
When there is a mismatch between the acoustic training environment and the testing environment, the performance of automatic speaker identification systems degrades significantly. A robust feature extraction method for speaker recognition based on the ...
Lip Reading using Simple Dynamic Features and a Novel ROI for Feature Extraction
SPML '18: Proceedings of the 2018 International Conference on Signal Processing and Machine LearningDeaf or hard-of-hearing people mostly rely on lip-reading to understand speech. They demonstrate the ability of humans to understand speech from visual cues only. Automatic lip reading systems work in a similar fashion - by obtaining speech or text from ...






Comments