skip to main content
research-article

Identification and Extraction of Features from Malayalam Poems for Analyzing Syllable Duration Patterns

Published:10 March 2023Publication History
Skip Abstract Section

Abstract

Text-to-speech (TTS) synthesis is an active area of research to generate synthetic speech from the underlying text. Compared to English and many European languages, TTS is yet to mature in Malayalam, the principal language of the South Indian state of Kerala. A syllable has to be uttered with proper durational and prosodic characteristics to emulate natural speech. When it comes to poems in Malayalam, many of them have an inherent rhythm attached to them. In Malayalam, this property is characterized by the Vruta [28] in which the poem is written. Vruta decides the meter of narration of the poem. Therefore, it is only consequential that Vruta can give away vital cues about the durational and prosodic characteristics of the poem verses recited. This study intends to identify the features that determine the durational characteristics of a poem written in a particular Vruta and develop an algorithm to extract those features required to build a dataset to model the duration of syllable utterances for tuneful TTS in Malayalam. Poems written in three Vrutas, namely Kakali, Manjari, and Keka, are considered in this study. Nineteen extractible features from the orthographic representation of a poem are identified for this purpose. A standard dataset is built using these extracted features. Later, support vector machine and feed forward neural network based estimators are proposed to model the duration of Malayalam poem syllables for tuneful speech synthesis. The hyperparameters are optimized using the GridsearchCV algorithm from the Scikit-learn machine learning library [15].

REFERENCES

  1. [1] Brownlee Jason. 2019. A gentle introduction to the rectified linear unit (ReLU). Machine Learning Mastery. Retrieved September 13, 2022 from https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/.Google ScholarGoogle Scholar
  2. [2] Consortium The Unicode. 2021. The Unicode Standard Version 13.0. (2021). Retrieved August 28, 2021 from http://www.unicode.org/charts/PDF/U0D00.pdf.Google ScholarGoogle Scholar
  3. [3] Nello Cristianini and John Shawe-Taylor2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  4. [4] Datta Asoke Kumar. 2018. Intonation rules for text reading. In Epoch Synchronous Overlap Add (ESOLA). Signals and Communication Technology. Springer, New York, NY, 135176.Google ScholarGoogle Scholar
  5. [5] Drucker Harris, Burges Chris J. C., Kaufman Linda, Alex Smola, and Vladimir Vapnik. 1997. Support vector regression machines. Advances in Neural Information Processing Systems 9 (1997), 155161.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Du Jinglin, Liu Yayun, Yu Yanan, and Yan Weilan. 2017. A prediction of precipitation data based on support vector machine and particle swarm optimization (PSO-SVM) algorithms. Algorithms 10, 2 (2017), 57.Google ScholarGoogle Scholar
  7. [7] Ezhuthachan Thunchath. 2015. Adhyathma Ramayanam. DC Books, Kottayam, Kerala.Google ScholarGoogle Scholar
  8. [8] Gopinath Deepa P.. 2009. Duration Analysis and Modelling for Malayalam Text to Speech Synthesis Systems. Ph.D. Dissertation. University of Kerala, Thiruvananthapuram, Kerala.Google ScholarGoogle Scholar
  9. [9] Gopinath Deepa P., Sree J. Divya, Mathew Reshmi, Rekhila S. J., and Nair Achuthsankar S.. 2006. Duration analysis for Malayalam text-to-speech systems. In Proceedings of the 9th International Conference on Information Technology (ICIT’06). IEEE, Los Alamitos, CA, 129132.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Gopinath Deepa P., Veena S., and Nair Achuthsankar S.. 2008. Modeling of vowel duration in Malayalam speech using probability distribution. In Proceedings of the Conference on Speech Prosody. 6–9.Google ScholarGoogle Scholar
  11. [11] Gopinath Deepa P., Vinod Chandra S. S., Veena S. G., and Achuthsankar S. Nair. 2008. A hybrid duration model using CART and HMM. In Proceedings of the 2008 IEEE Region 10 Conference(TENCON’08). IEEE, Los Alamitos, CA, 14.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Hopfield J. J.. 1988. Artificial neural networks. IEEE Circuits and Devices Magazine 4, 5 (1988), 310. Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] James Jesin and Gopinath Deepa P.. 2015. Pause duration model for Malayalam TTS. In Proceedings of the 2015 International Conference on Advances in Computing, Communications, and Informatics (ICACCI’15). IEEE, Los Alamitos, CA, 22062210.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google ScholarGoogle Scholar
  15. [15] Kramer Oliver. 2016. Scikit-learn. In Machine Learning for Evolution Strategies. Springer, New York, NY, 4553.Google ScholarGoogle Scholar
  16. [16] Krishna N. Sridhar and Murthy Hema A.. 2004. Duration modeling of Indian languages Hindi and Telugu. In Proceedings of the 5th ISCA Workshop on Speech Synthesis. 197202.Google ScholarGoogle Scholar
  17. [17] Krishna N. Sridhar, Talukdar Partha Pratim, Bali Kalika, and Ramakrishnan A. G.. 2004. Duration modeling for Hindi text-to-speech synthesis system. In Proceedings of the 8th International Conference on Spoken Language Processing (ICSLP’04). 1–4.Google ScholarGoogle Scholar
  18. [18] Kumar S. R. Rajesh and Yegnanarayana B.. 1989. Significance of durational knowledge for speech synthesis system in an Indian Language. In Proceedings of the 4th IEEE Region 10 International Conference (TENCON’89). IEEE, Los Alamitos, CA, 486489. Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Marar Kuttikrishna. 1964. Vrutha Shilpam. Mathrubhumi Printing and Publishing Co., Ernakulam, Kerala.Google ScholarGoogle Scholar
  20. [20] Menon Vyloppilli Sreedhara. 2000. Vyloppilli Kavithakal. DC Books, Kottayam, Kerala.Google ScholarGoogle Scholar
  21. [21] Jasir M. P. and Balakrishnan Kannan. 2021. Malayalam Poem Syllable Duration Dataset. Retrieved September 13, 2022 from Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Namboothiri Cherusseri. 2020. Krishna Gadha. DC Books, Kottayam, Kerala.Google ScholarGoogle Scholar
  23. [23] Narendra N. P., Rao K. Sreenivasa, Ghosh Krishnendu, Vempada Ramu Reddy, and Maity Sudhamay. 2011. Development of syllable-based text to speech synthesis system in Bengali. International Journal of Speech Technology 14, 3 (2011), 167181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Pal Kaushika and Patel Biraj V.. 2020. Automatic multiclass document classification of Hindi poems using machine learning techniques. In Proceedings of the 2020 International Conference for Emerging Technology (INCET’20). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Pal Kaushika and Patel Biraj V.. 2020. Model for classification of poems in Hindi language based on Ras. In Smart Systems and IoT: Innovations in Computing. Springer, New York, NY, 655661.Google ScholarGoogle Scholar
  26. [26] Prabodhachandran V. R. 1980. Swana Vijnanam. Keralabhasha Institute, Thiruvananthapuram, Kerala.Google ScholarGoogle Scholar
  27. [27] Rajan Bindhu K., Rijoy V., Gopinath Deepa P., and George Nimmy. 2015. Duration modeling for text to speech synthesis system using festival speech engine developed for Malayalam language. In Proceedings of the 2015 International Conference on Circuits, Power, and Computing Technologies (ICCPCT’15). IEEE, Los Alamitos, CA, 15.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Varma A. R Rajaraja. 1904. Vruthamanjari. Current Books, Kottayam, Kerala.Google ScholarGoogle Scholar
  29. [29] Varma A. R Rajaraja. 1986. Keralapanineeyam. DC Books, Kottayam, Kerala.Google ScholarGoogle Scholar
  30. [30] Rao Krothapalli S. and Koolagudi Shashidhar G.. 2010. Selection of suitable features for modeling the durations of syllables. Journal of Software Engineering and Applications 3, 12 (2010), 1107.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Rao K. Sreenivasa and Yegnanarayana B.. 2005. Modeling syllable duration in Indian languages using support vector machines. In Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing. IEEE, Los Alamitos, CA, 258263.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Rao K. Sreenivasa and Yegnanarayana B.. 2007. Modeling durations of syllables using neural networks. Computer Speech & Language 21, 2 (2007), 282295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Reddy V. Ramu, Sarkar Parakrant, and Rao K. Sreenivasa. 2014. Duration modeling by multi-models based on vowel production characteristics. In Proceedings of the 11th International Conference on Natural Language Processing (ICNLP’14). 3947.Google ScholarGoogle Scholar
  34. [34] Roy Somnath and Sinha Nishant. 2014. Duration modeling in Hindi. International Journal of Computer Applications 97, 6 (2014), 42–46.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Ruder Sebastian. 2016. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.Google ScholarGoogle Scholar
  36. [36] Rumelhart David E., Hinton Geoffrey E., and Williams Ronald J.. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533536.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Savithri S. R.. 1986. Durational analysis of Kannada vowels. Journal of Acoustical Society of India 14, 2 (1986), 3441.Google ScholarGoogle Scholar
  38. [38] Shreekanth T., Udayashankara V., and Chandrika M.. 2015. Duration modelling using neural networks for Hindi TTS system considering position of syllable in a word. Procedia Computer Science 46 (2015), 6067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Smola Alex J. and Schölkopf Bernhard. 2004. A tutorial on support vector regression. Statistics and Computing 14, 3 (2004), 199222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Sreelekshmi K. S. and Gopinath Deepa P.. 2012. Clustering of duration patterns in speech for text-to-speech synthesis. In Proceedings of the 2012 Annual IEEE India Conference (INDICON’12). IEEE, Los Alamitos, CA, 11221127.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2 (2012), 2631.Google ScholarGoogle Scholar

Index Terms

  1. Identification and Extraction of Features from Malayalam Poems for Analyzing Syllable Duration Patterns

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
      February 2023
      624 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3572719
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 10 March 2023
      • Online AM: 7 September 2022
      • Accepted: 29 August 2022
      • Revised: 19 March 2022
      • Received: 17 April 2021
      Published in tallip Volume 22, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)70
      • Downloads (Last 6 weeks)7

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!