skip to main content
research-article

Text-to-Speech Synthesis: Literature Review with an Emphasis on Malayalam Language

Authors Info & Claims
Published:19 January 2022Publication History
Skip Abstract Section

Abstract

Text-to-Speech Synthesis (TTS) is an active area of research to generate synthetic speech from underlying text. The identified syllables are uttered with proper duration and prosody characteristics to emulate natural speech. It falls under the category of Natural Language Processing (NLP), which aims to bridge the gap in communication between human and machine. So far as Western languages like English are concerned, the research to produce intelligent and natural synthetic speech has advanced considerably. But in a multilingual state like India, many regional languages viz. Malayalam is underexplored when it comes to NLP. In this article, we try to amalgamate the major research works performed in the area of TTS in English and the prominent Indian languages, with a special emphasis on the South Indian language, Malayalam. This review intends to provide right direction to the research activities in the language, in the area of TTS.

REFERENCES

  1. [1] Acharya Sudipta and Mandal Shyamal Kr Das. 2013. Prosody modeling: A review report on Indian language. In Mining Intelligence and Knowledge Exploration. Springer, 831842. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Agrawal Ameeta and An Aijun. 2012. Unsupervised emotion detection from text using semantic and syntactic relations. In Proceedings of the IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE, 346353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Ahmed Md Kausar and Islam Md Monirul. 2019. Syllable-based Bengali text-to-speech system. In Australian Journal of Science and Technology. AUJST.Google ScholarGoogle Scholar
  4. [4] Akhilaraj D. and Gopinath Deepa P.. 2010. Clustering of duration pattern in speech. https://www.researchgate.net/profile/Deepa-Gopinath/publication/229034822_Clustering_of_Duration_Pattern_in_Speech/links/0deec539fcdd84a76e000000/Clustering-of-Duration-Pattern-in-Speech.pdf.Google ScholarGoogle Scholar
  5. [5] Alam Firoj, Habib S. M., and Khan Mumit. 2008. Text Normalization System for Bangla. Technical Report. BRAC University.Google ScholarGoogle Scholar
  6. [6] Allen Donald R. and Strong William J.. 1985. A model for the synthesis of natural sounding vowels. J. Acoust. Soc. Amer. 78, 1 (1985), 5869.Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Allen Jonathan. 1976. Synthesis of speech from unrestricted text. Proc. IEEE 64, 4 (1976), 433442.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Allen Jonathan, Hunnicutt M. Sharon, Klatt Dennis H., Armstrong Robert C., and Pisoni David B.. 1987. From Text-to-Speech: The MITalk System. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Anumanchipalli Gopala Krishna, Prahallad Kishore, and Black Alan W.. 2011. Festvox: Tools for creation and analyses of large speech corpora. In Proceedings of the Workshop on Very Large Scale Phonetics Research. 70.Google ScholarGoogle Scholar
  10. [10] Aw AiTi, Zhang Min, Xiao Juan, and Su Jian. 2006. A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions. Association for Computational Linguistics, 3340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Bahdanau Dzmitry, Cho Kyunghyun, and Bengio Yoshua. 2014. Neural machine translation by jointly learning to align and translate. Retrieved from https://arXiv:1409.0473.Google ScholarGoogle Scholar
  12. [12] Balyan Archana. 2018. Development of unit selection based speech synthesis system. In Proceedings of the 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT’18). 2627.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Bank World. 2021. Land Area, Data Bank. Retrieved from https://data.worldbank.org/indicator/AG.LND.TOTL.K2.Google ScholarGoogle Scholar
  14. [14] Barros Julio and Diego Ramon I.. 2005. On the use of the hanning window for harmonic analysis in the standard framework. IEEE Trans. Power Delivery 21, 1 (2005), 538539.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Basu Tulika and Saha Arup. 2013. Evaluation of prosody in text-to-speech synthesis system of Bangla. In Proceedings of the International Conference Oriental COCOSDA Held Jointly with the Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE’13). IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Begum Afruza, Askari S. Md S., and Sharma Utpal. 2019. Text-to-speech synthesis system for Mymensinghiya dialect of Bangla language. In Progress in Advanced Computing and Intelligent Engineering. Springer, 291303.Google ScholarGoogle Scholar
  17. [17] Bharati Akshar, Bendre Sushma, and Sangal Rajeev. 1998. Some observations on corpora of some Indian languages. Knowledge-Based Computer Systems, Tata McGraw-Hill.Google ScholarGoogle Scholar
  18. [18] Black Alan, Taylor Paul, Caley Richard, and Clark Rob. 1998. The festival speech synthesis system. version 1.4.2. Unpublished document available via http://www.cstr.ed.ac.uk/projects/festival.html 6 (2001), 365–377.Google ScholarGoogle Scholar
  19. [19] Black Alan W. and Campbell Nick. 1995. Optimising selection of units from speech databases for concatenative synthesis. https://era.ed.ac.uk/handle/1842/1279.Google ScholarGoogle Scholar
  20. [20] Black Alan W. and Taylor Paul A.. 1997. Automatically clustering similar units for unit selection in speech synthesis. In Proceedings of the Eurospeech 1997, Rhodes, Greece.Google ScholarGoogle Scholar
  21. [21] Black Alan W., Zen Heiga, and Tokuda Keiichi. 2007. Statistical parametric speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’07), Vol. 4. IEEE, IV–1229.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Boothalingam Ramani, Solomi V. Sherlin, Gladston Anushiya Rachel, Christina S. Lilly, Vijayalakshmi P., Thangavelu Nagarajan, and Murthy Hema A.. 2013. Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil. In Proceedings of the National Conference on Communications (NC’13). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Bouma Gerlof. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL’09). 3140.Google ScholarGoogle Scholar
  24. [24] Broad David J.. 1979. The new theories of vocal fold vibration. In Speech and Language. Vol. 2. Elsevier, 203256.Google ScholarGoogle Scholar
  25. [25] Brody Samuel and Lapata Mirella. 2009. Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL’09). 103111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Bruckert Edward, Minow Martin, and Tetschner Walter. 1983. 3-Tiered Software and VLSI Aid Developmental System to Read Text Aloud. Electronics 56, 8 (1983), 133.Google ScholarGoogle Scholar
  27. [27] Bulyko Ivan and Ostendorf Mari. 2001. Joint prosody prediction and unit selection for concatenative speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), Vol. 2. IEEE, 781784. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Carlson Rolf and Granström Björn. 1975. A phonetically oriented programming language for rule description of speech. Speech Commun. 2 (1975), 245253.Google ScholarGoogle Scholar
  29. [29] Carlson Rolf and Granstrom B.. 1976. A text-to-speech system based entirely on rules. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’76), Vol. 1. IEEE, 686688.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Carter John B., Ford Bryan, Hibler Mike, Kuramkote Ravindra, Law Jeffrey, Lepreau Jay, Orr Douglas B., Stoller Leigh, and Swanson Mark. 1993. FLEX: A tool for building efficient and flexible systems. In Proceedings of IEEE 4th Workshop on Workstation Operating Systems (WWOS’93). IEEE, 198202.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Chaudhur Pamela and Kumar K. Vinod. 2010. Vowel classification based approach for Telugu text-to-speech system using symbol concatenation. In Proceedings of the International Conference (ACCTA’10), Vol. 1. 183187.Google ScholarGoogle Scholar
  32. [32] Chomsky Noam and Halle Morris. 1968. The sound pattern of English. Harper & Row Publishers. New York, Evanston, and London.Google ScholarGoogle Scholar
  33. [33] Clark Eleanor and Araki Kenji. 2011. Text normalization in social media: Progress, problems and applications for a pre-processing system of casual English. Procedia-Soc. Behav. Sci. 27 (2011), 211.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Cooper Franklin S., Gaitenby Jane H., Mattingly Ignatius G., Nye Patrick W., and Sholes George N.. 1973. Audible outputs of reading machines for the blind. Haskins Labortories Status Report on Speech Research SRr29/30 (1972), 91–95.Google ScholarGoogle Scholar
  35. [35] Cortes Corinna. 1995. Support vector machine. Machine Learning 20, 3 (1995), 273–297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Dash Niladri Sekhar. 2020. Pre-editing and textstandardization on a Bengali written text corpus. Aligarh J. Linguistics 10, 1 (2020), 13.Google ScholarGoogle Scholar
  37. [37] Datta Asoke Kumar. 2018. Intonation rules for text reading. In Epoch Synchronous Overlap Add. Springer, 135176.Google ScholarGoogle Scholar
  38. [38] A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, and K. Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.Google ScholarGoogle Scholar
  39. [39] Delattre Pierre C., Liberman Alvin M., and Cooper Franklin S.. 1955. Acoustic loci and transitional cues for consonants. J. Acoust. Soc. Amer. 27, 4 (1955), 769773.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Dhananjaya M. S., Krupa B. Niranjana, and Sushma R.. 2016. Kannada Text-to-Speech conversion: A novel approach. In Proceedings of the International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT’16). IEEE, 168172.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Dhanwal Swapnil, Dutta Hritwik, Nankani Hitesh, Shrivastava Nilay, Kumar Yaman, Li Junyi Jessy, Mahata Debanjan, Gosangi Rakesh, Zhang Haimin, Shah Rajiv, et al. 2020. An annotated dataset of discourse modes in Hindi stories. In Proceedings of the 12th Language Resources and Evaluation Conference. 11911196.Google ScholarGoogle Scholar
  42. [42] Dirac P. A. M.. 1953. The lorentz transformation and absolute time. Physica 19, 1–12 (1953), 888896. https://doi.org/10.1016/S0031-8914(53)80099-6Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Dunn Hugh K.. 1950. The calculation of vowel resonances, and an electrical vocal tract. J. Acoust. Soc. Amer. 22, 6 (1950), 740753.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Dutoit Thierry and Leich Henri. 1993. MBR-PSOLA: Text-to-speech synthesis based on an MBE re-synthesis of the segments database. Speech Commun. 13, 3–4 (1993), 435440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Ebden Peter and Sproat Richard. 2015. The kestrel TTS text normalization system. Natural Lang. Eng. 21, 3 (2015), 333.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Egan James P.. 1948. Articulation testing methods. Laryngoscope 58, 9 (1948), 955991.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Elias Isaac, Zen Heiga, Shen Jonathan, Zhang Yu, Ye Jia, Ryan R. J., and Wu Yonghui. 2021. Parallel tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling. Retrieved from https://arXiv:2103.14574.Google ScholarGoogle Scholar
  48. [48] Erber Norman P.. 1979. An approach to evaluating auditory speech perception ability. Volta Rev. 81, 1 (1979), 1624.Google ScholarGoogle Scholar
  49. [49] Fahad Md Shah, Singh Shreya, Gupta Shruti, Deepak Akshay, et al. 2019. Synthesis of emotional speech by prosody modification of vowel segments of neutral speech. In Proceedings of the 2nd International Conference on Advanced Computing and Software Engineering (ICACSE’19).Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] FalDessai N. B., Naik Gaurav A., and Pawar Jyoti D.. 2017. Review of syllable based Text-to-Speech systems: Strategies for enhancing naturalness for Devanagari languages. International Journal of Computer Science and Applications 14, 2 (2017).Google ScholarGoogle Scholar
  51. [51] Fant G.. 1960. Acoustic Theory of Speech Production.’s Gravenhage, Mouton & Co.Google ScholarGoogle Scholar
  52. [52] Fant G., Lin Q. G., and Gobl C.. 1985. Notes on glottal flow interaction. KTH, Speech Transmission Laboratory, Quarterly Report 2–3. 21–45.Google ScholarGoogle Scholar
  53. [53] Fernandez Raul, Rendel Asaf, Ramabhadran Bhuvana, and Hoory Ron. 2015. Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Feynman R. P. and Jr. F. L. Vernon1963. The theory of a general quantum system interacting with a linear dissipative system. Ann. Phys. 24 (1963), 118173. https://doi.org/10.1016/0003-4916(63)90068-XGoogle ScholarGoogle ScholarCross RefCross Ref
  55. [55] Flanagan J. L., Rabiner L. R., Schafer R. W., and Denman J. D.. 1972. Wiring telephone apparatus from computer-generated speech. Bell Labs Tech. J. 51, 2 (1972), 391397.Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Flint Emma, Ford Elliot, Thomas Olivia, Caines Andrew, and Buttery Paula. 2017. A text normalisation system for non-standard English words. In Proceedings of the 3rd Workshop on Noisy User-generated Text. 107115.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Forney G. David. 1973. The viterbi algorithm. Proc. IEEE 61, 3 (1973), 268278.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Gagnon R.. 1978. Votrax real time hardware for phoneme synthesis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’78), Vol. 3. IEEE, 175178.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Geeta Sai and Muralidhara B. L.. 2017. Syllable as the basic unit for Kannada speech synthesis. In 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 12051208.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Gerstman Louis J. and Kelly John L., Jr. 1964. Synthesis of speech from code signals. U.S. Patent 3,158,685.Google ScholarGoogle Scholar
  61. [61] Gokul P., Thomas Neethu, Thomas Crisil, and Gopinath Deepa P.. 2015. Text normalization and unit selection for a memory based non uniform unit selection TTS in Malayalam. In Proceedings of the 12th International Conference on Natural Language Processing. 168.Google ScholarGoogle Scholar
  62. [62] Goldhor Richard S. and Lund Robert T.. 1983. University-to-industry advanced technology transfer: A case study. Res. Policy 12, 3 (1983), 121152.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Gopinath Deepa P.. 2009. Duration analysis and modelling for Malayalam Text-to-Speech synthesis systems. (2009).Google ScholarGoogle Scholar
  64. [64] Gopinath Deepa P., Sheeba P. S., and Nair Achuthsankar S.. 2007. Emotional analysis for Malayalam Text-to-Speech synthesis systems. In Proceedings of the International Conference on Sciences of Electronic, Technologies of Information and Telecommunication (SETIT’07).Google ScholarGoogle Scholar
  65. [65] Gopinath Deepa P., Sree J. Divya, Mathew Reshmi, Rekhila S. J., and Nair Achuthsankar S.. 2006. Duration analysis for Malayalam text-to-speech systems. In Proceedings of the 9th International Conference on Information Technology (ICIT’06). IEEE, 129132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Gopinath Deepa P., Veena S. G., and Nair Achuthsankar S.. 2008. Modeling of vowel duration in Malayalam speech using probability distribution. In Proceedings of the Speech Prosody Conference. 69.Google ScholarGoogle Scholar
  67. [67] Gopinath Deepa P., Vinod Chandra S. S., Veena S. G., and Achuthsankar S. Nair. 2008. A hybrid duration model using CART and HMM. In Proceedings of the IEEE Region 10 Conference (TENCON’08). IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Goubanova Olga and Taylor Paul. 2000. Using Bayesian belief networks for model duration in text-to-speech systems. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’00). 427430.Google ScholarGoogle Scholar
  69. [69] Greenspan Steven L., Bennett Raymond W., and Syrdal Ann K.. 1998. An evaluation of the diagnostic rhyme test. Int. J. Speech Technol. 2, 3 (1998), 201214.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Griffin D. and Lim Jae. 1983. Signal estimation from modified short-time fourier transform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’83), Vol. 8. 804807. https://doi.org/10.1109/ICASSP.1983.1172092Google ScholarGoogle ScholarCross RefCross Ref
  71. [71] Murthy B. Yegnanarayana and H. A.. 1991. Formant extraction from minimum phase group delay functions. Speech Commun. 1 (1991), 209221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Hawkins J. and Blakeslee S.. 2004. On Intelligence, St. Martin’s Griffin, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Haykin Simon. 2008. Communication Systems. John Wiley & Sons, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Hebb Donald O.. 1949. The first stage of perception: Growth of the assembly. Organiz. Behav. 4 (1949), 6078.Google ScholarGoogle Scholar
  75. [75] Hecker Michael H. L.. 1962. Studies of nasal consonants with an articulatory speech synthesizer. J. Acoust. Soc. Amer. 34, 2 (1962), 179187.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Hodari Zack, Moinet Alexis, Karlapati Sri, Lorenzo-Trueba Jaime, Merritt Thomas, Joly Arnaud, Abbas Ammar, Karanasou Penny, and Drugman Thomas. 2020. CAMP: A two-stage approach to modelling prosody in context. Retrieved from https://arXiv:2011.01175.Google ScholarGoogle Scholar
  77. [77] Holmes John N., Mattingly Ignatius G., and Shearme John N.. 1964. Speech synthesis by rule. Lang. Speech 7, 3 (1964), 127143.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Hönig Florian, Batliner Anton, Weilhammer Karl, and Nöth Elmar. 2010. Automatic assessment of non-native prosody for English as l2. In Proceedings of the 5th International Conference on Speech Prosody.Google ScholarGoogle Scholar
  79. [79] House Arthur S., Williams Carl E., Hecker Michael H. L., and Kryter Karl D.. 1965. Articulation-testing methods: Consonantal differentiation with a closed-response set. J. Acoust. Soc. Amer. 37, 1 (1965), 158166.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Huang Lan, Zhuang Shunan, and Wang Kangping. 2020. A text normalization method for speech synthesis based on local attention mechanism. IEEE Access 8 (2020), 3620236209.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Huang Xuedong, Acero Alex, Hon Hsiao-Wuen, and By-Reddy Raj Foreword. 2001. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Hunt Andrew J. and Black Alan W.. 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’96), Vol. 1. IEEE, 373376. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Hurford James R.. 2011. The Linguistic Theory of Numerals. Vol. 16. Cambridge University Press, Cambridge, UK.Google ScholarGoogle Scholar
  84. [84] Hutchins W. John. 2004. The Georgetown-IBM experiment demonstrated in January 1954. In Proceedings of the Conference of the Association for Machine Translation in the Americas. Springer, 102114.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Jalin A. Femina and Jayakumari J.. 2017. Text-to-Speech synthesis system for Tamil using HMM. In Proceedings of the IEEE International Conference on Circuits and Systems (ICCS’17). IEEE, 447451.Google ScholarGoogle ScholarCross RefCross Ref
  86. [86] James Jesin and Gopinath Deepa P.. 2015. Pause duration model for Malayalam TTS. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’15). IEEE, 22062210.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] P Kannan Balakrishnan and Jasir M.. 2014. A comprehensive survey on Text-to-Speech synthesis with a special emphasis to Indian languages. In Proceedings of the National Conference on Indian Language Computing (NCILC’14). 14.Google ScholarGoogle Scholar
  88. [88] Javaloy Adrián and García-Mateos Ginés. 2020. Text normalization using encoder–decoder networks based on the causal feature extractor. Appl. Sci. 10, 13 (2020), 4551.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Jayakrishnan R., Gopal Greeshma N., and Santhikrishna M. S.. 2018. Multi-class emotion detection and annotation in Malayalam novels. In Proceedings of the International Conference on Computer Communication and Informatics (ICCCI’18). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] Jayakumari J. and Jalin A. Femina. 2019. An improved Text-to-Speech technique for Tamil language using hidden Markov model. In Proceedings of the 7th International Conference on Smart Computing and Communications (ICSCC’19). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Jayasankar T. and Vijayaselvi J. Arputha. 2016. Prediction of syllable duration using structure optimised cuckoo search neural network (SOCNN) for text-to-speech. J. Comput. Theoret. Nanosci. 13, 10 (2016), 75387544.Google ScholarGoogle ScholarCross RefCross Ref
  92. [92] Jelinek Frederick. 1985. Markov source modeling of text generation. In The Impact of Processing Techniques on Communications. Springer, 569591.Google ScholarGoogle ScholarCross RefCross Ref
  93. [93] Joos Martin. 1948. Acoustic phonetics. Language 24, 2 (1948), 5136.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Joshi Anusha, Chabbi Deepa, Suman M., and Kulkarni Suprita. 2015. Text-to-Speech system for Kannada language. In Proceedings of the International Conference on Communications and Signal Processing (ICCSP’15). IEEE, 19011904.Google ScholarGoogle ScholarCross RefCross Ref
  95. [95] Jun Sun-Ah. 2010. The implicit prosody hypothesis and overt prosody in English. Lang. Cogn. Process. 25, 7-9 (2010), 12011233.Google ScholarGoogle ScholarCross RefCross Ref
  96. [96] Kadiri Sudarsana Reddy and Yegnanarayana B.. 2015. Analysis of singing voice for epoch extraction using zero frequency filtering method. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). IEEE, 42604264.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Kallimani Jagadish S., Srinivasa K. G., et al. 2012. Normalization of non standard words for Kannada speech synthesis. International Journal of Information Technology Infrastructure 1, 2 (2012).Google ScholarGoogle Scholar
  98. [98] Kannojia Shilpi, Singh Ghanapriya, and Mathur Sanjay. 2016. A Text-to-Speech synthesizer using acoustic unit based concatenation for any Indian language of devanagari script. In Proceedings of the 11th International Conference on Industrial and Information Systems (ICIIS’16). IEEE, 759763.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Kawahara Hideki, Masuda-Katsuse Ikuyo, and Cheveigne Alain De. 1999. Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun. 27, 3–4 (1999), 187207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. [100] Jr. John L. Kelly and Gerstman Louis J.. 1961. An artificial talker driven from a phonetic input. J. Acoust. Soc. Amer. 33, 6 (1961), 835835.Google ScholarGoogle ScholarCross RefCross Ref
  101. [101] Kenter Tom, Wan Vincent, Chan Chun-An, Clark Rob, and Vit Jakub. 2019. CHiVE: Varying prosody in speech synthesis with a linguistically driven dynamic hierarchical conditional variational network. In Proceedings of the International Conference on Machine Learning. PMLR, 33313340.Google ScholarGoogle Scholar
  102. [102] Khan Rubeena A. and Chitode J. S.. 2016. Concatenative speech synthesis: A review. Int. J. Comput. Appl. 136, 3 (2016), 6.Google ScholarGoogle Scholar
  103. [103] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.Google ScholarGoogle Scholar
  104. [104] Kingma Diederik P. and Welling Max. 2013. Auto-encoding variational Bayes. Retrieved from https://arXiv:1312.6114.Google ScholarGoogle Scholar
  105. [105] Kishore S. P., Kumar Rohit, and Sangal Rajeev. 2002. A data driven synthesis approach for Indian languages using syllable as basic unit. In Proceedings of the International Conference on NLP (ICON’02). 311316.Google ScholarGoogle Scholar
  106. [106] Kishore S. Prahallad and Black Alan W.. 2003. Unit size in unit selection speech synthesis. In Proceedings of the 8th European Conference on Speech Communication and Technology.Google ScholarGoogle Scholar
  107. [107] Klatt Dennis H.. 1970. Synthesis of stop consonants in initial position. J. Acoust. Soc. Amer. 47, 1A (1970), 9394.Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Klatt Dennis H.. 1980. Software for a cascade/parallel formant synthesizer. the Journal of the Acoustical Society of America 67, 3 (1980), 971995.Google ScholarGoogle ScholarCross RefCross Ref
  109. [109] Klatt Dennis H.. 1987. Review of text-to-speech conversion for English. J. Acoust. Soc. Amer. 82, 3 (1987), 737793.Google ScholarGoogle ScholarCross RefCross Ref
  110. [110] Klein Gerwin. 2010. Jflex user’s manual. Available on-line at www. jflex. de. Accessed August (2010).Google ScholarGoogle Scholar
  111. [111] Koenig W., Dunn H. K., and Lacy L. Y.. 1946. The sound spectrograph. J. Acoust. Soc. Amer. 18, 1 (1946), 1949.Google ScholarGoogle ScholarCross RefCross Ref
  112. [112] Krishna N. Sridhar and Murthy Hema A.. 2004. Duration modeling of Indian languages Hindi and Telugu. In Proceedings of the 5th ISCA Workshop on Speech Synthesis.Google ScholarGoogle Scholar
  113. [113] Krishna Nemala Sridhar and Murthy Hema A.. 2004. A new prosodic phrasing model for Indian language Telugu. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’04).Google ScholarGoogle ScholarCross RefCross Ref
  114. [114] Krishna N. Sridhar, Talukdar Partha Pratim, Bali Kalika, and Ramakrishnan A. G.. 2004. Duration modeling for Hindi text-to-speech synthesis system. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’04).Google ScholarGoogle Scholar
  115. [115] N. S. Krishna and H. A. Murthy. 2004. Duration modeling of Indian languages Hindi and Telugu. In SSW.Google ScholarGoogle Scholar
  116. [116] Kubat Miroslav. 1999. Neural networks: A comprehensive foundation by Simon Haykin. Knowledge Eng. Rev. 13, 4 (1999), 409412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. [117] Kumar Naresh, Deepak Gerard, and Santhanavijayan A.. 2020. A novel semantic approach for intelligent response generation using emotion detection incorporating NPMI measure. Procedia Comput. Sci. 167 (2020), 571579.Google ScholarGoogle ScholarCross RefCross Ref
  118. [118] Kumar S. R. Rajesh and Yegnanarayana B.. 1989. Significance of durational knowledge for speech synthesis system in an Indian language. In Proceedings of the 4th IEEE Region 10 International Conference (TENCON’89). IEEE, 486489.Google ScholarGoogle ScholarCross RefCross Ref
  119. [119] Kurzweil Raymond. 1976. The Kurzweil reading machine: A technical overview. Science, Technology, and the Handicapped. 311.Google ScholarGoogle Scholar
  120. [120] Lakkavalli Vikram Ramesh, Arulmozhi P., and Ramakrishnan A. G.. 2010. Continuity metric for unit selection based text-to-speech synthesis. In Proceedings of the International Conference on Signal Processing and Communications (SPCOM’10). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  121. [121] Lee Younggun and Kim Taesu. 2019. Robust and fine-grained prosody control of end-to-end speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 59115915.Google ScholarGoogle ScholarCross RefCross Ref
  122. [122] Lees Robert B. and Chomsky Noam. 1957. Syntactic structures. Language 33, 3 Part 1 (1957), 375408.Google ScholarGoogle ScholarCross RefCross Ref
  123. [123] Levine John R., Mason John, Mason Tony, Brown Doug, and Levine Paul. 1992. Lex & Yacc. O’Reilly Media, Inc.Google ScholarGoogle Scholar
  124. [124] Lieberman Philip. 1967. Intonation, perception, and language. MIT Research Monograph (1967).Google ScholarGoogle Scholar
  125. [125] Liljencrants Johan. 1985. Speech synthesis with a reflection-type line analog. DS Dissertation, Dept. Speech Commun. and Music Acoust., Royal Inst. Tech.Google ScholarGoogle Scholar
  126. [126] Liu Fei, Weng Fuliang, Wang Bingqing, and Liu Yang. 2011. Insertion, deletion, or substitution?: Normalizing text messages without pre-categorization nor supervision. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 7176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. [127] Logan J., Pisoni D., and Greene B.. 1985. Measuring the segmental intelligibility of synthetic speech: Results for eight Text-to-Speech systems, Research on Speech Perception Progress Report No. 11. University of Indiana, Bloomington, IN.Google ScholarGoogle Scholar
  128. [128] Luce Paul A., Feustel Timothy C., and Pisoni David B.. 1983. Capacity demands in short-term memory for synthetic and natural speech. Human Factors 25, 1 (1983), 1732.Google ScholarGoogle ScholarCross RefCross Ref
  129. [129] Luong Minh-Thang, Pham Hieu, and Manning Christopher D.. 2015. Effective approaches to attention-based neural machine translation. Retrieved from https://arXiv:1508.04025.Google ScholarGoogle Scholar
  130. [130] Mack Molly. 1982. Voicing-dependent vowel duration in English and French: Monolingual and bilingual production. J. Acoust. Soc. Amer. 71, 1 (1982), 173178.Google ScholarGoogle ScholarCross RefCross Ref
  131. [131] Madhavan Manu, Rehman O. Mujeeb, Raj P. C. Reghu et al. 2013. Computing prosodic patterns for Malayalam. NCILC.DCA.CUSAT.Google ScholarGoogle Scholar
  132. [132] Madhukumar A. S., Rajendran S., and Yegnanarayana B.. 1993. Intonation component of a Text-to-Speech system for Hindi. Comput. Speech Lang. 7, 3 (1993), 283301.Google ScholarGoogle ScholarCross RefCross Ref
  133. [133] Mahesh M., Prakash Jeena J., and Murthy Hema A.. 2018. Resyllabification in Indian languages and its implications in text-to-speech systems. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 212216.Google ScholarGoogle Scholar
  134. [134] Makhija Piyush, Kumar Ankit, and Gupta Anuj. 2020. HinglishNorm—A corpus of Hindi-English code mixed sentences for text normalization. Retrieved from https://arXiv:2010.08974.Google ScholarGoogle Scholar
  135. [135] Manghat Sreeja, Manghat Sreeram, and Schultz Tanja. 2020. Malayalam-English code-switched: Grapheme to phoneme system. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’20). 41334137.Google ScholarGoogle Scholar
  136. [136] Manous Laura M., Pisoni David B., Dedina Michael J., and Nusbaum Howard C.. 1986. Comprehension of natural and synthetic speech using a sentence verification task. J. Acoust. Soc. Amer. 79, S1 (1986), S25–S25.Google ScholarGoogle ScholarCross RefCross Ref
  137. [137] Mary Leena, Antony Anil P., Babu Ben P., and Prasanna S. R. Mahadeva. 2018. Automatic syllabification of speech signal using short time energy and vowel onset points. Int. J. Speech Technol. 21, 3 (2018), 571579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. [138] Mathew Mili Mary and Bhat Jayashree S.. 2010. Aspects of emotional prosody in Malayalam and Hindi. Buckingham J. Lang. Linguist. 3 (2010), 2534.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Mattingly Ignatius G.. 1966. Synthesis by rule of prosodic features. Lang. Speech 9, 1 (1966), 113. DOI: 10.1177/002383096600900101Google ScholarGoogle ScholarCross RefCross Ref
  140. [140] Mattingly Ignatius Gorsline. 1969. Synthesis by Rule of General American English.Google ScholarGoogle Scholar
  141. [141] Maxwell Olga and Fletcher Janet. 2009. Acoustic and durational properties of Indian English vowels. World Englishes 28, 1 (2009), 5269.Google ScholarGoogle ScholarCross RefCross Ref
  142. [142] McCarthy John. 1978. History of LISP. In History of Programming Languages. 173185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. [143] McCulloch Warren S. and Pitts Walter. 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 4 (1943), 115133.Google ScholarGoogle ScholarCross RefCross Ref
  144. [144] Mehmood Khawar, Essam Daryl, Shafi Kamran, and Malik Muhammad Kamran. 2020. An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis. Info. Process. Manage. 57, 6 (2020), 102368.Google ScholarGoogle ScholarCross RefCross Ref
  145. [145] Ming Huaiping, Lu Yanfeng, Zhang Zhengchen, and Dong Minghui. 2017. A light-weight method of building an LSTM-RNN-based bilingual TTS system. In Proceedings of the International Conference on Asian Language Processing (IALP’17). IEEE, 201205.Google ScholarGoogle ScholarCross RefCross Ref
  146. [146] Moor James. 2006. The Dartmouth College artificial intelligence conference: The next fifty years. AI Mag. 27, 4 (2006), 87.Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. [147] Mortimer J. Y. and Salathiel J. A.. 1995. “Soundex” codes of surnames provide confidentiality and accuracy in a national HIV database. Communic. Disease Rep. CDR Rev. 5, 12 (1995), R183–6.Google ScholarGoogle Scholar
  148. [148] Moulines Eric and Charpentier Francis. 1990. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9, 5–6 (1990), 453467. Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. [149] Munson W. A. and Karlin J. E.. 1962. Isopreference method for evaluating speech-transmission circuits. J. Acoust. Soc. Amer. 34, 6 (1962), 762774.Google ScholarGoogle ScholarCross RefCross Ref
  150. [150] Muralishankar R. and Ramakrishnan A. G.. 2001. Human touch to Tamil speech synthesizer. In Proceedings of Tamilnet. 103109.Google ScholarGoogle Scholar
  151. [151] Muralishankar R., Ramakrishnan A. G., and Prathibha P.. 2004. Pitch modification using DCT in the source domain. Submitted to J. Speech Commun. 42, 143–154.Google ScholarGoogle ScholarCross RefCross Ref
  152. [152] Murthy H. A.. 1997. The real root cepstrum and its applications to speech processing. In Proceedings of the National Conference on Communication. 180183.Google ScholarGoogle Scholar
  153. [153] Nagarajan T., Prasad V. Kamakshi, and Murthy Hema A.. 2003. Minimum phase signal derived from root cepstrum. Electron. Lett. 39, 12 (2003), 941942.Google ScholarGoogle ScholarCross RefCross Ref
  154. [154] Nagaraju D. and Ramasree R. J.. 2014. Prosodic analysis for Telugu script. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 5 (2014), 19221925.Google ScholarGoogle Scholar
  155. [155] Nagaraju D. and Ramasree R. J.. 2018. Design issues of Telugu emotional speech system. Int. J. Appl. Eng. Res. 13, 4 (2018), 19221925.Google ScholarGoogle Scholar
  156. [156] Namboodiri A., Narayanan P., and Jawahar C.. 2007. On using classical poetry structure for Indian language post-processing. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07), Vol. 2. IEEE, 12381242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. [157] Narasimhan Bhuvana, Sproat Richard, and Kiraz George. 2004. Schwa-deletion in Hindi text-to-speech synthesis. Int. J. Speech Technol. 7, 4 (2004), 319333.Google ScholarGoogle ScholarCross RefCross Ref
  158. [158] Narendra N. P., Rao K. Sreenivasa, Ghosh Krishnendu, Vempada Ramu Reddy, and Maity Sudhamay. 2011. Development of syllable-based Text-to-Speech synthesis system in Bengali. Int. J. Speech Technol. 14, 3 (2011), 167181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. [159] Nations United. 2019. World Population Prospects 2019. Retrieved from https://population.un.org/wpp/.Google ScholarGoogle Scholar
  160. [160] Na’im Tyson R. and Nagar Ila. 2009. Prosodic rules for schwa-deletion in Hindi text-to-speech synthesis. Int. J. Speech Technol. 12, 1 (2009), 15.Google ScholarGoogle ScholarCross RefCross Ref
  161. [161] Nixon Charles W., Anderson Timothy R., and Moore Thomas J.. 1986. The perception of synthetic speech in noise. In Basic and Applied Aspects of Noise-Induced Hearing Loss. Springer, 345356.Google ScholarGoogle ScholarCross RefCross Ref
  162. [162] Nye P., Hankins J., Rand T., Mattingly I., and Cooper F.. 1973. A plan for the field evaluation of an automated reading system for the blind. IEEE Trans. Audio Electroacoust. 21, 3 (1973), 265268.Google ScholarGoogle ScholarCross RefCross Ref
  163. [163] India Government of. 2011. Data on Language and Mother Tongue. Retrieved from https://censusindia.gov.in/2011Census/Language_MTs.html.Google ScholarGoogle Scholar
  164. [164] Olinsky Craig and Black Alan W.. 2000. Non-standard word and homograph resolution for asian language text analysis. In Proceedings of the 6th International Conference on Spoken Language Processing.Google ScholarGoogle Scholar
  165. [165] Oord Aaron van den, Dieleman Sander, Zen Heiga, Simonyan Karen, Vinyals Oriol, Graves Alex, Kalchbrenner Nal, Senior Andrew, and Kavukcuoglu Koray. 2016. Wavenet: A generative model for raw audio. Retrieved from https://arXiv:1609.03499.Google ScholarGoogle Scholar
  166. [166] Pammi Sathish and Charfuelan Marcela. 2013. HMM-based sCost quality control for unit selection speech synthesis. In Proceedings of the 8th ISCA Speech Synthesis Workshop (SSW’13). 5357.Google ScholarGoogle Scholar
  167. [167] Panchapagesan K., Talukdar Partha Pratim, Krishna N. Sridhar, Bali Kalika, and Ramakrishnan A. G.. 2004. Hindi text normalization. In Proceedings of the 5th International Conference on Knowledge Based Computer Systems (KBCS’04). 1922.Google ScholarGoogle Scholar
  168. [168] Pandey Pramod. 2014. Akshara-to-sound rules for Hindi. Writing Syst. Res. 6, 1 (2014), 5472.Google ScholarGoogle ScholarCross RefCross Ref
  169. [169] Pandey Pramod and Roy Somnath. 2017. A generative model of a pronunciation lexicon for Hindi. Retrieved from https://arXiv:1705.02452.Google ScholarGoogle Scholar
  170. [170] Patil Hemant A., Patel Tanvina B., Shah Nirmesh J., Sailor Hardik B., Krishnan Raghava, Kasthuri G. R., Nagarajan T., Christina Lilly, Kumar Naresh, Raghavendra Veera, et al. 2013. A syllable-based framework for unit selection synthesis in 13 Indian languages. In Proceedings of the International Conference Oriental COCOSDA held jointly with Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE’13). IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  171. [171] Peddinti Vijayaditya and Prahallad Kishore. 2011. Significance of vowel epenthesis in Telugu text-to-speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). IEEE, 53485351.Google ScholarGoogle ScholarCross RefCross Ref
  172. [172] Pennell Deana L. and Liu Yang. 2010. Normalization of text messages for text-to-speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’10). IEEE, 48424845.Google ScholarGoogle ScholarCross RefCross Ref
  173. [173] Peterson Gordon E. and Lehiste Ilse. 1960. Duration of syllable nuclei in English. J. Acoust. Soc. Amer. 32, 6 (1960), 693703.Google ScholarGoogle ScholarCross RefCross Ref
  174. [174] Pierrehumbert Janet and Nair Rami. 1996. Implications of Hindi prosodic structure. Curr. Trends Phonol.: Models Methods 2 (1996), 549584.Google ScholarGoogle Scholar
  175. [175] Pisoni D. and Hunnicutt Sharon. 1980. Perceptual evaluation of MITalk: The MIT unrestricted text-to-speech system. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’80), Vol. 5. IEEE, 572575.Google ScholarGoogle ScholarCross RefCross Ref
  176. [176] Pisoni David B., Nusbaum Howard C., and Greene Beth G.. 1985. Perception of synthetic speech generated by rule. Proc. IEEE 73, 11 (1985), 16651676.Google ScholarGoogle ScholarCross RefCross Ref
  177. [177] Pols Louis C. W. and Olive J. P.. 1983. Intelligibility of consonants in CVC utterances produced by dyadic rule synthesis. Speech Commun. 2, 1 (1983), 313.Google ScholarGoogle ScholarCross RefCross Ref
  178. [178] Potter R. K., Kopp G. A., and Green H. C.. 1947. Visible Speech, D. Vannostrand Co., New York, NY, 2856.Google ScholarGoogle Scholar
  179. [179] Potter Ralph K.. 1946. Introduction to technical discussions of sound portrayal. J. Acoust. Soc. Amer. 18, 1 (1946), 13.Google ScholarGoogle ScholarCross RefCross Ref
  180. [180] Nayar V. R. Prabodhachandran. 1980. Svana vijnanam. Kerala Bhasha Institute. thiruvananthapuram.Google ScholarGoogle Scholar
  181. [181] Prakash Anusha, Thomas A. Leela, Umesh S., and Murthy Hema A.. 2019. Building multilingual end-to-end speech synthesisers for Indian languages. In Proceedings of the 10th ISCA Speech Synthesis Workshop (SSW’10). 194199.Google ScholarGoogle ScholarCross RefCross Ref
  182. [182] Prakash Jeena J.. 2019. Transcription Correction and Rhythm Analysis for Applications in Text-to-speech Synthesis for Indian Languages. Ph.D. Dissertation. Indian Institute of Technology, Madras.Google ScholarGoogle Scholar
  183. [183] Prakash Jeena J. and Murthy Hema A.. 2019. Analysis of inter-pausal units in Indian languages and its application to text-to-speech synthesis. IEEE/ACM Trans. Audio, Speech, Lang. Process. 27, 10 (2019), 16161628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  184. [184] Pravena D. and Govind D.. 2016. Expressive speech analysis for epoch extraction using zero frequency filtering approach. In Proceedings of the IEEE Students’ Technology Symposium (TechSym’16). IEEE, 240244.Google ScholarGoogle ScholarCross RefCross Ref
  185. [185] Press William H. and Teukolsky Saul A.. 1990. Savitzky-Golay smoothing filters. Comput. Phys. 4, 6 (1990), 669672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  186. [186] Quinlan J. Ross. 2014. C4.5: Programs for Machine Learning. Elsevier. Google ScholarGoogle ScholarDigital LibraryDigital Library
  187. [187] Rabiner Lawrence and Juang Biinghwang. 1986. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 1 (1986), 416.Google ScholarGoogle ScholarCross RefCross Ref
  188. [188] Rajan Bindhu K., Rijoy V., Gopinath Deepa P., and George Nimmy. 2015. Duration modeling for Text-to-Speech synthesis system using festival speech engine developed for Malayalam language. In Proceedings of the International Conference on Circuits, Power and Computing Technologies (ICCPCT’15). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  189. [189] A. R. Rajaraja Varma. 1980. Vruthamanjari. DC Books. Kottayam.Google ScholarGoogle Scholar
  190. [190] A. R. Rajaraja Varma. 1986. Keralapanineeyam. DC Books. Kottayam.Google ScholarGoogle Scholar
  191. [191] Rajendran Vibavi and Kumar G. Bharadwaja. 2015. Text processing for developing unrestricted Tamil Text-to-Speech synthesis system. Indian J. Sci. Technol. 8, 29 (2015), 112124.Google ScholarGoogle ScholarCross RefCross Ref
  192. [192] Rajendran Vaibhavi and Kumar G. Bharadwaja. 2017. Prosody detection from text using aggregative linguistic features. In Proceedings of the International Conference on Next Generation Computing Technologies. Springer, 736749.Google ScholarGoogle Scholar
  193. [193] Rajendran Vaibhavi and Kumar G. Bharadwaja. 2019. A robust syllable centric pronunciation model for Tamil Text-to-Speech synthesizer. IETE J. Res. 65, 5 (2019), 601612.Google ScholarGoogle ScholarCross RefCross Ref
  194. [194] Rajeswari K. C. and Uma M. P.. 2012. Prosody modeling techniques for text-to-speech synthesis systems—A survey. Int. J. Comput. Appl. 39, 16 (2012), 811.Google ScholarGoogle Scholar
  195. [195] Raju Rajan Saha, Bhattacharjee Prithwiraj, Ahmad Arif, and Rahman Mohammad Shahidur. 2019. A Bangla text-to-speech system using deep neural networks. In Proceedings of the International Conference on Bangla Speech and Language Processing (ICBSLP’19). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  196. [196] Rao Krothapalli S. and Koolagudi Shashidhar G.. 2010. Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 3, 12 (2010), 1107.Google ScholarGoogle ScholarCross RefCross Ref
  197. [197] Rao K. Sreenivasa and Yegnanarayana B.. 2004. Modeling syllable duration in Indian languages using neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Vol. 5. IEEE, V–313.Google ScholarGoogle ScholarCross RefCross Ref
  198. [198] Rao K. Sreenivasa and Yegnanarayana B.. 2005. Modeling syllable duration in Indian languages using support vector machines. In Proceedings of the International Conference on Intelligent Sensing and Information Processing. IEEE, 258263.Google ScholarGoogle ScholarCross RefCross Ref
  199. [199] Rao K. Sreenivasa and Yegnanarayana B.. 2007. Modeling durations of syllables using neural networks. Comput. Speech Lang. 21, 2 (2007), 282295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  200. [200] Rao K. Sreenivasa and Yegnanarayana B.. 2009. Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 12 (2009), 12631269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  201. [201] Rao K. Sreenivasa and Yegnanarayana Bayya. 2009. Intonation modeling for Indian languages. Comput. Speech Lang. 23, 2 (2009), 240256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  202. [202] Rashid Muhammad Masud, Hussain Md Akter, and Rahman M. Shahidur. 2010. Text normalization and diphone preparation for Bangla speech synthesis. J. Multimedia 5, 6 (2010), 551559.Google ScholarGoogle ScholarCross RefCross Ref
  203. [203] Raux Antoine and Black Alan W.. 2003. A unit selection approach to F0 modeling and its application to emphasis. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03). IEEE, 700705.Google ScholarGoogle ScholarCross RefCross Ref
  204. [204] Ravi D. J. and Patilkulkarni Sudarshan. 2011. Text-to-speech synthesis system for Kannada language. Int. J. Adv. Res. Comput. Sci. 2, 1 (2011).Google ScholarGoogle Scholar
  205. [205] Ravi D. J. and Patilkulkarni Sudarshan. 2012. Evaluation of Kannada text-to-speech system. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2, 1 (2012).Google ScholarGoogle Scholar
  206. [206] Reddy M. Kiran and Rao K. Sreenivasa. 2018. DNN-based bilingual (Telugu-Hindi) polyglot speech synthesis. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’18). IEEE, 18081811.Google ScholarGoogle ScholarCross RefCross Ref
  207. [207] Reddy V. Ramu and Rao K. Sreenivasa. 2011. Intonation modeling using FFNN for syllable based Bengali Text-to-Speech synthesis. In Proceedings of the 2nd International Conference on Computer and Communication Technology (ICCCT’11). IEEE, 334339.Google ScholarGoogle ScholarCross RefCross Ref
  208. [208] Reddy V. Ramu and Rao K. Sreenivasa. 2013. Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis. Comput. Speech Lang. 27, 5 (2013), 11051126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  209. [209] Reddy V. Ramu and Rao K. Sreenivasa. 2016. Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks. Neurocomputing 171 (2016), 13231334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  210. [210] Reddy V. Ramu, Sarkar Parakrant, and Rao K. Sreenivasa. 2014. Duration modeling by multi-models based on vowel production characteristics. In Proceedings of the 11th International Conference on Natural Language Processing. 3947.Google ScholarGoogle Scholar
  211. [211] Uwe D. Reichel and Hartmut R. Pfitzinger. 2006. Text preprocessing for speech synthesis. In TC-Star Speech to Speech Translation Workshop, Barcelona.Google ScholarGoogle Scholar
  212. [212] Ren Yi, Hu Chenxu, Tan Xu, Qin Tao, Zhao Sheng, Zhao Zhou, and Liu Tie-Yan. 2020. Fastspeech 2: Fast and high-quality end-to-end Text-to-Speech. Retrieved from https://arXiv:2006.04558.Google ScholarGoogle Scholar
  213. [213] Ritchie Sandy, Mahon Eoin, Heiligenstein Kim, Bampounis Nikos, Esch Daan van, Schallhart Christian, Mortensen Jonas, and Brard Benoit. 2020. Data-driven parametric text normalization: Rapidly scaling finite-state transduction verbalizers to new languages. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU’20) and Collaboration and Computing for Under-Resourced Languages (CCURL’20). 218225.Google ScholarGoogle Scholar
  214. [214] Ritchie Sandy, Sproat Richard, Gorman Kyle, Esch Daan van, Schallhart Christian, Bampounis Nikos, Brard Benoît, Mortensen Jonas Fromseier, Holt Millie, and Mahon Eoin. 2019. Unified verbalization for speech recognition & synthesis across languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). 35303534.Google ScholarGoogle ScholarCross RefCross Ref
  215. [215] Roark Brian, Sproat Richard, Allauzen Cyril, Riley Michael, Sorensen Jeffrey, and Tai Terry. 2012. The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL System Demonstrations. 6166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  216. [216] Rohit H. P. and Kallimani Jagadish S.. 2013. Text normalization in concatenative Text-to-Speech synthesis (TTS) for Kannada language. ICECIT.SIT, Tumkur.Google ScholarGoogle Scholar
  217. [217] Ronanki Srikanth. 2019. Prosody generation for text-to-speech synthesis. https://era.ed.ac.uk/handle/1842/36396.Google ScholarGoogle Scholar
  218. [218] Rosen George. 1958. Dynamic analog speech synthesizer. J. Acoust. Soc. Amer. 30, 3 (1958), 201209.Google ScholarGoogle ScholarCross RefCross Ref
  219. [219] Roy Somnath. 2014. Prominence detection in Hindi: A mathematical perspective. In Proceedings of the International Conference on Computational Science and Computational Intelligence (CSCI’14), Vol. 2. IEEE, 119124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  220. [220] Roy Somnath and Sinha Nishant. 2014. Duration modeling in Hindi. Int. J. Comput. Appl. 97, 6 (2014).Google ScholarGoogle Scholar
  221. [221] Saha Shambhu Nath and Mandal Shyamal Kr Das. 2014. Phonetic and phonological interference of English pronunciation by native Bengali (L1-Bengali, L2-English) speakers. In Proceedings of the 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA’14). IEEE, 16.Google ScholarGoogle Scholar
  222. [222] Sahu Lakshmi and Dhole Avinash. 2012. Hindi and Telugu text-to-Speech synthesis (TTS) and inter-language text conversion. Int. J. Sci. Res. Pub. 2, 4 (2012), 15.Google ScholarGoogle Scholar
  223. [223] Sangeetha J., Jothilakshmi S., Sindhuja S., and Ramalingam V.. 2013. Text-to-Speech synthesis system for Tamil. Proceedings of the International Conference on Information Systems and Computing (ICISC’13).Google ScholarGoogle Scholar
  224. [224] Sarkar Parakrant, Haque Arijul, Dutta Arup Kumar, Reddy Gurunath, Harikrishna D. M., Dhara Prasenjit, Verma Rashmi, Narendra N. P., Kr S. B. Sunil, Yadav Jainath, et al. 2014. Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi, and Telugu. In Proceedings of the 7th International Conference on Contemporary Computing (IC3’14). IEEE, 473477.Google ScholarGoogle ScholarCross RefCross Ref
  225. [225] Sau Anindya, Amin Tarik Aziz, Barman Nabagata, and Pal Alok Ranjan. 2019. Word sense disambiguation in Bengali using sense induction. In Proceedings of the International Conference on Applied Machine Learning (ICAML’19). IEEE, 170174.Google ScholarGoogle ScholarCross RefCross Ref
  226. [226] S. R. Savithri. 1986. Duration of stop consonants in Kannada. JASI 14, 2 (1986), 3–14.Google ScholarGoogle Scholar
  227. [227] Savithri S. R.. 2005. Duration as a Cue for Stress Perception in Kannada. J. Indian Speech Hear. Assoc. 19 (2005), 67.Google ScholarGoogle Scholar
  228. [228] IIT Madras SDL,. 2020. Writing Systems followed in Indian languages. Retrieved from http://www.acharya.gen.in:8080/linguistics/wrisys.php.Google ScholarGoogle Scholar
  229. [229] Shanmugam S. Aswin and Murthy Hema. 2014. A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. In Proceedings of the 15th Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  230. [230] Shen Jonathan, Jia Ye, Chrzanowski Mike, Zhang Yu, Elias Isaac, Zen Heiga, and Wu Yonghui. 2020. Non-attentive tacotron: Robust and controllable neural TTS synthesis including unsupervised duration modeling. Retrieved from https://arXiv:2010.04301.Google ScholarGoogle Scholar
  231. [231] Shreekanth T., Deeksha M. R., and Kaushik Karthikeya R.. 2018. A novel data independent approach for conversion of hand punched Kannada braille script to text and speech. Int. J. Image Graph. 18, 02 (2018), 1850010.Google ScholarGoogle ScholarCross RefCross Ref
  232. [232] Shreekanth T., Udayashankara V., and Chandrika M.. 2015. Duration modelling using neural networks for Hindi TTS system considering position of syllable in a word. Procedia Comput. Sci. 46 (2015), 6067.Google ScholarGoogle ScholarCross RefCross Ref
  233. [233] Keshan Sodimana, Pasindu De Silva, Richard Sproat, A. Theeraphol, Chen Fang Li, Alexander Gutkin, Supheak mungkol Sarin, and Knot Pipatsrisawat. 2018. Text normalization for Bangla, Khmer, Nepali, Javanese, Sinhala, and Sundanese TTS systems. In Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2018), International Speech Communication Association (ISCA), 29–31 August, Gurugram, India. 147–151.Google ScholarGoogle Scholar
  234. [234] Soorajkumar R., Girish G. N., Ramteke Pravin B., Joshi Shreyas S., and Koolagudi Shashidhar G.. 2017. Text-independent automatic accent identification system for Kannada language. In Proceedings of the International Conference on Data Engineering and Communication Technology. Springer, 411418.Google ScholarGoogle ScholarCross RefCross Ref
  235. [235] Sproat Richard, Black Alan W., Chen Stanley, Kumar Shankar, Ostendorf Mari, and Richards Christopher. 2001. Normalization of non-standard words. Comput. Speech Lang. 15, 3 (2001), 287333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  236. [236] Sreelekshmi K. S. and Gopinath Deepa P.. 2012. Clustering of duration patterns in speech for text-to-speech synthesis. In Proceedings of the Annual IEEE India Conference (INDICON’12). IEEE, 11221127.Google ScholarGoogle ScholarCross RefCross Ref
  237. [237] Stephenson Brooke, Hueber Thomas, Girin Laurent, and Besacier Laurent. 2021. Alternate endings: Improving prosody for incremental neural TTS with predicted future text input. Retrieved from https://arXiv:2102.09914.Google ScholarGoogle Scholar
  238. [238] Stevens Kenneth N.. 1977. Physics of laryngeal behavior and larynx modes. Phonetica 34, 4 (1977), 264279.Google ScholarGoogle ScholarCross RefCross Ref
  239. [239] Stevens Kenneth N. and House Arthur S.. 1955. Development of a quantitative description of vowel articulation. J. Acoust. Soc. Amer. 27, 3 (1955), 484493.Google ScholarGoogle ScholarCross RefCross Ref
  240. [240] Stevens Kenneth N., Kasowski Stanley, and Fant C. Gunnar M.. 1953. An electrical analog of the vocal tract. J. Acoust. Soc. Amer. 25, 4 (1953), 734742.Google ScholarGoogle ScholarCross RefCross Ref
  241. [241] Streijl Robert C., Winkler Stefan, and Hands David S.. 2016. Mean opinion score (MOS) revisited: Methods and applications, limitations and alternatives. Multimedia Syst. 22, 2 (2016), 213227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  242. [242] Sudhakar B. and Bensraj R.. 2015. Development of concatenative syllable-based Text-to-Speech synthesis system for Tamil. In Artificial Intelligence and Evolutionary Algorithms in Engineering Systems. Springer, 585592.Google ScholarGoogle Scholar
  243. [243] Sudhakar B. and Bensraj R.. 2016. Performance analysis of Text-to-Speech synthesis system using HMM and prosody features with parsing for Tamil language. Int. Res. J. Eng. Technol. 3, 06 (2016), 22332241.Google ScholarGoogle Scholar
  244. [244] Sun Guangzhi, Zhang Yu, Weiss Ron J., Cao Yuan, Zen Heiga, and Wu Yonghui. 2020. Fully hierarchical fine-grained prosody modeling for interpretable speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 62646268.Google ScholarGoogle ScholarCross RefCross Ref
  245. [245] Sunitha K. V. N. and Devi P. Sunitha. 2018. Unit selection to improve naturalness in speech synthesis. International J. Appl. Eng. Res. 13, 21 (2018), 1501115015.Google ScholarGoogle Scholar
  246. [246] Sutskever Ilya, Vinyals Oriol, and Le Quoc V.. 2014. Sequence to sequence learning with neural networks. Retrieved from https://arXiv:1409.3215.Google ScholarGoogle Scholar
  247. [247] Swain Philip H. and Hauska Hans. 1977. The decision tree classifier: Design and potential. IEEE Trans. Geosci. Electronics 15, 3 (1977), 142147.Google ScholarGoogle ScholarCross RefCross Ref
  248. [248] Taylor Paul. 2009. Text-to-speech Synthesis. Cambridge University Press, Cambridge, UK.Google ScholarGoogle ScholarCross RefCross Ref
  249. [249] Taylor Paul, Caley Richard, Black Alan W., and King Simon. 1999. Edinburgh speech tools library. Syst. Document. Ed. 1 (1999), 19941999.Google ScholarGoogle Scholar
  250. [250] Teranishi Ryunen and Umeda Noriko. 1968. Use of pronouncing dictionary in speech synthesis experiments. In Proceedings of the 6th International Congress on Acoustics, Vol. 2. 155158.Google ScholarGoogle Scholar
  251. [251] Thennattil Jubin James and Mary Leena. 2016. Phonetic engine for continuous speech in Malayalam. IETE J. Res. 62, 5 (2016), 679685.Google ScholarGoogle ScholarCross RefCross Ref
  252. [252] Thomas Abraham and Gopinath Deepa P.. 2012. Analysis of the chaotic nature of speech prosody and music. In Proceedings of the Annual IEEE India Conference (INDICON’12). IEEE, 210215.Google ScholarGoogle ScholarCross RefCross Ref
  253. [253] Thomas Neethu, Gokul P., Thomas Crisil, and Gopinath Deepa P.. 2015. Non-uniform unit selection using fuzzy ARTMAP for memory based Malayalam TTS. In Proceedings of the IEEE Recent Advances in Intelligent Computational Systems (RAICS’15). IEEE, 218223.Google ScholarGoogle ScholarCross RefCross Ref
  254. [254] Thomas Samuel, Rao M. Nageshwara, Murthy Hema A., and Ramalingam Coimbatore S.. 2006. Natural sounding TTS based on syllable-like units. In Proceedings of the 14th European Signal Processing Conference. IEEE, 15.Google ScholarGoogle Scholar
  255. [255] Titze Ingo R.. 1974. The human vocal cords: A mathematical model. Phonetica 29, 1-2 (1974), 121.Google ScholarGoogle ScholarCross RefCross Ref
  256. [256] Tripathi Kumud, Sarkar Parakrant, and Rao K. Sreenivasa. 2016. Sentence based discourse classification for Hindi story text-to-speech (TTS) system. In Proceedings of the 13th International Conference on Natural Language Processing. 4654.Google ScholarGoogle Scholar
  257. [257] Tsuruoka Yoshimasa, Miyao Yusuke, et al. 2011. Learning with lookahead: Can history-based models rival globally optimized models?. In Proceedings of the 15th Conference on Computational Natural Language Learning. 238246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  258. [258] Turney Peter D.. 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Retrieved from https://cs/0212032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  259. [259] Santen Jan P. H. Van. 1994. Assignment of segmental duration in text-to-speech synthesis. Comput. Speech Lang. 8, 2 (1994), 95128.Google ScholarGoogle ScholarCross RefCross Ref
  260. [260] Vekkot Susmitha and Gupta Deepa. 2019. Prosodic transformation in vocal emotion conversion for multi-lingual scenarios: A pilot study. Int. J. Speech Technol. 22, 3 (2019), 533549.Google ScholarGoogle ScholarDigital LibraryDigital Library
  261. [261] Vel S. Sakthi, Mubarak D. Muhammad Noorul, and Aji S.. 2015. A study on vowel duration in Tamil: Instrumental approach. In Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC’15). IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  262. [262] Vinodh M. V., Bellur Ashwin, Narayan K. Badri, Thakare Deepali M., Susan Anila, Suthakar N. M., and Murthy Hema A.. 2010. Using polysyllabic units for Text-to-Speech synthesis in Indian languages. In Proceedings of the National Conference on Communications (NCC’10). IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  263. [263] Wan Vincent, Agiomyrgiannakis Yannis, Silen Hanna, and Vit Jakub. 2017. Google’s next-generation real-time unit-selection synthesizer using sequence-to-sequence LSTM-based autoencoders. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17). 11431147.Google ScholarGoogle ScholarCross RefCross Ref
  264. [264] Wang Hongyan and Heuven V. J. Van. 2006. Acoustical analysis of English vowels produced by Chinese, Dutch, and American speakers.Google ScholarGoogle Scholar
  265. [265] Wang Wern Jun, Campbell W. Nick, Iwahashi Naoto, and Sagisaka Yoshinori. 1993. Tree-based unit selection for English speech synthesis. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’93), Vol. 2. IEEE, 191194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  266. [266] Wang Yuxuan, Skerry-Ryan R. J., Stanton Daisy, Wu Yonghui, Weiss Ron J., Jaitly Navdeep, Yang Zongheng, Xiao Ying, Chen Zhifeng, Bengio Samy, et al. 2017. Tacotron: Towards end-to-end speech synthesis. Retrieved from https://arXiv:1703.10135.Google ScholarGoogle Scholar
  267. [267] Wei Xizi, Hunt Melvyn, and Skilling Adrian. 2019. Neural network-based modeling of phonetic durations. Retrieved from https://arXiv:1909.03030.Google ScholarGoogle Scholar
  268. [268] Weizenbaum Joseph. 1966. ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (1966), 3645. Google ScholarGoogle ScholarDigital LibraryDigital Library
  269. [269] Wright James T., Malsheen B. J., and Peet Margot. 1986. Comparison of segmental intelligibility and pronunciation accuracy for two commercial text-to-speech systems. In Proceedings of the Applied Voice Input Output Society (AVIOS’86). 235261.Google ScholarGoogle Scholar
  270. [270] Yarowsky David. 1997. Homograph disambiguation in text-to-speech synthesis. In Progress in Speech Synthesis. Springer, 157172.Google ScholarGoogle ScholarCross RefCross Ref
  271. [271] Yegnanarayana B.. 2009. Artificial Neural Networks. PHI Learning Pvt. Ltd. Google ScholarGoogle ScholarDigital LibraryDigital Library
  272. [272] Zhang Hao, Sproat Richard, Ng Axel H., Stahlberg Felix, Peng Xiaochang, Gorman Kyle, and Roark Brian. 2019. Neural models of text normalization for speech applications. Comput. Linguist. 45, 2 (2019), 293337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  273. [273] Zhang Zhilu and Sabuncu Mert R.. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Retrieved from https://arXiv:1805.07836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  274. [274] Zhou Xiao, Ling Zhen-Hua, and Dai Li-Rong. 2020. Extracting unit embeddings using sequence-to-sequence acoustic models for unit selection speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 76597663.Google ScholarGoogle ScholarCross RefCross Ref
  275. [275] Zipf George Kingsley. 1949. Human Behaviour and the Principle of Least-effort. Addison-Wesley, Cambridge, MA.Google ScholarGoogle Scholar

Index Terms

  1. Text-to-Speech Synthesis: Literature Review with an Emphasis on Malayalam Language

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 4
        July 2022
        464 pages
        ISSN:2375-4699
        EISSN:2375-4702
        DOI:10.1145/3511099
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 January 2022
        • Accepted: 1 November 2021
        • Revised: 1 October 2021
        • Received: 1 November 2020
        Published in tallip Volume 21, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)239
        • Downloads (Last 6 weeks)19

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!