Abstract
Text-to-Speech Synthesis (TTS) is an active area of research to generate synthetic speech from underlying text. The identified syllables are uttered with proper duration and prosody characteristics to emulate natural speech. It falls under the category of Natural Language Processing (NLP), which aims to bridge the gap in communication between human and machine. So far as Western languages like English are concerned, the research to produce intelligent and natural synthetic speech has advanced considerably. But in a multilingual state like India, many regional languages viz. Malayalam is underexplored when it comes to NLP. In this article, we try to amalgamate the major research works performed in the area of TTS in English and the prominent Indian languages, with a special emphasis on the South Indian language, Malayalam. This review intends to provide right direction to the research activities in the language, in the area of TTS.
- [1] . 2013. Prosody modeling: A review report on Indian language. In Mining Intelligence and Knowledge Exploration. Springer, 831–842. Google Scholar
Digital Library
- [2] . 2012. Unsupervised emotion detection from text using semantic and syntactic relations. In Proceedings of the IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Vol. 1. IEEE, 346–353. Google Scholar
Digital Library
- [3] . 2019. Syllable-based Bengali text-to-speech system. In Australian Journal of Science and Technology. AUJST.Google Scholar
- [4] . 2010. Clustering of duration pattern in speech. https://www.researchgate.net/profile/Deepa-Gopinath/publication/229034822_Clustering_of_Duration_Pattern_in_Speech/links/0deec539fcdd84a76e000000/Clustering-of-Duration-Pattern-in-Speech.pdf.Google Scholar
- [5] . 2008. Text Normalization System for Bangla.
Technical Report . BRAC University.Google Scholar - [6] . 1985. A model for the synthesis of natural sounding vowels. J. Acoust. Soc. Amer. 78, 1 (1985), 58–69.Google Scholar
Cross Ref
- [7] . 1976. Synthesis of speech from unrestricted text. Proc. IEEE 64, 4 (1976), 433–442.Google Scholar
Cross Ref
- [8] . 1987. From Text-to-Speech: The MITalk System. Cambridge University Press. Google Scholar
Digital Library
- [9] . 2011. Festvox: Tools for creation and analyses of large speech corpora. In Proceedings of the Workshop on Very Large Scale Phonetics Research. 70.Google Scholar
- [10] . 2006. A phrase-based statistical model for SMS text normalization. In Proceedings of the COLING/ACL on Main Conference Poster Sessions. Association for Computational Linguistics, 33–40. Google Scholar
Digital Library
- [11] . 2014. Neural machine translation by jointly learning to align and translate. Retrieved from https://arXiv:1409.0473.Google Scholar
- [12] . 2018. Development of unit selection based speech synthesis system. In Proceedings of the 3rd International Conference on Internet of Things and Connected Technologies (ICIoTCT’18). 26–27.Google Scholar
Cross Ref
- [13] . 2021. Land Area, Data Bank. Retrieved from https://data.worldbank.org/indicator/AG.LND.TOTL.K2.Google Scholar
- [14] . 2005. On the use of the hanning window for harmonic analysis in the standard framework. IEEE Trans. Power Delivery 21, 1 (2005), 538–539.Google Scholar
Cross Ref
- [15] . 2013. Evaluation of prosody in text-to-speech synthesis system of Bangla. In Proceedings of the International Conference Oriental COCOSDA Held Jointly with the Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE’13). IEEE, 1–6.Google Scholar
Cross Ref
- [16] . 2019. Text-to-speech synthesis system for Mymensinghiya dialect of Bangla language. In Progress in Advanced Computing and Intelligent Engineering. Springer, 291–303.Google Scholar
- [17] . 1998. Some observations on corpora of some Indian languages. Knowledge-Based Computer Systems, Tata McGraw-Hill.Google Scholar
- [18] . 1998. The festival speech synthesis system. version 1.4.2. Unpublished document available via http://www.cstr.ed.ac.uk/projects/festival.html 6 (2001), 365–377.Google Scholar
- [19] . 1995. Optimising selection of units from speech databases for concatenative synthesis. https://era.ed.ac.uk/handle/1842/1279.Google Scholar
- [20] . 1997. Automatically clustering similar units for unit selection in speech synthesis. In Proceedings of the Eurospeech 1997, Rhodes, Greece.Google Scholar
- [21] . 2007. Statistical parametric speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’07), Vol. 4. IEEE, IV–1229.Google Scholar
Cross Ref
- [22] . 2013. Development and evaluation of unit selection and HMM-based speech synthesis systems for Tamil. In Proceedings of the National Conference on Communications (NC’13). IEEE, 1–5.Google Scholar
Cross Ref
- [23] . 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL’09). 31–40.Google Scholar
- [24] . 1979. The new theories of vocal fold vibration. In Speech and Language. Vol. 2. Elsevier, 203–256.Google Scholar
- [25] . 2009. Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL’09). 103–111. Google Scholar
Digital Library
- [26] . 1983. 3-Tiered Software and VLSI Aid Developmental System to Read Text Aloud. Electronics 56, 8 (1983), 133.Google Scholar
- [27] . 2001. Joint prosody prediction and unit selection for concatenative speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), Vol. 2. IEEE, 781–784. Google Scholar
Digital Library
- [28] . 1975. A phonetically oriented programming language for rule description of speech. Speech Commun. 2 (1975), 245–253.Google Scholar
- [29] . 1976. A text-to-speech system based entirely on rules. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’76), Vol. 1. IEEE, 686–688.Google Scholar
Cross Ref
- [30] . 1993. FLEX: A tool for building efficient and flexible systems. In Proceedings of IEEE 4th Workshop on Workstation Operating Systems (WWOS’93). IEEE, 198–202.Google Scholar
Cross Ref
- [31] . 2010. Vowel classification based approach for Telugu text-to-speech system using symbol concatenation. In Proceedings of the International Conference (ACCTA’10), Vol. 1. 183–187.Google Scholar
- [32] . 1968. The sound pattern of English. Harper & Row Publishers. New York, Evanston, and London.Google Scholar
- [33] . 2011. Text normalization in social media: Progress, problems and applications for a pre-processing system of casual English. Procedia-Soc. Behav. Sci. 27 (2011), 2–11.Google Scholar
Cross Ref
- [34] . 1973. Audible outputs of reading machines for the blind. Haskins Labortories Status Report on Speech Research SRr29/30 (1972), 91–95.Google Scholar
- [35] . 1995. Support vector machine. Machine Learning 20, 3 (1995), 273–297. Google Scholar
Digital Library
- [36] . 2020. Pre-editing and textstandardization on a Bengali written text corpus. Aligarh J. Linguistics 10, 1 (2020), 1–3.Google Scholar
- [37] . 2018. Intonation rules for text reading. In Epoch Synchronous Overlap Add. Springer, 135–176.Google Scholar
- [38] A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, and K. Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.Google Scholar
- [39] . 1955. Acoustic loci and transitional cues for consonants. J. Acoust. Soc. Amer. 27, 4 (1955), 769–773.Google Scholar
Cross Ref
- [40] . 2016. Kannada Text-to-Speech conversion: A novel approach. In Proceedings of the International Conference on Electrical, Electronics, Communication, Computer and Optimization Techniques (ICEECCOT’16). IEEE, 168–172.Google Scholar
Cross Ref
- [41] . 2020. An annotated dataset of discourse modes in Hindi stories. In Proceedings of the 12th Language Resources and Evaluation Conference. 1191–1196.Google Scholar
- [42] . 1953. The lorentz transformation and absolute time. Physica 19, 1–12 (1953), 888–896. https://doi.org/10.1016/S0031-8914(53)80099-6Google Scholar
Cross Ref
- [43] . 1950. The calculation of vowel resonances, and an electrical vocal tract. J. Acoust. Soc. Amer. 22, 6 (1950), 740–753.Google Scholar
Cross Ref
- [44] . 1993. MBR-PSOLA: Text-to-speech synthesis based on an MBE re-synthesis of the segments database. Speech Commun. 13, 3–4 (1993), 435–440. Google Scholar
Digital Library
- [45] . 2015. The kestrel TTS text normalization system. Natural Lang. Eng. 21, 3 (2015), 333.Google Scholar
Cross Ref
- [46] . 1948. Articulation testing methods. Laryngoscope 58, 9 (1948), 955–991.Google Scholar
Cross Ref
- [47] . 2021. Parallel tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling. Retrieved from https://arXiv:2103.14574.Google Scholar
- [48] . 1979. An approach to evaluating auditory speech perception ability. Volta Rev. 81, 1 (1979), 16–24.Google Scholar
- [49] . 2019. Synthesis of emotional speech by prosody modification of vowel segments of neutral speech. In Proceedings of the 2nd International Conference on Advanced Computing and Software Engineering (ICACSE’19).Google Scholar
Cross Ref
- [50] . 2017. Review of syllable based Text-to-Speech systems: Strategies for enhancing naturalness for Devanagari languages. International Journal of Computer Science and Applications 14, 2 (2017).Google Scholar
- [51] . 1960. Acoustic Theory of Speech Production.’s Gravenhage, Mouton & Co.Google Scholar
- [52] . 1985. Notes on glottal flow interaction. KTH, Speech Transmission Laboratory, Quarterly Report 2–3. 21–45.Google Scholar
- [53] . 2015. Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- [54] 1963. The theory of a general quantum system interacting with a linear dissipative system. Ann. Phys. 24 (1963), 118–173. https://doi.org/10.1016/0003-4916(63)90068-XGoogle Scholar
Cross Ref
- [55] . 1972. Wiring telephone apparatus from computer-generated speech. Bell Labs Tech. J. 51, 2 (1972), 391–397.Google Scholar
Cross Ref
- [56] . 2017. A text normalisation system for non-standard English words. In Proceedings of the 3rd Workshop on Noisy User-generated Text. 107–115.Google Scholar
Cross Ref
- [57] . 1973. The viterbi algorithm. Proc. IEEE 61, 3 (1973), 268–278.Google Scholar
Cross Ref
- [58] . 1978. Votrax real time hardware for phoneme synthesis of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’78), Vol. 3. IEEE, 175–178.Google Scholar
Cross Ref
- [59] . 2017. Syllable as the basic unit for Kannada speech synthesis. In 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET). IEEE, 1205–1208.Google Scholar
Cross Ref
- [60] . 1964. Synthesis of speech from code signals.
U.S. Patent 3,158,685. Google Scholar - [61] . 2015. Text normalization and unit selection for a memory based non uniform unit selection TTS in Malayalam. In Proceedings of the 12th International Conference on Natural Language Processing. 168.Google Scholar
- [62] . 1983. University-to-industry advanced technology transfer: A case study. Res. Policy 12, 3 (1983), 121–152.Google Scholar
Cross Ref
- [63] . 2009. Duration analysis and modelling for Malayalam Text-to-Speech synthesis systems. (2009).Google Scholar
- [64] . 2007. Emotional analysis for Malayalam Text-to-Speech synthesis systems. In Proceedings of the International Conference on Sciences of Electronic, Technologies of Information and Telecommunication (SETIT’07).Google Scholar
- [65] . 2006. Duration analysis for Malayalam text-to-speech systems. In Proceedings of the 9th International Conference on Information Technology (ICIT’06). IEEE, 129–132. Google Scholar
Digital Library
- [66] . 2008. Modeling of vowel duration in Malayalam speech using probability distribution. In Proceedings of the Speech Prosody Conference. 6–9.Google Scholar
- [67] . 2008. A hybrid duration model using CART and HMM. In Proceedings of the IEEE Region 10 Conference (TENCON’08). IEEE, 1–4.Google Scholar
Cross Ref
- [68] . 2000. Using Bayesian belief networks for model duration in text-to-speech systems. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’00). 427–430.Google Scholar
- [69] . 1998. An evaluation of the diagnostic rhyme test. Int. J. Speech Technol. 2, 3 (1998), 201–214.Google Scholar
Cross Ref
- [70] . 1983. Signal estimation from modified short-time fourier transform. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’83), Vol. 8. 804–807. https://doi.org/10.1109/ICASSP.1983.1172092Google Scholar
Cross Ref
- [71] . 1991. Formant extraction from minimum phase group delay functions. Speech Commun. 1 (1991), 209–221. Google Scholar
Digital Library
- [72] . 2004. On Intelligence, St. Martin’s Griffin, New York, NY. Google Scholar
Digital Library
- [73] . 2008. Communication Systems. John Wiley & Sons, New York, NY. Google Scholar
Digital Library
- [74] . 1949. The first stage of perception: Growth of the assembly. Organiz. Behav. 4 (1949), 60–78.Google Scholar
- [75] . 1962. Studies of nasal consonants with an articulatory speech synthesizer. J. Acoust. Soc. Amer. 34, 2 (1962), 179–187.Google Scholar
Cross Ref
- [76] . 2020. CAMP: A two-stage approach to modelling prosody in context. Retrieved from https://arXiv:2011.01175.Google Scholar
- [77] . 1964. Speech synthesis by rule. Lang. Speech 7, 3 (1964), 127–143.Google Scholar
Cross Ref
- [78] . 2010. Automatic assessment of non-native prosody for English as l2. In Proceedings of the 5th International Conference on Speech Prosody.Google Scholar
- [79] . 1965. Articulation-testing methods: Consonantal differentiation with a closed-response set. J. Acoust. Soc. Amer. 37, 1 (1965), 158–166.Google Scholar
Cross Ref
- [80] . 2020. A text normalization method for speech synthesis based on local attention mechanism. IEEE Access 8 (2020), 36202–36209.Google Scholar
Cross Ref
- [81] . 2001. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall PTR. Google Scholar
Digital Library
- [82] . 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’96), Vol. 1. IEEE, 373–376. Google Scholar
Digital Library
- [83] . 2011. The Linguistic Theory of Numerals. Vol. 16. Cambridge University Press, Cambridge, UK.Google Scholar
- [84] . 2004. The Georgetown-IBM experiment demonstrated in January 1954. In Proceedings of the Conference of the Association for Machine Translation in the Americas. Springer, 102–114.Google Scholar
Cross Ref
- [85] . 2017. Text-to-Speech synthesis system for Tamil using HMM. In Proceedings of the IEEE International Conference on Circuits and Systems (ICCS’17). IEEE, 447–451.Google Scholar
Cross Ref
- [86] . 2015. Pause duration model for Malayalam TTS. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’15). IEEE, 2206–2210.Google Scholar
Cross Ref
- [87] . 2014. A comprehensive survey on Text-to-Speech synthesis with a special emphasis to Indian languages. In Proceedings of the National Conference on Indian Language Computing (NCILC’14). 1–4.Google Scholar
- [88] . 2020. Text normalization using encoder–decoder networks based on the causal feature extractor. Appl. Sci. 10, 13 (2020), 4551.Google Scholar
Cross Ref
- [89] . 2018. Multi-class emotion detection and annotation in Malayalam novels. In Proceedings of the International Conference on Computer Communication and Informatics (ICCCI’18). IEEE, 1–5.Google Scholar
Cross Ref
- [90] . 2019. An improved Text-to-Speech technique for Tamil language using hidden Markov model. In Proceedings of the 7th International Conference on Smart Computing and Communications (ICSCC’19). IEEE, 1–5.Google Scholar
Cross Ref
- [91] . 2016. Prediction of syllable duration using structure optimised cuckoo search neural network (SOCNN) for text-to-speech. J. Comput. Theoret. Nanosci. 13, 10 (2016), 7538–7544.Google Scholar
Cross Ref
- [92] . 1985. Markov source modeling of text generation. In The Impact of Processing Techniques on Communications. Springer, 569–591.Google Scholar
Cross Ref
- [93] . 1948. Acoustic phonetics. Language 24, 2 (1948), 5–136.Google Scholar
Cross Ref
- [94] . 2015. Text-to-Speech system for Kannada language. In Proceedings of the International Conference on Communications and Signal Processing (ICCSP’15). IEEE, 1901–1904.Google Scholar
Cross Ref
- [95] . 2010. The implicit prosody hypothesis and overt prosody in English. Lang. Cogn. Process. 25, 7-9 (2010), 1201–1233.Google Scholar
Cross Ref
- [96] . 2015. Analysis of singing voice for epoch extraction using zero frequency filtering method. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). IEEE, 4260–4264.Google Scholar
Cross Ref
- [97] . 2012. Normalization of non standard words for Kannada speech synthesis. International Journal of Information Technology Infrastructure 1, 2 (2012).Google Scholar
- [98] . 2016. A Text-to-Speech synthesizer using acoustic unit based concatenation for any Indian language of devanagari script. In Proceedings of the 11th International Conference on Industrial and Information Systems (ICIIS’16). IEEE, 759–763.Google Scholar
Cross Ref
- [99] . 1999. Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun. 27, 3–4 (1999), 187–207. Google Scholar
Digital Library
- [100] . 1961. An artificial talker driven from a phonetic input. J. Acoust. Soc. Amer. 33, 6 (1961), 835–835.Google Scholar
Cross Ref
- [101] . 2019. CHiVE: Varying prosody in speech synthesis with a linguistically driven dynamic hierarchical conditional variational network. In Proceedings of the International Conference on Machine Learning. PMLR, 3331–3340.Google Scholar
- [102] . 2016. Concatenative speech synthesis: A review. Int. J. Comput. Appl. 136, 3 (2016), 6.Google Scholar
- [103] . 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.Google Scholar
- [104] . 2013. Auto-encoding variational Bayes. Retrieved from https://arXiv:1312.6114.Google Scholar
- [105] . 2002. A data driven synthesis approach for Indian languages using syllable as basic unit. In Proceedings of the International Conference on NLP (ICON’02). 311–316.Google Scholar
- [106] . 2003. Unit size in unit selection speech synthesis. In Proceedings of the 8th European Conference on Speech Communication and Technology.Google Scholar
- [107] . 1970. Synthesis of stop consonants in initial position. J. Acoust. Soc. Amer. 47, 1A (1970), 93–94.Google Scholar
Cross Ref
- [108] . 1980. Software for a cascade/parallel formant synthesizer. the Journal of the Acoustical Society of America 67, 3 (1980), 971–995.Google Scholar
Cross Ref
- [109] . 1987. Review of text-to-speech conversion for English. J. Acoust. Soc. Amer. 82, 3 (1987), 737–793.Google Scholar
Cross Ref
- [110] . 2010. Jflex user’s manual. Available on-line at www. jflex. de. Accessed August (2010).Google Scholar
- [111] . 1946. The sound spectrograph. J. Acoust. Soc. Amer. 18, 1 (1946), 19–49.Google Scholar
Cross Ref
- [112] . 2004. Duration modeling of Indian languages Hindi and Telugu. In Proceedings of the 5th ISCA Workshop on Speech Synthesis.Google Scholar
- [113] . 2004. A new prosodic phrasing model for Indian language Telugu. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’04).Google Scholar
Cross Ref
- [114] . 2004. Duration modeling for Hindi text-to-speech synthesis system. In Proceedings of the International Conference on Spoken Language Processing (ICSLP’04).Google Scholar
- [115] N. S. Krishna and H. A. Murthy. 2004. Duration modeling of Indian languages Hindi and Telugu. In SSW.Google Scholar
- [116] . 1999. Neural networks: A comprehensive foundation by Simon Haykin. Knowledge Eng. Rev. 13, 4 (1999), 409–412. Google Scholar
Digital Library
- [117] . 2020. A novel semantic approach for intelligent response generation using emotion detection incorporating NPMI measure. Procedia Comput. Sci. 167 (2020), 571–579.Google Scholar
Cross Ref
- [118] . 1989. Significance of durational knowledge for speech synthesis system in an Indian language. In Proceedings of the 4th IEEE Region 10 International Conference (TENCON’89). IEEE, 486–489.Google Scholar
Cross Ref
- [119] . 1976. The Kurzweil reading machine: A technical overview. Science, Technology, and the Handicapped. 3–11.Google Scholar
- [120] . 2010. Continuity metric for unit selection based text-to-speech synthesis. In Proceedings of the International Conference on Signal Processing and Communications (SPCOM’10). IEEE, 1–5.Google Scholar
Cross Ref
- [121] . 2019. Robust and fine-grained prosody control of end-to-end speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 5911–5915.Google Scholar
Cross Ref
- [122] . 1957. Syntactic structures. Language 33, 3 Part 1 (1957), 375–408.Google Scholar
Cross Ref
- [123] . 1992. Lex & Yacc. O’Reilly Media, Inc.Google Scholar
- [124] . 1967. Intonation, perception, and language. MIT Research Monograph (1967).Google Scholar
- [125] . 1985. Speech synthesis with a reflection-type line analog. DS Dissertation, Dept. Speech Commun. and Music Acoust., Royal Inst. Tech.Google Scholar
- [126] . 2011. Insertion, deletion, or substitution?: Normalizing text messages without pre-categorization nor supervision. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 71–76. Google Scholar
Digital Library
- [127] . 1985. Measuring the segmental intelligibility of synthetic speech: Results for eight Text-to-Speech systems, Research on Speech Perception Progress Report No. 11. University of Indiana, Bloomington, IN.Google Scholar
- [128] . 1983. Capacity demands in short-term memory for synthetic and natural speech. Human Factors 25, 1 (1983), 17–32.Google Scholar
Cross Ref
- [129] . 2015. Effective approaches to attention-based neural machine translation. Retrieved from https://arXiv:1508.04025.Google Scholar
- [130] . 1982. Voicing-dependent vowel duration in English and French: Monolingual and bilingual production. J. Acoust. Soc. Amer. 71, 1 (1982), 173–178.Google Scholar
Cross Ref
- [131] . 2013. Computing prosodic patterns for Malayalam. NCILC.DCA.CUSAT.Google Scholar
- [132] . 1993. Intonation component of a Text-to-Speech system for Hindi. Comput. Speech Lang. 7, 3 (1993), 283–301.Google Scholar
Cross Ref
- [133] . 2018. Resyllabification in Indian languages and its implications in text-to-speech systems. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). 212–216.Google Scholar
- [134] . 2020. HinglishNorm—A corpus of Hindi-English code mixed sentences for text normalization. Retrieved from https://arXiv:2010.08974.Google Scholar
- [135] . 2020. Malayalam-English code-switched: Grapheme to phoneme system. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’20). 4133–4137.Google Scholar
- [136] . 1986. Comprehension of natural and synthetic speech using a sentence verification task. J. Acoust. Soc. Amer. 79, S1 (1986), S25–S25.Google Scholar
Cross Ref
- [137] . 2018. Automatic syllabification of speech signal using short time energy and vowel onset points. Int. J. Speech Technol. 21, 3 (2018), 571–579. Google Scholar
Digital Library
- [138] . 2010. Aspects of emotional prosody in Malayalam and Hindi. Buckingham J. Lang. Linguist. 3 (2010), 25–34.Google Scholar
Cross Ref
- [139] . 1966. Synthesis by rule of prosodic features. Lang. Speech 9, 1 (1966), 1–13.
DOI: 10.1177/002383096600900101Google ScholarCross Ref
- [140] . 1969. Synthesis by Rule of General American English.Google Scholar
- [141] . 2009. Acoustic and durational properties of Indian English vowels. World Englishes 28, 1 (2009), 52–69.Google Scholar
Cross Ref
- [142] . 1978. History of LISP. In History of Programming Languages. 173–185. Google Scholar
Digital Library
- [143] . 1943. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 4 (1943), 115–133.Google Scholar
Cross Ref
- [144] . 2020. An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis. Info. Process. Manage. 57, 6 (2020), 102368.Google Scholar
Cross Ref
- [145] . 2017. A light-weight method of building an LSTM-RNN-based bilingual TTS system. In Proceedings of the International Conference on Asian Language Processing (IALP’17). IEEE, 201–205.Google Scholar
Cross Ref
- [146] . 2006. The Dartmouth College artificial intelligence conference: The next fifty years. AI Mag. 27, 4 (2006), 87.Google Scholar
Digital Library
- [147] . 1995. “Soundex” codes of surnames provide confidentiality and accuracy in a national HIV database. Communic. Disease Rep. CDR Rev. 5, 12 (1995), R183–6.Google Scholar
- [148] . 1990. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9, 5–6 (1990), 453–467. Google Scholar
Digital Library
- [149] . 1962. Isopreference method for evaluating speech-transmission circuits. J. Acoust. Soc. Amer. 34, 6 (1962), 762–774.Google Scholar
Cross Ref
- [150] . 2001. Human touch to Tamil speech synthesizer. In Proceedings of Tamilnet. 103–109.Google Scholar
- [151] . 2004. Pitch modification using DCT in the source domain. Submitted to J. Speech Commun. 42, 143–154.Google Scholar
Cross Ref
- [152] . 1997. The real root cepstrum and its applications to speech processing. In Proceedings of the National Conference on Communication. 180–183.Google Scholar
- [153] . 2003. Minimum phase signal derived from root cepstrum. Electron. Lett. 39, 12 (2003), 941–942.Google Scholar
Cross Ref
- [154] . 2014. Prosodic analysis for Telugu script. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4, 5 (2014), 1922–1925.Google Scholar
- [155] . 2018. Design issues of Telugu emotional speech system. Int. J. Appl. Eng. Res. 13, 4 (2018), 1922–1925.Google Scholar
- [156] . 2007. On using classical poetry structure for Indian language post-processing. In Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR’07), Vol. 2. IEEE, 1238–1242. Google Scholar
Digital Library
- [157] . 2004. Schwa-deletion in Hindi text-to-speech synthesis. Int. J. Speech Technol. 7, 4 (2004), 319–333.Google Scholar
Cross Ref
- [158] . 2011. Development of syllable-based Text-to-Speech synthesis system in Bengali. Int. J. Speech Technol. 14, 3 (2011), 167–181. Google Scholar
Digital Library
- [159] . 2019. World Population Prospects 2019. Retrieved from https://population.un.org/wpp/.Google Scholar
- [160] . 2009. Prosodic rules for schwa-deletion in Hindi text-to-speech synthesis. Int. J. Speech Technol. 12, 1 (2009), 15.Google Scholar
Cross Ref
- [161] . 1986. The perception of synthetic speech in noise. In Basic and Applied Aspects of Noise-Induced Hearing Loss. Springer, 345–356.Google Scholar
Cross Ref
- [162] . 1973. A plan for the field evaluation of an automated reading system for the blind. IEEE Trans. Audio Electroacoust. 21, 3 (1973), 265–268.Google Scholar
Cross Ref
- [163] . 2011. Data on Language and Mother Tongue. Retrieved from https://censusindia.gov.in/2011Census/Language_MTs.html.Google Scholar
- [164] . 2000. Non-standard word and homograph resolution for asian language text analysis. In Proceedings of the 6th International Conference on Spoken Language Processing.Google Scholar
- [165] . 2016. Wavenet: A generative model for raw audio. Retrieved from https://arXiv:1609.03499.Google Scholar
- [166] . 2013. HMM-based sCost quality control for unit selection speech synthesis. In Proceedings of the 8th ISCA Speech Synthesis Workshop (SSW’13). 53–57.Google Scholar
- [167] . 2004. Hindi text normalization. In Proceedings of the 5th International Conference on Knowledge Based Computer Systems (KBCS’04). 19–22.Google Scholar
- [168] . 2014. Akshara-to-sound rules for Hindi. Writing Syst. Res. 6, 1 (2014), 54–72.Google Scholar
Cross Ref
- [169] . 2017. A generative model of a pronunciation lexicon for Hindi. Retrieved from https://arXiv:1705.02452.Google Scholar
- [170] . 2013. A syllable-based framework for unit selection synthesis in 13 Indian languages. In Proceedings of the International Conference Oriental COCOSDA held jointly with Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE’13). IEEE, 1–8.Google Scholar
Cross Ref
- [171] . 2011. Significance of vowel epenthesis in Telugu text-to-speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). IEEE, 5348–5351.Google Scholar
Cross Ref
- [172] . 2010. Normalization of text messages for text-to-speech. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’10). IEEE, 4842–4845.Google Scholar
Cross Ref
- [173] . 1960. Duration of syllable nuclei in English. J. Acoust. Soc. Amer. 32, 6 (1960), 693–703.Google Scholar
Cross Ref
- [174] . 1996. Implications of Hindi prosodic structure. Curr. Trends Phonol.: Models Methods 2 (1996), 549–584.Google Scholar
- [175] . 1980. Perceptual evaluation of MITalk: The MIT unrestricted text-to-speech system. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’80), Vol. 5. IEEE, 572–575.Google Scholar
Cross Ref
- [176] . 1985. Perception of synthetic speech generated by rule. Proc. IEEE 73, 11 (1985), 1665–1676.Google Scholar
Cross Ref
- [177] . 1983. Intelligibility of consonants in CVC utterances produced by dyadic rule synthesis. Speech Commun. 2, 1 (1983), 3–13.Google Scholar
Cross Ref
- [178] . 1947. Visible Speech, D. Vannostrand Co., New York, NY, 28–56.Google Scholar
- [179] . 1946. Introduction to technical discussions of sound portrayal. J. Acoust. Soc. Amer. 18, 1 (1946), 1–3.Google Scholar
Cross Ref
- [180] . 1980. Svana vijnanam. Kerala Bhasha Institute. thiruvananthapuram.Google Scholar
- [181] . 2019. Building multilingual end-to-end speech synthesisers for Indian languages. In Proceedings of the 10th ISCA Speech Synthesis Workshop (SSW’10). 194–199.Google Scholar
Cross Ref
- [182] . 2019. Transcription Correction and Rhythm Analysis for Applications in Text-to-speech Synthesis for Indian Languages. Ph.D. Dissertation. Indian Institute of Technology, Madras.Google Scholar
- [183] . 2019. Analysis of inter-pausal units in Indian languages and its application to text-to-speech synthesis. IEEE/ACM Trans. Audio, Speech, Lang. Process. 27, 10 (2019), 1616–1628. Google Scholar
Digital Library
- [184] . 2016. Expressive speech analysis for epoch extraction using zero frequency filtering approach. In Proceedings of the IEEE Students’ Technology Symposium (TechSym’16). IEEE, 240–244.Google Scholar
Cross Ref
- [185] . 1990. Savitzky-Golay smoothing filters. Comput. Phys. 4, 6 (1990), 669–672. Google Scholar
Digital Library
- [186] . 2014. C4.5: Programs for Machine Learning. Elsevier. Google Scholar
Digital Library
- [187] . 1986. An introduction to hidden Markov models. IEEE ASSP Mag. 3, 1 (1986), 4–16.Google Scholar
Cross Ref
- [188] . 2015. Duration modeling for Text-to-Speech synthesis system using festival speech engine developed for Malayalam language. In Proceedings of the International Conference on Circuits, Power and Computing Technologies (ICCPCT’15). IEEE, 1–5.Google Scholar
Cross Ref
- [189] A. R. Rajaraja Varma. 1980. Vruthamanjari. DC Books. Kottayam.Google Scholar
- [190] A. R. Rajaraja Varma. 1986. Keralapanineeyam. DC Books. Kottayam.Google Scholar
- [191] . 2015. Text processing for developing unrestricted Tamil Text-to-Speech synthesis system. Indian J. Sci. Technol. 8, 29 (2015), 112–124.Google Scholar
Cross Ref
- [192] . 2017. Prosody detection from text using aggregative linguistic features. In Proceedings of the International Conference on Next Generation Computing Technologies. Springer, 736–749.Google Scholar
- [193] . 2019. A robust syllable centric pronunciation model for Tamil Text-to-Speech synthesizer. IETE J. Res. 65, 5 (2019), 601–612.Google Scholar
Cross Ref
- [194] . 2012. Prosody modeling techniques for text-to-speech synthesis systems—A survey. Int. J. Comput. Appl. 39, 16 (2012), 8–11.Google Scholar
- [195] . 2019. A Bangla text-to-speech system using deep neural networks. In Proceedings of the International Conference on Bangla Speech and Language Processing (ICBSLP’19). IEEE, 1–5.Google Scholar
Cross Ref
- [196] . 2010. Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 3, 12 (2010), 1107.Google Scholar
Cross Ref
- [197] . 2004. Modeling syllable duration in Indian languages using neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Vol. 5. IEEE, V–313.Google Scholar
Cross Ref
- [198] . 2005. Modeling syllable duration in Indian languages using support vector machines. In Proceedings of the International Conference on Intelligent Sensing and Information Processing. IEEE, 258–263.Google Scholar
Cross Ref
- [199] . 2007. Modeling durations of syllables using neural networks. Comput. Speech Lang. 21, 2 (2007), 282–295. Google Scholar
Digital Library
- [200] . 2009. Duration modification using glottal closure instants and vowel onset points. Speech Commun. 51, 12 (2009), 1263–1269. Google Scholar
Digital Library
- [201] . 2009. Intonation modeling for Indian languages. Comput. Speech Lang. 23, 2 (2009), 240–256. Google Scholar
Digital Library
- [202] . 2010. Text normalization and diphone preparation for Bangla speech synthesis. J. Multimedia 5, 6 (2010), 551–559.Google Scholar
Cross Ref
- [203] . 2003. A unit selection approach to F0 modeling and its application to emphasis. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU’03). IEEE, 700–705.Google Scholar
Cross Ref
- [204] . 2011. Text-to-speech synthesis system for Kannada language. Int. J. Adv. Res. Comput. Sci. 2, 1 (2011).Google Scholar
- [205] . 2012. Evaluation of Kannada text-to-speech system. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2, 1 (2012).Google Scholar
- [206] . 2018. DNN-based bilingual (Telugu-Hindi) polyglot speech synthesis. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’18). IEEE, 1808–1811.Google Scholar
Cross Ref
- [207] . 2011. Intonation modeling using FFNN for syllable based Bengali Text-to-Speech synthesis. In Proceedings of the 2nd International Conference on Computer and Communication Technology (ICCCT’11). IEEE, 334–339.Google Scholar
Cross Ref
- [208] . 2013. Two-stage intonation modeling using feedforward neural networks for syllable based text-to-speech synthesis. Comput. Speech Lang. 27, 5 (2013), 1105–1126. Google Scholar
Digital Library
- [209] . 2016. Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks. Neurocomputing 171 (2016), 1323–1334. Google Scholar
Digital Library
- [210] . 2014. Duration modeling by multi-models based on vowel production characteristics. In Proceedings of the 11th International Conference on Natural Language Processing. 39–47.Google Scholar
- [211] Uwe D. Reichel and Hartmut R. Pfitzinger. 2006. Text preprocessing for speech synthesis. In TC-Star Speech to Speech Translation Workshop, Barcelona.Google Scholar
- [212] . 2020. Fastspeech 2: Fast and high-quality end-to-end Text-to-Speech. Retrieved from https://arXiv:2006.04558.Google Scholar
- [213] . 2020. Data-driven parametric text normalization: Rapidly scaling finite-state transduction verbalizers to new languages. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU’20) and Collaboration and Computing for Under-Resourced Languages (CCURL’20). 218–225.Google Scholar
- [214] . 2019. Unified verbalization for speech recognition & synthesis across languages. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). 3530–3534.Google Scholar
Cross Ref
- [215] . 2012. The OpenGrm open-source finite-state grammar software libraries. In Proceedings of the ACL System Demonstrations. 61–66. Google Scholar
Digital Library
- [216] . 2013. Text normalization in concatenative Text-to-Speech synthesis (TTS) for Kannada language. ICECIT.SIT, Tumkur.Google Scholar
- [217] . 2019. Prosody generation for text-to-speech synthesis. https://era.ed.ac.uk/handle/1842/36396.Google Scholar
- [218] . 1958. Dynamic analog speech synthesizer. J. Acoust. Soc. Amer. 30, 3 (1958), 201–209.Google Scholar
Cross Ref
- [219] . 2014. Prominence detection in Hindi: A mathematical perspective. In Proceedings of the International Conference on Computational Science and Computational Intelligence (CSCI’14), Vol. 2. IEEE, 119–124. Google Scholar
Digital Library
- [220] . 2014. Duration modeling in Hindi. Int. J. Comput. Appl. 97, 6 (2014).Google Scholar
- [221] . 2014. Phonetic and phonological interference of English pronunciation by native Bengali (L1-Bengali, L2-English) speakers. In Proceedings of the 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA’14). IEEE, 1–6.Google Scholar
- [222] . 2012. Hindi and Telugu text-to-Speech synthesis (TTS) and inter-language text conversion. Int. J. Sci. Res. Pub. 2, 4 (2012), 1–5.Google Scholar
- [223] . 2013. Text-to-Speech synthesis system for Tamil. Proceedings of the International Conference on Information Systems and Computing (ICISC’13).Google Scholar
- [224] . 2014. Designing prosody rule-set for converting neutral TTS speech to storytelling style speech for Indian languages: Bengali, Hindi, and Telugu. In Proceedings of the 7th International Conference on Contemporary Computing (IC3’14). IEEE, 473–477.Google Scholar
Cross Ref
- [225] . 2019. Word sense disambiguation in Bengali using sense induction. In Proceedings of the International Conference on Applied Machine Learning (ICAML’19). IEEE, 170–174.Google Scholar
Cross Ref
- [226] S. R. Savithri. 1986. Duration of stop consonants in Kannada. JASI 14, 2 (1986), 3–14.Google Scholar
- [227] . 2005. Duration as a Cue for Stress Perception in Kannada. J. Indian Speech Hear. Assoc. 19 (2005), 67.Google Scholar
- [228] . 2020. Writing Systems followed in Indian languages. Retrieved from http://www.acharya.gen.in:8080/linguistics/wrisys.php.Google Scholar
- [229] . 2014. A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation. In Proceedings of the 15th Annual Conference of the International Speech Communication Association.Google Scholar
Cross Ref
- [230] . 2020. Non-attentive tacotron: Robust and controllable neural TTS synthesis including unsupervised duration modeling. Retrieved from https://arXiv:2010.04301.Google Scholar
- [231] . 2018. A novel data independent approach for conversion of hand punched Kannada braille script to text and speech. Int. J. Image Graph. 18, 02 (2018), 1850010.Google Scholar
Cross Ref
- [232] . 2015. Duration modelling using neural networks for Hindi TTS system considering position of syllable in a word. Procedia Comput. Sci. 46 (2015), 60–67.Google Scholar
Cross Ref
- [233] Keshan Sodimana, Pasindu De Silva, Richard Sproat, A. Theeraphol, Chen Fang Li, Alexander Gutkin, Supheak mungkol Sarin, and Knot Pipatsrisawat. 2018. Text normalization for Bangla, Khmer, Nepali, Javanese, Sinhala, and Sundanese TTS systems. In Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU-2018), International Speech Communication Association (ISCA), 29–31 August, Gurugram, India. 147–151.Google Scholar
- [234] . 2017. Text-independent automatic accent identification system for Kannada language. In Proceedings of the International Conference on Data Engineering and Communication Technology. Springer, 411–418.Google Scholar
Cross Ref
- [235] . 2001. Normalization of non-standard words. Comput. Speech Lang. 15, 3 (2001), 287–333. Google Scholar
Digital Library
- [236] . 2012. Clustering of duration patterns in speech for text-to-speech synthesis. In Proceedings of the Annual IEEE India Conference (INDICON’12). IEEE, 1122–1127.Google Scholar
Cross Ref
- [237] . 2021. Alternate endings: Improving prosody for incremental neural TTS with predicted future text input. Retrieved from https://arXiv:2102.09914.Google Scholar
- [238] . 1977. Physics of laryngeal behavior and larynx modes. Phonetica 34, 4 (1977), 264–279.Google Scholar
Cross Ref
- [239] . 1955. Development of a quantitative description of vowel articulation. J. Acoust. Soc. Amer. 27, 3 (1955), 484–493.Google Scholar
Cross Ref
- [240] . 1953. An electrical analog of the vocal tract. J. Acoust. Soc. Amer. 25, 4 (1953), 734–742.Google Scholar
Cross Ref
- [241] . 2016. Mean opinion score (MOS) revisited: Methods and applications, limitations and alternatives. Multimedia Syst. 22, 2 (2016), 213–227. Google Scholar
Digital Library
- [242] . 2015. Development of concatenative syllable-based Text-to-Speech synthesis system for Tamil. In Artificial Intelligence and Evolutionary Algorithms in Engineering Systems. Springer, 585–592.Google Scholar
- [243] . 2016. Performance analysis of Text-to-Speech synthesis system using HMM and prosody features with parsing for Tamil language. Int. Res. J. Eng. Technol. 3, 06 (2016), 2233–2241.Google Scholar
- [244] . 2020. Fully hierarchical fine-grained prosody modeling for interpretable speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 6264–6268.Google Scholar
Cross Ref
- [245] . 2018. Unit selection to improve naturalness in speech synthesis. International J. Appl. Eng. Res. 13, 21 (2018), 15011–15015.Google Scholar
- [246] . 2014. Sequence to sequence learning with neural networks. Retrieved from https://arXiv:1409.3215.Google Scholar
- [247] . 1977. The decision tree classifier: Design and potential. IEEE Trans. Geosci. Electronics 15, 3 (1977), 142–147.Google Scholar
Cross Ref
- [248] . 2009. Text-to-speech Synthesis. Cambridge University Press, Cambridge, UK.Google Scholar
Cross Ref
- [249] . 1999. Edinburgh speech tools library. Syst. Document. Ed. 1 (1999), 1994–1999.Google Scholar
- [250] . 1968. Use of pronouncing dictionary in speech synthesis experiments. In Proceedings of the 6th International Congress on Acoustics, Vol. 2. 155–158.Google Scholar
- [251] . 2016. Phonetic engine for continuous speech in Malayalam. IETE J. Res. 62, 5 (2016), 679–685.Google Scholar
Cross Ref
- [252] . 2012. Analysis of the chaotic nature of speech prosody and music. In Proceedings of the Annual IEEE India Conference (INDICON’12). IEEE, 210–215.Google Scholar
Cross Ref
- [253] . 2015. Non-uniform unit selection using fuzzy ARTMAP for memory based Malayalam TTS. In Proceedings of the IEEE Recent Advances in Intelligent Computational Systems (RAICS’15). IEEE, 218–223.Google Scholar
Cross Ref
- [254] . 2006. Natural sounding TTS based on syllable-like units. In Proceedings of the 14th European Signal Processing Conference. IEEE, 1–5.Google Scholar
- [255] . 1974. The human vocal cords: A mathematical model. Phonetica 29, 1-2 (1974), 1–21.Google Scholar
Cross Ref
- [256] . 2016. Sentence based discourse classification for Hindi story text-to-speech (TTS) system. In Proceedings of the 13th International Conference on Natural Language Processing. 46–54.Google Scholar
- [257] . 2011. Learning with lookahead: Can history-based models rival globally optimized models?. In Proceedings of the 15th Conference on Computational Natural Language Learning. 238–246. Google Scholar
Digital Library
- [258] . 2002. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. Retrieved from https://cs/0212032. Google Scholar
Digital Library
- [259] . 1994. Assignment of segmental duration in text-to-speech synthesis. Comput. Speech Lang. 8, 2 (1994), 95–128.Google Scholar
Cross Ref
- [260] . 2019. Prosodic transformation in vocal emotion conversion for multi-lingual scenarios: A pilot study. Int. J. Speech Technol. 22, 3 (2019), 533–549.Google Scholar
Digital Library
- [261] . 2015. A study on vowel duration in Tamil: Instrumental approach. In Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC’15). IEEE, 1–4.Google Scholar
Cross Ref
- [262] . 2010. Using polysyllabic units for Text-to-Speech synthesis in Indian languages. In Proceedings of the National Conference on Communications (NCC’10). IEEE, 1–5.Google Scholar
Cross Ref
- [263] . 2017. Google’s next-generation real-time unit-selection synthesizer using sequence-to-sequence LSTM-based autoencoders. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17). 1143–1147.Google Scholar
Cross Ref
- [264] . 2006. Acoustical analysis of English vowels produced by Chinese, Dutch, and American speakers.Google Scholar
- [265] . 1993. Tree-based unit selection for English speech synthesis. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP’93), Vol. 2. IEEE, 191–194. Google Scholar
Digital Library
- [266] . 2017. Tacotron: Towards end-to-end speech synthesis. Retrieved from https://arXiv:1703.10135.Google Scholar
- [267] . 2019. Neural network-based modeling of phonetic durations. Retrieved from https://arXiv:1909.03030.Google Scholar
- [268] . 1966. ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM 9, 1 (1966), 36–45. Google Scholar
Digital Library
- [269] . 1986. Comparison of segmental intelligibility and pronunciation accuracy for two commercial text-to-speech systems. In Proceedings of the Applied Voice Input Output Society (AVIOS’86). 235–261.Google Scholar
- [270] . 1997. Homograph disambiguation in text-to-speech synthesis. In Progress in Speech Synthesis. Springer, 157–172.Google Scholar
Cross Ref
- [271] . 2009. Artificial Neural Networks. PHI Learning Pvt. Ltd. Google Scholar
Digital Library
- [272] . 2019. Neural models of text normalization for speech applications. Comput. Linguist. 45, 2 (2019), 293–337. Google Scholar
Digital Library
- [273] . 2018. Generalized cross entropy loss for training deep neural networks with noisy labels. Retrieved from https://arXiv:1805.07836. Google Scholar
Digital Library
- [274] . 2020. Extracting unit embeddings using sequence-to-sequence acoustic models for unit selection speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 7659–7663.Google Scholar
Cross Ref
- [275] . 1949. Human Behaviour and the Principle of Least-effort. Addison-Wesley, Cambridge, MA.Google Scholar
Index Terms
Text-to-Speech Synthesis: Literature Review with an Emphasis on Malayalam Language
Recommendations
Development of syllable-based text to speech synthesis system in Bengali
This paper presents the design and development of unrestricted text to speech synthesis (TTS) system in Bengali language. Unrestricted TTS system is capable to synthesize good quality of speech in different domains. In this work, syllables are used as ...
Speech translation system for english to dravidian languages
In this paper the Speech-to-Speech Translation (SST) system, which is mainly focused on translation from English to Dravidian languages (Tamil and Malayalam) has been proposed. Three major techniques involved in SST system are Automatic continuous ...
Text analysis and language identification for polyglot text-to-speech synthesis
In multilingual countries, text-to-speech synthesis systems often have to deal with texts containing inclusions of multiple other languages in form of phrases, words, or even parts of words. In such multilingual cultural settings, listeners expect a ...






Comments