Abstract
Emotions, the building blocks of the human intellect, play a vital role in Artificial Intelligence (AI). For a robust AI-based machine, it is important that the machine understands human emotions. COVID-19 has introduced the world to no-touch intelligent systems. With an influx of users, it is critical to create devices that can communicate in a local dialect. A multilingual system is required in countries like India, which has a large population and a diverse range of languages. Given the importance of multilingual emotion recognition, this research introduces BERIS, an Indian language emotion detection system. From the Indian sound recording, BERIS estimates both acoustic and textual characteristics. To extract the textual features, we used Multilingual Bidirectional Encoder Representations from Transformers. For acoustics, BERIS computes the Mel Frequency Cepstral Coefficients and Linear Prediction coefficients, and Pitch. The features extracted are merged in a linear array. Since the dialogues are of varied lengths, the data are normalized to have arrays of equal length. Finally, we split the data into training and validated set to construct a predictive model. The model can predict emotions from the new input. On all the datasets presented, quantitative and qualitative evaluations show that the proposed algorithm outperforms state-of-the-art approaches.
- [1] . 1986. The Society of Mind. Simon & Schuster, New York, NY.Google Scholar
Digital Library
- [2] . 2007. The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind. Simon & Schuster, New York, NY.Google Scholar
- [3] . 2020. Meet the smart wife. In The Smart Wife: Why Siri, Alexa, and Other Smart Home Devices Need a Feminist Reboot. MIT Press, 1–22.Google Scholar
- [4] . 2016. Optimal feature extraction and selection techniques for speech processing: A review. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP’16). Melmaruvathur, India, 1669–1673.
DOI: , 2016.Google ScholarCross Ref
- [5] . 2020. BERT-based Chinese relation extraction for public security IEEE Access. 8 (2020), 132367–132375.
DOI: .Google ScholarCross Ref
- [6] . 2020. Tree Framework with BERT word embedding for the recognition of chinese implicit discourse relations. IEEE Access 8 (2020), 162004–162011.
DOI: Google ScholarCross Ref
- [7] . 2020. Fine-tuning BERT for multi-label emotion analysis in unbalanced code-switching text IEEE Access 8 (2020), 193248–193256.
DOI: Google ScholarCross Ref
- [8] . 2020. Fret: Functional reinforced transformer with BERT for code summarization IEEE Access 8 (2020), 135591–135604.
DOI: Google ScholarCross Ref
- [9] . 2020. COVID-19 sensing: Negative emotion analysis on social media in China via BERT model. IEEE Access 8 (2020), 138162–138169.
DOI: Google ScholarCross Ref
- [10] 2020. Corrections to ‘Enhancing BERT representation with context-aware embedding for aspect-based emotion analysis’. IEEE Access 8 (2020), 128042–128042.
DOI: Google ScholarCross Ref
- [11] . 2020. A two-stage transformer-based approach for variable-length abstractive summarization. IEEE/ACM Trans. Aud. Speech Lang. Process. 28 (2020), 2061–2072.
DOI: Google ScholarDigital Library
- [12] . 2020. Pretraining financial text encoder enhanced by lifelong learning. IEEE Access 8 (2020), 184036–184044.
DOI: Google ScholarCross Ref
- [13] . 2020. Cross-lingual passage re-ranking with alignment augmented multilingual BERT. IEEE Access 8 (2020), 213232–213243.
DOI: Google ScholarCross Ref
- [14] . 2020. A neural relation extraction model for distant supervision in counter-terrorism scenario. IEEE Access 8 (2020), 225088–225096.
DOI: Google ScholarCross Ref
- [15] 2020. Banner: A cost-sensitive contextualized model for bangla named entity recognition. IEEE Access 8 (2020), 58206–58226.
DOI: Google ScholarCross Ref
- [16] . 2020. SUDIR: An approach of sensing urban text data from internet resources based on deep learning. IEEE Access 8 (2020), 214454–214468.
DOI: Google ScholarCross Ref
- [17] . 2020. A topic information fusion and semantic relevance for text summarization. IEEE Access 8 (2020), 178946–178953.
DOI: Google ScholarCross Ref
- [18] . 2020. Aggregating customer review attributes for online reputation generation. IEEE Access 8 (2020), 96550–96564.
DOI: Google ScholarCross Ref
- [19] . 2010. Feature extraction for a speech recognition system in noisy environment: A study. In Proceedings of the 2nd International Conference on Computer Engineering and Applications. 358–361.
DOI: Google ScholarDigital Library
- [20] . 2012. Modification of the speech feature extraction module for the improvement of the system for automatic lectures transcription. In Proceedings of the International Symposium on Electronics in Marine (ELMAR’12). 223–226.Google Scholar
- [21] . 2013. Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Trans. Emerg. Top. Comput. 1, 2 (December 2013), 244–257.
DOI: Google ScholarCross Ref
- [22] . 2016. Optimal feature extraction and selection techniques for speech processing: A review. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP’16). 1669–1673.
DOI: Google ScholarCross Ref
- [23] . 2020. Adding dimensional features for emotion recognition on speech. In Proceedings of the 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP’20). 1–6.
DOI: Google ScholarCross Ref
- [24] . 2019. Incorporating end-to-end speech recognition models for emotion analysis. In Proceedings of the International Conference on Robotics and Automation (ICRA’19). 7976–7982.
DOI: Google ScholarDigital Library
- [25] . 2020. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access 8 (2020), 79861–79875.
DOI: Google ScholarCross Ref
- [26] . 2016. A comparative study of noise reduction techniques for automatic speech recognition systems. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’16). 2098–2103.
DOI: Google ScholarCross Ref
- [27] . 2011. Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment. In Proceedings of the 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD’11). 1–8.
DOI: Google ScholarCross Ref
- [28] S. A. Alim and N. K. A. Rashid. 2018. Some commonly used speech feature extraction algorithms. In From Natural to Artificial Intelligence - Algorithms and Applications. IntechOpen, London, United Kingdom. [Online]. https://www.intechopen.com/chapters/63970 doi: .Google Scholar
Cross Ref
- [29] . 2016. A review on feature extraction techniques for speech processing. Int. J. Eng. Comput. Sci. 5, 10 (2016), 18551–18556.Google Scholar
- [30] . 1975. Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57, S1 (1975), S35–S35.Google Scholar
Cross Ref
- [31] . 2018. Spoken Indian language identification: A review of features and databases. Sādhanā 43, 4 (April 2018), 53.
DOI: Google ScholarCross Ref
- [32] 2020. EEG-based emotion classification using spiking neural networks. IEEE Access 8 (2020), 46007–46016.
DOI: Google ScholarCross Ref
- [33] . 2021. Data augmentation versus noise compensation for x-vector speaker recognition systems in noisy environments. In Proceedings of the 28th European Signal Processing Conference (EUSIPCO’21). Amsterdam, Netherlands, 1–5.
DOI: Google ScholarCross Ref
- [34] . 2019. Speech enhancement using generative adversarial network by distilling knowledge from statistical method. Appl. Sci. 9, 16 (August 2019), 3396.
DOI: Google ScholarCross Ref
- [35] . 2017. An image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM Multimedia Conference (MM’17). 478–484.Google Scholar
Digital Library
- [36] . 2020. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access 8 (2020), 79861–79875.
DOI: Google ScholarCross Ref
- [37] . 2019. Insights into LSTM fully convolutional networks for time series classification. IEEE Access 7 (2019), 67718–67725.Google Scholar
Cross Ref
- [38] . 2018. Supervised speech separation based on deep learning: An overview. IEEE/ACM Trans. Audio Speech Lang. Process. 26, 10 (October 2018), 1702–1726.
DOI: Google ScholarDigital Library
- [39] . 2019. Speech enhancement using generative adversarial network by distilling knowledge from statistical method. Appl. Sci. 9, 16 (August 2019), 3396.
DOI: Google ScholarCross Ref
- [40] . 2016. Audio-visual speech recognition integrating 3D lip information obtained from the Kinect. Multimedia Syst. 22, 3 (June 2016), 315–323.
DOI: Google ScholarDigital Library
- [41] . 2020. Sub-band knowledge distillation framework for speech enhancement. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech’20). 2687–2691.
DOI: Google ScholarCross Ref
- [42] . 2019. Snapshot distillation: Teacher-student optimization in one generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 2854–2863.Google Scholar
Cross Ref
- [43] . 2019. Using bidirectional LSTM with BERT for Chinese punctuation prediction. In Proceedings of the IEEE International Conference on Signal, Information and Data Processing (ICSIDP’19). 1–5.
DOI: Google ScholarCross Ref
- [44] . 2013. The effects of native language on Indian English sounds and timing patterns. J. Phonet. 41, 6 (2013), 393–406.
DOI: Google ScholarCross Ref
- [45] 2015. Automatic pitch accent contour transcription for Indian languages. In Proceedings of the International Conference on Computer, Communication and Control (IC4’15). 1–6.
DOI: Google ScholarCross Ref
- [46] . 2016. Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int. J. Speech Technol. 19, 1 (2016), 75–85.
DOI: Google ScholarDigital Library
- [47] . 2019. Incorporating finer acoustic phonetic features in lexicon for Hindi language speech recognition. J. Inf. Optim. Sci. 40, 8 (November 2019), 1731–1739.
DOI: Google ScholarCross Ref
- [48] . 2018. Machine identification of spoken Indian languages. https://www.iitg.ac.in/clst/visitors/samudravijaya/publ/18wespac_langId_TN_AS_BN_babel.pdf.Google Scholar
- [49] . 2015. Development of an HMM-based speech synthesis system for Indian English language. In Proceedings of the International Symposium on Advanced Computing and Communication (ISACC’15). 124–127.
DOI: Google ScholarCross Ref
- [50] . 2016. Machine learning paradigms for speech recognition of an Indian dialect. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP’16). 0780–0786.
DOI: Google ScholarCross Ref
- [51] . 2016. Neural network based gujarati speech recognition for dataset collected by in-ear microphone. Proc. Comput. Sci. 93 (2016), 668–675.
DOI: Google ScholarCross Ref
- [52] . 2017. Vocal tract length normalization and sub-band spectral subtraction based robust assamese vowel recognition system. In Proceedings of the International Conference on Computing Methodologies and Communication (ICCMC’17). 32–35.
DOI: Google ScholarCross Ref
- [53] . 2018. ISI ASR system for the low resource speech recognition challenge for Indian languages. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH’18). 3207–3211.Google Scholar
- [54] . 2019. Speech enhancement for punjabi language using deep neural network. In Proceedings of the International Conference on Signal Processing and Communication (ICSC’19). 202–204.
DOI: Google ScholarCross Ref
- [55] 2018. BUT system for low resource indian language ASR. In Proceedings of the. Annual Conference of the International Speech Communication Association (Interspeech’18). 3182–3186.
DOI: Google ScholarCross Ref
- [56] . 2020. A hybrid meta-heuristic feature selection method for identification of indian spoken languages from audio signals. IEEE Access 8 (2020), 181432–181449.
DOI: Google ScholarCross Ref
- [57] . 2015. Emotional Hindi speech: Feature extraction and classification. In Proceedings of the 2nd International Conference on Computing for Sustainable Global Development (INDIACom’15). 1865–1868.Google Scholar
- [58] . 2020. Speech emotion recognition of Hindi speech using statistical and machine learning techniques. J. Interdiscipl. Math. 23 (2020), 311–319.
DOI: Google ScholarCross Ref
- [59] . 2020. Speech emotion recognition with deep learning. Proc. Comput. Sci. 176 (2020), 251–260.
DOI: Google ScholarCross Ref
- [60] . 2020. A hybrid machine learning model for emotion recognition from speech signals. In Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC’20). 491–496.
DOI: Google ScholarCross Ref
- [61] . 2015. Emotion recognition from isolated marathi speech using energy and formants. Int. J. Comput. Appl. 125, 2 (2015).
DOI: https://doi.org/10.1.1.695.8629Google Scholar - [62] . 2020. Speech emotion recognition for tamil language speakers. 125–136.
DOI: Google ScholarCross Ref
- [63] . 2016. Performance analysis of malayalam language speech emotion recognition system using ANN/SVM. 2016. Proc. Technol. 24 (2016), 1097–1104.
DOI: Google ScholarCross Ref
- [64] . 2017. Development of speech emotion recognition system using deep belief networks in Malayalam language. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’17).Google Scholar
- [65] . 2017. Development of speech emotion recognition system using deep belief networks in malayalam language. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’17). 676–680.
DOI: Google ScholarCross Ref
Index Terms
BERIS: An mBERT-based Emotion Recognition Algorithm from Indian Speech
Recommendations
ERIL: An Algorithm for Emotion Recognition from Indian Languages Using Machine Learning
AbstractIt is critical for a computer to understand the speaker’s mood during a human–machine conversation. Until now, we’ve only used neutral phrases or utterances to train robots. A person’s mood affects their performance. Machines have a hard time ...
SVM based GMM supervector speaker recognition using LP residual signal
ICISP'12: Proceedings of the 5th international conference on Image and Signal ProcessingFeature extraction is an important step for speaker recognition systems. In this paper, we generated MFCC (Mel Frequency Cepstral Coefficients) and LPCC (Linear Predictive Cepstral Coefficients) from LP residual of speech signal, instead their ...
Pitch in Speaker Recognition
HIS '09: Proceedings of the 2009 Ninth International Conference on Hybrid Intelligent Systems - Volume 01In order to improve the speaker recognition accuracy, the pitch is applied to GMM-based speaker recognition (SR). The circular average magnitude difference function (CAMDF) method is used to extract the pitch. An endpoint detection method based on the ...






Comments