skip to main content
research-article

BERIS: An mBERT-based Emotion Recognition Algorithm from Indian Speech

Published:29 April 2022Publication History
Skip Abstract Section

Abstract

Emotions, the building blocks of the human intellect, play a vital role in Artificial Intelligence (AI). For a robust AI-based machine, it is important that the machine understands human emotions. COVID-19 has introduced the world to no-touch intelligent systems. With an influx of users, it is critical to create devices that can communicate in a local dialect. A multilingual system is required in countries like India, which has a large population and a diverse range of languages. Given the importance of multilingual emotion recognition, this research introduces BERIS, an Indian language emotion detection system. From the Indian sound recording, BERIS estimates both acoustic and textual characteristics. To extract the textual features, we used Multilingual Bidirectional Encoder Representations from Transformers. For acoustics, BERIS computes the Mel Frequency Cepstral Coefficients and Linear Prediction coefficients, and Pitch. The features extracted are merged in a linear array. Since the dialogues are of varied lengths, the data are normalized to have arrays of equal length. Finally, we split the data into training and validated set to construct a predictive model. The model can predict emotions from the new input. On all the datasets presented, quantitative and qualitative evaluations show that the proposed algorithm outperforms state-of-the-art approaches.

REFERENCES

  1. [1] Minsky M. L.. 1986. The Society of Mind. Simon & Schuster, New York, NY.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [2] Minsky M. L.. 2007. The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind. Simon & Schuster, New York, NY.Google ScholarGoogle Scholar
  3. [3] Strengers Yolande and Kennedy Jenny. 2020. Meet the smart wife. In The Smart Wife: Why Siri, Alexa, and Other Smart Home Devices Need a Feminist Reboot. MIT Press, 122.Google ScholarGoogle Scholar
  4. [4] Chadha A. N., Zaveri M. A., and Sarvaiya J. N.. 2016. Optimal feature extraction and selection techniques for speech processing: A review. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP’16). Melmaruvathur, India, 16691673. DOI:, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Hou J., Li X., Yao H., Sun H., Mai T., and Zhu R.. 2020. BERT-based Chinese relation extraction for public security IEEE Access. 8 (2020), 132367132375. DOI:.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Jiang D. and He J.. 2020. Tree Framework with BERT word embedding for the recognition of chinese implicit discourse relations. IEEE Access 8 (2020), 162004162011. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  7. [7] Tang T., Tang X., and Yuan T.. 2020. Fine-tuning BERT for multi-label emotion analysis in unbalanced code-switching text IEEE Access 8 (2020), 193248193256. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Wang R., Zhang H., Lu G., Lyu L., and Lyu C.. 2020. Fret: Functional reinforced transformer with BERT for code summarization IEEE Access 8 (2020), 135591135604. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Wang T., Lu K., Chow K. P., and Zhu Q.. 2020. COVID-19 sensing: Negative emotion analysis on social media in China via BERT model. IEEE Access 8 (2020), 138162138169. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Li X. et al. 2020. Corrections to ‘Enhancing BERT representation with context-aware embedding for aspect-based emotion analysis’. IEEE Access 8 (2020), 128042128042. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Su M.-H., Wu C.-H., and Cheng H.-T.. 2020. A two-stage transformer-based approach for variable-length abstractive summarization. IEEE/ACM Trans. Aud. Speech Lang. Process. 28 (2020), 20612072. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Liu Z., Huang D., and Huang K.. 2020. Pretraining financial text encoder enhanced by lifelong learning. IEEE Access 8 (2020), 184036184044. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Chen D., Zhang S., Zhang X., and Yang K.. 2020. Cross-lingual passage re-ranking with alignment augmented multilingual BERT. IEEE Access 8 (2020), 213232213243. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Hou J., Li X., Zhu R., Zhu C., Wei Z., and Zhang C.. 2020. A neural relation extraction model for distant supervision in counter-terrorism scenario. IEEE Access 8 (2020), 225088225096. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Ashrafi I. et al. 2020. Banner: A cost-sensitive contextualized model for bangla named entity recognition. IEEE Access 8 (2020), 5820658226. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Zhou C., Zhao J., and Ren C.. 2020. SUDIR: An approach of sensing urban text data from internet resources based on deep learning. IEEE Access 8 (2020), 214454214468. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] You F., Zhao S., and Chen J.. 2020. A topic information fusion and semantic relevance for text summarization. IEEE Access 8 (2020), 178946178953. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Benlahbib A., and Nfaoui E. H.. 2020. Aggregating customer review attributes for online reputation generation. IEEE Access 8 (2020), 9655096564. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Shrawankar U., and Thakare V.. 2010. Feature extraction for a speech recognition system in noisy environment: A study. In Proceedings of the 2nd International Conference on Computer Engineering and Applications. 358361. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Chaloupka J., Červa P., Silovský J., Žd'ánský J., and Nouza J.. 2012. Modification of the speech feature extraction module for the improvement of the system for automatic lectures transcription. In Proceedings of the International Symposium on Electronics in Marine (ELMAR’12). 223226.Google ScholarGoogle Scholar
  21. [21] Bisio I., Delfino A., Lavagetto F., Marchese M., and Sciarrone A.. 2013. Gender-driven emotion recognition through speech signals for ambient intelligence applications. IEEE Trans. Emerg. Top. Comput. 1, 2 (December 2013), 244257. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Chadha A. N., Zaveri M. A., and Sarvaiya J. N.. 2016. Optimal feature extraction and selection techniques for speech processing: A review. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP’16). 16691673. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Letaifa L. B., Torres M. I., and Justo R.. 2020. Adding dimensional features for emotion recognition on speech. In Proceedings of the 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP’20). 16. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Lakomkin E., Zamani M. A., Weber C., Magg S., and Wermter S.. 2019. Incorporating end-to-end speech recognition models for emotion analysis. In Proceedings of the International Conference on Robotics and Automation (ICRA’19). 79767982. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] M. Sajjad Mustaqeem, and Kwon S.. 2020. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access 8 (2020), 7986179875. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Garg K., and Jain G.. 2016. A comparative study of noise reduction techniques for automatic speech recognition systems. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’16). 20982103. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Sárosi G., Mozsáry M., Mihajlik P., and Fegyó T.. 2011. Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment. In Proceedings of the 6th Conference on Speech Technology and Human-Computer Dialogue (SpeD’11). 18. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] S. A. Alim and N. K. A. Rashid. 2018. Some commonly used speech feature extraction algorithms. In From Natural to Artificial Intelligence - Algorithms and Applications. IntechOpen, London, United Kingdom. [Online]. https://www.intechopen.com/chapters/63970 doi: .Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Gill A. S.. 2016. A review on feature extraction techniques for speech processing. Int. J. Eng. Comput. Sci. 5, 10 (2016), 1855118556.Google ScholarGoogle Scholar
  30. [30] Itakura F.. 1975. Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57, S1 (1975), S35S35.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Aarti B., and Kopparapu S. K.. 2018. Spoken Indian language identification: A review of features and databases. Sādhanā 43, 4 (April 2018), 53. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Luo Y. et al. 2020. EEG-based emotion classification using spiking neural networks. IEEE Access 8 (2020), 4600746016. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Mohammad Amini M., and Matrouf D.. 2021. Data augmentation versus noise compensation for x-vector speaker recognition systems in noisy environments. In Proceedings of the 28th European Signal Processing Conference (EUSIPCO’21). Amsterdam, Netherlands, 15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Wu J., Hua Y., Yang S., Qin H., and Qin H.. 2019. Speech enhancement using generative adversarial network by distilling knowledge from statistical method. Appl. Sci. 9, 16 (August 2019), 3396. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Cummins N., Amiriparian S., Hagerer G., Batliner A., Steidl S., and Schuller B. W.. 2017. An image-based deep spectrum feature representation for the recognition of emotional speech. In Proceedings of the 25th ACM Multimedia Conference (MM’17). 478484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] M. Sajjad Mustaqeem, and Kwon S.. 2020. Clustering-based speech emotion recognition by incorporating learned features and deep BiLSTM. IEEE Access 8 (2020), 7986179875. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Karim F., Majumdar S., and Darabi H.. 2019. Insights into LSTM fully convolutional networks for time series classification. IEEE Access 7 (2019), 6771867725.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wang D., and Chen J.. 2018. Supervised speech separation based on deep learning: An overview. IEEE/ACM Trans. Audio Speech Lang. Process. 26, 10 (October 2018), 17021726. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Wu J., Hua Y., Yang S., Qin H., and Qin H.. 2019. Speech enhancement using generative adversarial network by distilling knowledge from statistical method. Appl. Sci. 9, 16 (August 2019), 3396. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Wang J., Zhang J., Honda K., Wei J., and Dang J.. 2016. Audio-visual speech recognition integrating 3D lip information obtained from the Kinect. Multimedia Syst. 22, 3 (June 2016), 315323. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Hao X., Wen S., Su X., Liu Y., Gao G., and Li X.. 2020. Sub-band knowledge distillation framework for speech enhancement. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech’20). 26872691. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Yang C., Xie L., Su C., and Yuille A. L.. 2019. Snapshot distillation: Teacher-student optimization in one generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 28542863.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Fang M., Zhao H., Song X., Wang X., and Huang S.. 2019. Using bidirectional LSTM with BERT for Chinese punctuation prediction. In Proceedings of the IEEE International Conference on Signal, Information and Data Processing (ICSIDP’19). 15. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Sirsa H., and Redford M. A.. 2013. The effects of native language on Indian English sounds and timing patterns. J. Phonet. 41, 6 (2013), 393406. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Reddy M. G. et al. 2015. Automatic pitch accent contour transcription for Indian languages. In Proceedings of the International Conference on Computer, Communication and Control (IC4’15). 16. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Polasi P. K., and Sri Rama Krishna K.. 2016. Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise. Int. J. Speech Technol. 19, 1 (2016), 7585. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Patil A., More P., and Sasikumar M.. 2019. Incorporating finer acoustic phonetic features in lexicon for Hindi language speech recognition. J. Inf. Optim. Sci. 40, 8 (November 2019), 17311739. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Nath S., Chakraborty J., and Sarmah P.. 2018. Machine identification of spoken Indian languages. https://www.iitg.ac.in/clst/visitors/samudravijaya/publ/18wespac_langId_TN_AS_BN_babel.pdf.Google ScholarGoogle Scholar
  49. [49] Mullah H. U., Pyrtuh F., and Singh L. J.. 2015. Development of an HMM-based speech synthesis system for Indian English language. In Proceedings of the International Symposium on Advanced Computing and Communication (ISACC’15). 124127. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  50. [50] Londhe N. D., Ahirwal M. K., and Lodha P.. 2016. Machine learning paradigms for speech recognition of an Indian dialect. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP’16). 07800786. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Vijayendra A. Desai, and Thakar V. K.. 2016. Neural network based gujarati speech recognition for dataset collected by in-ear microphone. Proc. Comput. Sci. 93 (2016), 668675. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Gogoi S., and Bhattacharjee U.. 2017. Vocal tract length normalization and sub-band spectral subtraction based robust assamese vowel recognition system. In Proceedings of the International Conference on Computing Methodologies and Communication (ICCMC’17). 3235. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Billa J.. 2018. ISI ASR system for the low resource speech recognition challenge for Indian languages. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH’18). 32073211.Google ScholarGoogle Scholar
  54. [54] Singh J., and Kaur K.. 2019. Speech enhancement for punjabi language using deep neural network. In Proceedings of the International Conference on Signal Processing and Communication (ICSC’19). 202204. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Pulugundla B. et al. 2018. BUT system for low resource indian language ASR. In Proceedings of the. Annual Conference of the International Speech Communication Association (Interspeech’18). 31823186. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  56. [56] Das A., Guha S., Singh P. K., Ahmadian A., Senu N., and Sarkar R.. 2020. A hybrid meta-heuristic feature selection method for identification of indian spoken languages from audio signals. IEEE Access 8 (2020), 181432181449. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Bansal S., and Dev A.. 2015. Emotional Hindi speech: Feature extraction and classification. In Proceedings of the 2nd International Conference on Computing for Sustainable Global Development (INDIACom’15). 18651868.Google ScholarGoogle Scholar
  58. [58] Agrawal Akshat, and Jain Anurag. 2020. Speech emotion recognition of Hindi speech using statistical and machine learning techniques. J. Interdiscipl. Math. 23 (2020), 311319. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Aouani H., and Ayed Y. B.. 2020. Speech emotion recognition with deep learning. Proc. Comput. Sci. 176 (2020), 251260. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Bharti D., and Kukana P.. 2020. A hybrid machine learning model for emotion recognition from speech signals. In Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC’20). 491496. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Gadhe R. P., Babasaheb D., Deshmukh R. R., and Babasaheb D.. 2015. Emotion recognition from isolated marathi speech using energy and formants. Int. J. Comput. Appl. 125, 2 (2015). DOI: https://doi.org/10.1.1.695.8629Google ScholarGoogle Scholar
  62. [62] Sowmya V., and Rajeswari A.. 2020. Speech emotion recognition for tamil language speakers. 125136. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Rajisha T. M., Prabhakaran Sunija, and Riyas K. S.. 2016. Performance analysis of malayalam language speech emotion recognition system using ANN/SVM. 2016. Proc. Technol. 24 (2016), 10971104. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Govind D., Chandran A., and Pravena D.. 2017. Development of speech emotion recognition system using deep belief networks in Malayalam language. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’17).Google ScholarGoogle Scholar
  65. [65] Chandran A., Pravena D., and Govind D.. 2017. Development of speech emotion recognition system using deep belief networks in malayalam language. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (ICACCI’17). 676680. DOI:Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. BERIS: An mBERT-based Emotion Recognition Algorithm from Indian Speech

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
                    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 5
                    September 2022
                    486 pages
                    ISSN:2375-4699
                    EISSN:2375-4702
                    DOI:10.1145/3533669
                    Issue’s Table of Contents

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 29 April 2022
                    • Online AM: 23 March 2022
                    • Accepted: 1 February 2022
                    • Revised: 1 December 2021
                    • Received: 1 July 2021
                    Published in tallip Volume 21, Issue 5

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article
                    • Refereed
                  • Article Metrics

                    • Downloads (Last 12 months)189
                    • Downloads (Last 6 weeks)4

                    Other Metrics

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader

                  Full Text

                  View this article in Full Text.

                  View Full Text

                  HTML Format

                  View this article in HTML Format .

                  View HTML Format
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!