skip to main content
research-article

Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

Published:20 December 2021Publication History
Skip Abstract Section

Abstract

The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.

REFERENCES

  1. [1] [n.d.]. dynaEdge DE-100. Retrieved from https://asia.dynabook.com/laptop/dynaedge-de100/overview.php.Google ScholarGoogle Scholar
  2. [2] Alex Starlet Ben, Babu Ben P., and Mary Leena. 2018. Utterance and syllable level prosodic features for automatic emotion recognition. In IEEE International Conference on Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 3135.Google ScholarGoogle Scholar
  3. [3] Altun Halis and Polat Gökhan. 2007. New frameworks to boost feature selection algorithms in emotion detection for improved human-computer interaction. In International Symposium on Brain, Vision, and Artificial Intelligence. Springer, 533541. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Bahreini Kiavash, Nadolski Rob, and Westera Wim. 2016. Towards real-time speech emotion recognition for affective e-learning. Educ. Inf. Technol. 21, 5 (2016), 13671386.Google ScholarGoogle ScholarCross RefCross Ref
  5. [5] Barros Pablo and Wermter Stefan. 2016. Developing crossmodal expression recognition based on a deep neural model. Adapt.Behav. 24, 5 (2016), 373396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Beard Rory, Das Ritwik, Ng Raymond W. M., Gopalakrishnan P. G. Keerthana, Eerens Luka, Swietojanski Pawel, and Miksik Ondrej. 2018. Multi-modal sequence fusion via recursive attention for emotion recognition. In 22nd Conference on Computational Natural Language Learning. 251259.Google ScholarGoogle Scholar
  7. [7] Bennett Paul N. and Nguyen Nam. 2009. Refined experts: Improving classification in large taxonomies. In 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Burkhardt Felix, Paeschke Astrid, Rolfes Miriam, Sendlmeier Walter F., and Weiss Benjamin. 2005. A database of German emotional speech. In 9th European Conference on Speech Communication and Technology.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Cai Ru Ying, Richdale Amanda L., Dissanayake Cheryl, and Uljarević Mirko. 2018. Brief report: Inter-relationship between emotion regulation, intolerance of uncertainty, anxiety, and depression in youth with autism spectrum disorder. J. Autism Devel. Disord. 48, 1 (2018), 316325.Google ScholarGoogle Scholar
  10. [10] Cao Houwei, Cooper David G., Keutmann Michael K., Gur Ruben C., Nenkova Ani, and Verma Ragini. 2014. CREMA-D: Crowd-sourced emotional multimodal actors dataset. IEEE Trans. Affect. Comput. 5, 4 (2014), 377390.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Castillo José Carlos, Castro-González Álvaro, Alonso-Martín Fernándo, Fernández-Caballero Antonio, and Salichs Miguel Ángel. 2018. Emotion detection and regulation from personal assistant robot in smart environment. In Personal Assistants: Emerging Computational Technologies. Springer, 179195.Google ScholarGoogle Scholar
  12. [12] Cen Ling, Wu Fei, Yu Zhu Liang, and Hu Fengye. 2016. A real-time speech emotion recognition system and its application in online learning. In Emotions, Technology, Design, and Learning. Elsevier, 2746.Google ScholarGoogle Scholar
  13. [13] Chatterjee Rajdeep, Mazumdar Saptarshi, Sherratt R. Simon, Halder Rohit, Maitra Tanmoy, and Giri Debasis. 2021. Real-time speech emotion analysis for smart home assistants.IEEE Trans. Consum. Electron. 67, 1 (2021), 6876.Google ScholarGoogle Scholar
  14. [14] Chen Zeya, Ahmed Mohsin Y., Salekin Asif, and Stankovic John A.. 2019. ARASID: Artificial reverberation-adjusted indoor speaker identification dealing with variable distances. In International Conference on Embedded Wireless Systems and Networks (EWSN’19). Junction Publishing, 154165. Retrieved from http://dl.acm.org/citation.cfm?id=3324320.3324339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Cheng Ming, Friesen Andrew, and Adekola Olalekan. 2019. Using emotion regulation to cope with challenges: A study of Chinese students in the United Kingdom. Cambr. J. Educ. 49, 2 (2019), 133145.Google ScholarGoogle Scholar
  16. [16] Choudhury Akash Roy, Ghosh Anik, Pandey Rahul, and Barman Subhas. 2018. Emotion recognition from speech signals using excitation source and spectral features. In IEEE Applied Signal Processing Conference (ASPCON’18). IEEE, 257261.Google ScholarGoogle Scholar
  17. [17] Danisman Taner and Alpkocak Adil. 2008. Emotion classification of audio signals using ensemble of support vector machines. In International Tutorial and Research Workshop on Perception and Interactive Technologies for Speech-based Systems. Springer, 205216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Datcu Dragos and Rothkrantz Léon J. M.. 2005. Facial expression recognition with relevance vector machines. In IEEE International Conference on Multimedia and Expo. IEEE, 193196.Google ScholarGoogle Scholar
  19. [19] Deng Jun, Xu Xinzhou, Zhang Zixing, Frühholz Sascha, and Schuller Björn. 2017. Universum autoencoder-based domain adaptation for speech emotion recognition. IEEE Sig. Proc. Lett. 24, 4 (2017), 500504.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Dickerson Robert F., Hoque Enamul, Asare Philip, Nirjon Shahriar, and Stankovic John A.. 2014. Resonate: Reverberation environment simulation for improved classification of speech models. In 13th International Symposium on Information Processing in Sensor Networks. IEEE, 107117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Dupuis Kate and Pichora-Fuller M. Kathleen. 2010. Toronto Emotional Speech Set (TESS). University of Toronto, Psychology Department.Google ScholarGoogle Scholar
  22. [22] Eyben Florian, Scherer Klaus R., Schuller Björn W., Sundberg Johan, André Elisabeth, Busso Carlos, Devillers Laurence Y., Epps Julien, Laukka Petri, Narayanan Shrikanth S., et al. 2015. The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7, 2 (2015), 190202.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Fayek Haytham M., Lech Margaret, and Cavedon Lawrence. 2015. Towards real-time speech emotion recognition using deep neural networks. In 9th International Conference on Signal Processing and Communication Systems (ICSPCS’15). IEEE, 15.Google ScholarGoogle Scholar
  24. [24] Fernandes V., Mascarehnas L., Mendonca C., Johnson A., and Mishra R.. 2018. Speech emotion recognition using mel frequency cepstral coefficient and SVM classifier. In International Conference on System Modeling & Advancement in Research Trends (SMART’18). IEEE, 200204.Google ScholarGoogle Scholar
  25. [25] Fernández-Caballero Antonio, Martínez-Rodrigo Arturo, Pastor José Manuel, Castillo José Carlos, Lozano-Monasor Elena, López María T., Zangróniz Roberto, Latorre José Miguel, and Fernández-Sotos Alicia. 2016. Smart environment architecture for emotion detection and regulation. J. Biomed. Inf. 64 (2016), 5573. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Gao Ye, Ma Meiyi, Gordon Kristina, Rose Karen, Wang Hongning, and Stankovic John. 2020. A monitoring, modeling, and interactive recommendation system for in-home caregivers: Demo abstract. In 18th Conference on Embedded Networked Sensor Systems. 587588. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Gaugler Joseph, James Bryan, Johnson Tricia, Marin Allison, and Weuve Jennifer. 2019. 2019 Alzheimer’s disease facts and figures. Alzh. Dement. 15, 3 (2019), 321387.Google ScholarGoogle Scholar
  28. [28] Ghaleb Esam, Popa Mirela, and Asteriadis Stylianos. 2019. Multimodal and temporal perception of audio-visual cues for emotion recognition. In 8th International Conference on Affective Computing & Intelligent Interaction (ACII’19).Google ScholarGoogle Scholar
  29. [29] Goodfellow Ian, Bengio Yoshua, and Courville Aaron. 2016. Deep Learning. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Grekow Jacek. 2018. Music emotion maps in the arousal-valence space. In From Content-based Music Emotion Recognition to Emotion Maps of Musical Pieces. Springer, 95106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Gross James J. and Muñoz Ricardo F.. 1995. Emotion regulation and mental health. Clin. Psychol.: Sci. Pract. 2, 2 (1995), 151164.Google ScholarGoogle Scholar
  32. [32] Gross James J., Uusberg Helen, and Uusberg Andero. 2019. Mental illness and well-being: An affect regulation perspective. World Psychiat. 18, 2 (2019), 130139.Google ScholarGoogle Scholar
  33. [33] Haq S. and Jackson P. J. B.. 2010. Machine Audition: Principles, Algorithms and Systems. IGI Global, Hershey PA, 398423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Hendrycks Dan and Gimpel Kevin. 2016. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016).Google ScholarGoogle Scholar
  35. [35] Huang Che-Wei and Narayanan Shrikanth. 2018. Stochastic shake-shake regularization for affective learning from speech. In Interspeech Conference. 36583662.Google ScholarGoogle Scholar
  36. [36] Jalili Amin, Sahami Sadid, Chi Chong-Yung, and Amirfattahi Rassoul. 2018. Speech emotion recognition using cyclostationary spectral analysis. In IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP’18). IEEE, 16.Google ScholarGoogle Scholar
  37. [37] Lech Margaret, Stolar Melissa, Best Christopher, and Bolia Robert. 2020. Real-time speech emotion recognition using a pre-trained image classification network: Effects of bandwidth reduction and companding. Front. Comput. Sci. 2 (2020), 14.Google ScholarGoogle Scholar
  38. [38] Lee Kimin, Lee Kibok, Lee Honglak, and Shin Jinwoo. 2018. A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In International Conference on Advances in Neural Information Processing Systems. 71677177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Liang Shiyu, Li Yixuan, and Srikant Rayadurgam. 2017. Enhancing the reliability of out-of-distribution image detection in neural networks. arXiv preprint arXiv:1706.02690 (2017).Google ScholarGoogle Scholar
  40. [40] Livingstone Steven R. and Russo Frank A.. 2018. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PloS One 13, 5 (2018), e0196391.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Lugger Marko and Yang Bin. 2007. An incremental analysis of different feature groups in speaker independent emotion recognition. In 16th International Congress of Phonetic Sciences.Google ScholarGoogle Scholar
  42. [42] Mahalanobis Prasanta Chandra. 1936. On the generalized distance in statistics. National Institute of Science of India.Google ScholarGoogle Scholar
  43. [43] Mano Leandro Y.. 2018. Emotional condition in the Health Smart Homes environment: Emotion recognition using ensemble of classifiers. In Innovations in Intelligent Systems and Applications (INISTA’18). IEEE, 18.Google ScholarGoogle Scholar
  44. [44] Mesaros Annamaria, Heittola Toni, and Virtanen Tuomas. 2016. TUT database for acoustic scene classification and sound event detection. In 24th European Signal Processing Conference (EUSIPCO’16). IEEE, 11281132.Google ScholarGoogle Scholar
  45. [45] Nagrani Arsha, Chung Joon Son, and Zisserman Andrew. 2017. VoxCeleb: A large-scale speaker identification dataset. arXiv preprint arXiv:1706.08612 (2017).Google ScholarGoogle Scholar
  46. [46] Noroozi Fatemeh, Marjanovic Marina, Njegus Angelina, Escalera Sergio, and Anbarjafari Gholamreza. 2017. Audio-visual emotion recognition in video clips. IEEE Trans. Affect. Comput. 10, 1 (2017), 6075. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Salekin Asif, Chen Zeya, Ahmed Mohsin Y., Lach John, Metz Donna, Haye Kayla De La, Bell Brooke, and Stankovic John A.. 2017. Distant emotion recognition. Proc. ACM Interact., Mob., Wear. Ubiq. Technol. 1, 3 (2017), 96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Shchetinin Eugene Yu, Sevastianov Leonid A., Kulyabov Dmitry S., Ayrjan Edik A., and Demidova Anastasia V.. 2020. Deep neural networks for emotion recognition. In International Conference on Distributed Computer and Communication Networks. Springer, 365379.Google ScholarGoogle Scholar
  49. [49] Stolar Melissa N., Lech Margaret, Bolia Robert S., and Skinner Michael. 2017. Real time speech emotion recognition using RGB image classification and transfer learning. In 11th International Conference on Signal Processing and Communication Systems (ICSPCS’17). IEEE, 18.Google ScholarGoogle Scholar
  50. [50] Triantafyllopoulos Andreas, Keren Gil, Wagner Johannes, Steiner Ingmar, and Schuller Björn. 2019. Towards robust speech emotion recognition using deep residual networks for speech enhancement. In Interspeech Conference.Google ScholarGoogle Scholar
  51. [51] Trigeorgis George, Ringeval Fabien, Brueckner Raymond, Marchi Erik, Nicolaou Mihalis A., Schuller Björn, and Zafeiriou Stefanos. 2016. Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 52005204.Google ScholarGoogle Scholar
  52. [52] Vrebčević N., Mijić I., and Petrinović D.. 2019. Emotion classification based on convolutional neural network using speech data. In 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO’19). IEEE, 10071012.Google ScholarGoogle Scholar
  53. [53] Wang Kunxia, An Ning, Li Bing Nan, Zhang Yanyong, and Li Lian. 2015. Speech emotion recognition using Fourier parameters. IEEE Trans. Affect. Comput. 6, 1 (2015), 6975.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wijayasingha Lahiru and Stankovic John A.. 2021. Robustness to noise for speech emotion classification using CNNs and attention mechanisms. Smart Health 19 (2021), 100165.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Zamil Adib Ashfaq A., Hasan Sajib, Baki Showmik MD. Jannatul, Adam Jawad MD., and Zaman Isra. 2019. Emotion detection from speech signals using voting mechanism on classified frames. In International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST’19). IEEE, 281285.Google ScholarGoogle Scholar
  56. [56] Zhao Mingmin, Adib Fadel, and Katabi Dina. 2016. Emotion recognition using wireless signals. In 22nd Annual International Conference on Mobile Computing and Networking. 95108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Burkhardt Felix, Paeschke Astrid, Rolfes Miriam, Sendlmeier Walter F., and Weiss Benjamin. 2005. A database of German emotional speech. In Interspeech, Vol. 5. 15171520.Google ScholarGoogle Scholar

Index Terms

  1. Emotion Recognition Robust to Indoor Environmental Distortions and Non-targeted Emotions Using Out-of-distribution Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Computing for Healthcare
        ACM Transactions on Computing for Healthcare  Volume 3, Issue 2
        April 2022
        292 pages
        ISSN:2691-1957
        EISSN:2637-8051
        DOI:10.1145/3505188
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 December 2021
        • Revised: 1 July 2021
        • Accepted: 1 July 2021
        • Received: 1 June 2020
        Published in health Volume 3, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed
      • Article Metrics

        • Downloads (Last 12 months)131
        • Downloads (Last 6 weeks)8

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!