skip to main content
research-article

A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Authors Info & Claims
Published:04 March 2022Publication History
Skip Abstract Section

Abstract

Considerable attention has been paid to physiological signal-based emotion recognition in the field of affective computing. For reliability and user-friendly acquisition, electrodermal activity (EDA) has a great advantage in practical applications. However, EDA-based emotion recognition with large-scale subjects is still a tough problem. The traditional well-designed classifiers with hand-crafted features produce poorer results because of their limited representation abilities. And the deep learning models with auto feature extraction suffer the overfitting drop-off because of large-scale individual differences. Since music has a strong correlation with human emotion, static music can be involved as the external benchmark to constrain various dynamic EDA signals. In this article, we make an attempt by fusing the subject’s individual EDA features and the external evoked music features. And we propose an end-to-end multimodal framework, the one-dimensional residual temporal and channel attention network (RTCAN-1D). For EDA features, the channel-temporal attention mechanism for EDA-based emotion recognition is first involved in mine the temporal and channel-wise dynamic and steady features. The comparisons with single EDA-based SOTA models on DEAP and AMIGOS datasets prove the effectiveness of RTCAN-1D to mine EDA features. For music features, we simply process the music signal with the open-source toolkit openSMILE to obtain external feature vectors. We conducted systematic and extensive evaluations. The experiments on the current largest music emotion dataset PMEmo validate that the fusion of EDA and music is a reliable and efficient solution for large-scale emotion recognition.

REFERENCES

  1. [1] Machot Fadi Al, Elmachot Ali, Ali Mouhannad, Machot Elyan Al, and Kyamakya Kyandoghere. 2019. A deep-learning model for subject-independent human emotion recognition using electrodermal activity sensors. Sensors 19, 7 (2019), 1659.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Alexander David M., Trengove Chris, Johnston P., Cooper Tim, August J. P., and Gordon Evian. 2005. Separating individual skin conductance responses in a short interstimulus-interval paradigm. Journal of Neuroscience Methods 146, 1 (2005), 116123.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Aljanaki Anna, Yang Yi-Hsuan, and Soleymani Mohammad. 2017. Developing a benchmark for emotional analysis of music. PloS One 12, 3 (2017), e0173392.Google ScholarGoogle ScholarCross RefCross Ref
  4. [4] Anusha A. S., Preejith S. P., Akl Tony J., Joseph Jayaraj, and Sivaprakasam Mohanasankar. 2018. Dry electrode optimization for wrist-based electrodermal activity monitoring. In Proceedings of the 2018 IEEE International Symposium on Medical Measurements and Applications. IEEE, 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Baltrušaitis Tadas, Ahuja Chaitanya, and Morency Louis-Philippe. 2018. Multimodal machine learning: A survey and taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 423443.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Becker Judith. 2004. Deep Listeners: Music, Emotion, and Trancing, Vol. 1. Indiana University Press.Google ScholarGoogle Scholar
  7. [7] Benedek Mathias and Kaernbach Christian. 2010. A continuous measure of phasic electrodermal activity. Journal of Neuroscience Methods 190, 1 (2010), 8091.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Benedek Mathias and Kaernbach Christian. 2010. Decomposition of skin conductance data by means of nonnegative deconvolution. Psychophysiology 47, 4 (2010), 647658.Google ScholarGoogle Scholar
  9. [9] Boucsein Wolfram. 2012. Electrodermal Activity. Springer Science & Business Media.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Buades Antoni, Coll Bartomeu, and Morel J.-M.. 2005. A non-local algorithm for image denoising. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, 6065.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Chiliguano Paulo and Fazekas Gyorgy. 2016. Hybrid music recommender using content-based and social information. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 26182622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Correa Juan Abdon Miranda, Abadi Mojtaba Khomami, Sebe Niculae, and Patras Ioannis. 2018. Amigos: A dataset for affect, personality and mood research on individuals and groups. IEEE Transactions on Affective Computing 12, 2 (2018), 479–493.Google ScholarGoogle Scholar
  13. [13] Damasio Antonio R.. 1994. Descartes’ error: Emotion, reason, and the human brain. Optometry and Vision Science 72, 11 (1994), 847–848.Google ScholarGoogle Scholar
  14. [14] Dawson Michael E., Schell Anne M., and Filion Diane L.. 2007. The electrodermal system. Handbook of Psychophysiology 2 (2007), 200223.Google ScholarGoogle Scholar
  15. [15] Ekman Paul. 1992. An argument for basic emotions. Cognition & Emotion 6, 3–4 (1992), 169200.Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Ayadi Moataz El, Kamel Mohamed S., and Karray Fakhri. 2011. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44, 3 (2011), 572587.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Frantzidis Christos A., Bratsas Charalampos, Klados Manousos A., Konstantinidis Evdokimos, Lithari Chrysa D., Vivas Ana B., Papadelis Christos L., Kaldoudi Eleni, Pappas Costas, and Bamidis Panagiotis D.. 2010. On the classification of emotional biosignals evoked while viewing affective pictures: An integrated data-mining-based approach for healthcare applications. IEEE Transactions on Information Technology in Biomedicine 14, 2 (2010), 309318.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Ganapathy Nagarajan, Veeranki Yedukondala Rao, and Swaminathan Ramakrishnan. 2020. Convolutional neural network based emotion classification using electrodermal activity signals and time-frequency features. Expert Systems with Applications 159 (2020), 113571.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Greco Alberto, Valenza Gaetano, Citi Luca, and Scilingo Enzo Pasquale. 2016. Arousal and valence recognition of affective sounds based on electrodermal activity. IEEE Sensors Journal 17, 3 (2016), 716725.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Greco Alberto, Valenza Gaetano, Lanata Antonio, Scilingo Enzo Pasquale, and Citi Luca. 2015. cvxEDA: A convex optimization approach to electrodermal activity processing. IEEE Transactions on Biomedical Engineering 63, 4 (2015), 797804.Google ScholarGoogle Scholar
  21. [21] Guo Rui, Li Shuangjiang, He Li, Gao Wei, Qi Hairong, and Owens Gina. 2013. Pervasive and unobtrusive emotion sensing for human mental health. In Proceedings of the 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops. IEEE, 436439.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Hamann Stephan. 2012. Mapping discrete and dimensional emotions onto the brain: Controversies and consensus. Trends in Cognitive Sciences 16, 9 (2012), 458466.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Hu Jie, Shen Li, and Sun Gang. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 71327141.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Huang Gao, Liu Zhuang, Maaten Laurens Van Der, and Weinberger Kilian Q.. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 47004708.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Izard Carroll E.. 2007. Basic emotions, natural kinds, emotion schemas, and a new paradigm. Perspectives on Psychological Science 2, 3 (2007), 260280.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Jaimes Alejandro and Sebe Nicu. 2007. Multimodal human–computer interaction: A survey. Computer Vision and Image Understanding 108, 1–2 (2007), 116134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Jerritta S., Murugappan M., Nagarajan R., and Wan Khairunizam. 2011. Physiological signals based human emotion recognition: A review. In Proceedings of the 2011 IEEE 7th International Colloquium on Signal Processing and its Applications. IEEE, 410415.Google ScholarGoogle ScholarCross RefCross Ref
  29. [29] Katsis Christos D., Katertsidis Nikolaos, Ganiatsas George, and Fotiadis Dimitrios I.. 2008. Toward emotion recognition in car-racing drivers: A biosignal processing approach. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 38, 3 (2008), 502512.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Kelsey Malia, Akcakaya Murat, Kleckner Ian R., Palumbo Richard Vincent, Barrett Lisa Feldman, Quigley Karen S., and Goodwin Matthew S.. 2018. Applications of sparse recovery and dictionary learning to enhance analysis of ambulatory electrodermal activity data. Biomedical Signal Processing and Control 40, 2 (2018), 5870.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Kim Jonghwa. 2007. Bimodal emotion recognition using speech and physiological changes. Robust Speech Recognition and Understanding 265 (2007), 280.Google ScholarGoogle Scholar
  32. [32] Kim Jonghwa and André Elisabeth. 2008. Emotion recognition based on physiological changes in music listening. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 12 (2008), 20672083.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Kim Yelin and Provost Emily Mower. 2015. Emotion recognition during speech using dynamics of multiple regions of the face. ACM Transactions on Multimedia Computing, Communications, and Applications 12, 1s (2015), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Koelstra Sander, Muhl Christian, Soleymani Mohammad, Lee Jong-Seok, Yazdani Ashkan, Ebrahimi Touradj, Pun Thierry, Nijholt Anton, and Patras Ioannis. 2011. Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing 3, 1 (2011), 1831.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Koelstra Sander and Patras Ioannis. 2013. Fusion of facial expressions and EEG for implicit affective tagging. Image and Vision Computing 31, 2 (2013), 164174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Lang Peter J., Greenwald Mark K., Bradley Margaret M., and Hamm Alfons O.. 1993. Looking at pictures: Affective, facial, visceral, and behavioral reactions. Psychophysiology 30, 3 (1993), 261273.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Langhorne Peter, Bernhardt Julie, and Kwakkel Gert. 2011. Stroke rehabilitation. The Lancet 377, 9778 (2011), 16931702.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Lawrence Steve and Giles C. Lee. 2000. Overfitting and neural networks: Conjugate gradient and backpropagation. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium, Vol. 1. IEEE, 114119.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] Lawrence Steve, Giles C. Lee, and Tsoi Ah Chung. 1997. Lessons in neural network training: Overfitting may be harder than expected. In Proceedings of the AAAI/IAAI. Citeseer, 540545.Google ScholarGoogle Scholar
  40. [40] Lin Wenqian, Li Chao, and Sun Shouqian. 2017. Deep convolutional neural network for emotion recognition using EEG and peripheral physiological signal. In Proceedings of the International Conference on Image and Graphics. Springer, 385394.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Lin Yu-Ching, Yang Yi-Hsuan, and Chen Homer H.. 2011. Exploiting online music tags for music emotion classification. ACM Transactions on Multimedia Computing, Communications, and Applications 7, 1 (2011), 116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Liu Jiamin, Su Yuanqi, and Liu Yuehu. 2017. Multi-modal emotion recognition with temporal-band attention based on lstm-rnn. In Proceedings of the Pacific Rim Conference on Multimedia. Springer, 194204.Google ScholarGoogle Scholar
  43. [43] Nittala Aditya Shekhar, Khan Arshad, Kruttwig Klaus, Kraus Tobias, and Steimle Jürgen. 2020. PhysioSkin: Rapid fabrication of skin-conformal physiological interfaces. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Pei Wenjie, Baltrusaitis Tadas, Tax David M. J., and Morency Louis-Philippe. 2017. Temporal attention-gated model for robust sequence classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 67306739.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Picard Rosalind W.. 2000. Affective Computing. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Picard Rosalind W., Vyzas Elias, and Healey Jennifer. 2001. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Transactions on Pattern Analysis & Machine Intelligence23, 10 (2001), 11751191.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Plutchik Robert. 1982. A psychoevolutionary theory of emotions. Social Science Information 21 (1982), 529553.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Russell James A.. 1980. A circumplex model of affect. Journal of Personality and Social Psychology 39, 6 (1980), 1161.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Sano Akane and Picard Rosalind W.. 2013. Stress recognition using wearable sensors and mobile phones. In Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. IEEE, 671676.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Santamaria-Granados Luz, Munoz-Organero Mario, Ramirez-Gonzalez Gustavo, Abdulhay Enas, and Arunkumar N. J. I. A.. 2018. Using deep convolutional neural network for emotion detection on a physiological signals dataset. IEEE Access 7 (2018), 5767.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Selvaraju Ramprasaath R., Cogswell Michael, Das Abhishek, Vedantam Ramakrishna, Parikh Devi, and Batra Dhruv. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618626.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Sharma Vivek, Prakash Neelam R., and Kalra Parveen. 2019. Audio-video emotional response mapping based upon electrodermal activity. Biomedical Signal Processing and Control 47 (2019), 324333.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Shukla Jainendra, Barreda-Angeles Miguel, Oliver Joan, Nandi G. C., and Puig Domenec. 2019. Feature extraction and selection for emotion recognition from electrodermal activity. IEEE Transactions on Affective Computing 12, 4 (2019), 857–869.Google ScholarGoogle Scholar
  54. [54] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. In the 3rd International Conference on Learning Representations (ICLR’15).Google ScholarGoogle Scholar
  55. [55] Srivastava Nitish, Hinton Geoffrey, Krizhevsky Alex, Sutskever Ilya, and Salakhutdinov Ruslan. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 19291958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Thammasan Nattapong, Fukui Ken-ichi, and Numao Masayuki. 2017. Multimodal fusion of eeg and musical features in music-emotion recognition. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Torres Cristian A., Orozco Álvaro A., and Álvarez Mauricio A.. 2013. Feature selection for multimodal emotion recognition in the arousal-valence space. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 43304333.Google ScholarGoogle ScholarCross RefCross Ref
  58. [58] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  59. [59] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Wundt Wilhelm Max. 1921. Vorlesungen über die menschen- und tierseele. American Journal of Psychology 32, 1 (1921), 151.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Xiong Ning and Svensson Per. 2002. Multi-sensor management for information fusion: Issues and approaches. Information Fusion 3, 2 (2002), 163186.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Yin Guanghao, Sun Shouqian, Zhang Hui, Yu Dian, Li Chao, Zhang Kejun, and Zou Ning. 2019. User independent emotion recognition with residual signal-image network. In Proceedings of the 2019 IEEE International Conference on Image Processing. IEEE, 32773281.Google ScholarGoogle ScholarCross RefCross Ref
  63. [63] Zhang Kejun, Zhang Hui, Li Simeng, Yang Changyuan, and Sun Lingyun. 2018. The PMEmo dataset for music emotion recognition. In Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval. ACM, 135142.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Multimodal Framework for Large-Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 18, Issue 3
          August 2022
          478 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3505208
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 4 March 2022
          • Accepted: 1 October 2021
          • Revised: 1 September 2021
          • Received: 1 January 2021
          Published in tomm Volume 18, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!