skip to main content
research-article

Alone versus In-a-group: A Multi-modal Framework for Automatic Affect Recognition

Published:10 June 2019Publication History
Skip Abstract Section

Abstract

Recognition and analysis of human affect has been researched extensively within the field of computer science in the past two decades. However, most of the past research in automatic analysis of human affect has focused on the recognition of affect displayed by people in individual settings and little attention has been paid to the analysis of the affect expressed in group settings. In this article, we first analyze the affect expressed by each individual in terms of arousal and valence dimensions in both individual and group videos and then propose methods to recognize the contextual information, i.e., whether a person is alone or in-a-group by analyzing their face and body behavioral cues. For affect analysis, we first devise affect recognition models separately in individual and group videos and then introduce a cross-condition affect recognition model that is trained by combining the two different types of data. We conduct a set of experiments on two datasets that contain both individual and group videos. Our experiments show that (1) the proposed Volume Quantized Local Zernike Moments Fisher Vector outperforms other unimodal features in affect analysis; (2) the temporal learning model, Long-Short Term Memory Networks, works better than the static learning model, Support Vector Machine; (3) decision fusion helps to improve affect recognition, indicating that body behaviors carry emotional information that is complementary rather than redundant to the emotion content in facial behaviors; and (4) it is possible to predict the context, i.e., whether a person is alone or in-a-group, using their non-verbal behavioral cues.

References

  1. Tanja Bänziger, Marcello Mortillaro, and Klaus R. Scherer. 2012. Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. Emotion 12, 5 (2012), 1161.Google ScholarGoogle Scholar
  2. Sigal G. Barsade and Donald E. Gibson. 2012. Group affect its influence on individual and group outcomes. Curr. Direct. Psychol. Sci. 21, 2 (2012), 119--123.Google ScholarGoogle ScholarCross RefCross Ref
  3. Oya Celiktutan and Hatice Gunes. 2014. Continuous prediction of perceived traits and social dimensions in space and time. In Proceedings of the Conference on on Image Processing (ICIP’14).Google ScholarGoogle ScholarCross RefCross Ref
  4. Oya Celiktutan and Hatice Gunes. 2017. Automatic prediction of impressions in time and across varying context: Personality, attractiveness, and likeability. IEEE Trans. Affect. Comput. 8, 1 (2017), 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Jeffrey F. Cohn and Fernando De la Torre. 2014. Automated face analysis for affective. Oxford Handbook Affect. Comput. (2014), 131.Google ScholarGoogle Scholar
  8. Ciprian A. Corneanu, Marc Oliu, Jeffrey F. Cohn, and Sergio Escalera. 2016. Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1548--1568.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lee J. Cronbach. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16, 3 (1951), 297--334.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kerstin Dautenhahn. 2007. Socially intelligent robots: Dimensions of human--robot interaction. Philos. Trans. Roy. Soc. B: Biol. Sci. 362, 1480 (2007), 697--704.Google ScholarGoogle ScholarCross RefCross Ref
  11. Abhinav Dhall et al. 2012. Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMedia 19, 3 (2012), 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Abhinav Dhall and Roland Goecke. 2015. A temporally piece-wise fisher vector approach for depression analysis. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII’15). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Abhinav Dhall, Roland Goecke, and Tom Gedeon. 2015. Automatic group happiness intensity analysis. IEEE Trans. Affect. Comput. 6, 1 (2015), 13--26.Google ScholarGoogle ScholarCross RefCross Ref
  14. Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2017. From individual to group-level emotion recognition: EmotiW 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2016. EmotiW 2016: Video and group-level emotion recognition challenges. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Abhinav Dhall, Jyoti Joshi, Ibrahim Radwan, and Roland Goecke. 2012. Finding happiest moments in a social context. http://users.cecs.anu.edu.au/∼adhall/Dhall_Joshi_Radwan_Goecke_ACCV_2012.pdf.Google ScholarGoogle Scholar
  17. Abhinav Dhall, Jyoti Joshi, Karan Sikka, Roland Goecke, and Nicu Sebe. 2015. The more the merrier: Analysing the affect of a group of people in images. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG’15).Google ScholarGoogle ScholarCross RefCross Ref
  18. Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Andrew Gallagher and Tsuhan Chen. 2009. Understanding images of groups of people. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’09).Google ScholarGoogle ScholarCross RefCross Ref
  20. Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, and Dong Hyun Lee. 2013. Challenges in representation learning: A report on three machine learning contests. Neural Netw. (2013).Google ScholarGoogle Scholar
  21. Hatice Gunes. 2010. Automatic, dimensional and continuous emotion recognition. https://ibug.doc.ic.ac.uk/media/uploads/documents/GunesPantic_IJSE_2010_camera.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hatice Gunes, Björn Schuller, Maja Pantic, and Roddy Cowie. 2011. Emotion representation, analysis and synthesis in continuous space: A survey. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG’11).Google ScholarGoogle ScholarCross RefCross Ref
  23. Hatice Gunes, Caifeng Shan, Shizhi Chen, and YingLi Tian. 2015. Bodily expression for automatic affect recognition. Emotion Recogn. Pattern Anal. Approach (2015).Google ScholarGoogle Scholar
  24. Javier Hernandez, Mohammed Hoque, Will Drevo, and Rosalind W. Picard. 2012. Mood meter: Counting smiles in the wild. https://affect.media.mit.edu/pdfs/12.Hernandez-Hoque-Drevo-Picard-MoodMeter-Ubicomp.pdf.Google ScholarGoogle Scholar
  25. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, and Guoying Zhao. 2018. Multi-modal framework for analyzing the affect of a group of people. IEEE Trans. Multimedia (2018).Google ScholarGoogle Scholar
  27. Xiaohua Huang, Abhinav Dhall, Xin Liu, Guoying Zhao, Jingang Shi, Roland Goecke, and Matti Pietikainen. 2016. Analyzing the affect of a group of people using multi-modal framework. arXiv preprint arXiv:1610.03640.Google ScholarGoogle Scholar
  28. Xiaohua Huang, Abhinav Dhall, Guoying Zhao, Roland Goecke, and Matti Pietikäinen. 2015. Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. In Proceedings of the British Machine and Vision Conference (BMVC’15).Google ScholarGoogle ScholarCross RefCross Ref
  29. Varun Jain, James L. Crowley, Anind K. Dey, and Augustin Lux. 2014. Depression estimation using audiovisual features and fisher vector encoding. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sebastian Kaltwang, Ognjen Rudovic, and Maja Pantic. 2012. Continuous pain intensity estimation from facial expressions. In Proceedings of the International Symposium on Visual Computing.Google ScholarGoogle ScholarCross RefCross Ref
  31. Michelle Karg, Ali-Akbar Samadani, Rob Gorbet, Kolja Kühnlenz, Jesse Hoey, and Dana Kulić. 2013. Body movements for affective expression: A survey of automatic recognition and generation. IEEE Trans. Affect. Comput. 4, 4 (2013), 341--359.Google ScholarGoogle ScholarCross RefCross Ref
  32. Andrea Kleinsmith and Nadia Bianchi-Berthouze. 2013. Affective body expression perception and recognition: A survey. IEEE Trans. Affect. Comput. 4, 1 (2013), 15--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Sander Koelstra, Christian Mühl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2012. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3, 1 (2012), 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Sander Koelstra and Ioannis Patras. 2013. Fusion of facial expressions and EEG for implicit affective tagging. Image Vision Comput. 31, 2 (2013), 164--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jianshu Li, Sujoy Roy, Jiashi Feng, and Terence Sim. 2016. Happiness level prediction with sequential inputs via multiple regressions. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’16).Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Michael J. Lyons, J. Budynek, and S. Akamatsu. 1999. Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 12 (1999), 1357--1362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Hanneke K. M. Meeren, Corné C. R. J. van Heijnsbergen, and Beatrice de Gelder. 2005. Rapid perceptual integration of facial expression and emotional body language. Proc. Natl. Acad. Sci. U.S.A. 102, 45 (2005), 16518--16523.Google ScholarGoogle ScholarCross RefCross Ref
  38. Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2017. AMIGOS: A dataset for mood, personality and affect research on Individuals and GrOupS. arXiv preprint arXiv:1702.02510.Google ScholarGoogle Scholar
  39. Louis-Philippe Morency. 2013. The role of context in affective behavior understanding. Soc. Emot. Nature Artif. (2013).Google ScholarGoogle Scholar
  40. Wenxuan Mou, Oya Celiktutan, and Hatice Gunes. 2015. Group-level arousal and valence recognition in static images: Face, body and context. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’15).Google ScholarGoogle Scholar
  41. Wenxuan Mou, Hatice Gunes, and Ioannis Patras. 2016. Alone versus in-a-group: A comparative analysis of facial affect recognition. In Proceedings of the ACM International Conference on Multimedia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Wenxuan Mou, Hatice Gunes, and Ioannis Patras. 2016. Automatic recognition of emotions and membership in group videos. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition and Workshops (CVPRW’16).Google ScholarGoogle ScholarCross RefCross Ref
  43. Petar Palasek and Ioannis Patras. 2016. Action recognition using convolutional restricted Boltzmann machines. In Proceedings of the International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Maja Pantic and Ioannis Patras. 2006. Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybernet. Part B: Cybernet. 36, 2 (2006), 433--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. https://openreview.net/pdf?id=BJJsrmfCZ.Google ScholarGoogle Scholar
  46. Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. AVEC 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, and Maja Pantic. 2015. AV+ EC 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Fabien Ringeval, Andreas Sonderegger, Jens Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG’13).Google ScholarGoogle ScholarCross RefCross Ref
  49. Jorge Sánchez, Florent Perronnin, Thomas Mensink, and Jakob Verbeek. 2013. Image classification with the fisher vector: Theory and practice. Int. J. Comput. Vis. (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 6 (2015), 1113--1133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Evangelos Sariyanidi, Hatice Gunes, Muhittin Gökmen, and Andrea Cavallaro. 2013. Local Zernike moment representation for facial affect recognition. In Proceedings of the British Machine and Vision Conference (BMVC’13).Google ScholarGoogle ScholarCross RefCross Ref
  52. Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort, and Marian Bartlett. 2013. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2012. A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3, 1 (2012), 42--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Lianzhi Tan, Kaipeng Zhang, Kai Wang, Xiaoxing Zeng, Xiaojiang Peng, and Yu Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Ying-li Tian, Takeo Kanade, and Jeffrey F. Cohn. 2001. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23, 2 (2001), 97--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the ACM International Conference on Multimedia and Workshop on Audio/visual Emotion Challenge. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Jan Van den Stock, Ruthger Righart, and Beatrice De Gelder. 2007. Body expressions influence recognition of emotions in the face and voice. Emotion 7, 3 (2007), 487.Google ScholarGoogle ScholarCross RefCross Ref
  58. Aggeliki Vlachostergiou, George Caridakis, and Stefanos Kollias. 2014. Context in affective multiparty and multimodal interaction: Why, which, how, and where? In Proc. ACM Workshop on Understanding and Modeling Multiparty, Multimodal Interactions. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’11).Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. (2013).Google ScholarGoogle Scholar
  61. Heng Wang and Cordelia Schmid. {n.d.}. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Heng Yang, Wenxuan Mou, Yichi Zhang, Ioannis Patras, Hatice Gunes, and Peter Robinson. 2015. Pose-invariant 3D face alignment. Proceedings of the British Machine and Vision Conference (BMVC’15).Google ScholarGoogle Scholar
  64. Yi Zhu, Ingrid Heynderickx, and Judith A. Redi. 2014. Alone or together: Measuring users’ viewing experience in different social contexts. In Proceedings of the 19th Conference on Human Vision and Electronic Imaging.Google ScholarGoogle Scholar

Index Terms

  1. Alone versus In-a-group: A Multi-modal Framework for Automatic Affect Recognition

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!