Abstract
Recognition and analysis of human affect has been researched extensively within the field of computer science in the past two decades. However, most of the past research in automatic analysis of human affect has focused on the recognition of affect displayed by people in individual settings and little attention has been paid to the analysis of the affect expressed in group settings. In this article, we first analyze the affect expressed by each individual in terms of arousal and valence dimensions in both individual and group videos and then propose methods to recognize the contextual information, i.e., whether a person is alone or in-a-group by analyzing their face and body behavioral cues. For affect analysis, we first devise affect recognition models separately in individual and group videos and then introduce a cross-condition affect recognition model that is trained by combining the two different types of data. We conduct a set of experiments on two datasets that contain both individual and group videos. Our experiments show that (1) the proposed Volume Quantized Local Zernike Moments Fisher Vector outperforms other unimodal features in affect analysis; (2) the temporal learning model, Long-Short Term Memory Networks, works better than the static learning model, Support Vector Machine; (3) decision fusion helps to improve affect recognition, indicating that body behaviors carry emotional information that is complementary rather than redundant to the emotion content in facial behaviors; and (4) it is possible to predict the context, i.e., whether a person is alone or in-a-group, using their non-verbal behavioral cues.
- Tanja Bänziger, Marcello Mortillaro, and Klaus R. Scherer. 2012. Introducing the Geneva Multimodal expression corpus for experimental research on emotion perception. Emotion 12, 5 (2012), 1161.Google Scholar
- Sigal G. Barsade and Donald E. Gibson. 2012. Group affect its influence on individual and group outcomes. Curr. Direct. Psychol. Sci. 21, 2 (2012), 119--123.Google Scholar
Cross Ref
- Oya Celiktutan and Hatice Gunes. 2014. Continuous prediction of perceived traits and social dimensions in space and time. In Proceedings of the Conference on on Image Processing (ICIP’14).Google Scholar
Cross Ref
- Oya Celiktutan and Hatice Gunes. 2017. Automatic prediction of impressions in time and across varying context: Personality, attractiveness, and likeability. IEEE Trans. Affect. Comput. 8, 1 (2017), 29--42. Google Scholar
Digital Library
- Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27. Google Scholar
Digital Library
- Shizhe Chen, Qin Jin, Jinming Zhao, and Shuai Wang. 2017. Multimodal multi-task learning for dimensional and continuous emotion recognition. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge.Google Scholar
Digital Library
- Jeffrey F. Cohn and Fernando De la Torre. 2014. Automated face analysis for affective. Oxford Handbook Affect. Comput. (2014), 131.Google Scholar
- Ciprian A. Corneanu, Marc Oliu, Jeffrey F. Cohn, and Sergio Escalera. 2016. Survey on RGB, 3D, thermal, and multimodal approaches for facial expression recognition: History, trends, and affect-related applications. IEEE Trans. Pattern Anal. Mach. Intell. 38, 8 (2016), 1548--1568.Google Scholar
Digital Library
- Lee J. Cronbach. 1951. Coefficient alpha and the internal structure of tests. Psychometrika 16, 3 (1951), 297--334.Google Scholar
Cross Ref
- Kerstin Dautenhahn. 2007. Socially intelligent robots: Dimensions of human--robot interaction. Philos. Trans. Roy. Soc. B: Biol. Sci. 362, 1480 (2007), 697--704.Google Scholar
Cross Ref
- Abhinav Dhall et al. 2012. Collecting large, richly annotated facial-expression databases from movies. IEEE MultiMedia 19, 3 (2012), 34--41. Google Scholar
Digital Library
- Abhinav Dhall and Roland Goecke. 2015. A temporally piece-wise fisher vector approach for depression analysis. In Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII’15). Google Scholar
Digital Library
- Abhinav Dhall, Roland Goecke, and Tom Gedeon. 2015. Automatic group happiness intensity analysis. IEEE Trans. Affect. Comput. 6, 1 (2015), 13--26.Google Scholar
Cross Ref
- Abhinav Dhall, Roland Goecke, Shreya Ghosh, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2017. From individual to group-level emotion recognition: EmotiW 5.0. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. Google Scholar
Digital Library
- Abhinav Dhall, Roland Goecke, Jyoti Joshi, Jesse Hoey, and Tom Gedeon. 2016. EmotiW 2016: Video and group-level emotion recognition challenges. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’16). Google Scholar
Digital Library
- Abhinav Dhall, Jyoti Joshi, Ibrahim Radwan, and Roland Goecke. 2012. Finding happiest moments in a social context. http://users.cecs.anu.edu.au/∼adhall/Dhall_Joshi_Radwan_Goecke_ACCV_2012.pdf.Google Scholar
- Abhinav Dhall, Jyoti Joshi, Karan Sikka, Roland Goecke, and Nicu Sebe. 2015. The more the merrier: Analysing the affect of a group of people in images. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG’15).Google Scholar
Cross Ref
- Yin Fan, Xiangju Lu, Dian Li, and Yuanliu Liu. 2016. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’16).Google Scholar
Digital Library
- Andrew Gallagher and Tsuhan Chen. 2009. Understanding images of groups of people. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
Cross Ref
- Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, and Dong Hyun Lee. 2013. Challenges in representation learning: A report on three machine learning contests. Neural Netw. (2013).Google Scholar
- Hatice Gunes. 2010. Automatic, dimensional and continuous emotion recognition. https://ibug.doc.ic.ac.uk/media/uploads/documents/GunesPantic_IJSE_2010_camera.pdf. Google Scholar
Digital Library
- Hatice Gunes, Björn Schuller, Maja Pantic, and Roddy Cowie. 2011. Emotion representation, analysis and synthesis in continuous space: A survey. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG’11).Google Scholar
Cross Ref
- Hatice Gunes, Caifeng Shan, Shizhi Chen, and YingLi Tian. 2015. Bodily expression for automatic affect recognition. Emotion Recogn. Pattern Anal. Approach (2015).Google Scholar
- Javier Hernandez, Mohammed Hoque, Will Drevo, and Rosalind W. Picard. 2012. Mood meter: Counting smiles in the wild. https://affect.media.mit.edu/pdfs/12.Hernandez-Hoque-Drevo-Picard-MoodMeter-Ubicomp.pdf.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735--1780. Google Scholar
Digital Library
- Xiaohua Huang, Abhinav Dhall, Roland Goecke, Matti Pietikainen, and Guoying Zhao. 2018. Multi-modal framework for analyzing the affect of a group of people. IEEE Trans. Multimedia (2018).Google Scholar
- Xiaohua Huang, Abhinav Dhall, Xin Liu, Guoying Zhao, Jingang Shi, Roland Goecke, and Matti Pietikainen. 2016. Analyzing the affect of a group of people using multi-modal framework. arXiv preprint arXiv:1610.03640.Google Scholar
- Xiaohua Huang, Abhinav Dhall, Guoying Zhao, Roland Goecke, and Matti Pietikäinen. 2015. Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. In Proceedings of the British Machine and Vision Conference (BMVC’15).Google Scholar
Cross Ref
- Varun Jain, James L. Crowley, Anind K. Dey, and Augustin Lux. 2014. Depression estimation using audiovisual features and fisher vector encoding. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. Google Scholar
Digital Library
- Sebastian Kaltwang, Ognjen Rudovic, and Maja Pantic. 2012. Continuous pain intensity estimation from facial expressions. In Proceedings of the International Symposium on Visual Computing.Google Scholar
Cross Ref
- Michelle Karg, Ali-Akbar Samadani, Rob Gorbet, Kolja Kühnlenz, Jesse Hoey, and Dana Kulić. 2013. Body movements for affective expression: A survey of automatic recognition and generation. IEEE Trans. Affect. Comput. 4, 4 (2013), 341--359.Google Scholar
Cross Ref
- Andrea Kleinsmith and Nadia Bianchi-Berthouze. 2013. Affective body expression perception and recognition: A survey. IEEE Trans. Affect. Comput. 4, 1 (2013), 15--33. Google Scholar
Digital Library
- Sander Koelstra, Christian Mühl, Mohammad Soleymani, Jong-Seok Lee, Ashkan Yazdani, Touradj Ebrahimi, Thierry Pun, Anton Nijholt, and Ioannis Patras. 2012. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3, 1 (2012), 18--31. Google Scholar
Digital Library
- Sander Koelstra and Ioannis Patras. 2013. Fusion of facial expressions and EEG for implicit affective tagging. Image Vision Comput. 31, 2 (2013), 164--174. Google Scholar
Digital Library
- Jianshu Li, Sujoy Roy, Jiashi Feng, and Terence Sim. 2016. Happiness level prediction with sequential inputs via multiple regressions. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’16).Google Scholar
Digital Library
- Michael J. Lyons, J. Budynek, and S. Akamatsu. 1999. Automatic classification of single facial images. IEEE Trans. Pattern Anal. Mach. Intell. 21, 12 (1999), 1357--1362. Google Scholar
Digital Library
- Hanneke K. M. Meeren, Corné C. R. J. van Heijnsbergen, and Beatrice de Gelder. 2005. Rapid perceptual integration of facial expression and emotional body language. Proc. Natl. Acad. Sci. U.S.A. 102, 45 (2005), 16518--16523.Google Scholar
Cross Ref
- Juan Abdon Miranda-Correa, Mojtaba Khomami Abadi, Nicu Sebe, and Ioannis Patras. 2017. AMIGOS: A dataset for mood, personality and affect research on Individuals and GrOupS. arXiv preprint arXiv:1702.02510.Google Scholar
- Louis-Philippe Morency. 2013. The role of context in affective behavior understanding. Soc. Emot. Nature Artif. (2013).Google Scholar
- Wenxuan Mou, Oya Celiktutan, and Hatice Gunes. 2015. Group-level arousal and valence recognition in static images: Face, body and context. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition and Workshops (FG’15).Google Scholar
- Wenxuan Mou, Hatice Gunes, and Ioannis Patras. 2016. Alone versus in-a-group: A comparative analysis of facial affect recognition. In Proceedings of the ACM International Conference on Multimedia.Google Scholar
Digital Library
- Wenxuan Mou, Hatice Gunes, and Ioannis Patras. 2016. Automatic recognition of emotions and membership in group videos. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition and Workshops (CVPRW’16).Google Scholar
Cross Ref
- Petar Palasek and Ioannis Patras. 2016. Action recognition using convolutional restricted Boltzmann machines. In Proceedings of the International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction. Google Scholar
Digital Library
- Maja Pantic and Ioannis Patras. 2006. Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences. IEEE Trans. Syst. Man Cybernet. Part B: Cybernet. 36, 2 (2006), 433--449. Google Scholar
Digital Library
- Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. https://openreview.net/pdf?id=BJJsrmfCZ.Google Scholar
- Fabien Ringeval, Björn Schuller, Michel Valstar, Jonathan Gratch, Roddy Cowie, Stefan Scherer, Sharon Mozgai, Nicholas Cummins, Maximilian Schmitt, and Maja Pantic. 2017. AVEC 2017: Real-life depression, and affect recognition workshop and challenge. In Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge. Google Scholar
Digital Library
- Fabien Ringeval, Björn Schuller, Michel Valstar, Shashank Jaiswal, Erik Marchi, Denis Lalanne, Roddy Cowie, and Maja Pantic. 2015. AV+ EC 2015: The first affect recognition challenge bridging across audio, video, and physiological data. In Proceedings of the International Workshop on Audio/Visual Emotion Challenge. Google Scholar
Digital Library
- Fabien Ringeval, Andreas Sonderegger, Jens Sauer, and Denis Lalanne. 2013. Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG’13).Google Scholar
Cross Ref
- Jorge Sánchez, Florent Perronnin, Thomas Mensink, and Jakob Verbeek. 2013. Image classification with the fisher vector: Theory and practice. Int. J. Comput. Vis. (2013). Google Scholar
Digital Library
- Evangelos Sariyanidi, Hatice Gunes, and Andrea Cavallaro. 2015. Automatic analysis of facial affect: A survey of registration, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 6 (2015), 1113--1133.Google Scholar
Digital Library
- Evangelos Sariyanidi, Hatice Gunes, Muhittin Gökmen, and Andrea Cavallaro. 2013. Local Zernike moment representation for facial affect recognition. In Proceedings of the British Machine and Vision Conference (BMVC’13).Google Scholar
Cross Ref
- Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen Littlewort, and Marian Bartlett. 2013. Multiple kernel learning for emotion recognition in the wild. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI). Google Scholar
Digital Library
- Mohammad Soleymani, Jeroen Lichtenauer, Thierry Pun, and Maja Pantic. 2012. A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 3, 1 (2012), 42--55. Google Scholar
Digital Library
- Lianzhi Tan, Kaipeng Zhang, Kai Wang, Xiaoxing Zeng, Xiaojiang Peng, and Yu Qiao. 2017. Group emotion recognition with individual facial emotion CNNs and global image based CNNs. In Proceedings of the 19th ACM International Conference on Multimodal Interaction. Google Scholar
Digital Library
- Ying-li Tian, Takeo Kanade, and Jeffrey F. Cohn. 2001. Recognizing action units for facial expression analysis. IEEE Trans. Pattern Anal. Mach. Intell. 23, 2 (2001), 97--115. Google Scholar
Digital Library
- Michel Valstar, Jonathan Gratch, Björn Schuller, Fabien Ringeval, Denis Lalanne, Mercedes Torres Torres, Stefan Scherer, Giota Stratou, Roddy Cowie, and Maja Pantic. 2016. AVEC 2016: Depression, mood, and emotion recognition workshop and challenge. In Proceedings of the ACM International Conference on Multimedia and Workshop on Audio/visual Emotion Challenge. Google Scholar
Digital Library
- Jan Van den Stock, Ruthger Righart, and Beatrice De Gelder. 2007. Body expressions influence recognition of emotions in the face and voice. Emotion 7, 3 (2007), 487.Google Scholar
Cross Ref
- Aggeliki Vlachostergiou, George Caridakis, and Stefanos Kollias. 2014. Context in affective multiparty and multimodal interaction: Why, which, how, and where? In Proc. ACM Workshop on Understanding and Modeling Multiparty, Multimodal Interactions. Google Scholar
Digital Library
- Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’11).Google Scholar
Digital Library
- Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. (2013).Google Scholar
- Heng Wang and Cordelia Schmid. {n.d.}. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision (ICCV). Google Scholar
Digital Library
- Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’13).Google Scholar
Digital Library
- Heng Yang, Wenxuan Mou, Yichi Zhang, Ioannis Patras, Hatice Gunes, and Peter Robinson. 2015. Pose-invariant 3D face alignment. Proceedings of the British Machine and Vision Conference (BMVC’15).Google Scholar
- Yi Zhu, Ingrid Heynderickx, and Judith A. Redi. 2014. Alone or together: Measuring users’ viewing experience in different social contexts. In Proceedings of the 19th Conference on Human Vision and Electronic Imaging.Google Scholar
Index Terms
Alone versus In-a-group: A Multi-modal Framework for Automatic Affect Recognition
Recommendations
Alone versus In-a-group: A Comparative Analysis of Facial Affect Recognition
MM '16: Proceedings of the 24th ACM international conference on MultimediaAutomatic affect analysis and understanding has become a well established research area in the last two decades. Recent works have started moving from individual to group scenarios. However, little attention has been paid to comparing the affect ...
Contextual affect analysis: a system for verification of emotion appropriateness supported with Contextual Valence Shifters
This paper presents a novel method for estimating speaker's affective states based on two contextual features: valence shifters and appropriateness. Firstly, a system for affect analysis is used to recognise specific types of emotions. We improve the ...
Speech Emotion Analysis: Exploring the Role of Context
Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new ...






Comments