skip to main content
research-article

Understanding the Dynamics of Social Interactions: A Multi-Modal Multi-View Approach

Authors Info & Claims
Published:17 February 2019Publication History
Skip Abstract Section

Abstract

In this article, we deal with the problem of understanding human-to-human interactions as a fundamental component of social events analysis. Inspired by the recent success of multi-modal visual data in many recognition tasks, we propose a novel approach to model dyadic interaction by means of features extracted from synchronized 3D skeleton coordinates, depth, and Red Green Blue (RGB) sequences. From skeleton data, we extract new view-invariant proxemic features, named Unified Proxemic Descriptor (UProD), which is able to incorporate intrinsic and extrinsic distances between two interacting subjects. A novel key frame selection method is introduced to identify salient instants of the interaction sequence based on the joints’ energy. From Red Green Blue Depth (RGBD) videos, more holistic CNN features are extracted by applying an adaptive pre-trained Convolutional Neural Networks (CNNs) on optical flow frames. For better understanding the dynamics of interactions, we expand the boundaries of dyadic interactions analysis by proposing a fundamentally new modeling for non-treated problem aiming to discern the active from the passive interactor. Extensive experiments have been carried out on four multi-modal and multi-view interactions datasets. The experimental results demonstrate the superiority of our proposed techniques against the state-of-the-art approaches.

References

  1. Jake K. Aggarwal and Michael S. Ryoo. 2011. Human activity analysis: A review. ACM Computing Surveys (CSUR) 43, 3 (2011), 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Judee K. Burgoon, Lesa A. Stern, and Leesa Dillman. 2007. Interpersonal adaptation: Dyadic interaction patterns. Cambridge University Press.Google ScholarGoogle Scholar
  3. Chao Yeh Chen and Kristen Grauman. 2017. Efficient activity detection in untrimmed video with max-subgraph search. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 5 (2017), 908--921. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Lulu Chen, Hong Wei, and James Ferryman. 2013. A survey of human motion analysis using depth imagery. Pattern Recognition Letters 34, 15 (2013), 1995--2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Claudio Coppola, Serhan Cosar, Diego R. Faria, Nicola Bellotto, and others. 2017. Automatic detection of human interactions from RGB-D data for social activity classification. (2017).Google ScholarGoogle Scholar
  6. Claudio Coppola, Diego R. Faria, Urbano Nunes, and Nicola Bellotto. 2016. Social activity recognition based on probabilistic merging of skeleton features with proximity priors from RGB-D data. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5055--5061.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, and Wolfram Burgard. 2015. Multimodal deep learning for robust RGB-D object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google ScholarGoogle ScholarCross RefCross Ref
  8. Georgios Evangelidis, Gurkirt Singh, and Radu Horaud. 2014. Skeletal quads: Human action recognition using joint quadruples. In International Conference on Pattern Recognition (ICPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9 (2008), 1871--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alan Page Fiske. 1991. Structures of Social Life: The Four Elementary Forms of Human Relations: Communal Sharing, Authority Ranking, Equality Matching, Market Pricing. Free Press.Google ScholarGoogle Scholar
  11. Alan P. Fiske. 1992. The four elementary forms of sociality: Framework for a unified theory of social relations. Psychological Review 99, 4 (1992), 689.Google ScholarGoogle ScholarCross RefCross Ref
  12. Yun Fu, Yunde Jia, and Yu Kong. 2014. Interactive phrases: Semantic descriptions for human interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) (2014).Google ScholarGoogle Scholar
  13. Edward T. Hall. 1963. A system for the notation of proxemic behavior. American Anthropologist 65, 5 (1963), 1003--1026.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jian-Fang Hu, Wei-Shi Zheng, Jianhuang Lai, and Jianguo Zhang. 2015. Jointly learning heterogeneous features for RGB-D activity recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5344--5352.Google ScholarGoogle ScholarCross RefCross Ref
  15. De-An Huang and Kris M. Kitani. 2014. Action-reaction: Forecasting the dynamics of human interaction. In European Conference on Computer Vision. Springer, 489--504.Google ScholarGoogle Scholar
  16. Kyriaki Kalimeri, Bruno Lepri, Oya Aran, Dinesh Babu Jayagopi, Daniel Gatica-Perez, and Fabio Pianesi. 2012. Modeling dominance effects on nonverbal behaviors using Granger causality. In 14th ACM International Conference on Multimodal Interaction. ACM, 23--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Yu Kong, Yunde Jia, and Yun Fu. 2012. Learning human interaction by interactive phrases. In European Conference on Computer Vision (ECCV), Vol. 7572. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Julian F. P. Kooij, M. C. Liem, Johannes D. Krijnders, Tjeerd C. Andringa, and Dariu M. Gavrila. 2016. Multi-modal human aggression detection. Computer Vision and Image Understanding 144 (2016), 106--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Iulia Lefter, Catholijn M Jonker, Stephanie Klein Tuente, Wim Veling, and Stefan Bogaerts. 2017. NAA: A multimodal database of negative affect and aggression. In 2017 7th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 21--27.Google ScholarGoogle ScholarCross RefCross Ref
  20. Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C Kot. 2017. Global context-aware attention LSTM networks for 3D action recognition. In IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  21. Minghuang Ma, Haoqi Fan, and Kris M Kitani. 2016. Going deeper into first-person activity recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 1894--1903.Google ScholarGoogle ScholarCross RefCross Ref
  22. Alvaro Marcos-Ramiro, Daniel Pizarro, Marta Marron-Romera, and Daniel Gatica-Perez. 2015. Let your body speak: Communicative cue extraction on natural interaction using RGBD data. IEEE Transactions on Multimedia 17, 10 (2015), 1721--1732.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Louis-Philippe Morency, Iwan de Kok, and Jonathan Gratch. 2008. Context-based recognition during human interactions: Automatic feature selection and encoding dictionary. In 10th International Conference on Multimodal Interfaces. ACM, 181--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Eshed Ohn-Bar and Mohan Trivedi. 2013. Joint angles similarities and HOG2 for action recognition. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sunghyun Park, Stefan Scherer, Jonathan Gratch, Peter Carnevale, and Louis-Philippe Morency. 2013. Mutual behaviors during dyadic negotiation: Automatic prediction of respondent reactions. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 423--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Alonso Patron-Perez, Marcin Marszalek, Andrew Zisserman, and Ian D. Reid. 2010. High five: Recognising human interactions in TV shows. In BMVC, Vol. 1. Citeseer, 2.Google ScholarGoogle Scholar
  27. Eric Postma and Marie Nilsenova. 2016. Measuring the causal dynamics of facial interaction. (2016).Google ScholarGoogle Scholar
  28. Michael S. Ryoo. 2011. Human activity prediction: Early recognition of ongoing activities from streaming videos. In 2011 IEEE International Conference on Computer Vision (ICCV). IEEE, 1036--1043. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. S. Ryoo and J. K. Aggarwal. 2009. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  30. Amir Shahroudy, Jun Liu, Tian-Tsong Ng, and Gang Wang. 2016. NTU RGB+D: A large scale dataset for 3D human activity analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  31. Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Neural Information Processing Systems Conference (NIPS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. Soomro, A. Roshan Zamir, and M. Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01.Google ScholarGoogle Scholar
  33. Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3D skeletons as points in a lie group. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 588--595. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Alessandro Vinciarelli, Anna Esposito, Elisabeth André, Francesca Bonin, Mohamed Chetouani, Jeffrey F. Cohn, Marco Cristani, Ferdinand Fuhrmann, Elmer Gilmartin, Zakia Hammal, and others. 2015. Open challenges in modelling, analysis and synthesis of human behaviour in human--human and human--machine interactions. Cognitive Computation 7, 4 (2015), 397--413.Google ScholarGoogle ScholarCross RefCross Ref
  35. Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In IEEE International Conference on Computer Vision. 3551--3558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Limin Wang, Yuanjun Xiong, Zhe Wang, and Yu Qiao. 2015. Towards good practices for very deep two-stream ConvNets. Arxiv Preprint Arxiv:1507.02159 (2015).Google ScholarGoogle Scholar
  37. Pichao Wang, Wanqing Li, Zhimin Gao, Jing Zhang, Chang Tang, and Philip O. Ogunbona. 2016. Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems 46, 4 (2016), 498--509.Google ScholarGoogle ScholarCross RefCross Ref
  38. Christian Wolf, Eric Lombardi, Julien Mille, Oya Celiktutan, Mingyuan Jiu, Emre Dogan, Gonen Eren, Moez Baccouche, Emmanuel Dellandréa, Charles-Edmond Bichot, and others. 2014. Evaluation of video activity localizations integrating quality and quantity measurements. Computer Vision and Image Understanding 127 (2014), 14--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ning Xu, Anan Liu, Weizhi Nie, Yongkang Wong, Fuwu Li, and Yuting Su. 2015. Multi-modal 8 multi-view 8 interactive benchmark dataset for human action recognition. In 23rd ACM International Conference on Multimedia. ACM, 1195--1198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Xiaodong Yang and YingLi Tian. 2014. Super normal vector for activity recognition using depth sequences. In IEEE Conference on Computer Vision and Pattern Recognition. 804--811. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xiaodong Yang and YingLi Tian. 2017. Super normal vector for human activity recognition with depth cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 5 (2017), 1028--1039. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ryo Yonetani, Kris M Kitani, and Yoichi Sato. 2016. Recognizing micro-actions and reactions from paired egocentric videos. In IEEE Conference on Computer Vision and Pattern Recognition. 2629--2638.Google ScholarGoogle ScholarCross RefCross Ref
  43. Kiwon Yun, Jean Honorio, Debaleena Chattopadhyay, Tamara L Berg, and Dimitris Samaras. 2012. Two-person interaction detection using body-pose features and multiple instance learning. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).Google ScholarGoogle ScholarCross RefCross Ref
  44. Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime TV-L 1 optical flow. In Joint Pattern Recognition Symposium. 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Maryam Ziaeefard, Robert Bergevin, and Louis-Philippe Morency. 2015. Time-slice prediction of dyadic human activities. In BMVC. 167--1.Google ScholarGoogle Scholar

Index Terms

  1. Understanding the Dynamics of Social Interactions: A Multi-Modal Multi-View Approach

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1s
      Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
      January 2019
      265 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3309769
      Issue’s Table of Contents

      Copyright © 2019 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2019
      • Accepted: 1 November 2018
      • Revised: 1 October 2018
      • Received: 1 October 2017
      Published in tomm Volume 15, Issue 1s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!