skip to main content
research-article

Modeling Dyadic and Group Impressions with Intermodal and Interperson Features

Authors Info & Claims
Published:24 January 2019Publication History
Skip Abstract Section

Abstract

This article proposes a novel feature-extraction framework for inferring impression personality traits, emergent leadership skills, communicative competence, and hiring decisions. The proposed framework extracts multimodal features, describing each participant’s nonverbal activities. It captures intermodal and interperson relationships in interactions and captures how the target interactor generates nonverbal behavior when other interactors also generate nonverbal behavior. The intermodal and interperson patterns are identified as frequent co-occurring events based on clustering from multimodal sequences. The proposed framework is applied to the SONVB corpus, which is an audiovisual dataset collected from dyadic job interviews, and the ELEA audiovisual data corpus, which is a dataset collected from group meetings. We evaluate the framework on a binary classification task involving 15 impression variables from the two data corpora. The experimental results show that the model trained with co-occurrence features is more accurate than previous models for 14 out of 15 traits.

References

  1. X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. 2016. SALSA: A novel dataset for multimodal group behavior analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence 38, 8 (2016), 1707--1720.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Xavier Alameda-Pineda, Yan Yan, Elisa Ricci, Oswald Lanz, and Nicu Sebe. 2015. Analyzing free-standing conversational groups: A multimodal approach. In Proceedings of the 23rd ACM International Conference on Multimedia (MM’15). ACM, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. James F. Allen. 1983. Maintaining knowledge about temporal intervals. Commun. ACM 26, 11 (Nov. 1983), 832--843. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Oya Aran, Joan-Isaac Biel, and Daniel Gatica-Perez. 2014. Broadcasting oneself: Visual discovery of vlogging styles. IEEE Trans. Multimedia 16, 1 (2014), 201--215.Google ScholarGoogle ScholarCross RefCross Ref
  5. Oya Aran and Daniel Gatica-Perez. 2013. One of a kind: Inferring personality impressions in meetings. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI’13). ACM, 11--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Javed Aslam, Katya Pelekhov, and Daniela Rus. 1999. A practical clustering algorithm for static and dynamic information organization. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99). Society for Industrial and Applied Mathematics, 51--60. https://dl.acm.org/citation.cfm?id=314530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Umut Avci and Oya Aran. 2016. Predicting the performance in decision-making tasks: From individual cues to group interaction. IEEE Trans. Multimedia 18, 4 (2016), 643--658.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Batrinca, N. Mana, B. Lepri, N. Sebe, and F. Pianesi. 2016. Multimodal personality recognition in collaborative goal-oriented tasks. IEEE Trans. Multimedia 18, 4 (2016), 659--673.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. S. Beauchemin and J. L. Barron. 1995. The computation of optical flow. ACM Comput. Surv. 27, 3 (Sept. 1995), 433--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joan-Isaac Biel, Lucía Teijeiro-Mosquera, and Daniel Gatica-Perez. 2012. FaceTube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM, 53--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Moitreya Chatterjee, Sunghyun Park, Louis-Philippe Morency, and Stefan Scherer. 2015. Combining two perspectives on classifying multimodal data for recognizing speaker traits. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI’15). ACM, 7--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jared R. Curhan and Alex Pentland. 2007. Thin slices of negotiation: Predicting outcomes from conversational dynamics within the first 5 minutes.Journal of Applied Psychology 92, 3 (2007), 802.Google ScholarGoogle Scholar
  15. Daniel Gatica-Perez. 2009. Automatic nonverbal analysis of social interaction in small groups: A review. Image Vision Computing 27, 12 (Nov. 2009), 1775--1787. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Samuel D. Gosling, Peter J. Rentfrow, and William B. Swann. 2003. A very brief measure of the big-five personality domains. Journal of Research in Personality 37 (2003), 504--528.Google ScholarGoogle ScholarCross RefCross Ref
  17. Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google ScholarGoogle Scholar
  18. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321--377.Google ScholarGoogle ScholarCross RefCross Ref
  20. Hayley Hung and Ben Kröse. 2011. Detecting F-formations as dominant sets. In Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI’11). ACM, 231--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Dineshbabu Jayagopi, Dairazalia Sanchez-Cortes, Kazuhiro Otsuka, Junji Yamato, and Daniel Gatica-Perez. 2012. Linking speaking and looking behavior patterns with group composition, perception, and performance. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 433--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Oliver P. John and Sanjay Srivastava. 1999. The big five trait taxonomy: History, measurement, and theoretical perspectives. Handbook of Personality: Theory and Research 2, 1999 (1999), 102--138.Google ScholarGoogle Scholar
  23. V. Kumar, A. Namboodiri, and C. V. Jawahar. 2015. Visual phrases for exemplar face detection. In Proceedings of IEEE International Conference on Computer Vision (ICCV’15). IEEE, 1994--2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Yann LeCun and Yoshua Bengio. 1998. Convolutional Networks for Images, Speech, and Time Series. In The Handbook of Brain Theory and Neural Networks, Michael A. Arbib (Ed.). MIT Press, 255--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Anmol Madan, Ron Caneel, and Alex Pentland. 2004. Voices of attraction. In Proceedings of International Conference on Augmented Cognition. Lawrence Erlbaum Associates, Inc (Acquired by CRC Press, Taylor 8 Francis Group), 6 pages.Google ScholarGoogle Scholar
  26. Héctor P. Martínez and Georgios N. Yannakakis. 2011. Mining multimodal sequential patterns: A case study on affect detection. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’11). ACM, 3--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Héctor P. Martínez and Georgios N. Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI’14). ACM, 34--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Chreston Miller, Louis-Philippe Morency, and Francis Quek. 2012. Structural and temporal inference search (STIS): Pattern identification in multimodal data. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 101--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque. 2015. Automated prediction and analysis of job interview performance: The role of what you say and how you say it. In Proceedings of 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG’15), Vol. 1. IEEE, 1--6.Google ScholarGoogle Scholar
  30. Yukiko I. Nakano, Sakiko Nihonyanagi, Yutaka Takase, Yuki Hayashi, and Shogo Okada. 2015. Predicting participation styles using co-occurrence patterns of nonverbal behaviors in collaborative learning. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI’15). ACM, 91--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. S. Nguyen, D. Frauendorfer, M. S. Mast, and D. Gatica-Perez. 2014. Hire me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans. on Multimedia 16, 4 (2014), 1018--1031. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Laurent Nguyen, Jean-Marc Odobez, and Daniel Gatica-Perez. 2012. Using self-context for multimodal detection of head nods in face-to-face interactions. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 289--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Fumio Nihei, Yukiko I. Nakano, Yuki Hayashi, Hung-Hsuan Hung, and Shogo Okada. 2014. Predicting influential statements in group discussions using speech and head motion information. In Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI’14). ACM, 136--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jean-Marc Odobez and Patrick Bouthemy. 1995. Robust multi resolution estimation of parametric motion models. Journal of Visual Communication and Image Representation 6, 4 (1995), 348--365.Google ScholarGoogle ScholarCross RefCross Ref
  35. Shogo Okada, Oya Aran, and Daniel Gatica-Perez. 2015. Personality trait classification via co-occurrent multiparty multimodal event discovery. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI’15). ACM, 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Shogo Okada, Mi Hang, and Katsumi Nitta. 2016. Predicting performance of collaborative storytelling using multimodal analysis. IEICE Transactions 99-D, 6 (2016), 1462--1473.Google ScholarGoogle Scholar
  37. S. Park, S. Scherer, J. Gratch, P. J. Carnevale, and L. P. Morency. 2015. I can already guess your answer: Predicting respondent reactions during dyadic negotiation. IEEE Trans. on Affective Computing 6, 2 (2015), 86--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Fabio Pianesi, Nadia Mana, Alessandro Cappelletti, Bruno Lepri, and Massimo Zancanaro. 2008. Multimodal recognition of personality traits in social interactions. In Proceedings of the 10th ACM International Conference on Multimodal Interfaces (ICMI’08). ACM, 53--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Qian, H. Wang, Y. Zhao, X. Hou, R. Hong, M. Wang, and Y. Y. Tang. 2017. Image location inference by multisaliency enhancement. IEEE Trans. on Multimedia 19, 4 (2017), 813--821. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rutger Rienks and Dirk Heylen. 2006. Dominance detection in meetings using easily obtainable features. In Proceedings of the 2nd International Conference on Machine Learning for Multimodal Interaction (MLMI’05). Springer, 76--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Dairazalia Sanchez-Cortes, Oya Aran, Dinesh Babu Jayagopi, Marianne Schmid Mast, and Daniel Gatica-Perez. 2013. Emergent leaders through looking and speaking: From audio-visual data to multimodal recognition. Journal on Multimodal User Interfaces 7, 1--2 (2013), 39--53.Google ScholarGoogle Scholar
  42. Dairazalia Sanchez-Cortes, Oya Aran, Marianne Schmid Mast, and Daniel Gatica-Perez. 2012. A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14, 3 (2012), 816--832. Google ScholarGoogle ScholarCross RefCross Ref
  43. S. Scherer, G. Stratou, M. Mahmoud, J. Boberg, J. Gratch, A. Rizzo, and L. P. Morency. 2013. Automatic behavior descriptors for psychological disorder analysis. In Proceedings of 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG’13). IEEE, 1--8.Google ScholarGoogle Scholar
  44. Yale Song, Louis-Philippe Morency, and Randall Davis. 2012. Multimodal human behavior analysis: Learning correlation and interaction across modalities. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 27--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. L. Teijeiro-Mosquera, J. I. Biel, J. L. Alba-Castro, and D. Gatica-Perez. 2015. What your face vlogs about: Expressions of emotion and big-five traits impressions in YouTube. IEEE Trans. Affective Computing 6, 2 (2015), 193--205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Alireza Vahdatpour, Navid Amini, and Majid Sarrafzadeh. 2009. Toward unsupervised activity discovery using multi-dimensional motif detection in time series. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). Morgan Kaufmann Publishers Inc., 1261--1266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Alessandro Vinciarelli. 2007. Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Trans. Multimedia 9, 6 (2007), 1215--1226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. 2009. Canal9: A database of political debates for analysis of social interactions. In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops. IEEE, 1--4. https://ieeexplore.ieee.org/document/5349466.Google ScholarGoogle Scholar
  49. Zhong Wu, Qifa Ke, M. Isard, and Jian Sun. 2009. Bundling features for large scale partial-duplicate web image search. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 25--32.Google ScholarGoogle Scholar
  50. Jian Bo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiao Li Li, and Shonali Krishnaswamy. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 3995--4001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. X. Yang, X. Qian, and Y. Xue. 2015. Scalable mobile image retrieval by exploring contextual saliency. IEEE Trans. on Image Processing 24, 6 (2015), 1709--1721.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Massimo Zancanaro, Bruno Lepri, and Fabio Pianesi. 2006. Automatic detection of group functional roles in face to face interactions. In Proceedings of the 8th ACM International Conference on Multimodal Interfaces (ICMI’06). ACM, 28--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S. Zhang, Q. Tian, G. Hua, Q. Huang, and W. Gao. 2011. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Trans. Image Processing 20, 9 (2011), 2664--2677. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Shiliang Zhang, Ming Yang, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2015. Semantic-aware co-indexing for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 37, 12 (2015), 2573--2587. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Guoshuai Zhao, Xueming Qian, Xiaojiang Lei, and Tao Mei. 2016. Service quality evaluation by exploring social users’ contextual information. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3382--3394. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling Dyadic and Group Impressions with Intermodal and Interperson Features

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1s
          Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
          January 2019
          265 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3309769
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 24 January 2019
          • Revised: 1 August 2018
          • Accepted: 1 August 2018
          • Received: 1 October 2017
          Published in tomm Volume 15, Issue 1s

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!