Abstract
This article proposes a novel feature-extraction framework for inferring impression personality traits, emergent leadership skills, communicative competence, and hiring decisions. The proposed framework extracts multimodal features, describing each participant’s nonverbal activities. It captures intermodal and interperson relationships in interactions and captures how the target interactor generates nonverbal behavior when other interactors also generate nonverbal behavior. The intermodal and interperson patterns are identified as frequent co-occurring events based on clustering from multimodal sequences. The proposed framework is applied to the SONVB corpus, which is an audiovisual dataset collected from dyadic job interviews, and the ELEA audiovisual data corpus, which is a dataset collected from group meetings. We evaluate the framework on a binary classification task involving 15 impression variables from the two data corpora. The experimental results show that the model trained with co-occurrence features is more accurate than previous models for 14 out of 15 traits.
- X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. 2016. SALSA: A novel dataset for multimodal group behavior analysis. IEEE Trans. on Pattern Analysis and Machine Intelligence 38, 8 (2016), 1707--1720.Google Scholar
Digital Library
- Xavier Alameda-Pineda, Yan Yan, Elisa Ricci, Oswald Lanz, and Nicu Sebe. 2015. Analyzing free-standing conversational groups: A multimodal approach. In Proceedings of the 23rd ACM International Conference on Multimedia (MM’15). ACM, 5--14. Google Scholar
Digital Library
- James F. Allen. 1983. Maintaining knowledge about temporal intervals. Commun. ACM 26, 11 (Nov. 1983), 832--843. Google Scholar
Digital Library
- Oya Aran, Joan-Isaac Biel, and Daniel Gatica-Perez. 2014. Broadcasting oneself: Visual discovery of vlogging styles. IEEE Trans. Multimedia 16, 1 (2014), 201--215.Google Scholar
Cross Ref
- Oya Aran and Daniel Gatica-Perez. 2013. One of a kind: Inferring personality impressions in meetings. In Proceedings of the 15th ACM International Conference on Multimodal Interaction (ICMI’13). ACM, 11--18. Google Scholar
Digital Library
- Javed Aslam, Katya Pelekhov, and Daniela Rus. 1999. A practical clustering algorithm for static and dynamic information organization. In Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’99). Society for Industrial and Applied Mathematics, 51--60. https://dl.acm.org/citation.cfm?id=314530. Google Scholar
Digital Library
- Umut Avci and Oya Aran. 2016. Predicting the performance in decision-making tasks: From individual cues to group interaction. IEEE Trans. Multimedia 18, 4 (2016), 643--658.Google Scholar
Digital Library
- L. Batrinca, N. Mana, B. Lepri, N. Sebe, and F. Pianesi. 2016. Multimodal personality recognition in collaborative goal-oriented tasks. IEEE Trans. Multimedia 18, 4 (2016), 659--673.Google Scholar
Digital Library
- S. S. Beauchemin and J. L. Barron. 1995. The computation of optical flow. ACM Comput. Surv. 27, 3 (Sept. 1995), 433--466. Google Scholar
Digital Library
- Joan-Isaac Biel, Lucía Teijeiro-Mosquera, and Daniel Gatica-Perez. 2012. FaceTube: Predicting personality from facial expressions of emotion in online conversational video. In Proceedings of the 14th ACM International Conference on Multimodal Interaction. ACM, 53--56. Google Scholar
Digital Library
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research 3, Jan (2003), 993--1022. Google Scholar
Digital Library
- Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5--32. Google Scholar
Digital Library
- Moitreya Chatterjee, Sunghyun Park, Louis-Philippe Morency, and Stefan Scherer. 2015. Combining two perspectives on classifying multimodal data for recognizing speaker traits. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI’15). ACM, 7--14. Google Scholar
Digital Library
- Jared R. Curhan and Alex Pentland. 2007. Thin slices of negotiation: Predicting outcomes from conversational dynamics within the first 5 minutes.Journal of Applied Psychology 92, 3 (2007), 802.Google Scholar
- Daniel Gatica-Perez. 2009. Automatic nonverbal analysis of social interaction in small groups: A review. Image Vision Computing 27, 12 (Nov. 2009), 1775--1787. Google Scholar
Digital Library
- Samuel D. Gosling, Peter J. Rentfrow, and William B. Swann. 2003. A very brief measure of the big-five personality domains. Journal of Research in Personality 37 (2003), 504--528.Google Scholar
Cross Ref
- Geoffrey E. Hinton and Ruslan R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780. Google Scholar
Digital Library
- Harold Hotelling. 1936. Relations between two sets of variates. Biometrika 28, 3/4 (1936), 321--377.Google Scholar
Cross Ref
- Hayley Hung and Ben Kröse. 2011. Detecting F-formations as dominant sets. In Proceedings of the 13th International Conference on Multimodal Interfaces (ICMI’11). ACM, 231--238. Google Scholar
Digital Library
- Dineshbabu Jayagopi, Dairazalia Sanchez-Cortes, Kazuhiro Otsuka, Junji Yamato, and Daniel Gatica-Perez. 2012. Linking speaking and looking behavior patterns with group composition, perception, and performance. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 433--440. Google Scholar
Digital Library
- Oliver P. John and Sanjay Srivastava. 1999. The big five trait taxonomy: History, measurement, and theoretical perspectives. Handbook of Personality: Theory and Research 2, 1999 (1999), 102--138.Google Scholar
- V. Kumar, A. Namboodiri, and C. V. Jawahar. 2015. Visual phrases for exemplar face detection. In Proceedings of IEEE International Conference on Computer Vision (ICCV’15). IEEE, 1994--2002. Google Scholar
Digital Library
- Yann LeCun and Yoshua Bengio. 1998. Convolutional Networks for Images, Speech, and Time Series. In The Handbook of Brain Theory and Neural Networks, Michael A. Arbib (Ed.). MIT Press, 255--258. Google Scholar
Digital Library
- Anmol Madan, Ron Caneel, and Alex Pentland. 2004. Voices of attraction. In Proceedings of International Conference on Augmented Cognition. Lawrence Erlbaum Associates, Inc (Acquired by CRC Press, Taylor 8 Francis Group), 6 pages.Google Scholar
- Héctor P. Martínez and Georgios N. Yannakakis. 2011. Mining multimodal sequential patterns: A case study on affect detection. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI’11). ACM, 3--10. Google Scholar
Digital Library
- Héctor P. Martínez and Georgios N. Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI’14). ACM, 34--41. Google Scholar
Digital Library
- Chreston Miller, Louis-Philippe Morency, and Francis Quek. 2012. Structural and temporal inference search (STIS): Pattern identification in multimodal data. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 101--108. Google Scholar
Digital Library
- I. Naim, M. I. Tanveer, D. Gildea, and M. E. Hoque. 2015. Automated prediction and analysis of job interview performance: The role of what you say and how you say it. In Proceedings of 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG’15), Vol. 1. IEEE, 1--6.Google Scholar
- Yukiko I. Nakano, Sakiko Nihonyanagi, Yutaka Takase, Yuki Hayashi, and Shogo Okada. 2015. Predicting participation styles using co-occurrence patterns of nonverbal behaviors in collaborative learning. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI’15). ACM, 91--98. Google Scholar
Digital Library
- L. S. Nguyen, D. Frauendorfer, M. S. Mast, and D. Gatica-Perez. 2014. Hire me: Computational inference of hirability in employment interviews based on nonverbal behavior. IEEE Trans. on Multimedia 16, 4 (2014), 1018--1031. Google Scholar
Digital Library
- Laurent Nguyen, Jean-Marc Odobez, and Daniel Gatica-Perez. 2012. Using self-context for multimodal detection of head nods in face-to-face interactions. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 289--292. Google Scholar
Digital Library
- Fumio Nihei, Yukiko I. Nakano, Yuki Hayashi, Hung-Hsuan Hung, and Shogo Okada. 2014. Predicting influential statements in group discussions using speech and head motion information. In Proceedings of the 16th ACM International Conference on Multimodal Interaction (ICMI’14). ACM, 136--143. Google Scholar
Digital Library
- Jean-Marc Odobez and Patrick Bouthemy. 1995. Robust multi resolution estimation of parametric motion models. Journal of Visual Communication and Image Representation 6, 4 (1995), 348--365.Google Scholar
Cross Ref
- Shogo Okada, Oya Aran, and Daniel Gatica-Perez. 2015. Personality trait classification via co-occurrent multiparty multimodal event discovery. In Proceedings of the 2015 ACM International Conference on Multimodal Interaction (ICMI’15). ACM, 15--22. Google Scholar
Digital Library
- Shogo Okada, Mi Hang, and Katsumi Nitta. 2016. Predicting performance of collaborative storytelling using multimodal analysis. IEICE Transactions 99-D, 6 (2016), 1462--1473.Google Scholar
- S. Park, S. Scherer, J. Gratch, P. J. Carnevale, and L. P. Morency. 2015. I can already guess your answer: Predicting respondent reactions during dyadic negotiation. IEEE Trans. on Affective Computing 6, 2 (2015), 86--96.Google Scholar
Digital Library
- Fabio Pianesi, Nadia Mana, Alessandro Cappelletti, Bruno Lepri, and Massimo Zancanaro. 2008. Multimodal recognition of personality traits in social interactions. In Proceedings of the 10th ACM International Conference on Multimodal Interfaces (ICMI’08). ACM, 53--60. Google Scholar
Digital Library
- X. Qian, H. Wang, Y. Zhao, X. Hou, R. Hong, M. Wang, and Y. Y. Tang. 2017. Image location inference by multisaliency enhancement. IEEE Trans. on Multimedia 19, 4 (2017), 813--821. Google Scholar
Digital Library
- Rutger Rienks and Dirk Heylen. 2006. Dominance detection in meetings using easily obtainable features. In Proceedings of the 2nd International Conference on Machine Learning for Multimodal Interaction (MLMI’05). Springer, 76--86. Google Scholar
Digital Library
- Dairazalia Sanchez-Cortes, Oya Aran, Dinesh Babu Jayagopi, Marianne Schmid Mast, and Daniel Gatica-Perez. 2013. Emergent leaders through looking and speaking: From audio-visual data to multimodal recognition. Journal on Multimodal User Interfaces 7, 1--2 (2013), 39--53.Google Scholar
- Dairazalia Sanchez-Cortes, Oya Aran, Marianne Schmid Mast, and Daniel Gatica-Perez. 2012. A nonverbal behavior approach to identify emergent leaders in small groups. IEEE Trans. Multimedia 14, 3 (2012), 816--832. Google Scholar
Cross Ref
- S. Scherer, G. Stratou, M. Mahmoud, J. Boberg, J. Gratch, A. Rizzo, and L. P. Morency. 2013. Automatic behavior descriptors for psychological disorder analysis. In Proceedings of 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG’13). IEEE, 1--8.Google Scholar
- Yale Song, Louis-Philippe Morency, and Randall Davis. 2012. Multimodal human behavior analysis: Learning correlation and interaction across modalities. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (ICMI’12). ACM, 27--30. Google Scholar
Digital Library
- L. Teijeiro-Mosquera, J. I. Biel, J. L. Alba-Castro, and D. Gatica-Perez. 2015. What your face vlogs about: Expressions of emotion and big-five traits impressions in YouTube. IEEE Trans. Affective Computing 6, 2 (2015), 193--205.Google Scholar
Digital Library
- Alireza Vahdatpour, Navid Amini, and Majid Sarrafzadeh. 2009. Toward unsupervised activity discovery using multi-dimensional motif detection in time series. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI’09). Morgan Kaufmann Publishers Inc., 1261--1266. Google Scholar
Digital Library
- Alessandro Vinciarelli. 2007. Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling. IEEE Trans. Multimedia 9, 6 (2007), 1215--1226. Google Scholar
Digital Library
- A. Vinciarelli, A. Dielmann, S. Favre, and H. Salamin. 2009. Canal9: A database of political debates for analysis of social interactions. In Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops. IEEE, 1--4. https://ieeexplore.ieee.org/document/5349466.Google Scholar
- Zhong Wu, Qifa Ke, M. Isard, and Jian Sun. 2009. Bundling features for large scale partial-duplicate web image search. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 25--32.Google Scholar
- Jian Bo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiao Li Li, and Shonali Krishnaswamy. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI’15). AAAI Press, 3995--4001. Google Scholar
Digital Library
- X. Yang, X. Qian, and Y. Xue. 2015. Scalable mobile image retrieval by exploring contextual saliency. IEEE Trans. on Image Processing 24, 6 (2015), 1709--1721.Google Scholar
Digital Library
- Massimo Zancanaro, Bruno Lepri, and Fabio Pianesi. 2006. Automatic detection of group functional roles in face to face interactions. In Proceedings of the 8th ACM International Conference on Multimodal Interfaces (ICMI’06). ACM, 28--34. Google Scholar
Digital Library
- S. Zhang, Q. Tian, G. Hua, Q. Huang, and W. Gao. 2011. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Trans. Image Processing 20, 9 (2011), 2664--2677. Google Scholar
Digital Library
- Shiliang Zhang, Ming Yang, Xiaoyu Wang, Yuanqing Lin, and Qi Tian. 2015. Semantic-aware co-indexing for image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 37, 12 (2015), 2573--2587. Google Scholar
Digital Library
- Guoshuai Zhao, Xueming Qian, Xiaojiang Lei, and Tao Mei. 2016. Service quality evaluation by exploring social users’ contextual information. IEEE Trans. Knowl. Data Eng. 28, 12 (2016), 3382--3394. Google Scholar
Digital Library
Index Terms
Modeling Dyadic and Group Impressions with Intermodal and Interperson Features
Recommendations
Personality Trait Classification via Co-Occurrent Multiparty Multimodal Event Discovery
ICMI '15: Proceedings of the 2015 ACM on International Conference on Multimodal InteractionThis paper proposes a novel feature extraction framework from mutli-party multimodal conversation for inference of personality traits and emergent leadership. The proposed framework represents multi modal features as the combination of each participant'...
The catchment feature model: a device for multimodal fusion and a bridge between signal and sense
The catchment feature model addresses two questions in the field of multimodal interaction: how we bridge video and audio processing with the realities of human multimodal communication, and how information from the different modes may be fused. We ...
Towards a computational model for first impressions generation
ICMI '17: Proceedings of the 19th ACM International Conference on Multimodal InteractionThis paper presents a plan towards a computational model of first impressions generation and its integration in an embodied conversational agent (ECA). The goal is to endow an ECA with the ability to manage the impressions elicited in the user by ...






Comments