Expanding the Role of Affective Phenomena in Multimodal Interaction Research

In recent decades, the field of affective computing has made substantial progress in advancing the ability of AI systems to recognize and express affective phenomena, such as affect and emotions, during human-human and human-machine interactions. This paper describes our examination of research at the intersection of multimodal interaction and affective computing, with the objective of observing trends and identifying understudied areas. We examined over 16,000 papers from selected conferences in multimodal interaction, affective computing, and natural language processing: ACM International Conference on Multimodal Interaction, AAAC International Conference on Affective Computing and Intelligent Interaction, Annual Meeting of the Association for Computational Linguistics, and Conference on Empirical Methods in Natural Language Processing. We identified 910 affect-related papers and present our analysis of the role of affective phenomena in these papers. We find that this body of research has primarily focused on enabling machines to recognize or express affect and emotion; there has been limited research on how affect and emotion predictions might, in turn, be used by AI systems to enhance machine understanding of human social behaviors and cognitive states. Based on our analysis, we discuss directions to expand the role of affective phenomena in multimodal interaction research.


INTRODUCTION
In recent decades, research in psychology and neuroscience has highlighted the importance of affective phenomena in understanding, explaining, and predicting how humans behave and think during real-world social interactions [20,21].This body of research has demonstrated explanatory relationships among affective phenomena (e.g., affect and emotion), cognitive processes (e.g,.memory, attention, perception, and decision-making), and behavioral processes (e.g., habits, adaptation, stimulus-response actions) [20].In addition, affective states have been shown to regulate the dynamics of human social behaviors (e.g., communicative social signals) and cognitive states (e.g., attitudes) during social interactions [15,29,45].
In parallel to the aforementioned progress in the affective sciences, recent decades of computer science research have laid foundations in affective computing [19,49,52], with substantial progress in advancing the ability of AI systems to estimate affective phenomena in humans.After affective phenomena have been predicted by an AI system, we believe those predictions can be used to enhance the system's understanding of human social behaviors and cognitive states, towards more socially-intelligent AI.We were, therefore, motivated to explore the question: How, and to what extent, have affective phenomena been used by AI systems in multimodal interaction research to enhance machine understanding of human social behaviors and cognitive states?
To begin answering this question, we scoped a study to examine trends in how multimodal interaction research has treated the role of affect and emotion in over 16,000 papers selected from premier conferences that represent communities in multimodal interaction, affective computing, and natural language processing (NLP): ACM International Conference on Multimodal Interaction (ICMI), AAAC International Conference on Affective Computing and Intelligent Interaction (ACII), Annual Meeting of the Association for Computational Linguistics (ACL), and Conference on Empirical Methods in Natural Language Processing (EMNLP).
Our paper makes four contributions.First, we identify 910 papers related to affect and emotion from past decades of proceedings at ICMI, ACII, ACL, and EMNLP and categorize the role of affect and emotion in these papers.Second, we quantify the extent to which affect and emotion have been used by AI systems to enhance AI system understanding of human social behaviors and cognitive states in these 910 papers.Third, we analyze trends in how affect and emotion have been used to enhance machine understanding of human social behaviors and cognitive states.Fourth, based on our analysis, we offer insights into future directions to expand the role of affect and emotion in multimodal interaction research.

AFFECTIVE PHENOMENA, SOCIAL BEHAVIORS, AND COGNITIVE STATES
This section provides a brief overview of relationships among affective phenomena, social behaviors, and cognitive states.This is a growing research area in psychology and neuroscience [20].Findings suggest that affective phenomena (e.g., affect, emotion) drive human social behavior and social cognition [1,5,31,47].
Affective processes can influence how people remember social information [39], make decisions [21,30], and perceive others during interpersonal social interactions; for example, a person's affective state before an interpersonal interaction can influence whether or not they end up liking or favourably judging another person (the reinforcement-affect model) [12,14].Prominent models backed by empirical findings include the affect-as-information theory (the perspective that humans directly query their affective state when making judgements) [13,16], the affect priming theory (the perspective that affect primes connections across concepts during reasoning and choice of behaviors in social situations) [34], and the affect infusion model (defines social contexts in which affect influences the choice of social behavior) [22].These relationships among affective phenomena, social behaviors, and cognitive states can be leveraged by AI systems during real-world interactions.As conceptualized in Figure 1, an AI system that uses behavioral cues to predict a human's affective state can then use those predictions to model the human's social behaviors and cognitive states.

SELECTING AND CATEGORIZING PAPERS
This study was designed to capture and analyze broad trends in premier conference venues, selected to reflect communities in multimodal interaction (ICMI), affective computing (ACII), and NLP (ACL and EMNLP).We note that our choice to include NLP venues in our analysis is due to the dialogue, discourse, and interaction research in this community.Our inclusion of ACL also encompassed prior papers at regional ACL conferences NAACL, EACL, AACL.Data were accessed from the available online conference proceedings: ICMI proceedings began from 2002 1  We applied an initial filter to select papers in which the title or abstract contained at least one of the following keywords: affective, affect-aware, valence, arousal, positive affect, negative affect, emotion, emotions, emotion-aware, emotional.We chose this approach to filter our initial set of papers on the assumption that papers disseminating findings applicable to affective phenomena during social interactions will include at least one of these terms in the title or abstract; filtering papers with this approach should effectively capture affect-related research (papers that address affect and emotion) for our further analysis.This yielded a total of 910 papers (129 ICMI papers, 547 ACII papers, 234 ACL/EMNLP papers).
We examined the 910 papers and categorized them into the following 7 groups, based on the primary focus of each paper in its treatment of affect and emotion.
(1) Recognizing Affect and Emotion: This category includes papers on modeling efforts to predict affect and emotion.For example, a paper that proposes a method to predict valence and arousal labels for speakers in a video dataset would be in this category.
(2) Expressing Affect and Emotion: This category includes papers that focus on techniques to enable virtual and embodied AI agents to express affect and emotion.For example, a paper that proposes a method to express facial emotions in virtual human avatars would be in this category.
(3) Recognizing and Expressing Affect and Emotion: This category includes papers that perform both recognition and expression of affect and emotion, warranting a separate category.For example, a paper that proposes a method to recognize emotional states in a human speaker and uses that method to inform a virtual avatar's expressed emotion would be in this category.
(4) Using Affect and Emotion for Enhanced Machine Understanding of Human Social Behaviors and Cognitive States: This category includes papers that explore the role of affect and emotion to enhance machine understanding of human social behaviors and cognitive states during interactions.For example, a paper that uses the outputs of an affect prediction model to predict human social behaviors would be in this category.
5) Affect and Emotion Frameworks and Analysis: This category includes conceptual work and psychology studies of humans during interactions.For example, a paper analyzes students' emotions while playing a game would be in this category.
(6) Tools, Interfaces, and Datasets: This category includes papers that discuss data collection tools, papers on interfaces for facilitating interactions, and papers that introduce datasets.
(7) Miscellaneous: This category includes papers that did not fit into prior categories.For example, a paper that focused on techniques for video retrieval and emphasized that the video's social context involved emotion, would be in this category.

DISTRIBUTION OF RESEARCH FOCUS IN THE AFFECT-RELATED PAPERS
All 910 affect-related papers were published between 1994-2022.
The number of these papers published across time at the studied venues is visualized in Figure 2. We observe a substantial increase in the number of papers during the past decade at ICMI and ACII, and during the past 5 years at ACL/EMNLP.We note that this increased interest in studying affective phenomena complements the acceleration of research activity in affective phenomena across psychology, neuroscience, humanities, and social sciences [20].
The distribution of research focus in the 910 papers is visualized in Figure 3.We find that across these papers, the large proportion focused on techniques for machines to recognize and express affect and emotion (41% and 9%, respectively); an additional 1% focused on both these challenges.Only 6% (52 papers) discussed research that investigated using affect and emotion to enhance machine understanding of human social behaviors and cognitive states.We analyze this subset of 52 papers in Section 5 for insights about this understudied area.Among the remaining papers, 17% focused on affect-related frameworks and analysis, 9% on new tools, interfaces, and datasets, and 17% on miscellaneous topics (e.g., video retrieval in papers that happened to mention emotion).

USE OF AFFECT AND EMOTION FOR ENHANCED MACHINE UNDERSTANDING
We analyzed the 52 papers that used affect and emotion to enhance machine understanding of social behaviors and cognitive states.We find that these papers used affective phenomena in three primary ways: as a feature, in an auxiliary task, and as a latent state.All 52 papers were published between 2009-2022; 54% of them were published in the last 4 years.The accelerating increase in papers on this topic is visualized in Figure 4.This trend demonstrates a growing interest across the multimodal interaction, affective computing, and NLP communities in this understudied research area.We observed a steady increase in papers that used affective phenomena as a feature and a slower increase in papers that used affective phenomena as a latent state.In the past 5 years, we also observed a sharp increase in the number of papers that used affective phenomena in an auxiliary task.We further examine these categories.
Affective Phenomena as Features: 29 of the 52 papers used affective phenomena as features to predict human social behaviors and cognitive states.In these papers, affective phenomena were used to predict the following social behaviors and social signal dynamics in human-human and human-machine interaction: head nods [37], humor [73], idiom and metaphor expression [32,51], noncooperative behavior [68], deception [42], fake communication [9], self-disclosure [3], intimacy [43], dialogue acts [7], and negotiation dialogue dynamics [24].In these papers, affective phenomena were also used to predict cognitive states such as personality traits [6,66], working memory [23], cognitive task performance (especially tasks that require high cognitive-overload) [33], and perceptions of other individuals [44,65].We find that the main application domains were healthcare and education.The papers with healthcare applications used affective phenomena to predict depression severity [63], suicidal ideation [59][60][61]), schizophrenic behavior [27], patient satisfaction with doctor communication styles [64], and multimodal distress assessment in patients during dyadic patient-clinician interactions, where affective context of clinician questions was taken into account [25].The papers with education applications used affective phenomena to predict children's pronunciation ability in an educational reading context [67] and cognitive strategies during learning [17]).Additional papers focused on using affective features to predict presentation proficiency [55], speaker reliability [50], and other affective states [53].Affective Phenomena in an Auxiliary Task: 13 out of the 52 papers used affective phenomena in auxiliary modeling tasks (e.g., emotion prediction) to improve performance in downstream tasks predicting human social behaviors and cognitive states.10 of these 13 papers were from NLP venues.Affective phenomena were used in auxiliary tasks during pretraining, multi-task learning, and fine-tuning; models with these auxiliary tasks achieved improved performance in predicting the following states: stress [69,74], dialogue acts, [57,58,62], abusive behavior (e.g., harassment) [54], stance [76], sarcasm, [8], metaphor expression [18], group cohesion [41], rhetorical behavior (e.g., critical, discriminative, supportive rhetoric) [28], formality [10], frustration [10], and politeness [10].We also found one paper that used multi-task learning for both emotion shift prediction and dialogue act recognition, to improve the emotion recognition in multi-party conversations [56].
Affective Phenomena as Latent States: 10 of these 52 papers treated affective phenomena as a latent state in models of human social behaviors and cognitive states.Several papers modeled group interaction dynamics by treating affective phenomena as latent states.Affective information was used as a latent variable in a partially observable Markov decision process to model dyadic human interactions, with applications in tutoring systems [26].Other papers defined interpersonal emotion networks as graphs capturing relationship dynamics in multi-party settings [35] and modeled emotion under the premise of emotion states modulating dyadic human behavior [72].Additional papers viewed affective phenomena as latent states in models for personality [70], moral conflicts [38], creative performance [46], human sarcasm perception [48], toxicity perception [36], and decision-making [4], as well as engagement, interactivity, impatience, reflectivity, and cognitive learning outcomes during online education [2].
The 52 papers are listed in the Appendix Table 1.

SUMMARY AND FUTURE DIRECTIONS
In this paper, we explore how, and to what extent, research at the intersection of multimodal interaction and affective computing has treated the role of affective phenomena in AI systems.In our sample of over 16,000 papers from ICMI, ACII, ACL, and EMNLP, we identify 910 papers related to affective phenomena (affect and emotion) and find that this body of research has primarily focused on enabling machines to recognize and express affect and emotion.We find that the use of affect and emotion to enhance machine understanding of human social behaviors and cognitive states has been understudied (52 of the 910 papers); this role is visualized in Figure 1.However, we observe an emerging interest in this direction (Figure 4).From our analysis, we offer insights into future directions to expand the role of affective phenomena in AI systems.
(1) Expanding the Roles of Affective Phenomena in Multimodal Interaction Models: We identify an emerging area of research using affective phenomena as features, in auxiliary tasks, and as latent states to improve downstream models of social behaviors and cognitive states.We acknowledge that there may be a selection bias influencing the publication of papers with positive results.We recommend that future research efforts replicate and validate past empirical findings on the usefulness of affective phenomena in these modeling contexts.From our study, we found that 10 of the 13 papers that used affective phenomena in auxiliary tasks were from experiments in unimodal text-only settings.We recommend that future research efforts explore affective signals in auxiliary tasks to include affective context in multimodal models of human social behaviors and cognitive states during different stages of the modeling process (e.g., pretraining, fine-tuning, co-learning [75]).In addition, we suggest that future multimodal interaction research explore the inclusion of explicit and implicit affective signals as rewards under the reinforcement learning from human feedback framework [11,40]), to adapt and expand AI system understanding of human social behaviors and cognitive states.
(2) Cognitively and Neurally-Inspired Models that Reflect the Complex Interaction of Affect, Behavior, and Cognition: The use of affective phenomena in AI models of human social behaviors and cognitive states can be motivated by existing relationships among these three constructs in humans [1,5,20,31,47] (overview in Section 2).The existence of these theorized and empirically-validated relationships in psychology and neuroscience (e.g., affect-as-information, affect priming, and affect infusion theories [13,16,22,34]) have the potential to inform the science of multimodal interaction research.We, therefore, recommend that future research efforts explore how to leverage these theories to build computational models that reflect the complex interaction among these three constructs in humans.We believe that cognitively and neurally-inspired models have the potential to advance the ability of AI systems to use affective phenomena in order to better understand human social behaviors and cognitive states.
(3) Expanding Multimodal Social Interaction Contexts: We find that a majority of the papers in our sample focus on monadic contexts, with a limited focus on challenges present in dyadic and multi-party contexts.We recommend that future research efforts explore how AI systems can better integrate affective state predictions from multiple people and multiple contexts to inform models of social behaviors and cognitive states in group-level dynamics.For example, how can an AI system integrate its affect predictions of both people in a dyad in order to predict their synchrony (grouplevel behavior) [71]?How can affective state predictions of one person in a multi-party interaction inform an AI system's understanding of behavior and cognition in other participants?
(4) Expanding Application Areas: We find that several of the papers successfully used affective phenomena to predict social behaviors and cognitive states in the domains of healthcare (e.g., modeling depression, doctor-patient communication dynamics) [25, 59-61, 63, 64, 69, 74] and education (e.g., modeling student learning dynamics) [2,17,67].We recommend that future research efforts further explore how affective phenomena can be used as features, auxiliary tasks, and latent states to improve AI systems that support human health, education, and well-being through applications and empirical validation in additional populations and social contexts.Since it is possible that affective phenomena might not be useful to inform models in all settings, we recommend that future research explore techniques to enable AI systems to efficiently estimate when they need to query affective state information from their environment in order to improve their understanding of the social behaviors and cognitive states of the people around them.

Figure 1 :
Figure 1: Conceptualization of use of an AI system to predict affect and emotion in a human and, then, use those predictions in models of human social behaviors and cognitive states.

Figure 3 :
Figure 3: Distribution of research focus in affect-related papers.

Figure 4 :
Figure 4: Accumulation of papers that used affective phenomena for enhanced machine understanding, split across use as a feature, in auxiliary tasks, and as a latent state.

Table 1 :
The 52 papers that use affect or emotion to enhance machine understanding of social behaviors and cognitive states.

Table 1 :
(continued) The 52 papers that use affect or emotion to enhance machine understanding of social behaviors and cognitive states.