Simulating Emotions With an Integrated Computational Model of Appraisal and Reinforcement Learning

Predicting users’ emotional states during interaction is a long-standing goal of affective computing. However, traditional methods based on sensory data alone fall short due to the interplay between users’ latent cognitive states and emotional responses. To address this, we introduce a computational cognitive model that simulates emotion as a continuous process, rather than a static state, during interactive episodes. This model integrates cognitive-emotional appraisal mechanisms with computational rationality, utilizing value predictions from reinforcement learning. Experiments with human participants demonstrate the model’s ability to predict and explain the emergence of emotions such as happiness, boredom, and irritation during interactions. Our approach opens the possibility of designing interactive systems that adapt to users’ emotional states, thereby improving user experience and engagement. This work also deepens our understanding of the potential of modeling the relationship between reward processing, reinforcement learning, goal-directed behavior, and appraisal.


INTRODUCTION
Emotions have a signifcant infuence on interpersonal dynamics and outcomes in daily interactions.Similar efects are also present in human-computer interaction (HCI) [3], where users exhibit emotions akin to face-to-face interactions [46].Consequently, emotions shape perceptions of interactive systems and impact the success of interactions [5,11,13].It is therefore a long-standing goal of HCI to understand and predict a user's emotions.This is a challenging problem because while humans have an innate ability to recognize emotions in others, and make inferences and reason about them [38], computers lack this capacity.They require an explicit emotion model in order to make sense of and adapt to users' emotions.
Many attempts to enhance computers' emotion detection focus on analyzing psychophysiological signals stemming from the user's autonomic nervous system [18,19,34].However, the challenge of automated emotion detection is difcult due to the interplay between emotions and cognition [2,41].Cognitive processes are unobservable, limiting machines to interpreting emotions based on observable behavior and physiological changes.Yet, if humans can deduce emotions from minimal observations, why can't machines?This paper posits that discerning user emotions requires a theory bridging cognition and emotions.Models accomplishing this implement psychological theories of human cognitive-emotional processes, aiming to deduce emotions from sparse data using modelinformed biases [12,28,43,45].While several such models have emerged recently, their integration into HCI remains limited.
The main contribution of our paper is the adaptation of the temporal diference reinforcement learning model of appraisal [54] to HCI.We assess the model's predictive capabilities in an interactive task, and expand it to capture the dynamic nature of emotions during interactions.The model's key innovation is merging a reward processing mechanism with appraisal theory, using a unifed reinforcement learning (RL) framework.Yet, it hasn't so far been adapted to interactive tasks, nor assessed with any real-life tasks involving human emotions.In this study, our focus is on examining and modeling three emotions: happiness, boredom, and irritation.These were selected due to their frequent occurrence in HCI and their substantial impact on user behavior, engagement, and the overall user experience [25].The selected emotions represent a spectrum from positive (happiness), via neutral (boredom), to negative (irritation) [20,21,26].
Figure 1 illustrates the model at work: 'Lucy' strives to achieve her objectives in an interactive task.Each progressive step elicits positive feedback, leading to positive value estimates.As the task advances, Lucy gains confdence in her goal attainment.When prompted about her emotions after the task, she expresses happiness, but also a hint of boredom due to the task's simplicity.She doesn't feel frustrated.The bar graphs in Figure 1 display human self-report results from a relatively straightforward, rewarding task alongside the model's predictions.The alignment between the two stems from a computational cognitive emotion model that estimates the user's likely emotions given interactive events during the task.Predictions are made by applying a computationally grounded cognitive model, not by observing physiological signals or learning a model from human responses.
Existing computational cognitive emotion models fall short in predicting the scenarios we present here, primarily because they do not incorporate a simulation of an autonomous agent capable of evaluating and selecting actions to optimize anticipated outcomes.We foresee multiple applications of this approach.First, afective computing researchers could integrate our work to existing models on physiological signals, improving the accuracy of emotion detection.Second, machines equipped with a model-based understanding of their users' emotions can simulate, in silico, alternative courses of action, deciding on one that is best predicted to achieve the desired emotional outcome [14,43].

BACKGROUND
Understanding and predicting the user's emotions is a long-standing objective in HCI.To that end, afective computing studies and develops systems designed to recognize, interpret, simulate, and respond to human emotions [51].Since the founding of this feld [33], the main research lines revolve around the detection and interpretation of afective and social signals from humans [50], modeling the diferent facets of human-agent interaction [23,35,44], as well as computational simulation of emotion processes based on psychological theories [7], especially appraisal theories [4,10,24].Afective computing has produced a number of key techniques that use sensors to infer emotional states, often based on either basic emotion [15] or core afect theories [36].In contrast, appraisal theory stands out as a promising foundation for computational cognitive emotion models due to its dedication to explaining emotions within integrated cognitive-afective processes inherent to humans [43].
Cognitive models of appraisal delineate an evaluative process (appraisal) by which specifc situations evoke particular emotional responses, given the subject's goals [29,31,42,47].For instance, the component process model (CPM) proposes a set of sequential cognitive checks, which assess situational stimuli based on characteristics like novelty, intrinsic pleasantness, goal relevance, and coping [30,37].The CPM predicts that the collective efect of these evaluations results in an emotion-specifc outcome profle.Most commonly occurring profles are called 'modal', and are associated with an emotion word, such as happiness, joy, or anger [40].Such appraisal models provide a detailed and empirically verifable account of the cognitive mechanics underpinning the appraisal process and its associated emotion [41].Moreover, these models enable their specifcs to be formalized in computationally implementable terms [27,37], making them suitable for creating machines that understand their users' emotions.While this allows for a clear stepby-step analysis of how a specifc emotion may have been elicited by a given situational stimulus, especially for computational implementations of this model, it only provides a framework for the static, momentary assessment of emotion elicitation, i.e. a specifc moment in time.Yet, most scenarios, especially interactive tasks, encompass an extended temporal context and repeated situational evaluations, underscoring the need for a more continuous account of computational appraisal.
Reward processing models have been used in afective computing and HCI to estimate and predict user responses, allowing systems to adapt their behaviors [6,9,53].At its core, a reward processing modeling seeks to understand decision-making based on anticipated rewards, with the ultimate aim of maximizing these rewards over time [1].The operating principle is that positive outcomes reinforce behaviors, encouraging their repetition.Yet, the approach has limitations in modeling emotions: it often oversimplifes motivations by assuming agents act purely for rewards, overlooking aspects such as cognitive processing or behavioral constraints.Furthermore, there is still a considerable gap between a reward-processing model of emotion and a realistic model of human emotions.
Computational rationality is an approach that has recently been used in modeling a variety of interactive tasks [32].It posits that humans can be modeled as agents whose decision-making and behavior are optimal within the bounds imposed by information, computational resources, and expected outcome utility [22].This approach has an interesting connection to the reward processing model of emotion: in computational rational modeling, RL is used to derive bounded optimal behavior policies.At the heart of computational rationality in HCI is implementing a simulated user's goals as a reward function [8,16,17].This facilitates the integration of emotion into computational rationality, thereby implementing emotion as part of an emerging modeling paradigm in HCI.
However, what stands in the way of implementing a model of user's emotions within computational rationality is the aforementioned gap between reward processing models of emotion and a more realistic understanding of human emotions.It has been recently suggested that appraisal theory is a promising candidate for bridging this gap [28,54].In many ways, appraisal theory is well suited for modeling emotion within the computational rationality framework, if it is implemented via the reward processing carried out in bounded optimal agents.This is because both appraisal theory and computational rationality embrace the importance of goals in making predictions, and note that it is not merely the events of the environment that shape behavior, but also cognition.In a demonstration of this, a recent model integrates appraisal theory with RL [54].While the model is promising, it is limited in being evaluated only with vignettes -textual descriptions -of everyday situations.Our goal with this paper is to review the applicability of this model in interactive tasks, and evaluate and design it further to ft this goal.

MODELING
In this section we review the recent model that formalizes emotional appraisal using RL [54], and develop it further.In section 3.1 we outline the foundations of RL; in sections 3.2 and 3.3 we describe the existing model that integrates appraisal into RL; and in section 3.4 we build on this model.

Sequential Decision-Making
The model's interactive episodes are formalized via a Markov decision process (MDP), a mathematical framework for modeling decision-making problems in stochastic environments [48].It is a tuple < , , , , >, where denotes the set of states and represents the set of actions that the agent can take.The state transition function (, , ′ ) describes the probability of transitioning from state ∈ to state ′ ∈ when taking action ∈ .The reward function (, , ′ ) defnes the immediate reward an agent receives when transitioning from state to state ′ by performing action .The discount factor discounts future rewards when calculating the value of actions.
In order to maximize the long-term rewards of a sequential decision-making task described with an MDP, an RL agent interacts with the environment, encoding the state transition probabilities and the reward function.The problem of RL is to derive an optimal policy * , which maps states to action probabilities such that behavior according to it maximizes the expected cumulative reward over time.The value function of a state under a policy , denoted as (), is the expected return when starting in state and following policy thereafter.The function () is the state-value function for policy : Í ∞ where = =0 represents the expected discounted return, and E denotes the expected value of the policy.The value of performing an action ∈ while in a state ∈ is defned as: The agent learns the optimal policy by interacting with the environment, receiving feedback in the form of rewards, and updating its value estimates for state-action pairs.In temporal diference (TD) learning, the value estimates are based on the diference between the expected and the observed value: (3) where is the learning rate.′ is the reward received after moving to the new state and ( ′ ) is the estimated value for the new state.This operation updates the value () associated with a state as soon as the new state ′ is reached, by computing the diference between predicted and observed values.Combining equations 2 and 3 results in a form of TD learning called Q-learning [48], which can be expressed as

Appraisal Calculation
Several appraisals are discussed in the literature, including relevance, implication, coping potential, and normative signifcance [39].In the RL appraisal model [54], four appraisals were considered: suddenness, goal relevance, conduciveness, and power.This choice was made because these appraisals have distinct representational capacities (regarding real-life episodes), and minimal inter-correlation, and they are suitable for integration into an RL model.
Suddenness is part of the novelty assessment of an event during appraisal.Specifcally, it quantifes the frequency with which a transition to state ′ occurs after action is taken in a prior state by the agent.Suddenness is denoted by and is defned as: where ˆ is a world model.It approximates the true transition function , and is learned by the agent during interaction.The intuition of ˆ is that the agent learns to expect certain state-action-state transitions, and therefore encountering such a transition triggers a suddenness appraisal: how expected was this transition?
Goal relevance checks how relevant an event is, given the agent's current goal.The more goal-relevant an event is, the stronger emotional reactions there will probably be [42].Goal relevance is operationalized as the magnitude of the TD error observed during value prediction updates: where Conduciveness appraisal in the CPM evaluates if an event aids the agent's goal attainment.Conducive events generally elicit positive emotions, while obstructive ones invoke negative ones [37].In the RL appraisal model, conduciveness is likened to both the direction and magnitude of the discrepancy between expected and actual outcomes.This concept is quantifed by standardizing its values between 0 (highly unconducive) and 1 (very conducive), with 0.5 marking neutral events that meet expectations.The intrinsic conduciveness of an event relies on the agent's cognitive value update, informed by prior expectations and goals.Goal conduciveness is expressed as: It is worth mentioning that goal relevance and conduciveness in emotional appraisal are not inherently correlative.Events that are goal-relevant may still be unconducive, as observed in negative emotions like despair, irritation, and sadness.Conversely, conducive events can have low goal relevance, exemplifed by scenarios eliciting boredom.
Power appraisal is part of the more general coping evaluation, asking how much an agent infuences an event's outcome.For instance, an experienced user possesses power due to their knowledge, while a novice lacks this.Power appraisal provides a means to explain why a particular event, such as an error message, might cause widely diferent emotions in diferent users (e.g.confusion or even fear in novice users, and irritation in experienced users).In the model, power refects the agent's ability to discern between benefcial and non-benefcial actions.When the values for various actions difer, the agent is believed to have power.Conversely, identical values or a singular action option denote no power.Power is quantifed as: This formulation underscores that the essence of the agent's power lies in its ability to identify which actions to pursue and which to avoid.In the simulations below, we standardize the power appraisal by dividing by the highest absolute value.

Classifer
In order to predict modal emotions (emotion words such as 'happiness' or 'irritation') from the computed appraisals, the vector of the four appraisals needs to be classifed.In the model, this classifer is created by connecting modal emotions to particular values or profles of such vectors.These profles are shown in Table 1, which summarizes textual descriptions connecting appraisal profles and modal emotions [39].Details of how this was done are reported in [54].For this study, we employed simulated data to train and test our classifer.The simulated data were generated by transforming nominal appraisals from Table 1 into a range of quantitative values (Table 2).We used a linear Support Vector Machine (SVM) for classifcation, focusing on the penalty parameter to balance maximizing the margin and minimizing classifcation errors.Our goal was to approximate human performance in the classifer.We We tested 100 SVM classifers with varying values (0.0035 to 0.006) against the simulated data.The classifer's precision closely matched human performance at a value of 0.0049, with a variance of 0.0004.To account for individual diferences, we trained an SVM classifer for each participant, each with a value sampled from a normal distribution (mean = 0.0049, variance = 0.0004), refecting the variance in human precision, thereby ensuring that our model not only matched average human performance but also captured individual variability.Importantly, the parameter was not ftted to minimize the model's prediction error against human emotion ratings, but to the same rating precision level as found in the human data.The goal of this procedure was to bring the variance of our modal emotion predictions more in line with human self-responses: with a value too large, only the most intense emotion would be predicted; by lowering the value, the model predicts also other, less intense emotions.This refects how humans are able to experience various emotions simultaneously.With the classifer, the computational appraisal model is able to predict emotion words from value computations of an RL agent, via the equations for diferent appraisals.

Extending The Model for Sequential Emotions
While the model presented above bridges appraisal theory and a general computational approach to modeling interactive behavior, it lacks in capturing the episodic nature of the interaction, wherein a single episode there are bound to be diferent emotional reactions.For instance, encountering an error multiple times during interaction should not result in multiple 'snapshot' instances of irritation, but rather a continuously growing feeling of irritation.
In other words, emotions do not appear and disappear, but linger and interact.
To that end, we augment the model.Initially, in this paper, we implement a simple moving window average, which considers not merely the present state evaluation, but those that precede it.
+ 1 where, () is the prediction from the classifcation of a given emotion e at time step t, and n is the length of the window.In this paper, experiment 2, we set = 2, but this number depends on the abstraction level of the simulation.In the future, we also envision a discounting factor, making emotions that occurred further in the past have less impact on the present emotional state.

EVALUATION 4.1 General Method
Given the goal of this paper -adapting an RL-based appraisal model to predict emotion in interactive tasks -, we focus our evaluation on users within an interactive environment.This approach difers from the vignette-based method used previously in validation.This paper introduces two original studies and assesses the model's predictions based on their outcomes.Our focus is on three common emotions in HCI: happiness, boredom, and irritation [20,21,26].Happiness refects the fulfllment of a user's goals or desires, and can lead to increased user engagement.Boredom signifes a lack of stimulation, possibly due to a system's failure to maintain the user's interest.
Irritation is typically associated with frustrating events that may be due to system errors, poor design, or a failure to meet user expectations.The experimental tasks derive from appraisal theory principles, refecting the targeted emotions' appraisal profles (see Table 1).For example, the happiness task featured low-suddenness, and high goal-conduciveness events, while irritation involved goalobstructive events where participants had some power.Having formalized these appraisals computationally, we implemented identical manipulations in the computational task designs.The frst experiment tests the original single-appraisal model against data collected from real emotional experiences.In the second, we test the idea of averaging emotions over longer sequences.

Materials:
We constructed six online tasks, three for each experiment.The material was a text paragraph (about 220 words) in the English language sourced from Wikipedia, and the participants had to answer questions about the text.Multiple questions were designed from the same source text.To infuence participants' emotions, we made specifc design alterations.For the happiness task, the questions were meaningful, correct answers resulted in positive feedback, and in the end the participant received a message congratulating them for good performance (Figure 2a).The boredom task featured a large number of monotonous, simple questions, and intentionally neutral feedback both for an individual task and at the end of the experiment (Figure 2b).Finally, the irritation task incorporated multiple system errors, leading to incorrect selections irrespective of user decisions, culminating in task failure and negative feedback (Figure 2c).The text and all questions are presented in full in Appendix B. second experiment, = 45 participants were recruited, 15 for each task (average age 29 ( = 7.3), 15 men and 30 women).All participants were sourced online through Prolifc, and were required to be native English speakers.With this requirement we aimed to eliminate potential biases or variations in the comprehension of the text, allowing participants to concentrate primarily on the test's structure.

Procedure:
Participants evaluated their emotions using a rating scale that was part of the online experiment design.This scale encompassed emotion words, asking participants to report on their current feelings (0 indicating "not experiencing this emotion at all" and 10 denoting "experiencing this emotion intensely").Besides the primary emotions of interest (happiness, boredom, irritation), we introduced two other emotions (joy, sadness) to divert concentrated attention from the manipulated emotions.In the frst experiment, the self-report was administered once post-task.In the second, evaluations occurred four times: initially, twice during the tasks, and upon completion.Correlations between the self-reported emotions are reported in the Appendix C. A between-subjects design was used, with participants engaging solely in one of the three emotional conditions.Diferent participants took part in the two experiments.

Data Analysis:
In analyzing the self-reporting of the targeted three emotions, we frst standardized them to reduce the impact of individuals interpreting the scale diferently.For each participant, we normalized their ratings by dividing their rating for an emotion by the total sum of their ratings for all three emotions.This ensured that the rating for each emotion ranged between 0 and 1, with the combined ratings always summing up to 1.The rationale for this was that we expected the participants to hold a diferent internal standard for how strong a particular rating for an emotion is.However, what could be assumed to be common to all participants is how they rate the emotions in relation to each other.
For model predictions, we designed 6 simulated environments to represent each task using the MDP formalism.The formalized tasks are shown in Figure 3 (experiment 1) and Figure 5 (experiment 2).An RL agent was trained via tabular Q-learning to converge on an optimal policy separately for each task.From the converged models, we computed four appraisal measures using the equations of the previous section.The resulting appraisal vectors were classifed into modal emotion probabilities using an SVM, which was calibrated as described in section 3.3.An overview of the data processing fow is shown in Appendix A.

Experiment 1
The frst experiment aimed to elicit three emotions in three betweensubjects tasks: happiness, boredom and irritation.The participants carried out tasks that manipulated these target emotions based on appraisal theory.Table 1 shows the appraisal profle.
The MDPs that formalize the three tasks are illustrated in Fig. 3.For the MDP of the happiness task, serves as the Goal state with positive rewards, while denotes the error state with negative rewards.Initiating from the state , the agent can perform the exclusive action , transitioning to 1.This action represents the task's start, and 1 represents the state where the participant is shown a question with two options.These choices are represented by two actions: 1 for the selection that is correct, and 2 for incorrect.Electing for 1 ofers an 80% likelihood for the agent to land in the goal state, contrasted with a 20% chance of ending up in the error state.This probabilistic outcome recognizes the real-world scenario where, despite intending to choose correctly, participants might inadvertently err due to incorrect knowledge or confusion.The appraisal analysis occurs at the onset of the goal state, when the agent transitions from S1 to G. The numerical patterns for  appraisal of the experiment, generated by these models, are shown in Table 3.Note the discrepancy in goal relevance of irritation between Tables 1 and 3.The reason for this is that in the experiment, we wanted to emphasize the irritation in human participants by making the obstructing task very goal-relevant.
The boredom MDP shares a similar confguration in its initial states and 1.The distinction lies in the reward values: ) is set to -1.This design choice makes the task less rewarding, both positively and negatively, implying that the outcome is less important.Appraisal analysis is conducted at 2. In the irritation task, we introduce a high likelihood of reaching the problematic state of the system even when opting for the correct choice 1.This represents the frustrating event when a certainly correct action results in an unwanted state due to system errors.The appraisal analysis happens accordingly when the problem state is encountered.The average standardized self-rated emotions from the participants after the tasks and the model predictions are presented in Fig. 4. Overall, our model achieved a reasonable degree of ft to  the data, 2 = 0.78, RMSE = 0.13.The manipulations proved effective for both human evaluations and model outcomes.In every task, both humans and the model rated the target emotion with the highest intensity.

Experiment 2
Our second experiment aimed to expose the process nature of emotion and show that a static snapshot emotional state, either via a self-report or a model-based prediction, does not provide a full understanding of emotions during the interaction.To that end, the participants again interacted with the tasks designed to elicit one of the three emotions, but now they self-reported their emotions four times: beginning, twice during the tasks, and at the end of the experiment.While these separate self-reports are still static measurements alone, the progression of these self-reports over time can be used to evaluate the process nature of emotion and how well our model captures that.Figure 5 illustrates the MDPs utilized in the second experiment.They bear a resemblance to those from the frst experiment, but are extended to three appraisal stages.Unlike the participants, who were expected to already have some emotional experiences upon  starting the experiment, our model does not have the frst emotion measurement.That complicates how the SMA (Eq.9) is implemented.Thus, the initial emotions of the model were set to the average initial values obtained from human participants when they rated their emotions before the task started.At each time stage, we extracted appraisals from the model.Using the same trained SVM classifer as in the frst experiment, these appraisals were then transformed into predictions of modal emotion intensities.We did not recalibrate the SVM classifer's parameters for this new experiment.Unlike in the frst experiment, we used the average of the current and previous emotion predictions to capture the process nature of emotion.
We performed a regression analysis on the human emotion values against our model's predictions, with each experiment as a fxed term.The results yielded an 2 = 0.86, RMSE of 0.19, a reasonable ft between predictions and responses.From Figure 6, it is evident that the targeted emotions increase over time, while the non-targeted ones decrease.This trend is consistent in both human evaluations and model predictions, and is particularly pronounced in the happiness and irritation tasks.
However, the results from the boredom task warrant further discussion.The boredom rating sees an uptick, but there's also a slight increase in participants' irritation ratings.This can be attributed to the inherent challenge in designing a universally neutral interactive task.In our boredom task, the repetitiveness of the simple questions, combined with the sheer volume and limited feedback, likely led to participants becoming impatient and consequently irritated.Furthermore, the self-reported happiness levels in the boredom task were higher than boredom ratings, potentially due to participant response bias where the participants want to provide good feedback to the experimenters and are generally favorable of their task designs [49].This might have resulted in over-reporting happiness in a task designed primarily to elicit boredom.This is also possible for the happiness task in the frst experiment, where the participants did not report as much boredom as predicted by the model.However, a more general view of all the graphs over these time stages clearly shows a decline in happiness and a rise in boredom.This points to the need for a more in-depth data analysis, focusing on the temporal shifts in emotion ratings as a contextual reference for participants' emotional responses.

DISCUSSION AND CONCLUSION
Empirical evaluation: The goal of the paper was to adapt and demonstrate a computational cognitive emotion model that simulates emotion in response to goal-driven interaction events.With some exceptions, the model closely mirrored emotional self-reports collected from human participants engaging in interactive tasks.This accuracy stemmed from incorporating appraisal theory into an RL computational framework, facilitated by a theory linking reward prediction errors with emotional responses [28,54].Furthermore, our fndings support the validity of appraisal theory, as both the human and computational experiment task designs were rooted in appraisal-centric hypotheses.
While our validating experiments included human participants self-reporting their emotions, future research should extend its scope into more emotions and tasks.The designed interventions intentionally exerted a pronounced emotional impact, and the tasks do not represent the wide range of tasks typically performed by humans on computers.In the future, adapting the model for more involved interactive scenarios beyond the simple interactions discussed here is crucial.The MDP framework utilized in this paper has simulated various interactions, such as multitasking while driving [17], touchscreen typing [16], and GUI-based decision-making [8].Given that these tasks elicit emotions, our model should be able to predict them.Furthermore, while happiness, boredom, and irritation are prevalent emotions in interaction, there are other emotions relevant in HCI.We limited the amount of emotion to these key ones to focus on testing the model.However, there is a broader spectrum of emotions that the model should be able to predict.The challenge for future research lies in either controlling the experiments carefully to elicit targeted emotions, or collecting a large naturalistic dataset that considers a wide range of lived human emotions.Finally, this paper used a simple moving window average to capture the persistence of emotion throughout an episode.In the future, more complicated formulas should be considered and the complex time-dependent dynamic of emotion investigated.
Implications: In the future, we foresee our model used in interactive systems that anticipate and adapt to their users' states [14], including emotional responses.Even in its current theoretical form, the model can provide designers with insights by allowing them to examine how variations in task progression or user goals infuence emotional outcomes.This implies that the model discussed herein should be adjustable individually, and tailored to a specifc user's objectives and profciency.By inferring a user's underlying cognitive states, the model's alignment with the human user can be enhanced, potentially boosting the accuracy and validity of its predictions.
Conclusion: With the increase in automated and intelligent machines that interact with their users, it becomes imperative that collaborative agents possess an understanding of their human users.A crucial aspect of this understanding is emotion.Alignment between humans and technologies embedded with artifcial intelligence is risked, if the latter cannot predict their users' emotional responses to interactive events.With computational cognitive models like those developed here, it becomes possible to implement an explicit understanding of emotion into artifcial agents.This understanding is not merely an ability to predict, given observed behavior or physiological signals, but to provide reasons for the causes of predicted emotions and internally simulate various 'what if' experiments to facilitate fuent interaction with the user.For the purposes of open science as recommended by [52] we present the model code and data from the experiments freely available at https://gitlab.jyu.f/zhangjy/simulating_emotions_chi.

B EXPERIMENTAL SETUP (EXPERIMENTS 1 AND 2)
The text used in the tasks (from https://en.wikipedia.org/wiki/English_language) Modern English has spread around the world since the 17th century as a consequence of the worldwide infuence of the British Empire and the United States of America.Through all types of printed and electronic media of these countries, English has become the leading language of international discourse and the lingua franca in many regions and professional contexts such as science, navigation and law.English is the most spoken language in the world and the third-most spoken native language in the world, after Standard Chinese and Spanish.It is the most widely learned second language and is either the ofcial language or one of the ofcial languages in 59 sovereign states.There are more people who have learned English as a second language than there are native speakers.As of 2005, it was estimated that there were over 2 billion speakers of English.English is the majority native language in the United Kingdom, the United States, Canada, Australia, New Zealand and the Republic of Ireland, and is widely spoken in some areas of the Caribbean, Africa, South Asia, Southeast Asia, and Oceania.It is a co-ofcial language of the United Nations, the European Union and many other world and regional international organisations.It is the most widely spoken Germanic language, accounting for at least 70% of speakers of this Indo-European branch.

4. 1 . 2 Figure 2 :
Figure 2: Screenshots from the experiments that targeted one of the three emotions.The happiness task was designed to be encouraging and rewarding, the boredom task contained a large number of repetitive tasks and neutral feedback, and the irritation task was ridden with errors and ended up failing the task.
(a) The MDP model for Happiness (b) The MDP model for Boredom (c) The MDP model for Irritation

Figure 3 :
Figure 3: MDP models for various emotions.Circles are states and arrows are transitions caused by actions.Denoted are also transition probabilities and rewards, when relevant.In the Happiness model, selecting action 'a1' from state 'S1' results in a 20% probability of encountering an error 'E', mirroring the anticipated error rate among participants.Conversely, in the Irritation model, choosing 'a1' at 'S1' leads to error 'E' with an 80% likelihood, refecting the high probability of system errors occurring during the task.

Figure 4 :
Figure 4: A comparison of human and modeling predictions for emotional ratings in each tasks.Error bars indicate 95% confdence intervals.
(a) The MDP model for Happiness in series (b) The MDP model for Boredom in series (c) The MDP model for Irritation in series

Figure 5 :
Figure 5: MDP models in series.Circles are states and arrows are transitions caused by actions.Denoted are also transition probabilities and rewards, when relevant.The error rates in the Irritation model escalate from 60% to 70%, and fnally to 80%, refecting the true the task failure rates.

Figure 6 :
Figure 6: A comparison of human and model predictions for emotional ratings in each task at each time stage.Targeted emotion magnitudes are reported with numbers.

Figure 7 :
Figure 7: A reinforcement learning agent is trained in a Markov Decision Process-based task environment.Learning signals (TD errors from the converged model) are converted into appraisal predictions, which are classifed into modal emotion predictions using a pre-trained SVM.The classifer is fne-tuned to have human-like spread between emotion ratings.

Table 3 :
Appraisal profles generated by our model.
How did English become the world's leading language?a. Through word of mouth of the British Empire and the US. b.Through printed and electronic media of the British Empire and the US.In which professional contexts is English the leading language?a. Art, music, and sports.b.Science, navigation, and law.Question 19: Which of the following countries has English as a majority native language?a.The Republic of Ireland.b.The Republic of Congo.Question 20: How many sovereign states have English as an ofcial Questions for the Boredom task: Question 1: When has Modern English spread around the world?a. 17th century.b. 18th century.Question 2: Which of the following countries infuenced the spread of modern English?a.The Great Britain.b.France.Question 3: How did English become the world's leading language?a. Through word of mouth of the British Empire and the US. b.Through printed and electronic media of the British Empire and the US.Question 4: Is English the most spoken language in the world?a. Yes, it is.b.No, it isn't.Question 5: Is English the most spoken native language in the world?a. Yes, it is.b.No, it isn't.Question 6: What is the most spoken native language in the world?a. English.b.Chinese.Question 7: What is the most learned second language in the world?a. English.b.Spanish.Question 8: Which of the following categories has a larger number?a.People who speak English as a native language.b.People who learn English as a second language.Question 9: How many people speak English as of 2005?a. 2 billion.b. 3 billion.Question 10: Which of the following countries has English as a majority native language?a. India.b.New Zealand.Question 11: Which of the following languages belongs to the ofcial languages of the United Nations?Which of the following countries infuenced the spread of modern English?a. Australia.b.The United States of America.Question 16: Which of the following statements about English is true?a.It is the most spoken native language.b.It is the most learned second language.Question 17: Which of the following statements about English is true?a.It is the ofcial language in many professions.b.It is a leading language of international discourse.Question 18: When has Modern English spread around the world?a. 16th century.b. 17th century.Question 19: Which of the following countries has English as a majority native language?a.The Republic of Ireland.b.The Republic of Congo.Question 20: How many sovereign states have English as an ofcial language?a. 39. b. 59.Question 21: How did English become the world's leading language?a. Through word of mouth of the British Empire and the US. b.Through printed and electronic media of the British Empire and the US.Question 22: Is English a co-ofcial language of the United Nations?a. Yes, it is.b.No, it isn't.Question 23: Is English the most spoken Germanic language?a. Yes, it is.b.No, it isn't.Question 24: Does English have the most speakers?a. Yes, it does.b.No, it doesn't.Question 25: Does the US infuence the spread of modern English?a. Yes, it does.b.No, it doesn't.Question 26: Is English widely used in many professions?a. Yes, it is.b.No, it isn't.Question 27: Is English the native language in Australia?a. Yes, it is.b.No, it isn't.Question 28: Is English widely spoken in South America?a. Yes, it is.b.No, it isn't.Question 29: Is English a Latin language?a. Yes, it is.b.No, it isn't.Question 30: Which type of language does English belong to? a. Germanic language.