Assessing Human Reactions in a Virtual Crowd Based on Crowd Disposition, Perceived Agency, and User Traits

Immersive virtual environments populated by real and virtual humans provide valuable insights into human decision-making processes under controlled conditions. Existing literature indicates elevated comfort, higher presence, and a more positive user experience when virtual humans exhibit rich behaviors. Based on this knowledge, we conducted a web-based, interactive study, in which participants were embodied within a virtual crowd with complex behaviors driven by an underlying psychological model. While participants interacted with a group of autonomous humanoid agents in a shopping scenario similar to Black Friday, the platform recorded their non-verbal behaviors. In this independent-subjects study, we investigated behavioral and emotional variances across participants with diverse backgrounds focusing on two conditions: perceived agency and the crowd’s emotional disposition. For perceived agency, one group of participants was told that the other crowd members were avatars controlled by humans, while another group was told that they were artificial agents. For emotional disposition, the crowd behaved either in a docile or hostile manner. The results suggest that the crowd’s disposition and specific participant traits significantly affected certain emotions and behaviors. For instance, participants collected fewer items and reported a higher increase of negative emotions when placed in a hostile crowd. However, perceived agency did not yield any statistically significant effects.


INTRODUCTION
The growing integration of digital technologies into our daily lives necessitates an in-depth understanding of our interactions within virtual environments.Exploring our relationship with virtual humans and crowds that populate these environments presents opportunities and challenges across various domains such as social psychology, human-computer interaction, emergency management, and gaming.One such opportunity involves the ability to conduct controlled studies that would be unattainable or non-replicable in real-life settings.For instance, when examining events that involve crowds, such as those characterized by emergent behaviors like panic, stampedes, or riots, the consequences of irrational behavior are discernible; yet the underlying mechanisms can only be approximated through retrospective analyses or ield observations.Given the realistic responses of people in immersive virtual worlds [Slater et al. 2020;Slater and Sanchez-Vives 2016] and the high ecological validity of virtual social interactions [Bombari et al. 2015], we can gain critical insights into human decisionmaking processes through controlled virtual crowd experiments in a low-risk and eicient manner.
Existing approaches that integrate humans into virtual crowds predominantly focus on crowd or human movement characteristics [Kim et al. 2016;Moussaïd et al. 2016;Nelson et al. 2020], or consider a limited set of other parameters such as crowd density [Dickinson et al. 2019], eye gazes of agents [Narang et al. 2016], or character appearance [Nelson et al. 2020].However, a holistic approach needs to consider nuanced behaviors rooted in the complexities of human psychology.Understanding the reactions of a user when embodied in a crowd simulation system that portrays the psychological nuances of its agents remains a promising area for exploration.Such an understanding can elucidate human psychology and decision-making patterns in virtual or real crowds, depending on the level of immersion.
In this paper, we explore the impact of a virtual crowd's emotional disposition and perceived agency of crowd members on human emotions and non-verbal behaviors.We assess emotional disposition on two levels: hostile versus docile crowds.Perceived agency also involves two levels that indicate whether users believe crowd members to be human-controlled avatars or autonomous agents.The literature does not have consensus on the efects of perceived agency [Oh et al. 2018].While some works suggest that avatars and agents elicit similar responses in users [Von der Pütten et al. 2010], others report stronger emotional responses toward avatars [Fox et al. 2015;Kothgassner et al. 2017].However, existing studies only involve interactions with individual virtual humans, and do not extend to crowds.
We introduce an independent-subjects study where users were embodied in a web-based 3D platform involving emotionally expressive virtual crowds exhibiting complex behaviors.The web-based nature of the study allowed us to eiciently recruit a diverse set of participants.We designed a scenario mirroring a shopping event resembling Black Friday.The participants were told to collect as many items as they could from a virtual store.We presented the study as a game but incorporated a reward strategy directly mapped from real life, where participants earned additional compensation based on the number of items they collected.While each person engaged with the environment as a crowd member, the platform collected data regarding their simulation behaviors and afective states.
In addition to crowd disposition and perceived agency, we explored the impact of individual diferences across participants on their emotional and behavioral responses.Speciically, we considered their age, gender, personality traits, and familiarity with 3D virtual environments.This work poses the following research questions: RQ1.How does the virtual crowd's emotional disposition towards aggression afect user emotions and behavior?RQ2.How does perceived agency afect user emotions and behavior?Speciically, do user behaviors and emotions vary when they perceive the other crowd members as avatars controlled by real individuals as opposed to artiicial agents?RQ3.How do users' demographics afect their emotions and behavior when interacting with virtual crowds?RQ4.Are there any correlations between users' personality traits and their emotions and behaviors in the simulation?Our work, expanding upon traditional approaches to crowd simulation research, holds the potential to advance the understanding of human behavior in social environments while contributing to the development of more efective and adaptable crowd simulation strategies.
with virtual humans naturally as if they were real humans [de Borst and de Gelder 2015; Nass and Reeves 2003].Virtual humans increase the sense of immersion in virtual worlds [Llobera et al. 2010;Pelechano et al. 2008], provide standardization for social experiments, and ofer low-cost solutions for training applications [Bombari et al. 2015].
Besides individual exchanges with virtual humans, understanding human interactions in multi-agent scenarios carries a signiicant potential to advance both our knowledge of human psychology and the state of the art in crowd simulation research [Pelechano and Allbeck 2016].Since Pelechano et al.'s seminal work that evaluated user presence in a virtual crowd [2008], many studies have explored human behaviors when embodied in virtual crowds, especially in VR environments.These studies have predominantly examined the steering or walking behaviors of human participants in crowds.Nelson et al. [2019], evaluated walking within a virtual crowd where the participant was actually walking in a motion capture studio.They found that an extremely dense virtual crowd signiicantly altered the movement behavior of participants in terms of their speed and walking time.This inding was also supported by Koilias et al. [2020a], who reported a high impact of density, low speed, and diagonal direction on the speed, deviation, and trajectory lengths of participants.In a separate work, Nelson et al. found participant movement speed, deviation, and interpersonal distance to be signiicantly afected by the appearance of the virtual characters [2020].Bruneau et al. [2015] studied the interactions between individual walkers and groups by applying the Principle of Minimum Energy (PME) and found that humans behaved as predicted by the PME.For instance, participants preferred to go through large and sparse groups and go around small and dense groups.The study took place in a CAVE environment where the participants navigated with a joystick.The work demonstrated the promise of VR experiments to guide the improvement of crowd simulations by directly designing an algorithm to imitate human behavior.Olivier et al. [2017] also performed CAVE experiments to show that collision avoidance trajectories and levels of perception in VR were suiciently realistic for human locomotion studies.Jiang et al. [2018] investigated the coordination of joint actions on road-crossing in VR, which provides a safe environment for such tests.They observed that participants' impulses to move in coordination when crossing a virtual road were consistent with real-world studies.Koilias et al. [2020b] evaluated the movement coordination of participants with virtual crowd members and found signiicant diferences between the movements of participants and crowd agents.Particularly, the participants moved slower, followed longer paths, performed less smooth motions, and had higher interpersonal distance.Although they found moderate associations, the crowd's inluence was not enough to make the participant become a part of it.
Studies show proxemics to be an inluential factor in user experience.In an early work by Llobera et al. [2010], virtual characters approached a stationary participant in groups of one to four agents.The authors measured electrodermal activity to evaluate participant arousal and found that skin conductance levels increased more the closer the virtual characters approached the participants.Christou et al. [2015] also evaluated proxemics and found similar results by measuring electrodermal activity, again with a stationary participant and groups of agents.Additionally, they showed a declined cognitive performance under close proximity.Diferent from earlier works, Dickinson et al. [2019] designed an experiment where the participant was allowed to navigate naturally in the environment with dynamically tracked handheld controllers and a head-mounted display.They found high density to increase the diiculty of participants' carrying out the given task and negatively impact their afective states as measured by PANAS self-reports.
Instead of employing crowds of virtual humans, Moussaïd et al. [2016] conducted a study with multiple human participants, demonstrating the eicacy of virtual platforms for social experiments.They replicated a high-stress emergency situation from a real-life experiment on a screen-based study involving the simultaneous participation of multiple human subjects.An important observation they report is that herding resulted from the high density and not from a change in the individual tendency to imitate neighbors.
Gaze and gestures positively afect user experience by increasing perceived realism and feeling of comfort.Narang et al. [2016] introduced PedVR, a VR framework that couples crowd simulations with realistic 3D body movements and gestures of a large number of virtual characters.Their within-subjects experiments indicated that the gaze behavior of agents through eye contact with the user improved perceived believability.Kyrikou et al. [2017] suggested that in addition to collision avoidance with the participant, interactions such as verbal salutations, gaze, waving, and other gestures should be employed to enhance the sense of presence.Volonte et al. [2020] developed an agent-based crowd model with rich behaviors including eye gaze, facial expressions, body motion, verbal, and non-verbal behaviors.They evaluated the impact of a crowd of emotional virtual humans on users' afective and non-verbal behaviors in a VR setting and found that participants interacted with the positive emotional crowds more than the negative ones.Our work shares similarities with this study in the exploration of complex virtual human behaviors and the emphasis on afective and non-verbal behaviors.Diferently, our crowd simulation system is controlled by a multi-layered parametric psychological model that incorporates personality and emotions to control low and high-level agent behaviors.Additionally, the experimental settings, scenarios, and research questions diverge signiicantly, which demonstrates the applicability and breadth of virtual social interactions for understanding human reactions.
Although undoubtedly more immersive, VR with head-mounted displays or in CAVE is more restrictive than desktop environments regarding participant recruitment.The aforementioned studies typically involved a small number of participants, ranging from 13 [Bruneau et al. 2015] to 18 [Nelson et al. 2019[Nelson et al. , 2020]].Given our primary emphasis on high-level behavioral parameters and emotional outcomes rather than steering behaviors, precise locomotion control was not a pressing requirement for our work.Thus, we preferred a browser-based environment, which facilitated access to a broader participant base.

Perception of Aggression in Crowds and Virtual Humans
Several studies have examined the perception of emotions in crowds.For example, Hansen and Hansen [1988] showed an anger superiority efect, where angry faces were more easily detected in crowds of happy faces compared to happy faces in crowds of angry ones.Bucher and Voss [2019] asked participants to rate the overall mood of the crowd and found that happy faces were more likely to be attended to and their predominance was assessed more accurately than the predominance of angry faces.Mihalache evaluated anger bias towards crowds with varying ratios of angry to happy faces and found that anger bias emerges particularly in the context of perceptual uncertainty, i.e. with low intensities of expressions [2021].It is important to note that in these studies, crowds were composed of static images of real or computer-generated faces, not real or simulated crowds.Also, the participants were merely observers.
Studies involving individual virtual humans also conirm anger bias.For instance, participants took longer times to recognize patterns when presented with angry faces [Rapuano et al. 2023], tended to increase spatial distances, and reacted with increased emotional arousal [Ruggiero et al. 2021].

Perceived Agency
Perceived agency, or łagency beliefž, is a concept for which there is no clear consensus in the literature regarding its efects.There are many studies that did not ind any diferences towards agents vs. avatars in social judgments [Nowak 2004;Von der Pütten et al. 2010].This can be explained by the łethopoeiaž concept, introduced by Nass and Moon, which indicates that systems elicit social responses as long as they provide social cues [2000].Von der Pütten et al. later discussed that the strength of reactions would depend on the amount or strength of the social cues [2010] and demonstrated that higher behavioral realism led to stronger social efects.
Contrary to these works, in a study by Guadagno et al., subjects assessed a virtual human's behaviors as more realistic and reported higher levels of social presence when they believed that it was controlled by a human [2007].The authors also found interaction efects of agency with the gender of the virtual human, indicating in-group favoritism and stereotypical efects.Other studies also support the relationship between social presence and agency in a similar way, reporting stronger efects when others were perceived as avatars instead of agents [Appel et al. 2012;Fox et al. 2015;Gajadhar et al. 2008;Weibel et al. 2008].Poinsot et al. discussed the interaction efects of emotional communication and perceived agency, reporting that emotional communication led computer-controlled opponents to elicit a stronger sense of co-presence than human-controlled ones [2022].
In the opposite direction, Williams and Clippinger found higher levels of aggression toward computer opponents than humans [2002].Lim and Reeves [2010] later showed interaction efects of agency with the type of game activity (cooperative vs. competitive).Thus, the apparent contradictions in research results toward agency may be a result of other factors such as visual and behavioral idelity, type of interaction, and in-group efects.

METHOD 3.1 Environmental Setup
We built an interactive, browser-based platform using Unity1 and WebGL, upon an existing crowd simulation system that comprises autonomous humanoid agents with controllable psychological parameters [Durupinar et al. 2016].We extended this system to incorporate the human user as a crowd member with a 3D body who can navigate the environment with a irst-person perspective.We launched the platform's Unity WebGL build on a public-facing server that employs Express2 , a Node.jsapplication framework.User data was stored in a MySQL database.
We deployed the web-based study on Amazon Mechanical Turk (MTurk).Users accessed the study's website, which was embedded in an HTML inline frame (iFrame) as part of the MTurk Human Intelligence Task (HIT).Worker IDs and HIT information were automatically transmitted to our server (Figure 1).As a quality assurance measure, we hid the łsubmitž button until a study completion token was sent from the server.

Stimuli Creation
We created a scenario that featured a big sale event at a store ofering discounts on iPads.The store contained 50 iPads across ive rows of shelves.The scene involved 20 virtual agents with diverse appearances and a human user each given the goal of purchasing iPads.Both the human user and the computer agents could navigate the store, collect iPads from the shelves, and pay at the counter.We used freely available Mixamo3 and Adobe Fuse4 virtual human models with facial blendshapes.
Participants were informed that they would receive bonuses based on the number of iPads they collected, incentivizing them with a direct mapping to the scenario's theme.Users were given the option to start ights with virtual humans, with the winner collecting the loser's iPads.Users could interact with the environment through keyboard controls using the arrows for navigation, the 'C' key for collecting iPads, 'F' for ighting, and 'P' for paying at the counter (Figure 2).We preferred keyboard controls over mouse or touchpad for accessibility.Each press of 'F' animated a punching clip on the human's avatar and included a punching sound.To enhance immersion, we also assigned footstep sounds to the participant's walking.

Agent Behaviors
The parametric crowd simulation system allows authoring scenarios by initializing agents with social roles, cognition, and personalities, which determine the agents' emotions, decision-making, and actions.Agents are assigned personalities that remain constant throughout the simulation.The system uses the Five-Factor model of personality [Goldberg 1990], which directly controls low-level dynamics parameters such as velocities, forces, and local steering choices.For instance, high extroversion leads agents to move faster and leave smaller personal spaces.Personality indirectly inluences decision-making and behavior choices by determining emotional dispositions.Emotions, unlike personality, are prone to changes based on internal or external stimuli.The system uses the cognition-based OCC (Ortony, Clore, Collins) emotion model [Ortony et al. 1988], which attributes emotion elicitation to the subjective interpretation of a person's environment.The OCC model describes a hierarchy that classiies 22 emotions as valenced reactions to an individual's interpretation of three factors: goals regarding consequences of events, standards about actions of individuals, and attitudes towards aspects of objects.Relevant emotions are elicited by traversing the hierarchy based on these factors' valence (positive or negative), desirability (desirable or undesirable), conirmation status (conirmed, unconirmed, or disconirmed), focus (on consequences for oneself or others), and approval status (approving or disapproving).Agents are imbued with a cognition module initialized with scenario-based goals, standards, and attitudes.These, in addition to their personalities and social roles, determine their emotions and consequently their behaviors.For instance, agents have positive attitudes towards discounted items, for which they feel love, leading them to the closest available iPad in the store.Peaceful agents initially experience hope because their anticipation of the sales event's consequence for themselves is positive.On the other hand, hostile agents experience fear as they anticipate the consequence of the sales event for themselves to be negative.An agent's cognition module is responsible for updating the goals, standards, and attitudes throughout the scenario.For example, during the shopping spree, hostile agents develop disapproving standards towards others who attempt to get the same iPads, potentially leading to reproach and anger.
The system also employs an epidemiological emotional contagion model, which causes emotions to propagate among the crowd.Based on the agents' susceptibility levels (controlled by personality), they can łcontractž emotions of nearby agents after a certain duration of exposure.Emotions also decay over time.Because agents experience multiple emotions simultaneously and emotion intensities change quickly, the system computes the average emotional state to control behaviors.For this, the Pleasure-Arousal-Dominance (PAD) model [Mehrabian 1996], which deines a link between the FFM personality and OCC emotions is used.Agents' elicited emotions and the average emotional state are updated at regular intervals.The PAD values, in combination with the cognition, determine high-level behaviors such as starting a ight.For instance, an agent with negative pleasure, high arousal, and high dominance, with disapproving standards towards another agent within its proximity can start a ight.The PAD values also determine the facial expression of emotions and body postures, which are controlled by interpolating model blendshapes and joint rotations respectively.The details of this system can be found in the work by Durupinar et al. [2016].

Study Design
We conducted a study with a 22 factorial design model with independent subjects on MTurk.Each task consisted of six scenes: a warm-up scenario, demographic data survey, participant personality assessment, pre-study participant emotional state assessment, the main scenario, and post-study participant emotional state assessment (Figure 3).The warm-up scene displayed the environment without any virtual humans, where the participants were instructed to collect iPads.The study started with a warmup scene so that the workers could decide whether to continue the study or not.We assessed personality by a brief Five-Factor personality measure, the Ten Item Personality Inventory (TIPI) [Gosling et al. 2003].For the emotional state assessment, we used a short version of PANAS [Watson et al. 1988] for brevity: the International Positive and Negative Afect Schedule Short-Form (I-PANAS-SF) [Karim et al. 2011].At the end of the study, we presented a questionnaire with the following questions: • Please rate your overall experience during the iPad collection task on a scale of 1 to 7, where 1 indicates "Not at all like interacting with real persons" and 7 indicates "Exactly like interacting with real persons." • Did you feel that the behaviors of others during the task were consistent and predictable?Please explain your answer.• How would you describe the personalities of others based on your interactions?(e.g., friendly, helpful, competitive, cooperative) The study conditions included the emotional disposition of the crowd (docile vs. hostile) and whether the participants were told the others in the crowd were virtual agents or avatars controlled by other MTurk workers.In the docile scenario, crowd agents were assigned neutral personalities (with all the ive factors set as 0, e.g., neither introvert nor extrovert), so they only walked around the store, collecting iPads and not showing strong emotions or ighting (Fig. 4(a)).In the hostile scenario, the personalities were assigned as unconscientious, extroverted, disagreeable, and neurotic ( = 0, = −1, = 1, = −1, = 1), so that the agents would be more assertive and prone to starting ights with the human participant and other agents, displaying negative emotions such as angry facial expressions, and exiting the store without paying (Fig. 4(b)).During each scenario, we recorded the total time, average speed, the number of ights, punches, iPads grabbed from others, and the total number of iPads (collected from shelves and grabbed in a ight).The University Institutional Review Board approved the experiment protocol.

Participants
We set participation requirements as having an acceptance rate of > 95% and experience of more than 100 HITs and Masters qualiications.To ensure quality, we placed attention-checking questions in the questionnaires and discarded the responses of participants who did not pass these tests.For a medium efect size (Cohen's = 0.25) for both main efects and their interaction and power of 0.80 at a signiicance level of 0.05, we collected 30 participant responses per group (120 in total).
Of the 120 unique participants (83M/37F/0 other), the average age was = 37.083 ± 11.41.The ethnic distribution was 81 White, 25 Asian, 6 Black, and 7 Hispanic/LatinX, and 1 other.We also asked about familiarity with irst-person view video games on a scale of 0 (łnot familiar at allž) to 5 (łhighly familiarž).The mean familiarity level was 1.533 with a standard deviation of 1.66.

Analysis
3.6.1The Efects of Study Conditions on Participant Emotions and Behaviors.We collected the emotional states of participants before and after the study using a short and international version of PANAS.We collected scores for ten emotions, ive positive and ive negative, on a 5-point Likert scale.The positive emotion items were active, determined, attentive, inspired, alert, and the negative emotion items were afraid, nervous, upset, hostile, and ashamed.To test for diferential efects, we subtracted the pre-study scores from post-study scores.Figure 5 shows the frequency histograms of the positive and negative afect diferences scores, indicating approximately normal distributions.
As behavioral parameters, we collected the number of ights (ightCnt) each participant involved, punches administered (punchCnt), iPads collected from the shelves (collectedItemCnt), and iPads stolen from others (stolenItemCnt), as well as the time spent (timeSpent), average speed (avgSpeed), and the total distance covered during the simulation (totalDist).Figure 6 shows the frequency histograms of the behavior parameters for the whole study.
To analyze the efects of the perceived agency (whether participants believe others are humans vs. agents) and crowd disposition (docile vs. hostile) on emotional state changes, we designated these two factors as independent variables with two levels each, and the afect score diferences as the response variables.With a lack of evidence for unequal variances across conditions (with Levene's test) we employed a two-way independent subjects Analysis of Variance (ANOVA) to test the efects of the study conditions on afective diferences.
The ANOVA test returned a statistically signiicant main efect of crowd disposition on afraid (1, 116) = 4.545, = 0.035, upset (1, 116) = 4.474, = 0.037, hostile (1, 116) = 12.865, = 0.0005, and ashamed (1, 116) = 9.901, = 0.002.We did not ind any main efects of agency or interaction efects.To control for potential inlation of the Type I error rate, we employed Benjamini-Hochberg (BH) procedure for False Discovery Rate (FDR) detection.After FDR correction, only hostile and ashamed remained statistically signiicant.Figure 7 illustrates the comparative distributions of afect score diferences between post and pre-study responses across the study conditions.We also calculated the composite positive and negative afect scores as the means of the positive and negative items respectively as dictated by PANAS.ANOVA yielded a signiicant main efect of post and pre-study negative afect score diference with (1, 116) = 13.954,= 0.0003.The efect remained statistically signiicant after the BH procedure.Table 1 summarizes the statistically signiicant results after ANOVA and FDR correction.
To measure simulation conditions on participant behaviors, we again took participant belief and crowd disposition as independent variables.Because of the ANOVA model's robustness given independent and suiciently large data (typically = 30) we repeated a two-way independent subjects ANOVA to test the efects of the study conditions on behavior parameters.The ANOVA test yielded statistically signiicant main efects of crowd type on ightCnt with (1, 116) = 9.405, = 0.003, collectedItemCnt with (1, 116) = 15.917,= 0.0001, timeSpent with (1, 116) = 5.299, = 0.023, totalDist with (1, 116) = 4.696, = 0.032.We did not ind any main efects of Table 2 summarizes the signiicant results after ANOVA and FDR correction.Figure 8 illustrates the comparative distributions of behavior parameters across the study conditions.From the demographic parameters, we discarded ethnicity due to its disproportionate distribution.Additionally, a least-squares linear regression to ind the efects of age, a continuous variable, on emotional changes did not return any statistically signiicant efects.
As nominal and ordinal variables respectively, we assessed the efect of gender and familiarity on participant emotions using a two-way ANOVA.Familiarity was recorded on a 7-point Likert scale ranging from łnot at allž to łhighly familiarž.We binned familiarity into two groups low (n = 32) and high (n = 88), corresponding to the ranges [-3,0] and [1,3].Thus, we treated it as a binary variable with suicient sample sizes per category and explored its interaction efects with gender.
We ran a 2x2 ANOVA to test the efects of gender and familiarity on the diferences in participants' emotions after and before the study.The ANOVA returned a statistically signiicant main efect of familiarity on the diferential afect scores of afraid with (1, 116) = 9.158, = 0.003 and ashamed with (1, 116) = 6.810, = 0.01.There were interaction efects of gender and familiarity on the diferential afect scores of attentive with (1, 116) = 6.194, = 0.014.However, we did not ind any statistically signiicant efects after BH correction.We also tested the composite afect scores, where familiarity yielded a signiicant main efect on diferential  3 summarizes the signiicant results after ANOVA and FDR correction.
Figure 9 shows the distributions of afect score diferences for gender and familiarity, respectively.We ran a 2x2 independent subjects ANOVA to test the efects of gender and familiarity on participant behavior parameters.ANOVA returned a statistically signiicant main efect of familiarity on ightCnt with (1, 116) = 8.677, = 0.004, timeSpent with (1, 116) = 13.423,= 0.0004, and avgSpeed with (1, 116) = 9.886, = 0.002.We also found an interaction efect of gender and familiarity on totalDist with (1, 116) = 3.961, = 0.049.BH correction yielded only the main efects to be statistically signiicant.Table 4 summarizes the signiicant results after ANOVA and FDR correction.We performed Spearman rank-order correlations between personality factors and participants' afect score diferences between post and pre-study responses.To control for Type I errors, we performed FDR correction with BH procedure.We evaluated composite positive and negative afect scores separately due to their dependence on other factors, again correcting for FDR.After FDR correction, the results indicate a moderate negative correlation between extroversion and the diferential score of active emotion ( = −0.456,< 0.001), a low positive correlation between neuroticism and the change in active emotion ( = −0.305,< 0.001), a low negative correlation between extroversion and the change in overall positive afect score ( = −0.29,= 0.0013), and a low negative correlation between conscientiousness and the change in overall positive afect score ( = −0.25,= 0.006).Figure 11 (a) shows the correlations between participants' diferential afect scores and their personality traits, where the statistically signiicant correlations are highlighted with their corresponding adjusted p-values.
To further explore the relationship between initial emotions and personality scores, we also calculated Spearman rank-order correlations between personality factors and participants' pre-study afect scores and applied FDR correction with BH procedure.We observe slight to moderate positive correlations between positive afect scores (individual and composite) and the personality factors of openness, conscientiousness, extroversion, and agreeableness.In contrast, we ind negative correlations between negative afect scores and neuroticism.Figure 11 (b) shows the correlations between pre-study afect scores and personality traits.The statistically signiicant correlations are highlighted with their corresponding adjusted p-values.
Although Spearman rank-order correlations between personality factors and participant behavior parameters returned a low positive correlation between neuroticism and punchCnt ( = 0.202, = 0.027), openness and timeSpent ( = 0.187, = 0.041), after FDR correction using the BH, none of these values remained statistically signiicant.Figure 12 shows the correlation matrix of personality factors and behaviors, where statistically signiicant correlations are highlighted with their corresponding adjusted p-value.

3.6.4
Post-Study uestionnaire.The mean ratings for q1 and q5 were = 3.306 ± 1.952 and = 2.953 ± 1.731, respectively.These indicate that the participants had neutral experiences and found the character communication to be neutral.Most participants found the characters consistent and predictable.
All participants stated that they never suspected the agency when they were told that others were artiicial agents.For the avatar condition, more people believed the characters to be genuinely human-controlled in the hostile setting than the docile.Around 40% explicitly stated that they never suspected them to be artiicial and 16% stated they were suspicious when agents did not attack others with more iPads or when they quickly gathered so many iPads.Others who did not believe characters were avatar-controlled mentioned robotic movements, the improbability of having the simulation start simultaneously on MTurk, and the game-like nature of the study.In the docile-human condition, only 25% believed the others to be humans.
An interesting response for the docile-human condition, where the participant believed others were humans was łI felt pretty ashamed because I just started swinging, trying to get what I needed.I felt like I didn't have much of a choice because of the other players.It seemed like one of those Black Friday videos but happening in a virtual world.It was crazy.ž

DISCUSSION
RQ1 explores the efect of the crowd's emotional disposition on participant emotions and behaviors.We calculated the diferences between the post and pre-study afect scores of participants to understand the emotional changes.Thus, the analyses compare these diferential scores rather than the absolute emotions before or after the study.In the hostile crowd condition, there was a higher increase in the scores of hostility, shame, and overall negative emotions after the study, compared to the docile crowd condition.Additionally, the hostile condition yielded a slightly higher increase in fear and upset although the results did not remain statistically signiicant after the FDR correction.However, all these negative emotions have likely contributed to the combined negative afect score.There were no statistically signiicant diferences between conditions for the changes in positive emotions.
The results for emotions are in line with the participant behavior diferences, showing a higher ight count in the hostile crowd than in the docile crowd condition.Yet, there was no signiicant diference in the number of punches they administered.Participants were attacked by crowd agents in the hostile setting.So, they had to engage in ights even when they did not initiate them.The lack of a signiicant diference in the number of punches between the two conditions indicates that participants generally responded to ights only when needed, which is also supported by their responses to open-ended questions.The number of collected iPads was lower in the hostile condition.This is also an expected result of the simulation setting as the hostile agents were able to attack the participants and steal their iPads.The participants spent slightly longer time and traveled longer distances in the store in the hostile setting.Regardless, reaching a deinitive conclusion is challenging because these results did not retain statistical signiicance after the FDR correction.We can speculate that some participants might have extended their stay to recover the iPads that were stolen from them.
The results indicate that by imbuing agents with emotional behaviors dictated by personality factors, we can control users' behaviors and afective states.The increase in negative emotions, especially hostility, airms the efectiveness of the simulation conditions in eliciting emotional responses.
In response to RQ2, we did not ind any statistically signiicant efects of agency on human emotions or behaviors.
The answers to open-ended questions hint at the main reason being the failure of the instructions and simulation to convince the participants that the agents were human-controlled.Although a considerable number of participants believed that narrative, others were skeptical, as a multi-user study that starts simultaneously on all computers is not usual on a crowdsourcing platform.Considering that the participants were selected from łMastersž, i.e., people with high experience, deceiving them was especially challenging.Participants in the hostile crowd condition had a higher belief in the stated agency, possibly because the agents' actions were less predictable.The literature has mixed indings on the impact of perceived agency.Therefore, other factors than the participants' beliefs may have played a role in the lack of a diference between the two conditions of avatar vs. agent.An explanation might be the efect of the anonymity on MTurk.Because the experiment was short and did not involve further interaction other than the simulation, the participants may have distanced themselves from others, depersonalizing them, regardless of their agency [Chen and Dang 2022].Another explanation may simply be ethopoeia, as suggested by Nass and Moon [2000].
Regarding RQ3, we did not ind any signiicant efects of age and gender on participants' emotions or behaviors.Although high familiarity suggested higher increases in fear, shame, and negative emotions, the efects were not found to be statistically signiicant after controlling for Type I errors.Regarding behaviors, familiarity was found to be related to a lower number of ights and the time spent, and an increased average speed with statistically signiicant efects after FDR correction.Having more experienced participants perform the study in a shorter time and with higher speed is an expected outcome.The lower number of ights was likely a result of the shorter time spent in the simulation.The lack of a signiicant efect of familiarity on punch count can be attributed to more efective ights for more familiar participants.
RQ4 explored potential correlations between the dependent variables and participants' personality traits.Although no signiicant correlations were observed after applying the FDR correction, we found slight positive correlations between neuroticism and punch count, and openness and time spent.Neuroticism is associated with anger, nervousness, and emotionality [Goldberg 1992] which could explain the higher punch count.Similarly, openness is described with curiosity and inquisitiveness [Goldberg 1992], which could indicate the increased time in the environment.As these results are in line with the literature [Durupinar et al. 2011], we postulate that signiicant efects might be achieved with a larger sample size.
Although our main focus was the change in emotional states, we examined correlations between the participants' emotional states before the study and their personality traits to test the coherence of the collected data with the literature.Openness, conscientiousness, extroversion, and agreeableness, which can be considered the personality factors with łpositivež connotations, were correlated with the participants' initial positive afect scores, i.e. being alert, determined, attentive, and inspired.In contrast, neuroticism, which is described as a tendency to experience negative afect, was correlated with the initial scores of afraid, nervous, upset, hostile, and ashamed.Those ainities are in alignment with the trait descriptions of the personality factors in the literature [Watson and Clark 1992;Watson et al. 1992].
We found statistically signiicant correlations between some diferential afect scores and personality traits.For instance, extroversion was negatively correlated and neuroticism was positively correlated with the increase of active emotion.In other words, higher extroversion and emotional stability (low neuroticism) scores were linked to higher decreases in feeling active after the study.In addition, higher extroversion and conscientiousness were found to be correlated with higher decreases in positive afect scores.An explanation could be about the computer-based nature of the study, leading to lower activity and less excitement for extroverted and emotionally stable individuals.The unpredictable nature of the simulation might have caused higher distress in the more conscientious individuals.Although the correlations between individual positive emotions and conscientiousness were not statistically signiicant, their combination was.In addition, all the positive emotions, i.e., activity, determination, attentiveness, inspiration, and alertness were negatively correlated with conscientiousness.

LIMITATIONS
Despite its convenience, the web-based approach introduces several limitations.One drawback is its failure to simulate the stress from high crowd density and possible stampedes that often occur during actual Black Friday events.Due to computational constraints, our scenario only included twenty shoppers, which is signiicantly lower than the thousands that might enter a store during such events.Additionally, we had to limit the length of the study to ensure that we could recruit and retain participants for the MTurk study, which further deviated the scenario from its real-life counterparts.
Although the web-based platform allows easy and convenient experimental setup, it is not as immersive as a VR platform that supports multiple modalities of communication.For instance, participants were only able to interact with the system by mouse and keyboard.A higher-idelity environment capable of accurately representing users' movements (e.g., through the use of hand and body trackers), capturing their emotions via facial recognition, and enabling verbal communication with others, would enhance the study's ecological validity.
We used a limited set of animations for character movements.Interestingly, although the participants' own actions were restricted to moving in the environment, picking up objects, and starting ights, they reported disbelief that others were humans, based on their limited actions.As a result, the study's emotional impact was diminished, which complicated the analysis of results from tests designed to gauge the perception of agency.

CONCLUSION AND FUTURE WORK
This paper presents a web-based study to explore the efects of perceived agency and emotional disposition of virtual crowds on a human participant embodied as a crowd member.The system recorded each participant's actions during the simulation.We also collected data about their personalities and emotional states before and after the study, in addition to their demographics.We found statistically signiicant efects of crowd disposition and familiarity with irst-person 3D games on certain behavioral parameters and diferential emotion scores but did not observe any efects of perceived agency.
The indings provide a foundation and direction for the next phase of crowd simulation research.By incorporating the insights gained from these results, researchers can work towards more detailed, realistic, and adaptive virtual environments that mirror the intricacies of human behavior and decision-making in crowds.The clear impact of crowd disposition on individual emotions and behaviors emphasizes the promise of incorporating emotional temperaments in simulations.The absence of signiicant efects of perceived agency suggests that the distinction between avatars and agents might not be crucial in certain contexts.However, it also opens the door for more detailed research, perhaps exploring subtler nuances or diferent scenarios where perceived agency might have a more pronounced impact.Because not all the participants believed that the crowd was human-controlled, a more convincing study needs to be performed to reach a conclusion about the lack of signiicant efects for perceived agency.More diverse animations, perhaps with deliberately erratic behaviors, and a more convincing platform for multi-user studies can provide diferent results.
The indings about the efects of participant familiarity and personality on their behaviors and emotions can guide researchers and animators to design personalized simulations based on user proiles.This can increase the accuracy of predicted simulation efects and provide customized virtual experiences.
Our future plans include repeating the study in a VR environment.Although VR headsets provide higher immersion, their usage has limitations in terms of eiciency, diversity of the participant pool, and participant comfort.Thus, we preferred a browser-based environment instead of a lab-based VR study for this exploratory study.We expect to gather more detailed data and pronounced diferences in a more immersive setting.Such a setting will also permit recording physiological responses in addition to self-reported measures.We will explore additional scenarios that analyze the attitudes of people toward virtual agents with diverse personalities and behaviors.Our study involved a single user but the platform can easily be extended to incorporate multiple participants, allowing the design of more variable scenarios.
Another interesting direction would be the exploration of the efect of conversation.In the present setup, neither the agents nor the user can use dialogue to interact, ignoring a vital channel of communication.Advancements in large language models have made it possible to equip agents with conversational abilities.Agents expressing their emotions verbally and even vocally can be more convincing and efective in eliciting emotional responses.

Fig. 2 .
Fig. 2. A first-person view of the environment at the beginning of the study.

Fig. 5 .
Fig. 5. Frequency histograms for the diferences of the post and pre-study afect scores

Table 2 .Fig. 7 .
Fig. 7. Box plot diagrams for the diferences between the post and pre-study afect scores by study condition

Fig. 8 .
Fig. 8. Box plot diagrams for the participant behaviors by study condition Figure 10 depicts the distributions of behavior parameter values for gender and familiarity, respectively.
3.6.3Correlations of Participant Personality Factors with Their Emotions and Behaviors .We computed the participant personality scores for the Five-Factor personality model, which deines personality on ive orthogonal

Fig. 9 .
Fig. 9. Box plot diagrams for the diferences between the post and pre-study afect scores by gender and familiarity

Fig. 10 .
Fig. 10.Box plot diagrams for the participant behaviors by gender and familiarity

Table 3 .
Two-way ANOVA results for gender and familiarity on participant post and pre-study participant afect diferences.