Using Speech Patterns to Model the Dimensions of Teamness in Human-Agent Teams

Teamness is a newly proposed multidimensional construct aimed to characterize teams and their dynamic levels of interdependence over time. Specifically, teamness is deeply rooted in team cognition literature, considering how a team’s composition, processes, states, and actions affect collaboration. With this multifaceted construct being recently proposed, there is a call to the research community to investigate, measure, and model dimensions of teamness. In this study, we explored the speech content of 21 human-human-agent teams during a remote collaborative search task. Using self-report surveys of their social and affective states throughout the task, we conducted factor analysis to condense the survey measures into four components closely aligned with the dimensions outlined in the teamness framework: social dynamics and trust, affect, cognitive load, and interpersonal reliance. We then extracted features from teams’ speech using Linguistic Inquiry and Word Count (LIWC) and performed Epistemic Network Analyses (ENA) across these four teamwork components as well as team performance. We developed six hypotheses of how we expected specific LIWC features to correlate with self-reported team processes and performance, which we investigated through our ENA analyses. Through quantitative and qualitative analyses of the networks, we explore differences of speech patterns across the four components and relate these findings to the dimensions of teamness. Our results indicate that ENA models based on selected LIWC features were able to capture elements of teamness as well as team performance; this technique therefore shows promise for modeling of these states during CSCW, to ultimately design intelligent systems to promote greater teamness using speech-based measures.


INTRODUCTION
A challenge that has long faced team science research is how to measure essential, multifaceted teamwork constructs on the team level, as teamwork unfolds over time, rather than simply aggregating individual-level measurements.This challenge has been exacerbated by recent advancements in human-agent teaming (HAT) that have blurred the line between human and agent roles, relationships, and dependencies.As Cooke describes, rather than measuring how individuals in a team function collectively, there is a need for objective, longitudinal measurements that refect changes in team cognition, composition, and team processes as tasks unfold over time [7].
Cooke's recently dubbed term, teamness, is proposed to be considered within the dimensions of team composition, role heterogeneity, diversity of shared goals and identity, authority structure, and degrees of interdependence [7].While this teamness framework takes a crucial step toward developing a better understanding of today's complex HATs, the teamness authors note that these dimensions require further development, measurement, and testing [7].We posit that the dimensions of teamness may be measured, in part, through sub-facets of individual and team-level measures of afective states, degrees of trust, mental workload levels, and team processes, which we describe next.

Bridging teamwork measures with teamness
To bridge the teamness dimensions with the aforementioned measurements, we refer to the 'ABCs' of teamwork literature that describes why a team meets their objectives given certain afective states, behavioral processes, and cognitive states of team members [3].The ABCs are a validated framework of measurable teamwork mechanisms which connect our measurement methods to the teamness dimensions below.
Both individual and team afective states (i.e.valence, and arousal) are afected by afect disposition-making team composition important-or behaviors like self-and co-regulation of attitudes, emotion and mood, cooperation, and ingroup identity [21].Afective states are measured using surveys and physiological data, while their associated behaviors can be studied using various methods like speech and facial expression.
Trust is crucial to teamness, especially in HATs, with both afectiveand cognitive-based trust playing an important role in interdependence via overall trust and reliance [26].Trusting behaviors, along with a mix of afective and cognitive states, engender team processes like cohesion, communication, and secure team identity.
Cognitive states also have a reciprocal relationship with trusting behaviors.Behaviors like information sharing and willingness to adapt strategies increase the chances for developing new cognitive states that can positively impact performance [9].Due to trust's connection with the afective and cognitive states, teams with members exhibiting less trust are more susceptible to team process breakdowns that lead to poor performance [9].While a lack of trust does not guarantee team performance failure, performance benefts from creativity, cooperation, and coordination are harder to achieve without it.
Connected to teamness through cognitive states and behavior, is mental workload.Cognitive states include the team level knowledge structure and the perception and acquisition of information (e.g, shared mental models) [33].Within the shared cognitive state could be a shared goal which team workload research has shown to decrease mental workload in individual members and team overall [5].Individual members bring their own measurable cognitive abilities, knowledge, and skills.Typically, more of these attributes improve the team's cognitive state since they contribute to the facilitation of teamwork-an impact of team composition.Role diferentiation by assigning roles based on the strengths of each member and subsequent role heterogeneity is shown to decrease cognitive workload [33].Agents also contribute to their teammates' cognitive state through their own informational participation, but depending on its characteristics, the deeper mechanisms are harder for humans to intuit so shared states look diferent.
The foundational defnition of team processes comes from Marks et al. [23], who describes these processes as how members work interdependently to share resources and organize task work to yield a meaningful outcome [23].While these processes describe stages of collaboration over time, they may present diferently in HATs compared to human teams.Introducing a non-human entity, especially one without clear afective and cognitive processes, to a team has several implications not present in human teams.Human perceptions of an agent (i.e.trust, reliability, fear, suspicion) directly infuence team dynamics [2].Agents typically lack the intelligence, emotion, and other characteristics of their human teammates, which can have negative implications such as lack of trust and higher mental workload [8,10,35].Although currently challenging, agents should be thoughtfully designed to reduce the mental workload of its teammates [42].
Within recent years, there has also been a radical shift in how teams are distributed, causing an increased reliance in virtual communications spanning time and space [27].Temporally-and spatiallydispersed teams may have diferent perspectives of shared goal and identity that is naturally established in co-located teams [7].The impact of virtual communication on dispersed team dynamics and communication patterns is not yet fully understood [11].The level of interdependence in virtual teams is also difcult to measure as compared to physical teams, where the execution of sequential and interdependent tasks requires frequent communication [16].Naturalistic dialogue is especially important in remote teams due to the lack of physical nonverbal indicators (i.e.body language) that contribute to team dynamics [24].Team cognition has previously been measured dynamically using speech-based measures [20], suggesting speech as a efective measure of teamness.

Candidate Measures
Surveys and behavioral measures of teamwork can be subjective and obtrusive if they interrupt teams in real-time.With teamwork evolving, it is imperative to take a more naturalistic, multimodal approach to measuring dimensions of teamness using less obtrusive measures that can be applied beyond traditional teams.While noninvasive physiological measurements such as electroencephalogram (EEG), functional near-infrared spectroscopy (fNIRS), galvanic skin response (GSR), eye-tracking, and heart rate variability are telling of one's physical and cognitive states during collaboration, these measurements cannot be applied to artifcial agents nor in many real-world environments [28].However, speech is a rich, multidimensional, team-level metric that can be measured unobtrusively in most team types.Natural dialogue is complex yet informative on its own.In this paper, we analyze speech patterns using wordcounting into distinct psychologically meaningful categories [41], described in 3.2.We note that this use of "speech patterns" difers from that in similar multimodal literature that focuses on explicit speech behaviors like question-asking, argument, reasoning, initiation style [29,32] and prosodic features of speech including pitch and rate [12,45].This use of speech patterns, in combination with other multimodal measures, shows great potential to capture the dynamic nature of teamness.

Current study and contributions.
In this paper, we aim to evaluate teamness in virtual, human-agent teams using combined survey and speech measures.We posit that four components derived from a combination of afective state, trust, team processes, and workload measures are highly interconnected to Cooke's proposed dimensions of teamness [7].In this study, we used four components: social dynamics and trust, afect, cognitive load, and interpersonal reliance to split teams into high-or low-component groups for comparison.Speech data was analyzed using Linguistic Inquiry and Word Count (LIWC) to parse out linguistic features of particular interest to teamwork.Epistemic networks were constructed to compare speech patterns of teams for each component.This unique method to visualize speech patterns allowed for the comparison of co-occurrences of LIWC features in high vs. low component teams.Our results suggest that naturalistic speech in teams can be used to model afective state, trust, workload, and team processes; all of which contribute to measuring the dynamic dimensions of teamness as teams collaborate over time.

METHODS 2.1 Participants
42 students, with an average age of 22 years old and 48% female, participated in the study at a large public university.They worked in teams of two human participants and one agent on an experimental task, with a total of 21 teams/sessions.The participants were compensated with a monetary payment of $15/hour, along with a variable cash bonus based on their task score.The recruitment and experimental procedures were approved by the university's Institutional Review Board, and the participants provided informed consent forms before starting the task.

Experimental Testbed
The experimental task was conducted using the Computer-Human Allocation of Resources Testbed (CHART) [4].CHART allows teams of two humans and one rule-based agent to collaborate remotely on a spatiotemporal mapping task, where participants are tasked with searching through historical data overlaid on a map to identify trends in unlawful activity.Specifcally, given these trends teams are instructed to allocate a limited number of 'crime prevention' resources throughout the city and each crime caught adds to the score.The interface consists of two displays: an interactive map that allows participants to visualize data from specifc past dates and categories of ofenses, and a shared map where both participants and the agent place their resources, represented as pins.The shared map displays the team's current task score as well as each individual's contribution to the team's score (Fig. 1).

Survey Measures
Following each round, participants were given a battery of state surveys described next.Within the framework proposed by Marks et al. [23], team processes were measured using items from the established Team Processes survey from Mathieu [25].We specifed that the team consisted of both the other human and the agent.The survey included the items with highest factor loadings and adaptability to our scenario for the processes of coordination, confict management, goal monitoring, strategy formulation, and cohesion (from "Afect Management" items) with a Cronbach's alpha = 0.93.The participants recorded the extent of their agreement on a Likert scale from Strongly Disagree to Strongly Agree.Participants were also given a visual analog scale to report their emotional valence ("very negative" to "very positive") and arousal ("very sleepy" to "very active"), based on Russell's classic circumplex model of afective states [34].Three items from the NASA-TLX were presented via an on-screen slider to assess mental demand, temporal demand, and perceived performance [14].Cognition-based trust, afect-based trust, and teammate-monitoring behavior were measured with the highest factor-loading items from McAllister [26].

Task Performance
Task score was calculated using the number of events that were 'caught' within the radius of a team's pins.Since real data was used for the task, fnding a true solution (optimal pin placements) is computationally intractable for any date.The maximum score obtained by a team was 21 events 'caught' in a single round, and 80 events across all 8 rounds.

Speech Measurement
Audio of speech was recorded using Zoom, transcribed using Whisper, and processed through the LIWC-22 application.Whisper is an open-source, multi-lingual automatic speech recognition model supporting speech translation and language identifcation developed by OpenAI [31].Whisper was chosen given its potential use in future intelligent systems (i.e.conversational agents) to capture user speech and correspondingly apply models to determine interventions.Along with Whisper, stable-ts library was used, which provides timestamp stabilization and thus improves segment-level timestamps [17].A subset of Whisper-generated transcripts were compared to human-generated transcripts to validate its accuracy.We observed a Word Error Rate (WER) of 2.15%.LIWC-22 computes over 100 features per utterance based on a series of pre-defned dictionaries [6].These features indicate characteristics of utterances including the number of words spoken and percent of words related to a predefned category.The features from LIWC-22 extracted for the present study are listed in the section 3.2.

ANALYSIS
Grounded in the teamness framework, our hypothesis testing pipeline went as follows: derived four teamwork components through survey feature selection, split each component into comparison groups, selected speech features using LIWC characterization, and fnally, conducted ENA of those LIWC speech features per each teamwork component.

Survey Feature Selection
To reduce the number of total survey measures while minimizing information loss, principal component factor analysis (PCA) identifed 4 combined measures of valence, arousal, cognitive trust, afective trust, teammate monitoring, mental demand, temporal demand, perceived performance, and team processes (coordination, confict management, goal monitoring, strategy, and cohesion).A Kaiser-Meyer-Olkin test deemed the data well suited to factor analysis with a value of 0.85.Subsequent PCA returned 4 components with eigenvalues greater than 1.Varimax rotation with a loading cutof of 0.3 identifed the following composition of factors (Table 1) explaining a cumulative total of 76% variance.We selected these four components as variables of interest relating to several dimensions of teamness.
We defne component 1 as social dynamics and trust, as it is comprised of every team process measure (loadings 0.86 -0.92), valence (0.48), cognitive trust (0.63), and afective trust (0.66).This describes the positive social perceptions of one's team, along with their levels of trust.Component 2 represents emotional valence (0.68), arousal (0.76), and perceived performance (0.80).Thus, we label this as afect, given its strong link to both afective measures as well as a team's sense of accomplishment.Component 3, comprised of temporal demand (0.87), mental demand (0.68), and arousal (0.37), straightforwardly captures teams' cognitive workload.Lastly, cognitive trust (0.47), confict management (0.36), teammate monitoring (0.87), and inverse mental demand (-0.57) make up component 4. Cognitive trust and monitoring behavior are directly based on one's judgment of their teammate's ability in the task; it follows that as a teammate's competence increases, individuals can rely on them and decrease their own mental demand.Thus, we refer to this component as interpersonal reliance.A median split was performed on each component to divide teams into high and low teams per component: Social Dynamics and Trust (median = 0.2265), Afect (median = -0.1409),Cognitive Load (median = -0.0938).and Interpersonal Reliance (median = 0.1437).Along with the survey measures, team success (0,1) was an outcome measure.A median split was used for team score (median = 1.8) to divide teams into high-performing (1) or low-performing teams (0).The median was then subtracted to center the data around 0.

LIWC Feature Selection
Linguistic content ofers insight on a team's processes, afect, and even specifc collaborative problem-solving skills [40].Six LIWC features were selected out of the possible 100 based on prior work using LIWC to model team processes.Specifcally, LIWC is commonly used for analyzing speech data from multi-party conversations and prior work has identifed several features most refective of team processes including Analytic Language, Drives Language, Positive Tone, Negative Tone, Cognitive Processes, and Past Tense as described in Table 2.

Epistemic Network Analysis
Epistemic networks [38] have the potential to meaningfully unpack real-time, conversational speech data during collaborative activities in teams.The epistemic networks are constructed using an optimization routine that accounts for the co-occurrence of features across utterances within conversations.This results in a network with connections between nodes (i.e., LIWC features) weighted to refect how frequently features co-occur within each conversation.Conversations can be grouped to make comparisons between patterns in discourse associated with particular outcomes (e.g., successful versus unsuccessful task performance).ENA is a valuable modeling approach as it allows for understanding connections between features in discourse, as well as quantitatively and qualitatively comparing patterns in discourse related to specifc outcomes [38].Additionally, the networks that emerge can be used to evaluate whether the model features are able to successfully capture the component of interest and distinguish between outcomes for the construct.
In the present study, the six LIWC features were used as the nodes in the network to compare patterns in discourse in relation to our four team components and team success.Specifcally, networks were compared for team conversations according to: High Social Dynamics and Trust versus Low Social Dynamics and Trust, High Afect versus Low Afect, High Cognitive Load versus Low Cognitive Load, and High Interpersonal Reliance versus Low Interpersonal Reliance.All conversations for high versus low groups were determined using a median split as explained in Section 3.1 (resulting in n=65 groups in each network; n = 130 total group conversations).The stanza size for the analysis in all networks was set to a moving window of 4, to best capture patterns occurring within conversations.The networks were compared frst quantitatively with a t-test examining diferences in the mean centroids of each network.The networks were then compared qualitatively based on the diference in weighted connections between the two networks for each outcome (i.e., the subtracted network).
As a measurement check, we expected to see alignment between participants' self-reported afect and the values of the LIWC features representing afect.We therefore ran a Pearson correlation of the Valence factor from the surveys (ranging from 1= very negative to 5 = very positive) against relevant LIWC features: positive tone and positive emotion.Valence was positively correlated with both positive tone (r = 0.37, p < 0.001) and positive emotion (r = 0.34, p < 0.001), confrming that these LIWC features are indeed aligned with participants perceived emotional valence.

RESULTS AND INTERPRETATION
An ENA comparison of LIWC features are reported per component, with their relation to the hypotheses specifed, and then the same process for team performance.Afterwards, the implications of these results for teamness evaluation are discussed.

Component 1: Social Dynamics and Trust
We frst developed an ENA model to compare speech patterns associated with high vs. low levels of the Social Dynamics and Trust component (Fig. 2).A two sample t-test assuming unequal variance showed that the network for Low Social Dynamics and Trust teams (M = 0.16, SD=0.64,N=65) was statistically signifcantly diferent at the =0.05 level from the network for High Social Dynamics and Trust teams (mean=-0.16,SD=0.28,N=65; t(87.79)=3.66, p<0.001,Cohen's d=0.64).Qualitative analysis of the subtracted network revealed that teams with higher social dynamics and trust used signifcantly greater co-occurrences of analytic, cognitive processes, and past tense language (see Figure 2).This fnding suggests that these three LIWC features can explain some of the diferences in outcomes regarding how teams dealt with confict as well as level of trust, validating hypotheses H1, H5, and H6.Notably, the link

Component 2: Afect
An ENA model to compare speech patterns associated with high vs. low levels of the Afect component was constructed (Fig. 3).A two sample t-test assuming unequal variance showed Low Afect teams (mean=-0.19,SD=0.54,N=65) was statistically signifcantly diferent at the =0.05 level from High Afect teams (mean=0.19,SD=0.29,N=65; t(98.66)=-4.97, p<0.001,Cohen's d=0.87).Groups that rated themselves with greater afect (more positive and energetic) exhibited increased use of analytical thinking, cognitive processes, and past focus language (H1, H5, H6). ).The analysis revealed that teams with increased cognitive load are most strongly diferentiated by increased co-occurrence of cognitive processes language with past focus, analytical thinking, and positive tone language (H1, H3, H5, H6).The strength of the connection between past focus and analytical thinking specifcally suggests that teams with higher cognitive workload tended

Analytic
The "Analytic" feature is a summary variable used to measure logical or abstract thinking (through increased article use) and cognitive complexity (through increased preposition use) [30].Analytic words have been positively correlated with increased team member efectiveness scores [1], revealing higher levels of interdependence between team members.H1: We expect that frequent presence of analytic talk will correspond with higher team processes and with better team performance because logical thinking is necessary to coordinate team activities and successfully complete the CHART task.

Drives
The "Drives" dimension includes words of achievement, afliation, power, reward, and risk through use of frst-person pronouns like "we", "us", and "our" [6].Drives language is highly correlated with the teamness dimensions of having shared goals and role hierarchy.Drives words have been correlated with the collaborative problem solving facet of maintaining team function [40].It is important for tasks to simulate real-world risks, as echoed by Cooke [7].Because CHART mimics real-world risks, we hypothesize that: H2: frequent presence of drives will correspond with higher team processes and with better team performance because of the role hierarchy and level of interdependence required of the CHART task.

Positive Tone
Compared to previous versions, LIWC-22 has further classifcations of positive and negative emotions into tone categories.These categories now refect sentiment, rather than emotion by incorporating words related to certain emotions [6].Assents and positive emotion words measure levels of agreement [41].It is true that when group members express positive sentiment, it tends to facilitate group functioning [18,23].Positive tone words have been positively correlated with higher peer ratings of team efectiveness [1].Therefore, positive tone language may be associated with higher level of teamness throughout a task.H3: We expect that frequent presence of positive tone will correspond with higher team processes and with better team performance as demonstrated in literature.

Negative Tone
Negative afective tone has been associated with poor team performance, decreased group identifcation [19,21], and decreased team cooperation [21].H4: We expect that the increased presence of negative tone will correspond with poorer team processes and with lower team performance due to lack of group cohesion.Cognitive Processes Cognitive processes words represent causation, discrepancy, diferentiation, and insight [6].This measure can evaluate the degree to which group members engage in refective thinking.For instance, van Swol et al.
(2016) found that groups that had a member with an extreme opinion used less cognitive process language than groups without such members [44].This may have resulted in a reduced interest in meaningful conversation.Additionally, van Swol et al. (2021) observed that group members who engaged in more perspective-taking utilized more cognitive process language [43].Both refective thinking and perspective-taking coincide with the teamness dimensions of role hierarchy and heterogeneity.H5: We expect that frequent presence of Cognitive Processes will correspond with higher team processes and with better team performance due to healthy levels of interdependence and heterogeneity of the team (humans and agent).

Past tense
The "Focuspast" feature refers to words spoken in the past tense.Because of the nature of the CHART task, we propose that more successful teams and those with higher team processes with have more frequent use of words in the past tense.H6: Because successful task completion requires frequent reference to the historical data, we expect increased past tense language to correspond more frequently with higher team processes and team performance.
to communicate more about past events, perhaps referencing the historical data in the CHART task more frequently.Conversely, teams that maintained a lower cognitive load were more likely to include drives and analytical thinking together in their discussion.

Component 4: Interpersonal Reliance
An ENA model to compare speech patterns associated with high vs. low levels of the Interpersonal Reliance component was constructed (Fig. 5).A two sample t-test assuming unequal variance showed Low Interpersonal Reliance teams (mean=0.08,SD=0.50, N=65) was statistically signifcantly diferent at the =0.05 level from High Interpersonal Reliance teams (mean=-0.08,SD=0.29,N=65; t(101.85)=2.36, p=0.02,Cohen's d=0.41).Teams with lower interpersonal reliance displayed higher instances of negative tone (H4) and analytic thinking.This efect seems to follow from instances when individuals could rely less on their teammate and thus had to take on more of the task load themselves.Teams with higher interpersonal reliance had frequent co-occurrences of cognitive process language with drives and analytic language (H1, H2, H5), as well as more use of past tense language (H6).

Team Performance
An ENA model to compare speech patterns associated with high vs. low performing teams were constructed using each team's score (Fig. 6).A two sample t-test assuming unequal variance showed Low Performance (mean=0.09,SD=0.45,N=57) was statistically signifcantly diferent at the =0.05 level from High Performance (mean=-0.07,SD=0.30,N=73; t(93.92)=2.34, p=0.02,Cohen's d=0.43).High performing groups had more co-occurrences of positive tone and cognitive processes, confrming hypotheses H3 and H5.Low performing teams had more instances of past tense, drives, and analytic language, contrary to our hypotheses: H1, H2, and H6.However, low-performing teams used more negative tone, confrming H4.

Interpretations
A few LIWC feature comparison results that stood out for their unexpected relationships or their potential signifcance in the efort to evaluate teamness.First, increased co-occurrences of drives and analytic language were observed in low afect, low cognitive load, and low performing teams.This suggests that drives language may not be indicative of ideal teamwork language (contrary to H2).Firstperson plural pronouns (identifed as drives language) can indicate increased team cohesion, better performance, and a greater sense of group identity [36,39,46] as it decreases hierarchical challenges and promotes group communication [46].However, use of frstperson plural pronouns can also indicate use of the Royal We.The Royal We refers to the use of 'we' language by a superior fgure to really mean 'you' rather than 'us', signifying role hierarchical issues [41].This negative relationship between frst-person plural language and group cohesiveness has been observed in prior work [13].The presence of drives language in low workload and low afect groups also support drives as an indicator of poor teamness.This is because teams typically perform better with a healthy amount of cognitive workload and moderate afect.While drives language may not be the best measurement of optimal teamwork, it was used more frequently among teams with higher inter-reliability.This suggests that teammates relying too heavily on each other may translate to degraded team processes or performance.This aligns with our theory that drives language may be representative of role hierarchy and interdependence of a team.More drives language will be present in groups that show higher, unhealthy levels of interdependence.
Increased co-occurrences of analytic, cognitive processes, and past tense language were observed in high social dynamics and trust, high afect, and high cognitive load groups (confrming H1, H5, and H6).This suggests an noteworthy relationship of these features with higher team processes.We posit that use of cognitive language represents role hierarchy and heterogeneity dimensions of teamness while analytic language demonstrates interdependence.This implies that teams with higher social dynamics and trust likely engaged in refective thinking more openly than teams with lower trust.High afect teams, motivated by having a shared goal, used these speech features to maintain team function and role hierarchy.The presence of these features in high cognitive workload groups also demonstrates that use of these linguistic features may help groups manage cognitive load in an efective way.As for past tense language use, future work could investigate if this language is important to quantifying teamness or if it is specifcally important to our task.
Regarding signifcant diferences in use of tone, higher performing teams used more positive tone language than lower performing teams (confrming H3), while lower interpersonal reliant groups used more negative tone language than highly reliant teams (rejecting H4).Tone is indicative of team identifcation, which describes how one self-identifes using the entire team's characteristics and is closely related to performance outcomes [21].Frequent positive tone leads to higher levels of social integration and subsequently better performance [22], while negative tone is associated with weakened team identifcation and cooperation [21].Our results of greater use of positive tone language in higher-performing groups are consistent with previous fndings and suggest a greater sense of team identity [22].Because the CHART task requires high levels of interpersonal reliance through information sharing, greater negative tone usage is consistent with degraded team identifcation among low interpersonal reliant teams.High performing teams likely used more encouragement and social language, which is consistent with better performance [9].While groups with less reliance on each other likely used more negative tone words due to their lack of unity.A more balanced approach to reliance is needed in teams when performing a collaborative task.
A few noteworthy interpretations arise from these results that should be considered when designing real-time teamness-rooted interventions: • Drives language may not be a well-suited measure of positive teamness or high performance.However, it may indicate too much reliance between team members leading to nonideal levels of interdependence.• Cognitive processes and analytic language were observed in higher team processes groups.While we argue these two features' relation to the the teamness dimensions of role hierarchy, heterogeneity, and interdependence, this requires further validation in future work.• Although positive and negative tone can ofer some general information about how a team is interacting, tone was not a very informative measure of the dimensions of teamness.

CONCLUSIONS 5.1 Study Limitations
While our study provides valuable insights into using speech to model the dimensions of teamness, there are several limitations that must be acknowledged.First, Whisper is a new transcription system and some features may be unstable, impacting some of the LIWC results.Second, while speech is a valuable measure of team communication, our fndings could be further strengthened by incorporating additional multimodal measures (i.e.eye gaze, gesture, physiological data) of team interaction.Future research should explore the potential benefts of using multiple measures in combination with speech to better understand the complexities of team collaboration.Third, teamness is a novel construct that requires further development through the identifcation and measurement of interaction-based dimensions [7].While our study provides a valuable starting point for this research, further validation is required to capture the full range of teamness dimensions.We encourage future work to expand upon our initial fndings between teamness dimensions and associated LIWC features as not all of our hypotheses were correct.Finally, surveys are subjective and obtrusive as they often interrupt simulation of a real-world task.While we took steps to minimize disruption by distributing surveys in between task rounds, future studies should consider alternative methods for collecting data that minimize the impact on team members and do not interfere with team collaboration.

Conclusions and Future Work
This study marks a signifcant step toward the quantifcation of teamness in HATs, with implications for the development of technology to support team processes in real-time.Analysis of epistemic networks comparing speech patterns characterized by LIWC features revealed signifcant diferences between high and low teamwork component groups.When taken in real-time, these speechbased measures can inform the design of intelligent systems that support team processes longitudinally.For example, co-occurrences of analytic and negative tone language indicating low interpersonal reliance could result in a real-time intervention to increase such reliance.A reliance-building intervention may prompt or encourage teammates to increase transparency by communicating their knowledge and reasoning behind an action.The fndings suggest that incorporating multimodal data (e.g.physiology, eye-gaze, gesture) into discourse analysis could further improve the accuracy of performance predictions and deepen our understanding of team collaboration dynamics.With recent advances in natural language processing such as the development of conversational agents and proliferation of chatGPT [15,37], the information we can glean from human speech and the ability to use an agent's speech for more teamwork related functions will continue to grow.The fndings suggest that incorporating multimodal data (e.g.physiology, eye-gaze, gesture) into discourse analysis could further improve the accuracy of performance predictions and deepen our understanding of team collaboration dynamics.

Figure 1 :
Figure 1: Shared map view of CHART featuring individual scores and a timer reporting the remaining time left in the round.

Figure 2 :
Figure 2: Subtracted ENA network for High Social Dynamics and Trust (blue) -Low Social Dynamics and Trust (red) teams.Blue networks between Cognitive Processes, Past Tense, and Analytic Language translate to greater co-occurences of these features in High Social Dynamics and Trust teams.

Figure 3 :
Figure 3: Subtracted ENA network for High Afect (blue) -Low Afect (red) teams.High Afect teams showed greater usage of Cognitive Processes, Past Tense, and Analytic Language, while Low Afect teams used Drives language more often with Analytic language.

Figure 4 :
Figure 4: Subtracted ENA network for High Cognitive Load (blue) -Low Cognitive Load (red) teams.Teams with High Cognitive Load more often used Cognitive Processes, Past Tense, and Analytic Language.Teams with Low Cognitive Load frequently used Drives and Analytic Language.

Figure 5 :
Figure 5: Subtracted ENA network for High Interpersonal Reliance (blue) -Low Interpersonal Reliance (red) teams.Highly reliant teams more often used Negative Tone with Analytic Language, while less reliant teams used Cognitive Processes, Past Tense along with Analytic Language.

Figure 6 :
Figure 6: Subtracted ENA network for High Performing (blue) -Low Performing (red) teams.Teams with higher performance used Cognitive Processes language with Positive Tone, while lower performing teams typically used Drives with Analytic Language.

Table 1 :
Components, factors, and loadings yielded by factor analysis between cognitive processes and analytical thinking is the most prominent in this network, highlighting the contribution of these features to social dynamics.

Table 2 :
Selected LIWC features and associated hypotheses