Beyond Dyadic Interactions: Assessing Trust Networks in Multi-Human-Robot Teams

Many HRI applications (such as in search and rescue; SAR) require multiple humans to interact with robot agents, making it essential to understand and evaluate both the trust in robots and trust in teams when robots are embedded into such team structures. In the present study, we utilized a virtual urban search and rescue task to compare individual and team trust and associated team performances between (1) all-human and multi-human-robot teams (mHRTs) with reliable robot behavior, and (2) mHRTs with reliable and unreliable robot behaviors. The team structure included a mission specialist (human), a navigator (human or robot), and a safety officer (human). Utilizing apriori pair-wise comparisons, we found that the human navigator was trusted more than the reliable robot navigator by other teammates and that the trust in the robot navigator declined when it performed unreliably. Interestingly, team trust remained comparable between all humans and mHRT (under reliable conditions), but the mHRT team trust levels declined under unreliable robot conditions. Trust between the human dyads was not affected by the actions of the third agent (whether human or robot). Finally, while introducing a reliable robot teammate did not improve team performance, robot unreliability significantly improved performance on the SAR task. The study captures changes in trust networks between human teammates by introducing robots with varying performances.


INTRODUCTION
With advancements in AI, robot applications in unstructured realworld contexts, such as search and rescue (SAR), have steadily grown [20,22] and robots have been integrated into human teams, creating unique teaming structures with multiple humans and the integrated robot.Robots are far from intelligent and often fail in unstructured settings [11].However, robots and humans have complementary skills and can work together to achieve a common goal.For such a collaboration to succeed, humans need to trust the robot.Trust is afected by several robot factors, such as anthropomorphism, agent behaviors (performance, reliability, transparency), as well as human (e.g., prior experience with robots [25], human states [14,30] and environmental factors (e.g., time pressure [24]).Most of the work in human-robot teaming focuses on a dyad of one robot and one human.Murphy [21] highlights the need for a 2:1 humanto-robot ratio in emergency response where sending animals or humans is hard or dangerous.Research focusing on multi-human autonomy teams has typically relied on subjective measures and performance [19,24].In such scenarios, it is important to understand trust in the human dyads and between human and robot agents and team trust to examine how such trust networks impact team and task performances.
Lee and See [16] defned trust as "the attitude that an agent will help achieve an individual's goals in a situation characterized by uncertainty and vulnerability."Overtrust in the robot can lead to human complacency, while undertrust can lead to a loss of efciency [6]; both can result in unsafe work operations for human team members.In human teams, intrateam trust has been shown to impact team performance signifcantly [5].Beyond human-only interactions, a few studies have examined and fundamentally modeled trust networks in teams containing multiple humans and one (or multiple) autonomy agents, particularly in high-risk and sharedspace environments [17].Prior work on capturing trust in embodied autonomy agents (e.g., robots) has focused on civilian trust in robots guiding evacuations during emergencies [24].While informative to our proposed work, these studies are not focused on teaming (i.e., inter-dependencies) and do not address how trust evolves and spreads in multi-human-autonomy teams.
This study utilized a virtual simulation of SAR exercises in all humans and mHRTs to compare trust networks (dyadic and team) between all human and multi-human-robot teams (mHRTs).The study also examines how varying robot performance impacts the trust networks of mHRTs.Finally, the study aims to capture accompanying changes in team performance and perceived workload, situation awareness, and fatigue.

METHODS 2.1 Participants
The study recruited 46 participants (20 females, 26 males) to form = 23 teams (7 male-male, 4 female-female, and 12 male-female) from the local student population at a public university.The local Institutional Review Board approved the study.The mean age of the participants was 24.14 ± 3.05 years.On average, participants spent 4.73 ± 5.55 hours per week playing computer games.

Experiment Protocol
Figure 1: Experimental protocol.The experiment followed a within-subjects design, and the participants performed the task in all three team confgurations: all human teams (HHH), reliable multi-human-robot teams (HRR), and unreliable multi-human-robot teams (HRU).
The experimental setting consisted of a virtual environment created in Unity 3D in the format of a game.The scene depicted a SAR scenario where participants were tasked to locate victims from a burning building.The building was designed to be two stories high, including a lobby area with left and right wings on each foor.Each participant was positioned in front of a monitor displaying the game to complete the objective.The participants could move freely around the environment by pressing the keyboard's arrow keys and pan perspectives using the mouse.
The team was asked to work collaboratively with three distinct task roles: mission specialist, safety ofcer, and navigator.1 always assumed the role of the mission specialist, 2 the role of the safety ofcer, and 3 or the robot assumed the role of the navigator dependent on the condition.Using the Wizard of Oz method [3], the robot followed the exact behavior of 3 /navigator.These roles were assigned randomly after confrming that the participants did not know each other before the start of the experiment.Team familiarization between the team members can signifcantly impact team dynamics and trust.Javaid and Estivill-Castro [15] used greetings and short introductions for familiarization between members.
Participants were asked to complete a team familiarization task before the experimental trials.The task included two engagement activities where the participants had to (1) introduce themselves to the team and (2) plan a picnic event with the team.
The mission specialist was required to coordinate with the safety ofcer and navigator to lead the entire team.When a victim was discovered, the mission specialist logged the fnding.The safety ofcer monitored air quality using the nitrogen oxide ( ) levels throughout the environment.The screen of the safety ofcer displayed levels of low, medium, and high hazards depending on their location in the building.The safety ofcer was tasked with evaluating the risk in particular areas of the building and determining if it was safe for the team to proceed.The navigator's responsibility was to aid the mission specialist and safety ofcer by guiding directions to the location of the victims.The participants were told that the navigator had access to thermal imaging cameras and could guide them to the victim's location.However, the role of the navigator was performed by a confederate who was provided with a map of the environment, revealing the location of four bodies, with two on each foor. 1 and 2 were permitted to request directional recommendations by asking, "Navigator, do you have any suggestions?"The navigator was instructed to lead to the victims' locations sequentially.Before the experiment, seven voice commands were prerecorded so the robot could communicate to 1 and 2 . 1 and 2 could ask the robot for directional guidance, similar to when the navigator was a human.The study used a within-subjects design, and human dyads went through all three experimental conditions with fve 3-minute long trials in each condition as shown in Figure 1.The HHH and HRR conditions were counterbalanced to minimize order or learning efects on the experimental outcomes; however, the HRU condition was always presented the last.The confederate ( 3 ) was introduced as a normal participant and was only present during familiarization and HHH conditions.After 3 left the site, the human dyad was introduced to the robot functions and communication.The robot functioned normally during the fve trials in the HRR condition.The robot had two failures within the fve trials in the HRU condition.The failure modes included stopping against a wall, repeating the same direction recommendation, or the robot announcing "system malfunction" and stopping.The experimenter controlling the robot conveyed these responses when prompted by accessing prerecorded instructions (e.g., "I suggest taking a U-turn at the next intersection", "I do not have any information from my sensors").ofcer/mission specialist"), trust in the navigator ("I trust the navigator"), and trust in the team ("I trust my team"), while two others captured fatigue ("What is your fatigue level?") and mental efort ("What is your mental efort?").At the end of each condition, participants completed the NASA Task Load Index (TLX) questionnaire [10], the situation awareness rate technique (SART) [28], and a modifed team trust questionnaire [4] on a 7-point Likert scale (6 of 21 total questions were administered based on [7]).

Measurements
2.3.2Performance outcomes.Team performance was quantifed using two metrics: the total distance traveled by the team and the total number of victims located.The total distance was computed by combining the distance traveled by 1 and 2 in the virtual environment in each trial.Number of victims located was calculated as the cumulative sum of all victims marked by 2 in each trial.

Statistical Analysis
The data were frst tested for normality using the Shapiro-Wilk test.Upon confrming the normality, two apriori paired t-tests were performed to compare HHH-HRR and HRR-HRU on the various study measures.A non-parametric Wilcoxon signed-rank test was employed if the data was not normally distributed.The value for statistical signifcance was set to 0.05.All statistical analyses and visualizations were performed using R version 4.3.2[23].Efect size is reported as Cohen's using the 'efsize' R package.The statistical analysis results are also indicated within the plots, where "ns" denotes not signifcant ( > 0.05), "*" indicates signifcance (0.01 < < 0.05), "**" denotes high signifcance (0.001 < < 0.01), and "***" signifes high signifcance ( < 0.001).

DISCUSSION AND LIMITATIONS
Integrating robots in human teams presents signifcant challenges, with trust between team members being critical.We found that people trust humans more than robots [8], even though both provide the same information and similar communication styles.This is in line with previous studies in simulated emergency scenarios that report that as the number of autonomy agents increases in a three-member team, the trust in the agent decreases [1,26].
The introduction of a robot teammate presented new team behaviors.For example, the team traveled less with a robot navigator than with a human navigator.Similar fndings were also reported by [26], who observed that team performance increased with an increase in the number of autonomy agents in the team.Although not signifcant, we found that this was also associated with a trend towards lower perceived fatigue when using a reliable robot.The decline in situation awareness in the reliable mHRT condition suggests that people were more reliant on the robot for navigation, reducing their operational awareness even though they reported less trust in the robot compared to humans.
We expected the teams to perform poorly with an unreliable robot [2,12].However, the teams could locate more victims (and cover more distance) when working with an unreliable robot.The increase in the number of victims in unreliable conditions comes with a cost of signifcantly higher perceived fatigue.Based on the preliminary observations made from the screen recordings and experiment video data, it appeared that the human dyads often slowed down or stopped altogether, anticipating the suggestion from the reliable robot.Once the robot started showing signs of unreliable behavior, the dyads focused on continuously exploring the environment independently while communicating with the robot and using it as secondary input in the background.This shift provides insights into how teams adopt emergent behaviors to compensate for poor robot performance.
A noteworthy observation found in the present study is that perceived team trust remains intact even when a reliable yet less trusted robot substitutes a human team member.However, it becomes evident that team trust diminishes when the robot's performance declines.This implies that the performance of the robot not only afects trust in the individual agent but also exerts infuence over the overall trust network within the team, similar to that reported by McNeese et al., [18].Prior work that specifcally probed on reliance on autonomy agents reported deviating fndings from the ones observed here, in that reliance on the team does not change despite lowered reliance on autonomy agent [9].Future research may beneft from diferent perspectives [13,31] and more granular metrics of team trust [29] to establish how trust networks change with varying robot performances.
The virtual environment was designed to simulate urban SAR missions.However, it abstracts the complexity and unpredictability of the real-world environment, and thus future research is needed to bridge this gap by comparing emergent HRIs in physical and virtual realities.The data for the present study was collected by recruiting university students and thus cannot be generalized to emergency responders, yet it may still provide insight into emergent mHRT behaviors.Another important consideration is to look at the sex distribution within the teams; it is known that people may perform diferently in the presence of diferent sexes [27].The preliminary analysis presents the distance traveled as a performance metric; however, the task was very complex, and distance alone may not capture the true performance of the team.Future analysis will focus on team and task metrics that include the number of victims identifed and the characterization of the path itself, as well as a deep dive into communication behaviors.

2. 3 . 1
Subjective responses.At the beginning of the experiment, participants completed a background questionnaire to capture their demographics.At the end of each trial, participants completed a set of fve questions, each on a 7-point scale.Three of these captured trust metrics include trust between the dyads ("I trust the

Figure 2 :
Figure 2: Trust metrics across the three-team confgurations: all human team (HHH), multi-human-robot team with a reliable robot (HRR), and multi-human-robot team with an unreliable robot (HRU).(a) Trust in human-human dyads across all the trials (b) Trust in the navigator, where the navigator could be a robot or a human.(c) Team trust ratings captured at the end of each trial.(d) Team trust ratings captured at the end of each condition using six sub-scales.

Figure 3 :
Figure 3: Situation awareness and fatigue across the diferent team confgurations.

Figure 4 :
Figure 4: Team performance metrics across the diferent team confgurations.