"Don't Judge a Book by its Cover": Exploring Discriminatory Behavior in Multi-User-Robot Interaction

With multi-user scenarios in human-robot interaction the problem of predisposed and unfair robot treatments due to biases arises. Thus, this study explores whether individuals recognize discrimination by a social robot and the impact of the feeling of exclusion. As a social consequence, the influence of robot discrimination on the perception of interaction partners and the attribution of blame is focused. Employing a VR-based multi-user lab experiment simulating a library task, participants experienced discrimination by a robot. Results suggest that discriminated individuals felt more discriminated against, albeit not significantly more ostracized. Moreover, discrimination influenced the self-attribution of blame and observers' evaluations of the discriminated user's competence. This work highlights the complex social impact of robot discrimination on human interactions and team dynamics.


INTRODUCTION
Through the integration of social robots into various spheres of society they are taking on a role in diferent social contexts, for example as assistants in healthcare [4], city institutions [9] or shopping centers [19].This means that interaction between humans and social robots often takes place in public spaces used by multiple people at the same time.Consequently, the applied robot needs to be able to interact with a variety of diferent users.However, research pointed out that algorithmic biases [14,17,18] exist, resulting in unequal functionality, interaction, and treatment of users.Also, as soon as the interaction involves more than two people, the question arises of how the robot allocates its resources to the interaction partners.Among other things, this process can be infuenced by the goal of efciency or be the result of a bias.As a result, a person or group of people can be discriminated against [10,17].This efect is even stronger when diferent user groups interact with the robot and one group is afected by technical biases.While a lot of efort is taken to avoid these kinds of biases [11,17], there is still a way to go and biases unintentionally occur in the interaction with AI systems like robots [18].Thus, the human reaction to biased interactions needs to be investigated to assess their potential psychological consequences.This study accordingly addresses the questions of whether (a) people recognize discrimination by a social robot in group interactions, (b) how this socially afects the perception of the interaction partners (robot and human) and (c) who will be blamed for the problems caused by a biased interaction.

RELATED WORK
Due to the diferent areas of robot presence in public spaces, there is also human-robot interaction in groups, which is likely to become more relevant in the future [22].Since humans react to social robots with social behaviors [2], it is relevant to investigate not only technological but also social challenges.When allocating its resources to interaction partners, the robot's behavior can be infuenced by a bias.Algorithmic bias is defned as "the outputs of an algorithm [that] beneft or disadvantage certain individuals or groups more than others without a justifed reason for such unequal impacts" ( [14], p.1). Biases as systematic unequal treatment can be caused, for example, by a data set that is based on discriminatory patterns [17].While discrimination in a social sense is typically linked to specifc characteristics of the user, we understand discrimination as an unequal treatment by the system in this study.The potential risk of bias in the robot's behavior should be viewed against the background of regular objectivity attribution to technologies.This makes biases in robot's behavior harder to detect, as its behavior will be considered objective and fair.This was demonstrated in a study by Hitron et al. [10].They also showed that the stereotypical behavior of the robot strengthens the stereotypical thinking of people [10], making the social consequences of biased interaction visible.Another social consequence is that discrimination by robots can lead to feelings of ostracism.Ostracism describes the feeling of exclusion [21].If people are discriminated against in receiving positive feedback during a triadic human-robot interaction by the robot NAO, discriminated people feel more excluded than non-discriminated people [20].Hence, we assume the following: H1: Robot-discriminated users state a higher feeling of ostracism than users that receive no discrimination from the robot.
Research has already focused on whether biased interaction and robot discrimination afect the evaluation of the robot.In the study of Spisak and Indurkhya [20] discriminated users did not evaluate the robot more negatively in terms of likability and perceived intelligence.However, Büttner et al. [5] created a situation in which people were disadvantaged by a robot in the allocation of resources required for task fulfllment.Here, disadvantaged individuals rated the robot more negatively in terms of likability and perceived intelligence.As ambivalent evidence is present, this study will further investigate the following question: RQ1: How does the discrimination by a robot in multi-user settings afect users' evaluation of the robot?
Discrimination by robots also has an infuence on the team relationship.People in a team in which one person is disadvantaged in the allocation of resources by the robot show lower satisfaction with their team relationship and poorer performance in solving the task than teams without discrimination.When refecting on the human-robot interaction, some people state that they are prioritized due to better performance [12].When disadvantaged by a chatbot, some disadvantaged individuals say that they are to blame for the disadvantage because they used inaccurate questions [8].These statements, and the fact that objectivity and neutrality are usually attributed to technologies, suggest that discrimination by robots can infuence attitudes towards people, leading to the investigation of the following research question: RQ2: How does the discrimination by a robot afect observers' evaluation of the discriminated user?Furthermore, people attribute appreciation and blame to their interaction partners when successes and failures occur during interactions.Regarding blame attribution in human-robot teams study results are ambivalent.While fndings support the fact that higher autonomy leads to stronger blame attribution towards the robot [13] other results do not support the fact that AI-based systems are more often blamed for interaction failures [1].Moreover, it was found that people blame human interaction partners as well as robots less for failures than they blame themselves [15].At the same time, the robot receives the most appreciation for success [15].The attribution of blame in the context of biased interactions that leads to discrimination of users has not been studied so far.Accordingly, the following research question is addressed in this work: RQ3: How do (discriminated) users attribute blame for the performance in discriminating human-robot cooperation?

METHOD
In order to investigate the described research agenda, a VR-based lab experiment was conducted.In a virtual library (see Figure 1 (a)), pursuing the procedure at the same time, two participants were asked to sort ten books into four diferent categories with the help of a robot.Within this sorting task the robot had a systematical bias for one person (book scanning was not successful, see Figure 1 (b)), while it worked well for the other one.

The VR-based Stimulus Material
We use Virtual Reality (VR) as it has emerged as a prevalent tool for exploring human-robot interactions in research studies.Despite its limitations in replicating physical interactions, VR ofers distinct advantages, particularly in laboratory settings.With regard to algorithmic biases, such discrimination scenarios cannot be realized in the feld under real-world conditions, as a safe and controlled environment is necessary.Since studies have evidenced that VR setups can closely simulate real-world interactions with robots [16], this allows examining human responses to robots in public spaces due to its secure, adaptable, and replicable nature.For the experiment, we used a VR-based platform serving as a framework for studying human-robot interaction in public spaces (developed in the context of the RuhrBots Competence Center).Our VR application, built on Unity 3D (LTS 2022.3.7f1),authentically replicates a library environment.Using LIDAR and photogrammetry scans, we recreated the setting's bespoke assets.The VR application integrates a detailed, scale-accurate 3D model of the Pepper social robot by Softbank Robotics, with gestures and verbal expression capabilities using QiSDK for voice synthesis.Essential features include an online multi-user cooperative component via Photon Network SDK.Interaction mechanics, designed with OpenXR SDK, enable participants to manipulate virtual objects with touch controllers, while haptic feedback through vibration enhances the sense of realism during interactions.

Study Procedure
The procedure started with an introductory meeting between the study supervisor and both participants.The experimenter told the participants that their task was to evaluate a library employee training session in VR.Each participant then proceeded to separate rooms to complete the consent form and pre-questionnaire.Following this, the study supervisor provided VR hardware instructions and safety guidelines to each participant.After entering the VR application, they independently underwent a tutorial in a test room, guided by Pepper, to practice the interaction procedure of scanning a book before entering the real library setting and to create a unique participant ID for data matching.Upon concluding the tutorial, both participants joined a network session.The session host had a 90% success rate (not discriminated) for book scans, while the peer participant had a 20% success rate (discriminated).Pepper assisted in sorting ten books per participant, scanning books, and assigning shelves.Participants interacted with Pepper to initiate scans, and upon failure, Pepper notifed them without specifying the reason (see Figure 1 (b)).Participants could attempt the process multiple times using the same odds system.Successful book sorting required communication between the participants and Pepper.Upon sorting all twenty books correctly, the task was accomplished.After completing the task, participants removed the VR hardware and proceeded to the post-questionnaire and debriefng led by the study supervisor.As there is no ethics committee at the research institution, no ethical approval exists.To the best of our knowledge and belief, we have taken all available ethical measures (voluntary and informed consent, specifc debriefng on discriminatory behavior at the end of the study, compliance with data protection) to protect the well-being of the participants.

Sample
After excluding participants (n = 3) with severe technical breakdowns from the sample, the fnal sample consists of N = 52 people (n = 26 in the discriminated condition and n = 26 in the not discriminated condition).As women and men in general are evaluated diferently due to stereotypical perceptions (e.g. with regard to warmth and competence; [6]), we decided for this initial study to investigate male same-sex dyads.Otherwise, it would be hard to keep the efects of general gender stereotypes apart from the efect of the robot discrimination on the perception of the interaction partner (see discussion part for a ethical and methodological refection).Thus, all 52 participants were male.All participants were students in computer science.In n = 4 cases (n = 2 in the discriminated condition and n = 2 in the not discriminated condition) the second participant did not show up, so these four participants interacted with the experimenter of the second room.However, these cases experienced the same interaction and experiment as the other participants.From the remaining 24 dyadic teams, 15 teams stated to either not to know each other or only from seeing and 9 teams said that they know each other on a personal level or are even friends.The average age was 22 (M = 21.92,SD = 2.83).Most of the participants had no prior experiences with the robot Pepper (n never = 32, n rarely = 4, n sometimes = 2) or did not even know what the Pepper robot is (n = 14).In addition, the prior experiences with VR were rather low for the majority (n never = 14, n rarely = 26, n sometimes = 9) and only 3 participants stated to use VR on a regular basis (n on a regular basis = 1, n often = 2).The sample is especially of interest, as male participants with a computer science background are usually a group that is not likely a target of algorithmic biases and discriminating experiences with technology.

Measures
Self-reported questionnaire data were collected.Based on items used by Kim and Hinds [13], a 5-point Likert scale was used to evaluate the attribution of blame and credit of the task outcome.Overall the following six sub-dimensions were measured: blaming the robot (2 items, = 0.737, e.g."The robot was responsible for most of the problems that occurred during the collaboration."),blaming oneself (2 items, = 0.634, e. ).Despite its low reliability ( = 0.391), the dimension of recognition to oneself is used because the analysis of blame attribution is a central topic of the present study.A scale adapted from Fiske et al. [7] was used to assess the perception of the robot regarding competence (8 items, = 0.884, e.g."Reliable") and warmth (8 items, = 0.870, e.g."Friendly").The scale is also used to determine the perception of competence (8 items, = 0.935, e.g."Competent") and warmth (8 items, = 0.893, e.g."Warm") in the human interaction partner.Furthermore, as a manipulation check, participants were asked: "How disadvantaged did you feel during the interaction in the library?" and "How excluded did you feel during the interaction in the library?" on a 10-point scale (1 = not at all, 10 = very much).In addition, socio-demographic variables, simulator sickness (13 items, e.g."Headache" or "Eyestrain", [3]), and prior experiences with robots and VR were measured.

RESULTS
No diference between the experimental groups was found with regard to simulator sickness (t(46) = 0.68, p = .500).Means of both groups indicate low feelings of simulator sickness (M disc = 1.65,SD disc = 0.67, M not disc = 1.53,SD not disc = 0.56).In order to check whether participants, who have been discriminated during the book sorting task, perceived the situation as discrimination and felt ostracized (H1), two independent t-tests have been calculated with the experimental condition (1 = discriminated and 2 = not discriminated) as independent variable and the perceived discrimination and ostracism as dependent variables.Results show that there was a signifcant diference for perceived discrimination (t(50) = 2.50, p = .016,d = 0.70), but not for perceived ostracism (t(50) = 1.07, p = .289).Participants who have been discriminated against by the robot stated a stronger feeling of being discriminated (M = 2.65, SD = 2.51) than users without (M = 1.38,SD = 0.64).However, both, the average values for felt discrimination and felt ostracism, are rather low (below 3 on a scale ranging to 10).In accordance with the aforementioned results, hypothesis H1 was only partly supported.Two independent t-tests were used to investigate the efect of the discrimination behavior on users' perception of the robot with regard to perceived warmth and competence (RQ1).Results indicate that users' who experienced discrimination by the robot did not devalue the robot, as no signifcant diferences regarding perceived warmth (t(50) = 0.73, p = .468)and competence (t(50) = 0.37, p = .912)between both experimental groups occurred.Similar analyses were used to examine the efect of robot discrimination on the perception of the human interaction partner (RQ2).Results of the t-tests indicate that the perceived warmth of the human interaction partner was not afected (t(50) = 1.17, p = .247),while there were signifcant diferences in perceived competence (t(50) = 2.18, p = .034,d = 0.62).Participants who have not been discriminated by the robot evaluated the competence of the discriminated interaction partner (M = 5.83, SD = 0.83) signifcantly lower than the discriminated participant evaluated the competence of their interaction partner (M = 6.34,SD = 0.85).To investigate RQ3 t-tests were calculated with regard to the blame attribution of failures and success on either the participant themselves, the robot, or the human interaction partner.Results indicate only a signifcant diference between the experimental groups regarding the blame attribution of failures in the performance on themselves (t(46.30)= 2.20, p = .033,d = 0.65).Means show that discriminated participants (M = 2.02, SD = 0.96) stated higher blame for failures on themselves in comparison to not discriminated participants (M = 1.50, SD = 0.72).No other signifcant diferences regarding blame attribution occurred.

DISCUSSION AND CONCLUSION
In a VR-related lab study, the efect of task-related robot discrimination was investigated.In a book sorting scenario two users interacted with a robot to complete the sorting task.However, the robot systematically reports technical issues, but mainly during the interaction with one participant (20% success rate), while the other participant had no such trouble using the robot (90% success rate).Our overall research goal was to investigate the efects of this kind of discrimination on (a) the perception of the interaction partners (human and robot) and (b) the blame attribution of this failure.
Our results frst of all indicate that the discrimination of the robot has been detected by the participants, as those who were not able to use the robot correctly stated to feel more discriminated.However, the means were rather low for both groups, indicating that participants overall would not describe their experience as discrimination (H1).Also, no efects and low means have been found for the feeling of ostracism (contradicting the fndings of [20]), demonstrating that participants somehow detected that there was an unequal treatment, but would not consider it discrimination or ostracism.This might be due to the fact that discrimination in a social sense has to be linked to a specifc user characteristic, which was not the case in our experiment.Nevertheless, in real-world scenarios of algorithmic biases (e.g.face detection does not function for black people), this form of discrimination might also not be noticeable.Important in this regard are our fndings with regard to blame attribution and the evaluation of the discriminated user made by non-afected observers: Results showed that participants in the discriminated condition more often blame themselves for the technical problems than the non-afected ones (RQ 3).While the observers do not attribute more blame to the discriminated person, they still devalue their competencies, as they perceive the discriminated person as less competent (RQ 2).Hence, the technical bias in using robots for task execution does not only afect the self-perception of the discriminated user but also afects the perception of observing people.Since the users have not described the situation as discrimination or ostracism (H1) and the perception of the robot (RQ1) was not afected, both -the afected and non-afected users -attribute the situation wrongly towards the discriminated user.Especially in task and work-related environments, this can cause massive problems (e.g.job loss, loss of self-esteem, or team conficts).Therefore, measures are needed to sensitize people towards algorithmic biases.Also, the goal to eliminate any existing algorithmic bias in robotic and technical systems should be a priority.Prior research demonstrates that these issues unintentionally happen even if the researcher and developer are sensitized [18].Accordingly, methods during and after the development process are needed.Future research, therefore needs to investigate how users can be trained to detect algorithmic biases and how to prevent the unintended blame attribution and devaluation efects.Finally, limitations need to be discussed.One main limitation is the sample choice, as only male participants with a computer science background participated.We decided to use same-sex teams to prevent unintended gender efects caused by gender stereotypes [6] and because this group has a lower risk of being targeted by algorithmic bias, so that prior experiences are assumed to have lower impact.However, we are aware that this limits the generalizability of our results and that this is a one-sided perspective, which includes own side efects due to the male only perspective.Thus, we aim to investigate female dyads as well as mixed-gender in a larger sample in the future.Also, we did not directly measure the prior discrimination experiences of the users.Future research should address both problems by investigating diferences between user groups in the discrimination detection and attribution efects.Technically, the VR application occasionally faced a challenge with the peer-to-peer protocol, causing network disruptions and occasional disconnections.To address this, the network protocol has been updated and a follow-up study is planned.

ACKNOWLEDGMENTS
This work was supported by the RuhrBots Competence Center (16SV8693) funded by the Federal Ministry of Education and Research Germany.We thank Tim Aßmann for providing assistance during the study and all colleagues, who gave comments and remarks to improve our work.

Figure 1 :
Figure 1: (a) The virtual environment where the shared task was conducted.(b) Pepper expressing the technical issue.
g. "I was responsible for most of the problems that occurred during the collaboration."),blaming the other person (2 items, = 0.650, e.g."The other person was responsible for all the mistakes that were made during the collaboration."), attribution of credit to the robot (2 items, = 0.604, e.g."The success of the collaboration was largely dependent on what the robot did."),attribution of credit to oneself (2 items, = 0.391, e.g."The success of the collaboration was largely dependent on what I did.") and attribution of credit to another person (2 items, = 0.642, e.g."The success of the collaboration was largely dependent on what the other person did."