Real-World Implicit Association Task for Studying Mind Perception: Insights for Social Robotics

In response to the growing demand for enhanced integration of implicit measurements in Human-Robot Interaction (HRI) research, the need for studies involving physically present robots, and the calls for a transition from lab experiments to more naturalistic investigations, we introduce the Real-World Implicit Association Task (RW-IAT). This report outlines the versatile methodology of the RW-IAT; emphasizing its allowance to present real-life stimuli and capture behavioral data, including response times and mouse tracking metrics in a controlled manner. Sample analyses focusing on communicative and noncommunicative actions between a human actor and the Pepper robot reveal significant effects on the Agency and Experience dimensions of the mind perception. We believe the methodology we proposed will contribute to conducting ecologically valid research in the field of HRI in real-world contexts.


INTRODUCTION
In Human-Robot Interaction (HRI) research, the widely-used Wizardof-Oz (WoZ) experimental methodology [23] involves participants interacting with a robot, unaware of human operator control [41].Beyond its functional inspiration, analyzing characters such as the Scarecrow and Tin Woodman from this iconic novel [2] reveals nuanced aspects of how our observations infuence our perception of others -a core aspect of mind perception discussions in HRI.Despite the Scarecrow's "apparent" lack of a brain, their consistent displays of wisdom challenge conventional notions, prompting attributions of mental capacities, such as rationality.Similarly, the Tin Woodman, with a metallic exterior and no physical heart, embodies emotional intelligence, leading to attributions, such as empathy.These characters exemplify our human capacity to attribute cognitive and emotional abilities, transcending anatomical structures and emphasizing the multifaceted nature of mind perception [7].
Associating the brain with cognitive functions and the heart with emotional capabilities are recurring themes in literature and research.Gray et al. [16] proposed two main dimensions of mind perception: Agency, the ability to do, and Experience, the ability to feel.While further studies exploring the possible dimensions of mind perception [29,48] ofer valuable insights, their reliance on self-reports can make their results challenging to reconcile.Acknowledging the limitations of explicit measurements highlights the need for extending the use of implicit measurements [10,17,32,45], grounded in behavioral or neuroimaging data, for in-depth explorations of complex cognitive processes, such as mind perception.
Similarly, attributing mental states to entities, adopting the intentional stance [4,5], is not limited to fction; mind perception can arise from the need to understand and predict others' behavior [7].Consistent with causes and consequences outlined by Waytz et al. [47], prior research demonstrated that robots' expressions of social behavior [12,44], or gestures [42] infuence mental state attributions.However, these studies primarily utilized texts, images, or a combination, as discussed in a recent comprehensive review [45].Notably, humans tend to anthropomorphize a robot more strongly when it is physically present [24,27,44].Among its counterparts, the original Implicit Association Test (IAT) by Greenwald et al. [19], stands out as the most widely used psychological tool assessing implicit attitudes.Participants' reaction times to two conditions of target stimuli indicate their implicit attitudes toward the targets based on attribute stimuli.For example, in the gender-science IAT [33], participants categorize Male and Female attributes under Science and Liberal Arts.Faster and more accurate categorization in the initial condition (Male = Science) implies stronger associations between science and male attributes.In a recent study, Li et al. [28] introduced the Mind Perception IAT (MP-IAT) to examine mind perception in human-robot interaction.This test measures mental attributions on Agency and Experience dimensions, utilizing attributes from the High and Low ends of these dimensions and using images of humans and humanoid robots as targets.While the MP-IAT is a valuable method with the potential to test concepts, images, or videos as target stimuli, it remains a computerized task, limiting its ability to present real-world stimuli.
The scarcity of studies on how individuals perceive the physically present and active robots, coupled with the necessity for real-world investigations outside controlled labs to enhance ecological validity [21,45], underscores the need for conducting WoZ studies with actual robots [6].In this report, we address these gaps as well as the need for implicit measurements by introducing the Real-World Implicit Association Test (RW-IAT) to implicitly study mind perception in real-time with physically present robots and humans and live actions, while capturing participants' response times and mouse trajectories in a controlled manner.RW-IAT is a WoZ experiment in a lab environment, yet it uses real-life social stimuli, corresponding to the last level before the fully naturalistic studies [9], serving as a further step towards "HRI in the real world."

METHODS AND MATERIALS 2.1 Participants
As part of a broader social robotics project [35], we collected inperson data from 160 participants from four generations (ages 18 to 73).Since this report aims to introduce RW-IAT details, for aligning with prior research [28] (N = 53) and facilitating comparisons with studies representing conventional age ranges, we present data from 55 participants (ages 18-35, M = 25.93,SD = 5.59, 33 females).This study was approved by the Human Research Ethics Committee of Bilkent University.All participants provided informed consent and they received approximately 3 USD in compensation.

Materials
Target (Action) Stimuli: Following IAT best practices [18], we selected "easy stimuli" systematically via norming.Using Android Studio and Pepper SDK's Animation Editor IDE [8], we created 40 animations based on action clusters [34] and datasets of communicative and noncommunicative actions [30,50].We shot standardized videos, consistent in length, aspect ratio, trajectories, and repetitions, with both the Pepper robot and the human actor.438 participants tested them in two separate online studies [36], in which they identifed actions, rated confdence levels, and categorized them as communicative or noncommunicative.We annotated the open-ended answers and calculated the H entropy [43], and the k-means clustering algorithm [20,46] yielded the four most straightforward communicative (peek-a-boo, saluting, throwing a kiss, hand-waving) and noncommunicative (shooting an arrow, jogging, drinking, driving) actions across human and robot conditions.Attribute (Lexical) Stimuli: The target stimuli were live actions, the attribute stimuli were verbal concepts: High Agency, Low Agency, High Experience, and Low Experience.Following IAT best practices [18], participants underwent training to familiarize themselves with the Agency and Experience dimensions and their High and Low ends.Training concepts were validated in an online norming study with 274 participants, as documented in our previous work [37].

Procedure
To prevent any familiarity-or interaction-related biases, the participants provided demographic data and consent and underwent additional tests in a separate room.Only after these procedures, they entered the main experiment room where the human and robot actors were positioned behind a curtain and an experimenter was waiting to supervise the session.The Lexical Training involved detailed descriptions of Agency and Experience concepts.Participants evaluated twelve concepts for each dimension, receiving feedback until reaching 80% accuracy.Subsequently, participants learned the High and Low ends of Agency and Experience dimensions, evaluating six concepts for each until achieving 80% accuracy.Following Lexical Training, participants moved to Action Identifcation, observing eight actions performed by human and robot actors, respectively.They verbally identifed each action, serving as familiarization and manipulation checks.After these steps, participants started the RW-IAT.

Real-World IAT (RW-IAT)
The RW-IAT utilizes a 55" OLED display, depicted in Figure 1, that functions as both a standard monitor and a transparent screen for presenting real-life stimuli.It transitions between an opaque state for instructions and prompts (Figure 1.A) or evaluations (Figure 1.C) and a transparent state during stimulus presentations (Figure 1.B).The RW-IAT uniquely features real-time, live stimuli, distinguishing it from existing computerized IATs.The use of the OLED screen maintains strict experimental control, setting it apart from fully naturalistic experiments that involve sacrifcing control.The use of a single platform for stimuli presentation and evaluations in the RW-IAT aims for a seamless transition between tasks, minimizing potential disruptions in behavioral data crucial for precise response times and mouse trajectories [39].
With inspiration from the Single Category IAT [22], in RW-IAT, we tested the human and robot actor in four separate blocks to address the non-complementary nature of these categories.Block contents included Robot-Agency, Robot-Experience, Human-Agency, and Human-Experience, and their order was counterbalanced across participants while each actor performed the eight actions in a randomized order within the blocks.Participants were instructed to observe each action and actor on a transparent screen carefully and then evaluate what they saw in terms of the required capacity.For instance, in the Robot-Experience Block, participants watched the live "peek-a-boo" action performed by the robot actor.After six seconds -the duration of all actions-, the screen turned opaque, revealing "High Experience" and "Low Experience" choices in the upper left and right corners.Participants clicked on one, deciding whether what they saw required High or Low Experience capacity.Although the participants were encouraged to keep the defnitions and examples in the training session in mind during their evaluations, the experimenter also emphasized the need for quick and instinctive responses.Both human and robot actors executed all actions identically to observe potential variations in participants' judgments based on the actor or action or both.After evaluating each of the eight actions in one block, participants waited for the actors to replace, marking the start of the next block.
Figure 1.D illustrates the lab setup with a curtain system dividing participant and actor areas.The participant area includes tables and devices for the experimenter and participants, while the actor area features a cabinet for the actors to wait between blocks and a laptop, positioned on top of a table, displaying the next action to inform the human actor.Ceiling LEDs enhance the transparency or the opaqueness of the screen.Background music is used during actor replacements, and the experimenter observes backstage through a security camera.The task was implemented using Psychtoolbox-3 [3,25,40] on MATLAB R2022a.Full details of the laboratory setup, including hardware, software, and materials, are extensively documented in our prior work [38].
The RW-IAT involved two independent variables: Actor type: robot or human and Action type: communicative or noncommunicative.Participants assessed each actor and action combination in both Agency and Experience dimensions.Dependent variables included Response Time (RT), the elapsed time in seconds between the end of an action and the occurrence of a mouse click on one of the response alternatives, Maximum Deviation of a Trajectory (MD), and Area Under the Curve (AUC) for mouse trajectories, calculated following the original work [14] and interpreted as indicators of participant hesitations [13,49].The fourth dependent variable, Response, was categorical, categorized as either High or Low.

RESULTS
We processed and visualized data with MATLAB 2023b and performed analyses using RStudio 2023.09.1.After identifying outliers (eight trials), we categorized the dataset (1750 trials) into Agency and Experience blocks and calculated participants' average scores for each dependent variable.Since the Shapiro-Wilk test revealed signifcant departures from normal distribution (p < .001)for both actor and action types in both block dimensions, we used Friedman's ANOVA [15] as a nonparametric alternative [11] and conducted post hoc analyses using the Nemenyi test [31] to identify diferences between data groups.We present the results separately for each dependent variable in the Agency and Experience blocks.

Action
Type.The RTs ( 2 (1) = 0.45, p = .50),MDs ( 2 (1) = 0.02, p = .89),and AUC values ( 2 (1) = 0.45, p = .50)did not signifcantly change across action types.However, there was a signifcant efect of the action type on the ratio of High responses of the participants, 2 (1) = 15.87,p < .001(see Action Type in Figure 2.D).Pairwise comparisons revealed that the ratio of High Experience responses for the communicative actions was signifcantly higher compared to the ratio for the noncommunicative actions, p < .001.

DISCUSSION
We introduced the RW-IAT, which we designed by following the best practices of creating an IAT [18], to explore mind perception implicitly, using real-world stimuli performed by both human and robot actors.With the RW-IAT, we investigated how attributions of mental capacity on the Agency and Experience dimensions of mind perception change across diferent actor and action types.In addition to traditional IAT metrics such as responses and response times, we also tracked participants' mouse trajectories.This additional data enables further investigations into the cognitive processes involved [13], aligning with discussions [49] and recommendations in previous work [45].
In the results, action type notably afected behavioral metrics in the Agency dimension, with communicative actions leading to longer response times and increased hesitations.The robot actor showed a signifcantly higher response ratio in the Agency dimension, possibly due to participants' lowered expectations of its capabilities, as refected in their surprised comments while they watched the actions such as "The robot can do everything a human can do!" or "They perform almost the same!", refecting the infuence of the live-action performance of the robot.In contrast, a human performing similar actions might have stayed within or below participants' agency thresholds.Previous studies also suggest a reduced diference in mental state attributions between robots and humans when their behaviors are tested [1,26].
On the contrary, in the Experience dimension, the efect of the actor type prevails, with the robot actor eliciting longer decision times and more hesitations, yet leading to signifcantly lower Experience scores compared to the human actor.Notably, action type signifcantly afected response ratios, with communicative actions yielding higher rates than noncommunicative ones.The observed hesitation diferences between human and robot actors may be attributed to the nature of communicative actions.Previous studies have suggested that when robots engage in social behaviors, people's judgments can change [12,42,44].However, robots still received lower attributions, as indicated in most prior work [45].
In conclusion, the RW-IAT shows promise based on observed outcomes in the sample data.While the investigations in applied domains of HRI often extend into real-world scenarios, investigations regarding the dynamics of cognitive processes, such as mind perception, may still require experimental control for behavioral data accuracy.While initially designed for studying mind perception in HRI, RW-IAT's methodology could be adapted to the studies from various domains deploying neuroimaging or eye-tracking methodology and using real-life stimuli with precise timing and pre-defned conditions.Future directions include further validation of this methodology, and we view this naturalistic yet controlled approach as a signifcant step towards increased ecological validity, embracing investigations into HRI in real-world scenarios.

Figure 1 :
Figure 1: Real-World Implicit Association Task (RW-IAT) setup overview.(A, B, C) The RW-IAT from the experimenter's view.(D) The naturalistic laboratory setup from top-down view.

Figure 2 :
Figure 2: Mean response times (RT) and High response ratios across conditions.

Figure 3 :
Figure 3: Mean maximum deviation (MD) and area under the curve (AUC) values across conditions.