Is there Really an Effect of Time Delays on Perceived Fluency and Social attributes between Humans and Social Robots? A Pilot Study

Humans are expert percievers of behavioural properties, including the timing of movements. Even short hesitancies and delays can be salient depending on the context. This article presents results from a pilot study on time delays in a human-robot interaction setting using the Wizard of Oz paradigm. Participants (n=17) played Tic-Tac-Toe with the humanoid robot Epi. They were randomized into one of three groups, where Epi either executed its movements with no delay, a short delay (4s) or a long delay (10s). Results from questionnaires measuring fluency, trust, anthropomorphism, animacy and likability were compared before and after the interaction and between the different groups. Although there was evidence of decreased perceived fluency after delays, the difference between the groups did not meet the threshold for statistical significance. The latter is true for our other measures used. We conclude that better statistical power is needed to be sure whether there is indeed an effect of time delays on robot-related attribution of social features. Suggestions are made in regards to how the study design could become more robust for a future, more large-scale study. In addition, we propose using measures that better account for the participants' embodied experiences by taking emotional and bodily states into consideration for future studies.


INTRODUCTION
Imagine playing Tic-tac-toe against your classmate.All of a sudden within the movement of marking her feld, your classmate stops for a few seconds and you wait; is she not sure about where to place her next move, perhaps because she is not really good at this game?Does she prefer to play the game more slowly, so you should do the same?Or does she need your help?Now imagine the same situation but playing against a robot.How would this afect how you perceive the robot as a player and the interaction in general?Time-delays are inevitable in regards to advanced technology like robots, due to high processing demand, unexpected encounters, or simply erroneous data transmission.However, this can place a burden on the interaction with the robot.If robots fail to reciprocate or provide feedback according to human expectations, the interaction tends to be deemed unsuccessful by humans [12,14,17].
Fluency has been tested mainly in industrial settings, where humans and robots have to collaborate to perform a certain task; i.e. they have a common goal [5].In our scenario, the robot and the human are competitors, which would make the human less dependent on the robot's actions (the human can still win even if the robot takes a lot of time/seems to malfunction).At the same time, they are co-participants in this interaction, which means that even though their goals of winning the game are competing, they have the shared goal of successfully playing the same game, and the robot's behavior can infuence how the process of reaching this goal is perceived.We present results from a pilot study conducted with the humanoid robot Epi, which executed its arm movements either with no delay, a short or a long delay while playing 8 matches of Tic-tac-toe.We hypothesized that delays, especially longer delays, should negatively impact how fuent, anthropomorphic, animate and likable the robot seems to participants.Moreover, we assumed that pre-existing attitudes towards robots in general would have an efect on how time delays are experienced.

RELATED WORK 2.1 Movements, Timing and Fluency in HRI
Humans have been shown to respond to temporal features during interactions with robots, for instance by adapting to the pace of the robot's movements [9].Timing is a generally unperceived companion of interactions, but it can constitute an important pathway of communication.Zhou et al. [20] found that diferences in speed during robot motions can express various states, such as heaviness of a cup or how confdent the robot is.Pausing negatively infuenced participants' perceptions of the robot's competence, confdence, disposition and naturalness.On the other hand, it seems like delays can have benefcial efects.For instance, individuals talking to a route-directing robot were more fond of a pattern incorporating pauses even if they seemed abnormally extended, since it left suffcient time for comprehending the response [11].Hence, in most cases, robots should not respond unhesitatingly.This is important to, on one hand, adapt to a normal conversational rhythm and, on the other hand, to make it appear like it is processing, thinking or learning [20].While task efciency is a commonly measured metric in HRI scenarios, Hofman and Breazeal [6] posit that fuency should receive consideration too, since it seems to be closely connected to user comfort.Fluency is usually perceived unconsciously by matching patterns of previous results with expected results, and current results [13].To achieve fuency, the robot's actions, which includes movements, should be understandable and predictable [6,7].Predictability and reliability have been connected to trust-ratings during human-robot interaction [3].Hence, if the robot behaves unpredictably, it can negatively afect human trust and performance [8,10].

Perception of Social attributes and Robotic movement
Expectations towards the robot also depend on the extent to which it is seen as a teammate and social agent as opposed to merely a tool [10].The perception of robot behavior can be impacted by pre-existing attitudes towards robots, e.g.Syrdal et al. [16] found that initial negative attitudes towards robots correlated with the properties ascribed to the robot after an interaction.Most people are not used to interacting with social robots and thus use scripts from human-human interaction (HHI) to navigate the situation, such as reading intentions, communicating and displaying and comprehending social cues, which can lead to expectations towards humans being transferred to robots [2].As a consequence, a majority of people favor anthropomorphic properties and behavior that is intuitively interpretable [4].To avoid problems with the narrow notion of anthropomorphism, the concept of animacy can be used.Animacy is related to agency in a sense that animate entities are living beings capable of acting as agents, while an entity can also be inanimate but be classifed as an agent which executes movements and actions to reach specifc goals [18].All these features mentioned in 2.2 and 2.3 seem important for a robot to be viewed as a teammate and social agent, which is why in the later parts of the paper, ratings of fuency, trust, anthropomorphism, animacy and likability will be summarized as "attribution of social agentic properties".

Testing the Efects of Time Delays in Robotic Movements
Fluent interactions require robots to display movements, actions, and intentions that are comprehensible and foreseeable [1,6].Accordingly, unexpected pauses in the robot's movements have been interpreted as uncertainty and lower competence on the robot's side [20].The present study aims to establish how time delays during robotic movements afect perceptions of fuency, anthropomorphism, animacy and likability.In addition, the connection between fuency, anthropomorphism and animacy will be studied since these social attributes seem to afect the extent to which errors and delays negatively infuence the perception of the robot.Moreover, it has been found that pre-existing attitudes can have an impact on the perception of robot behavior.Hence, it seems important to test whether humans with more positive initial attitudes towards robots are more resistant towards losing trust and fuency perception during a delayed condition than humans with negative attitudes.On the other hand, it is possible that the mechanism works reversed and that persons with very positive attitudes have too high expectations and thus exhibit more strong negative ratings resulting from delayed robot behavior.A similar motivation can be stated for investigating the connection of the initial situated perception of the robot and the efect of time delays.While we are aware that 10s is a long break, we wanted to see what happens if the breaks are longer than will probably be seen as natural.As for 4s, there will be a noticable delay but there is a higher chance that this could be interpreted as 'normal', since it is closer to the usual rhythm of HHI e.g.Shiwa et al. [15] found that user comfort declines after two seconds pause length, while short pauses make the interaction appear more natural.Delays in HHI and also HRI seem to be benefcial when they match expectations or ft into a social script; if unexpected or unnatural (e.g. by being excessively long), it seems like the efects are negative.Thus, negative efects in our two delay-conditions (D1 and D2) should be expected, unless participants think that they are in some way matching their expectations of the robot's behavior (e.g. that it has to think about its turn).

METHODOLOGY 3.1 Research Questions
In order to investigate whether humans' perception of the robot's social agency is infuenced by a) delays in the robot's actions timing and b) individual pre-existing attitudes towards Social Robots, we set out to answer the following research questions: • Q1: Do changes in the timing of the robot's movements afect the perception of the robot's social agency (fuency, anthropomorphism, animacy, likeability, intelligence, safety and trust)?• Q2: Are pre-existing attitudes connected to how time delays are perceived?• Q3: Can a more positive initial situated perception of the robot infuence the efect of the time delays on fuency?

Participants
Seventeen (11 female, 6 male, aged between 20-34) participants were recruited from students and staf at Lund University from various disciplines.All participants gave informed consent prior to the experiment and were compensated for their participation.

Experimental design and procedure
We used the humanoid robot Epi, which has been developed by the LU Cognitive Robotics.Epi has arms with fve degrees of freedom each and fngers which are angled in a way that makes them move towards the same point, which imitates infants' grasping movements [11].We used the Wizard of Oz paradigm.After being greeted by the experiment leader, the participants stated their demographic information and their attitude towards robots.In addition, their initial perception of Epi in terms of intelligence, animacy, anthropomorphism, likability and perceived safety was measured.
Then they played eight matches of Tic-Tac-Toe against Epi while being video recorded.Subsequently, they reported their perception of the previously mentioned aspects of Epi again as well as how fuent they thought the interaction was.They also were asked about their trust towards Epi.

Task
We used the game Tic-Tac-Toe which participants played against Epi.The general setting consisted of Epi, that was placed in front of a table (Fig 1).On the table, the grid of the Tic-Tac-Toe game was marked with two horizontal and two vertical lines.The participant sat on the other side of the table, facing Epi.To play the game, we used balls that the two players placed on the grid to mark the spot they wanted to occupy.If one player had three balls in a row (vertically, horizontally or diagonally), that player won the match.

Experimental conditions
This paper presents a between-group study design.The following diferent conditions were used: no time delay (ND), 4s delay (D1) and 10s delay (D2).The conditions were randomly assigned to the participants.All three groups started with one match without a time delay.Depending on the group, in the following matches, Epi played with either no delays, short (D1), or long delays (D2).A time delay meant that Epi would pause its movement just before placing the ball on the grid, thus hovering above the table with its hand.In the two delay conditions, delays occurred approx.30 percent of the time.

Measures
3.6.1 Pre-interaction measures.Participants flled out a set of questionnaires to record their subjective experience before the interaction.They started by stating their demographic information and flled in the Negative Attitudes towards Robots Scale (NARS).Perceived Anthropomorphism, Animacy, Intelligence, Likability and Safety were measured using the Godspeed Questionnaire on a 5point Likert scale.
3.6.2Post-interaction measures.After the interaction, the participants flled in the Godspeed Questionnaire again.Fluency was measured using the Hofman Fluency Scale (7-point Likert) and trust was measured using the Trust Perception Scale.As a manipulation check, we also asked the participants whether they think that Epi made a mistake during the interaction.

Data Analysis
All statistical analysis was conducted in R. Associations between initial ratings and outcome variables were explored using the Spearman's rank correlation test.Diferences between conditions were analysed using the Kruskal-Wallis test.We chose non-parametric tests since we were dealing with ordinal data.

RESULTS
4.1 Do changes in the timing of the robot's movements afect the perception of the robot's social agency (Q1)?
4.3 Can a more positive initial situated perception of the robot infuence the efect of the time delays on fuency (Q3)?
4.3.1 Do people with higher ratings of the robot's anthropomorphism, animacy and likeability report higher ratings of fluency in general?We found no signifcant correlation between initial ratings of the robot in terms of anthropomorphism (=0.36), animacy (= -0.16) and likability (=0.1) with fuency.
4.3.2Do people with higher ratings of the robot's anthropomorphism, animacy and likeability report higher ratings of fluency afer experiencing time delays?Due to the small number of participants, the results of this section refect a rather exploratory approach.We found a strong variation among all conditions.The results indicate no specifc pattern regarding the infuence of an initial perception of anthropomorphism, animacy and likability on fuency during diferent time delays.For instance, for the ND condition, the participants that had rated animacy initially higher than the mean (m=2.42),two out of three participants rated fuency lower than the fuency mean in this condition.For the D1 condition, of the two participants that had higher initial animacy ratings than average one reported the highest fuency (5.23) and the other one the lowest fuency (2.82) in this condition.In D2, out of the three participants who had higher-than-average ratings of animacy, two rated fuency above average while one rated fuency below average.

DISCUSSION
We did not fnd any signifcant results; however, there are indications that interesting diferences do exist, but that the statistical power may be too weak to display them.The frst research question we aimed to answer was whether changes in the timing of robotic movements would afect the perception of the robot's social agentic qualities.We did not fnd any signifcant efects between the diferent conditions (ND, D1 D2), which was in contrast to what was shown in previous studies [6].One reason could be that most of the mentioned research involved HRI in industrial settings, while our task consisted of a game played against a social robot.There was a small decrease from ND, to D1, to D2 which would be the expected result.We found another indication of trust being slightly higher in the delay conditions D1 and D2, which is in line with Hancock et al. [3], and which we will follow up on in a larger study.
In the second research question we tested whether pre-existing attitudes would be connected to how time delays are perceived.We found a strong (non-signifcant) correlation (=0.68) between NARS ratings and fuency in the D2 condition, which was quite surprising.An explanation might be that, as previously mentioned, higher expectations could lead to disappointment when there are severe problems with the robot (such as very long time delays), while lower expectations might lead to being positively surprised.We found a moderate reverse (non-signifcant) correlation between NARS ratings and animacy in the ND condition.This indicates that participants with more positive attitudes towards robots perceived the robot as more animate after the interaction than participants with more negative attitudes.The absence of this efect in the two delay conditions could hint at time delays disturbing this positive connection.In the third research question, we investigated whether a morepositive initial perception of Epi's anthropomorphism, animacy and likability would lead to more positive fuency ratings specifcally in the two delay conditions.We didn't have enough participants to perform any meaningful statistical tests, however looking at participants individually, it seemed like there was no specifc trend.In addition, the absence of signifcant results could imply that the measures used were not the most relevant ones in the context of time delays during a HRI task.While both the Godspeed Questionnaire and the Hofman Fluency scale are common measures within in the feld, it might be more important to look at emotional and bodily components regarding the participant's state.Winkle et al. [19] suggested that embodiment and emotions are essential parts of social interactions.These aspects are often overlooked in HRI research, which prioritises task efciency and other more robot-related measures.It is possible that participants became uncomfortable or emotionally afected by the time delays during the Tic-Tac-Toe matches, but that our questionnaires did not account for these aspects.Some of the observed participant behaviors support this suspicion, for instance, one participant tried to 'help' Epi during a delay by pointing at a feld that could make sense to be marked by the robot.

FUTURE RESEARCH
This pilot study provides several avenues of future research.First, a full scale study with enough participants and statistical power could verify whether delay times afect fuency ratings in an ordered way.Secondly, such a study should be able to provide evidence whether longer delays are interpreted as displaying more animacy, particularly if it is backed up by qualitative measures.Results from [20] indicate that people rate the "naturalness" of a robot lower when it pauses, and it would be interesting to compare naturalness with animacy, and see whether this measure too changes with the length of delay.Furthermore, to be more open towards participants' interpretations of the situation, we could use qualitative measures such as semi-structured interviews.Lastly, future studies could utilize measures that account for participants' emotional and bodily states in connection to time delays.

ACKNOWLEDGMENTS
This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program -Humanities and Society (WASP-HS) funded by the Marianne and Marcus Wallenberg Foundation and the Marcus and Amalia Wallenberg Foundation.We also want to thank Johannes Rosenfrost and Alexander Wåhlander for their help.