Influencing Human Performance: Investigating the Effect of Non-humanoid Robot Feedback on Task Performance

This paper reports the findings of an online between subject study that investigated the effectiveness of a non-humanoid socially assistive robot in providing positive reinforcement feedback to aid in improving performance on a cognitively demanding task. Four different feedback conditions were used, including verbal, expressive, neutral, and text-based feedback, to identify which type of feedback could positively influence behaviour. Results showed no significant differences in task performance, perceived workload or robot perception.


INTRODUCTION
The feld of human-robot interaction is rapidly advancing with the introduction of socially assistive robots (SAR) in household settings.These robots are designed to be smaller and integrate into our homes, and they have the potential to infuence our daily behaviour through subtle persuasive mechanisms.This study investigates the efectiveness of non-humanoid SARs in improving performance in demanding cognitive tasks.Given their signifcance in felds like education [12], health [16], and rehabilitation [17], we focus on the impact of various feedback mechanisms (verbal, expressive, neutral, and text-based) on user motivation and task performance.
The basis of our research is centred around reinforcement, a core concept in behaviourism, the study of how environmental stimuli can condition individuals to learn new behaviours.Our study focused on positive reinforcement strategies based on our review of the existing literature.Specifcally, we introduced feedback to encourage the desired behaviour.Overall, the mechanisms of behaviour change being researched are social infuence and reinforcement feedback.
We conducted an online between-subject study.Participants solved puzzles requiring them to count the number of boxes containing a specifed shape (see Figure 1).A combination of objective and subjective measures was employed to evaluate the efects of each feedback type.However, considering that many in-person studies struggle to get enough participants, we aimed to investigate the viability of assessing the impact of robot feedback in an online study, where we have easier access to a large population to sample from.
Our research question was whether social feedback from a nonhumanoid robot afects task performance and user perceptions.We hypothesised that positive reinforcement delivered by a nonhumanoid social robot would improve the task performance and positively infuence user attitudes towards the robot.
The robot chosen to provide feedback for this study was the Digital Dream Labs Vector 2.01 , a palm size non-humanoid companion robot (see Figure 2).Its small size, afordability, expressive capabilities, and publicly available software development kit made it an ideal candidate for our experimental setup.
Through this study, we aim to contribute to understanding how non-humanoid SARs can be efectively utilised to infuence human behaviour, particularly in settings that demand high cognitive engagement.The insights gained could have signifcant implications for the development of future SARs and their application in various felds, including education, therapy, and workplace productivity.

BACKGROUND AND RELATED WORK
Our research extends previous studies on using Socially Assistive Robots (SAR) for persuasive contexts and focuses on how SAR can alter human mental states (attitudes, beliefs, or behaviours) to get people to work harder on a cognitively demanding and repetitive task.Persuasion in this context is defned as the process of changing the mental state of the person being persuaded [11] through communication and interaction.
A robot's infuence on humans depends on being perceived as a social actor [7], a concept used to describe actors that engage in interactions and infuence others.In the context of SAR, being perceived as a social actor enables the robot to provide efective social feedback, a term used to describe the infuence of behaviour or speech on others.When technology presents itself as social, it elicits social responses from humans [13].This phenomenon has been studied as human susceptibility to fattery from computers just as they are susceptible to fattery from other humans [2].
Previous research by Ham and Midden [8] investigated the effect of robot-generated feedback on energy saving behaviour in a simulated washing machine task.Their study contrasted the impact of positive or negative social feedback against factual feedback displayed on a gauge.They discovered that social feedback, particularly negative feedback, had stronger persuasive efects to save energy than factual feedback.This aligns with our hypothesis about the infuence of social feedback on task performance.We have extended their fndings by exploring if social feedback from a non-humanoid robot can motivate individuals to exert more efort on a task.
However, contrasting fndings by Akalin et al. [1] suggest that participants preferred the robot that provided positive or fattery feedback even if it did not correlate to objective performance measures.This highlights a mismatch between preferred feedback and efective feedback, a gap our research looks to explore further.
Similarly, Chidambaram et al. [5] investigated compliance with robot suggestions in a desert survival task.Participants had to choose the best tools from a selection to survive in a desert.Their fndings suggest that nonverbal immediacies are crucial in persuasive interactions, including bodily cues such as gaze, gesture, and facial expression, and verbal cues such as tone and expressions.We summarised this type of feedback as expressive feedback and will be looking to compare if there are any noticeable diferences in performance between expressive and verbal feedback conditions.
We used the framework of feedback dimensions proposed by Pritchard and Montagno [14] to identify the dimensions relevant to our current study.They include positive versus negative feedback, timing of feedback, specifcity, absolute versus comparative feedback, and the source's authority.Our approach incorporates positive reinforcement with positive feedback, associated with increased intrinsic motivation [3].Additionally, Swift-Spong et al. [15] emphasised the impact of self-comparative, other-comparative, and no comparative feedback, suggesting that the feedback should be biased towards self-comparative feedback.
In summary, our review of existing literature highlights the important role of feedback dynamics and nonverbal immediacies in the context of Socially Assistive Robots (SAR).By examining how various forms of positive feedback infuence human behaviour and decision-making, we aim to gain practical insights into the persuasive power of SARs.Our study seeks to build on these fndings by exploring the specifc impact of non-humanoid robot feedback on motivation and persistence in repetitive tasks.

METHODOLOGY 3.1 Study setup
We employed a between-subject online comparative study using the Digital Dream Labs Vector 2.0 with four study conditions.Participants were recruited through an online recruitment portal (Prolifc) and were reimbursed for their time.
Participants were asked to solve spatial reasoning puzzles, as shown in Fig. 1, inspired by the Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA).We used greyscale puzzles with combinations of triangles, circles, and squares, which were crafted to be universally accessible, avoiding language or mathematical barriers.The repetitive nature of the task aimed to evaluate how participants adapted to the robot's feedback.
The study was advertised to participants as a test of the efectiveness of a new type of CAPTCHA puzzle.Participants were told the robot would guide them through the survey and was framed as an invigilator.They were not told that their task performance would be correlated to the robot's feedback.

Experimental procedure
After consenting to the study terms, tutorial videos of the robot were shown, introducing the study and explaining the rules for solving the puzzles.The introductory section involved practising with two to three trial puzzles, where the robot provided tailored feedback based on the correctness of responses.
In the main phase of the study, participants were assigned to one of four distinct conditions (described in section 3.3).Feedback, when provided, was always positive and encouraging regardless of the correct answer.Additionally, all answers to the CAPTCHAs were presented in multiple-choice format.
The videos were played automatically and could not be paused, rewound, fast-forwarded or replayed.Additionally, there was a time delay before the next page button was displayed to ensure the participant watched most of the videos before going to the next page.
After completing the repetitive task at the end of the study, the participants were asked to complete two questionnaires (Section 3.6).Finally, participants had the chance to provide their opinion of the robot and the study in general in a free-form text feld.

Robot feedback types
Our study was designed with four distinct study conditions to examine the efects of diferent types of feedback on participant performance using a SAR.Feedback videos, lasting between three and six seconds, were centrally displayed against a white background.In the Verbal and Text feedback conditions, we employed three types of positive feedback, supported by the literature on their efectiveness.These included general feedback (e.g., well done), self-comparative (e.g., well done, that was all correct) and other comparative (e.g., you are doing better than average).
We employed neutral and text conditions as control groups; their purpose was to isolate if the presence of the robot alone or feedback alone afects performance.

Robot interaction model
In each of the robot feedback videos, the robot starts the interaction and demonstrates engagement by looking directly at the camera as if it were looking directly at the participant, in addition to turning to look in the direction it wanted the participant to look.
Initially, a continuous reinforcement schedule was used for the trial puzzles.This schedule has been shown to have the fastest time to learn a new behaviour [10].This was followed by a fxed interval reinforcement schedule for the main portion of the study.We did this to avoid giving participants too frequent feedback, which could have made them aware of the study's purpose.As a result, participants received feedback after solving three puzzles.

Population and sampling, participant recruitment
A power analysis based on the assumptions of normality and variance was conducted to ensure statistical reliability for a one-way ANOVA.To detect a medium efect size (Cohen's f = 0.25), with an alpha error of 0.05 and a power of 80%.This calculation led to a required sample size of 180 participants, >45 participants in each study condition.200 participants, consisting of 100 males and 100 females, with a mean age of 27 (SD = 7.1), were recruited through Prolifc, with access granted only to those on desktop computers.The study took roughly 15-20 minutes, and participants were compensated £2.5 for their time.Participants were allocated to a condition using pseudorandomisation.This study was approved by the Ethics Committee of the Faculty of Engineering at the University of the West of England (Approval number FET-2122-103).

Experimental measures
In our study, we assessed the efect of robot feedback on task performance using a combination of objective and subjective measures.The objective metrics were the number of puzzles solved, the number of correct answers and the time taken to solve them.Subjectively, we evaluated participants' workload using the NASA Task Load Index [9] and their perception of the robot using the Godspeed questionnaire [4] to measure perceived robot anthropomorphism, animacy, likability, and intelligence.Additionally, participants were asked for their open-ended opinions of the robot and the overall study design.

Attention Check
There were 50 participants per study condition.The puzzles were not considered difcult, resulting in participants scoring >75% on average.Those who scored less than 75% were excluded from the study.Additionally, the average time spent on each puzzle across all conditions was 16.7 seconds (lower IQR = 7.71); we excluded any participant who spent less than eight seconds per puzzle for not paying attention.The remaining participants for each study condition were 40 for the verbal, 39 for the expressive, 40 for the neutral, and 42 for the text condition.

Robot persuasiveness
Before conducting the main analysis, the dataset was tested to assess the assumptions of parametric testing.Levene's test indicated homogeneity of variances across conditions for all measures.However, the Shapiro-Wilk test revealed signifcant deviations from normality in several conditions for diferent measures.Due to these normality violations, non-parametric methods were chosen for the subsequent analysis.
Following the signifcant result in the animacy measure, a posthoc Dunn's test with Bonferroni correction was conducted to explore pairwise group diferences 3. The post-hoc analysis did not indicate any signifcant pairwise diferences among the feedback conditions, suggesting that the observed variations in animacy scores were not distinct enough to be diferentiated between specifc pairs of groups.

DISCUSSION
This section analyses the key fndings from the study, evaluating their implications and exploring potential reasons behind the observed outcomes.

Task performance
Our analysis did not reveal signifcant diferences in task performance between the experimental conditions.The feedback frequency, the nature of the task, and participant engagement levels are potential infuencing factors to consider.
Several participants commented that they felt the robot was trying to encourage them to solve more puzzles: "I could immediately tell the test was actually about how the robot afected my decisions rather than the captcha itself." The participants above stopped the study early once they realised the robot was trying to persuade them to solve as many puzzles as possible.It is unclear how many other participants may have fgured out the purpose of the study and acted in accordance with the null hypothesis.
The type of puzzles used in our study (visual puzzles) may have infuenced participants' performance.Participants who enjoyed solving puzzles may have performed well regardless of the feedback they received: "I think I enjoyed it more because I like solving puzzles.It was fun." Visual puzzles were chosen over math or literacy problems to prevent profciency in the subject from afecting how well someone performed.It is not easy to know if the data recorded is infuenced by intrinsic motivation or the feedback from the robot.Future studies may consider using diferent types of tasks to avoid this potentially confounding variable.

Perception of the robot
We used the Godspeed questionnaire subscales for anthropomorphism, animacy, likability, and perceived intelligence to assess participants' perceptions of the robot during the study.Kruskal-Wallis analysis revealed a statistically signifcant diference in animacy between study conditions.However, post hoc analysis showed that the signifcance was above our signifcance threshold of 0.05.Although the evidence is insufcient to conclude a diference in animacy, it does suggest that the type of feedback the robot provides has a subtle infuence on the perception of a non-humanoid robot's liveliness.

Perceived workload
Participants perceived workload across each study condition was relatively low, as observed by the group average scores, all falling in the lower quartile of the NASA Task Load Index.Moreover, the average perceived workload was similar across conditions, suggesting that the type of feedback provided did not signifcantly afect participants' perceived workload.

Limitations
There may be a combination of factors that contributed to the lack of evidence supporting our hypothesis.The following are some key areas that need to be examined: (1) Generalisability: Paid participants were incentivised to take part and may have optimised for receiving payment.Future studies could consider seeking a more diverse participant pool, possibly through in-person recruitment.(2) Feedback schedule: The frequency of feedback may have infuenced the results, as participants may have identifed the purpose of the study.Future studies could implement a variable feedback schedule in addition to having the robot always present on the screen, which may make the process of receiving feedback more natural.(3) Task Engagement: The enjoyment derived from solving puzzles may have overshadowed the feedback's impact.Progressively increasing puzzle difculty, increasing the number of puzzles, and assessing the intrinsic motivations of the participants during the study could help identify and reduce this efect.

CONCLUSION
In conclusion, this study was an online study that aimed to evaluate the efectiveness of non-humanoid SAR in providing positive reinforcement feedback to improve task performance on a cognitively dependent task.
No signifcant diferences in task performance or perceived workload were observed across the diferent feedback conditions.However, the type of feedback provided did slightly impact how participants perceived the robot's level of animacy, although the signifcance level exceeded our alpha level in a post hoc analysis.

Figure 1 :Figure 2 :
Figure 1: Example of a CAPTCHA puzzle used during the study

Figure 3 :
Figure 3: Heatmap illustrating the pairwise post hoc Dunn test for animacy across all study conditions

•
Verbal Utterance Condition: The robot provided positive verbal feedback.This feedback was aimed at reinforcing correct responses or encouraging efort.• Expressive Condition: Feedback was delivered through nonverbal expressions, combining facial expression, gestures and sound efects, because they are a valuable addition to recognising emotion in robots [6].• Neutral Condition: The robot only ofered instructional guidance without any feedback on the correctness of the participants' responses.
• Text Condition: Feedback similar to that in the verbal utterance condition was presented as text on a screen without the robot's presence.