Implementation and Evaluation of a Motivational Robotic Coach for Repetitive Stroke Rehabilitation

Repetitive, individual exercises can improve the functional ability of stroke survivors over the long term. With the aim of providing extra motivation to adhere to repetitive, individual rehabilitation, this paper presents a robotic coach for stroke rehabilitation. Our system uses the Pepper robot and performs one of twelve data-driven coaching policies. The policies were learned from human-human observations of professional stroke physiotherapists and provide high-level personalisation based on user information and training context. A within subjects evaluation of the system was conducted in-person involving short interactions with 3 stroke survivors. The system was able to engage the target end users and there were indications that decreased workload could be possible when using the system compared to exercising alone.


INTRODUCTION
In rehabilitation after stroke, adherence tends to decrease after discharge from hospital due to, among other reasons, a lack of motivation [1].A personalised robotic coach could provide extra motivation to adhere to an exercise routine and therefore increase the potential of functional improvements over the long term.
Previous work has shown the potential of using a Socially Assistive Robot (SAR) to motivate users during physical exercise (e.g.[2]- [4]).However, past systems tend not to offer technical advice on a specific skill or are limited in their personalisation.
We have built on the work of [5], using a novel development and personalisation approach to implement and evaluate a robotic rehabilitation coach on the Pepper platform.We use the term 'personalised' in reference to high-level personalisation to groups of users but acknowledge that continual, low-level adaption to individuals would need added to meet the full personalisation requirement [6].This is the first work to explore this personalisation approach in rehabilitation and complements the results of [7] who performed a similar study in sports training.

Stroke Rehabilitation
Stroke is a sudden and devastating medical condition leaving more than half of survivors with permanent disabilities [9], and almost half feeling abandoned after they leave hospital [10].Individual rehabilitation after a stroke is often done alone, contains repetitive exercises and is performed frequently over a long period of time.Motivation for rehabilitation can be affected by the actions of a therapist [12] [14].In this work, we explore if a robotic coach used during individual exercise could have a similar effect.
Examples of past work in this area include a fully autonomous robotic coach (Baxter) leading stroke survivors through a wire puzzle task [4], and a gamified robotic system (Pepper) being rated highly by participants in a series of stroke rehabilitation tasks over 15 interactions [8].These studies point to the effectiveness of a SAR providing motivation during physical activity.However, the evaluation conducted by [4] focussed more on the efficacy of the wire puzzle task than the robot, and the personalisation conducted by [8] was set up ahead of time by the researchers.

Personalisation of SARs
Personalisation has been suggested as a requirement of a robotic coach [6].One promising method of achieving personalisation to Martin K. Ross, Frank Broz, & Lynne Baillie groups of users is to learn from human demonstrations.In a collaborative packing task, Nikolaidis et al. showed that by clustering human demonstrations into similar styles and applying inverse reinforcement learning over those clusters, it was possible to learn a reward function that was representative of each user type [9].In this work, we evaluate the application of a similar strategy to the more interactive, open scenario of coaching.
By combining data collection methods adapted from sports coaching literature with computational techniques and mathematical modelling, Ross et al. defined a process to formalise human knowledge in the form of 'coaching policies' usable for robotic control [5].A policy refers to a mapping from states to actions.Starting from observations of professional squash coaches and stroke physiotherapists (Ross' study was cross-domain), they used Nikolaidis et al.'s clustering algorithm to generate 12 unique coaching policies that can be viewed as 'behaviour graphs' on GitHub. 1 They then utilised qualitative data obtained through semi-structured interviews with the observed coaches and physiotherapists to make actionable suggestions as to which policies were likely to be more appropriate for which groups of users.The robotic coach evaluated in this work uses these suggestions to select from Ross et al.'s 12 policies and achieve personalisation based on user information and training context.This type of personalisation goes beyond that previously explored in the context of robotic coaching.Most past works have focussed on customising the utterances of a system with the name and performance history of its user (e.g.[10], [11]).While findings suggest that a personalised robotic coach can increase adherence to [10], and enjoyment of [11], interactions with a robotic exercise coach, we went a step further.In this work, we attempted to predict the style of interaction (i.e. the behaviours used by the robot during a session) that each user would prefer.

Overview of the System
The robotic coaching system was implemented on a Pepper robot using NAOqi Python API.It receives data from a human operator using Wizard of Oz (WoZ) techniques, which it uses to formulate appropriate behaviours to coach users during individual rehabilitation sessions.These sessions comprise sets of repetitions of a particular exercise, sandwiched between introduction and feedback sequences from the robot.By performing a range of coaching behaviours similar to those performed by a human physiotherapist, the robot leads its user through their rehabilitation session.Behaviours are primarily animated utterances spoken by the robot, but also include demonstrations via the robot's movements.For example, the robot might perform a pre-instruction behaviour, praise, or ask a question while demonstrating the correct arm position for a certain exercise.Throughout the session, Pepper's tablet screen is used to display subtitles of Pepper's utterances and images of the current exercise.

System Implementation
Figure 1 shows the architecture of the robotic coach.Following are details on the implementation of each part of the architecture.3.2.1 Processing Layer.The processing layer is composed of two main blocks: the controller and the coaching policy.
The controller coordinates everything in the system and communicates with the interface and tracking layers.It is implemented using a behaviour tree, a structure often used for robotic control [12].The behaviour tree structure drives the format of each session, which mimics what was seen during observations of human coaches and physiotherapists in [5].The format involves: introduction by the robot; instruction by the robot to perform a set of an exercise; feedback during the set; and feedback on the set just performed.Within the leaf nodes of the tree (where execution of coaching actions occurs) are calls to the coaching policy module.
The coaching policy selects an action for the robot to perform in its current state.The policy is formulated as a T(a t , a t+1 ) matrix where a denotes an action and t+1 denotes the next timestep.The available actions are 13 of the most frequently used behaviour categories in the observation instrument used by [5].Example actions are given in Table 1.The state space consists of the system's previous action combined with the stage of the interaction.At each timestep, the system selects an action from the next state distribution as defined by the T matrix.
The choice of policy to execute is made at the start of each coaching session.It is based on the user information categories identified by coaches and physiotherapists in [5]: the user's level of impairment (self-rated), number of interactions with the robotic coach (i.e.length of the relationship), motivation for conducting rehabilitation (self-rated), and type of session.Each of these information categories was split into a 'high' and 'low' value, and if the value matched the recommendations given by [8], the policy received a higher score.The policy with the highest score was used by the system during the coaching session.Thus, the behaviour of the robotic coach is based on the raw data obtained from the HHI observations conducted by Ross et al. [5] but the interaction style is personalised to each user.
Once an action has been selected by the policy module, the controller formats it into a robotic behaviour (i.e. an utterance) that incorporates data from the tracking layer.Demonstrates the correct way to perform shoulder openers.Praise "Good" 3.2.2Tracking Layer.This layer consists of a human operator who indicated completion of an exercise repetition during the session.This is the only part of the system was run WoZ techniques and was designed this way to remove the potential for failure of a vision system or sensor.If an appropriate sensing system was developed it could easily be plugged in to the system to replace the human operator and would allow the system to gather much more information usable in its feedback.
3.2.3Interface Layer.The user directly interacts with this layer.Communication between the user and the system is done through Pepper using the robot's touch sensors and in-built text-to-speech technology.Actions selected by the processing layer are performed by the robot and lead the user through their session.The specific utterances and demonstrations that correspond to each action were selected at random from four available options for each combination of action, stage of interaction and user performance.This resulted in over 1,000 possible utterances.
The differences in the coaching policies mean that the robot's behaviour is personalised to the user.For example, some policies contain lots of instruction actions (likely to be chosen for a user who is early in their rehabilitation journey), whereas others would have the robot ask more questions to have the user figure things out for themselves or recall how an exercise was performed in a previous session.

Participants
Independently living stroke survivors (N=3) were recruited with the help of local charities using the following inclusion criteria: 1.
Have a stroke-related arm impairment.

2.
Have the required cognitive ability to provide informed consent for the study.

3.
Are living independently (i.e. in their own home, not in hospital or a care home).Each participant entered a raffle for a £50 Amazon voucher.

Conditions
Three conditions were evaluated during the study: two coaching conditions and one baseline condition.In the Data Selected Policy (DSP) condition, participants interacted with the robot executing the coaching policy chosen using the method described in Section 3.2.1.In the Non-Personalised Policy (NPP) condition, the robot executed a randomly selected policy from the other 11 policies that were not the best match for the participant's information and training context.Comparing these two conditions allowed us to discover the effect of high-level personalisation.The selection of the random policy was performed at the beginning of the interaction.These two coaching conditions are in contrast with the No Coaching Policy (NCP) baseline condition in which the robot told the user which exercise to perform, and when to perform each set (see Section 4.4) but gave no coaching behaviours.It was the closest condition to a regular individual rehabilitation session.

Measures
The following measures were used in the study.The CBS-S and IMI used 7-point scales and the NASA TLX used a 21-point scale.
4.3.1 Coaching Behaviour Scale for Sport (CBS-S) [13].The "technical skills" subscale of the CBS-S was used to measure participants' opinions on the coaching provided by the robot.
4.3.2Intrinsic Motivation Inventory (IMI) [14].The interest/ enjoyment, perceived competence, perceived choice and value /usefulness subscales of the IMI were used to assess the effect of each condition on the participants' intrinsic motivation for conducting an individual rehabilitation session.
4.3.3NASA Task Load Index (TLX) [15] was used as a measure of workload during a rehabilitation session with a robotic coach.

Study Design
The sessions for 2 participants were conducted in the sports centre at the university, and for the other at a local respite centre.The setup used in the study is shown in Figure 2. All necessary equipment was provided and sanitised between sessions.At least 1 week prior to the study, participants were sent an information sheet outlining the study procedure and a consent form.The study received full ethical approval from the university's ethics board.Table 2 shows an overview of the procedure.A within subject design was used with the order of conditions counterbalanced and interactions split across 2 separate days.In each session the robot asked participants to perform 2 sets of 4 exercises (chosen with input from physiotherapists: external rotations with cane, shoulder openers, towel slides, and table-top circles) in a random order.The first set consisted of 10 repetitions and the second of 5.
Table 2: Summary of the evaluation procedure used.

Results
One of the participants opted not to complete the final session (NPP condition) but as they did not experience any problems with the system, and gave permission for their data to be disseminated, they have been included in the analysis shown in Figure 3.
While no statistical significance can be shown due to the small number of participants, the NCP condition performed worse than both coaching conditions in 5 of the 6 NASA TLX measures.Participants perceived the same exercises as more mentally and physically demanding, while perceiving their performance on the exercises as lower when the robot did not offer any coaching.
There were minimal differences between conditions in the IMI scores given by participants.Meanwhile, each condition scored well in different aspects of perceived coaching effectiveness, as measured by the CBS-S.Interestingly, the NCP condition was perceived to give more advice, but the NPP condition scored highest in reinforcement and technical feedback, whilst the DSP condition was perceived as the best at giving immediate feedback.

DISCUSSION AND CONCLUSION
This paper has presented the evaluation of a novel robotic coach for individual rehabilitation after stroke.We acknowledge the lack of participants as a limitation.However, the system was able to engage three stroke survivors in rehabilitation with limited WoZ input from a human operator.This shows the effectiveness of the implementation approach proposed by Ross et al. [5], and adds to the results obtained in squash coaching by [7].
Interestingly, personalisation of the system's coaching policy did not result in improvements in motivation or perceived coaching effectiveness.It is important to note that the NPP condition also used a policy based on human coaching data.The rigid experimental setup and limited number of participants resulted in few differences in participants' user information and training context and therefore there was little personalisation possible in the DSP condition.Future work will learn from studies such as [16] by further adapting the selected policy to individuals over time using reinforcement learning.Evaluations of the system with more participants and over the long-term is also a clear next step.

Figure 1 :
Figure 1: Architecture diagram of the robotic coach.

Figure 2 :
Figure 2: The experimental setup shown from the researcher's perspective at the back of the room.

Figure 3 :
Figure 3: Results from the 3 scales.Values are mean.

Table 1 :
Implementation and Evaluation of a Motivational Robotic Coach for Repetitive Stroke Rehabilitation HRI '24 Companion, March 11-14, 2014, Boulder, CO, USA Examples of actions and robotic behaviours.