A Systematic Approach to Modeling Structured Behavior in Social Robots

Social Robots (SRs) often require structured models of behavior to facilitate sophisticated interaction episodes in their capacity as coaches, teachers, assistants, and beyond. Techniques in human-centered design can support the translation of human-human to human-robot behavior, but can be challenging and often lead to weak interpretations. We introduce a four-step approach of data gathering, behavior model development, behavior model annotation, and robot implementation to promote a more systematic approach to the development of SR behaviors. The efficacy of this approach was demonstrated in a set of case studies involving 24 participants. We demonstrate how a structured behavior model for a SR was developed systematically by clustering observed human interaction episodes, and that the researchers’ original translations of human to robot behavior models captured task knowledge, but in each case, our model annotation step was necessary to validate the designs and further refine missing aspects.


INTRODUCTION
Social robots (SRs) have become increasingly prevalent across a wide range of settings such as homes, schools, hospitals and public areas.In instances beyond trivial encounters, where the robot is tasked with more elaborate and sustained interaction, such as providing guidance during multiple structured sessions of some kind, its control processes become significantly more complex to develop [20].Moreover, there is a pronounced shift towards machine learning-based adaptability for SRs [2].Such mechanisms require a precise and comprehensive understanding of the underlying robot behaviors and valid interaction paths for such personalization to translate effectively to the real-world.
To ensure the robot's behaviors accurately reflect that of the human interactions they are attempting to simulate, researchers often rely on collaboration with stakeholders early in the design process [8], be it the end user (perhaps a customer in a restaurant or patient in a hospital) or some expert who directs the task (the server or clinician).These humans can hold valuable implicit and explicit task knowledge that can be leveraged to develop a formalized model of robot behavior and can be gathered through activities such as systematic task observations [30,33] or through the collection and analysis of data from participatory design workshops [11,35,38].Yet, the translation of these high-level social interactions into structured behavior models can be challenging and researchers typically rely on only a brief pass of stakeholder collaboration which often leads to weak representations of the scenario [14].
To address this, we propose a four-step approach for structured behavior modeling in SRs.This involves the use of behavior model visualizations, realized through a systematic process of data gathering and behavior model development.These visualizations are then employed as a design probe during a second round of stakeholder collaboration to validate the researcher's original interpretations of the human interaction, before implementing the SR system.We argue that our novel approach of model refinement through visualizations is an essential step when creating the underlying behaviors that define SR interaction.The efficacy of our approach was demonstrated during three separate research activities.We leverage the structured interaction paradigm we have established in rehabilitation and sports coaching as a foundational starting point, demonstrating potential to apply our approach to a broader spectrum of SR domains.Importantly, we understand that researchers will have varying goals and technical needs for their SR design.We aim to provide high-level guidance at each step that could be applicable to a wide variety of applications, but wish to also provide a concrete example of how our approach can be used to craft behavior models for complex control paradigms.Thus, our first case study (Section 4.1) provides a technical discussion of how our approach can be applied to inform SR behavior from clustered observations of interaction episodes, with a goal of using reinforcement learning for ongoing personalization.Subsequently, we provide concise overviews of our second and third case studies (Sections 4.2 and 4.3), offering brief yet illustrative insights into the applicability of our approach in other domains.
Researchers considering the use of our approach should note that certain criteria are conducive to its application.Firstly, the task under consideration should be structured in nature, following a sequential flow of events or behaviors.Secondly, the task should involve embodied interactions that encompass a strong social component.Lastly, the humans involved in the task rely on implicit knowledge to perform it effectively.These criteria serve as guideposts for identifying scenarios in which our approach is particularly valuable.

BACKGROUND 2.1 Behavior Models for Social Robots
In current research, behavior models for SRs are defined broadly as some resource a robot references to guide its behaviors as the interaction progresses.Nocentini et al.'s taxonomy of behavioral models for SRs, for instance, considers any and all technical components, social constructs, and theoretic methods that relate to how the robot interfaces with, learns from, and empathizes with its world, as its behavior model [26].Our definition of behavior model in this paper is more strictly concerned with the robot's technical underlying representation of the interaction episodes.We denote an episode as a self-contained and repeatable start-to-end interaction encompassing structured flow of possible behaviors that can be executed by the robot.We can imagine this as a session, class, appointment, lesson, or some other social engagement.In an episode, actors can be seen to exhibit various behaviors: distinct segments of the interaction, encompassing an action or a set of actions that contribute to achieving a specific objective within a broader task context.Behaviors are typically high-level task segments, which have domain-specific instantiations that can manifest as physical movements, gestures, gaze, speech, and so on.
The necessity for elaborate behavior models such as this becomes most apparent in scenarios characterized by structured and multifaceted interaction episodes, which can cover a wide range of SR roles.

Coaching.
Numerous studies have invested in SRs to coach humans towards skill improvement [19].One such robot was developed by Fasola and Matarić to guide older adults through a 20-minute upper arm exercise session [10].The system's behavior module managed the flow of interaction and orchestrated behaviors such as exercises, games, breaks, and introductions.The underlying model was represented as a finite-state machine.Similarly, Sussenbach et al. describe a SR cycling instructor [33] which employed macro-level behaviors such as warm-up, workout and cooldown, and micro-level behaviors with specific verbal utterances such as encouragement.In the design of their behavior model, the authors developed a behavior tree informed by observational data from four real-world cycling sessions.In the wellbeing space, Spitale et al. utilized the QT robot to deliver positive psychology exercises to employees in the workplace [32].The robot coach used pre-scripted interaction flows, including an introduction, deployment of the exercise, and question-asking.
2.1.2Teaching.Park et al. describe a personalized companion for young learners aged 4-6 who interacted over a three-month period [27].The SR's behavior model was designed to mirror realworld story-telling.High-level behaviors included narrating stories, displaying illustrations, asking diverse questions, and inviting the child's story retelling.Reinforcement Learning (RL) was used to adjust the complexity levels of the storybook.In work by Kennedy et al., a SR was employed to teach a lesson on prime numbers [16].The behavior model was structured to flow through a sequence of the math lesson, as interpreted from human teacher experts.SR behavior segments included leading subsections of the larger lesson and prescribing tests.

Assistance.
In the domain of assistive living, Moro et al. show how Learning from Demonstration (LfD) and RL can be combined to teach robots complex assistive living tasks [24].Specifically, the authors present a tea-making task, defined as a set of highlevel, structured behaviors (e.g., instruct user to put teabag in cup).Once learned through LfD, the underlying behavior model was interpreted as a behavior tree designed by human experts directly demonstrating the task.Similarly, Robinson et al. present a SR which implements a MAXQ hierarchical RL algorithm to learn a userdressing assistance task [29].Thus, robot behaviors are designed as high-level task segments such as instruct, engage, fasten (straps), correct, etc., and the behavior model is visualized as a MAXQ task graph.

Harnessing Human Behavior
Human-centered design has long been recognized as an important process toward achieving more technically sound applications that are deeply attuned to human needs [37].Numerous instances exist where systems are successfully trained on human data to construct behavior models, for example, in autonomous driving [3], conversational agents [15] and video games [5].Yet, modeling efforts can be challenging when tasks, such as those frequently attempted by SRs, rely on high levels of abstract reasoning [36].Additionally, these systems often require large datasets gathered through task observations to the capture implicit knowledge in human behavior, a luxury that is expensive to ascertain.Therefore, the HRI community has explored various alternative techniques to address the challenges associated with modeling SR behavior.

2.2.1
Low-code/no-code approaches.Substantial research has demonstrated how low-code or no-code systems can aid the development of behavior models through stakeholders directly crafting behavior [13,31,34].Winkle et al. had a personal trainer develop a heuristic behavior model for a SR running coach (Pepper) and then manually correct the robot's behavior during interactions [39].A different Figure 1: Our approach steps to modeling structured behavior in social robots work created cards that each represented different robotic behaviors that neuropsychologists could use to construct personalized treatments using a SR for people with mild cognitive impairment [17].Although these methods capture explicit knowledge of domain experts, they may fail to capture the implicit knowledge demonstrated by humans during actual interactions, which often differs from stakeholders' interpretations of their actions across multiple domains [12,23].

Bodystorming.
Physically acting out interactive scenarios through roleplay, known as bodystorming, has been considered as an effective technique towards SR behavior modeling [1,4].Porfirio et al. demonstrated how their Synthe system can automatically capture human task behavior and generalize multiple user traces into one behavior model through program synthesis [28].Such an approach demonstrates potential for rapid prototyping, however, the authors evaluate their system on package delivery and shop information tasks.When confronted with more specialized roles characterized by complex structures, researchers will likely lack expert knowledge to effectively perform accurate interactions.Furthermore, the synthesis of a single behavior model is unlikely to capture the diverse variations of behavior styles exhibited by coaches, teachers, and so on.[8], informing aspects of embodiment [6,11] and interaction design [11].For behavior modeling, this is less common.Winkle at al. uses a participatory design session to inform the action and input spaces that formulated an environment for an interactive machine learning approach to their couch-to-5k robot instructor [39].Although such co-design activities can gather extrinsic task knowledge effectively, there may exist a gap between what is actually performed in reallife, and the translation of subjective domain-based knowledge can be challenging to smoothly translate into a behavior model for a SR.

Participatory design. Design activities such as stakeholder workshops have been heavily used in SR research
While existing techniques towards harnessing human behavior for SRs offer valuable insights, they often fall short in capturing the intricate variations of behavior styles inherent in structured roles such as coaches and teachers.Furthermore, we know of no existing work that 1) combines the implicit knowledge captured in learned behavior models during task observations with the explicit knowledge gained during a second round of more direct stakeholder engagement, and 2) attempts to formalize a systematic approach such as this for use throughout the HRI community.

DESIGN APPROACH
We present an approach to modeling structured behavior in SRs.Shown in Figure 1, our approach follows: 1) an initial data gathering step to efficiently, but systematically record human-human interactions; 2) a behavior model development step, translating and visualizing task knowledge into a structured behavior model(s) for SR control; 3) a behavior model annotation step, bringing human experts back to the table to validate and refine the researcher's initial interpretation of the scenario; and 4) a final robot implementation step demonstrating how such behavior models developed this way can be realized.We present the general design approach first, and then discuss in Section 4 our experience in applying these to exemplary domains in SRs for rehabilitation and sports coaching.

Data Gathering
The outcome of this first step is the acquisition of a well-organized dataset that captures observational data describing the human interactions that the proposed SR aims to facilitate.Specifically, capturing the sequential flow of behaviors, or action trajectories, during a human-human interaction episode is of key focus here.For this, systematic recordings of human observations are recommended, a method which has consistently proven to yield rich and structured datasets of human behavior across diverse domains [22].
It is worth noting that there likely exists a multitude of potential paths through the interaction of interest, much like the variation in teaching styles among educators.Here, the researcher has the discretion to determine the appropriate number of observation episodes to include in the dataset.For instance, in some cases, a single baseline flow may suffice, with the expectation that future learning mechanisms will adapt and personalize SR interactions.Importantly, our approach later introduces an iteration of stakeholder collaboration, directly refining the implicit behavior data gathered during this step, thus relieving the need for excessively large data collection.Moreover, human experts will perform tasks in different ways depending on their own characteristics and differences in environment or other users.We advise data gathering from wide variety of human subjects to limit bias and uncover different interaction styles.
The choice and granularity of the behaviors captured depends on the specific application and the researcher/stakeholder's judgment regarding significance.For example, consider a server in a restaurant.In this scenario, behaviors may manifest as high-level interaction phases such as approaching tables, providing an overview of the menu, taking orders, and so on.However, certain applications may necessitate a model that consists of more granular behaviors, such as recording drinks separately from food and then taking allergy requirements.Notably, for certain task contexts there may exist some instrument that can be harnessed to record human behaviors, as we demonstrate in our case studies.Researchers may utilize these where appropriate, or otherwise allow behaviors to naturally manifest and evolve as observations take place.The decision on what behaviors are captured could, in the case of the example, be determined by the restaurant staff, customers, the research team, or a fusion of multiple.
One can imagine a finalized dataset consisting of, for example, a set of ten high-level behaviors relating to the task, the trajectories of which these occurred in the observed episodes, and which side of the interaction relates to which behavior (e.g.user and expert).

Behavior Model Development
This step focuses on the processing of the dataset gathered in Step 1 to produce a behavior model or set of models representative of the researcher's initial understanding of the scenario.Researchers may wish to take different approaches to model development depending on the ultimate objectives of the system.Subsequently, the software representation of the model may take various forms.Behavior trees are a standard approach to organizing sequential behavior logic [7], however, finite state machines and graph-based models are evident in SR research.For researchers looking to generate a single model of SR behavior, averaging observations by aggregating action trajectories, or using an approach such as program synthesis [28] may suffice.However, for the SR roles of discussion in this paper, there likely exists multiple styles, stages, or types of interaction that must be discovered.In such cases, the application of unsupervised learning techniques, specifically clustering, can prove highly useful in categorizing these nuanced patterns [30].
Machine learning approaches, particularly Reinforcement Learning (RL), are increasingly being utilized for SR behavior control, given their capacity for adaptation and personalization in complex task scenarios [2].When implementing Step 2 of our approach with the aim of future RL personalization, researchers can use the developed behaviors to formulate appropriate environment dynamics.A more interesting consideration is how to begin interaction.Starting with randomized or default behavior and allowing the SR full agency to explore various interaction strategies and styles with the user creates challenges in real-world implementation.Specifically, the cost associated with human interaction creates a problem of sample efficiency that slows learning [40].This makes such an approach unpractical, particularly in scenarios involving potential risks (e.g., healthcare).Performing clustering the acquired dataset from Step 1, however, can discern patterns of distinct styles, techniques, or methods of task delivery (e.g. an educational robot teaching a class).This allows for the matching of user and environment profiles (e.g., the student's ability), enabling the provision of more accurate starting policies for the robot's behavior [30].
We substantiate this in our case studies using a clustering technique to discern coaching styles for sports coaches and therapists.We focus on RL due to its relevance in current SR research, but we stress that the overarching process in Step 2 can be followed for other techniques outside of RL.Finally, the outcome of this step should be a set of one or more visualizations that represent the model as an interaction episode or set of episodes.An example can be seen in Figure 2.

Behavior Model Annotation
In Step 3, a collaborative session with stakeholders is held to facilitate annotation of the models generated and visualized in Step 2. During this step stakeholders are shown the behavior model visualizations and asked to undertake a one-to-one interview or group workshop with researchers where they provide detailed criticisms and enhancements to the researchers original interpretations of the models.These stakeholders should have implicit knowledge of the scenario that has been modeled, much like those involved in Step 1.This activity gives rise to a host of new opportunities simply not possible during human observations alone.For instance, experts can validate episode flows, highlight specific aspects of importance to provide deeper detail, design specific interactive actions such as utterances or gestures, and point to other interaction features that may have been missed.
Since the initial model interpretations are developed for use by the research team, the visualizations in this step can be re-designed to improve clarity (e.g. by using clear behavior terminology, incorporation of a key, reduced clutter).In short, the behavior model annotation step involves a collaborative exchange between the stakeholder(s) and researchers in the format of a workshop or oneto-one interview, where the visualizations are used as a probe to facilitate discussion while acting as a data capturing medium for rich annotations.

Robot Implementation
Step 4 concerns the process of incorporating the behavior model(s) developed in Steps 1-3 into the final SR system to facilitate the planned HRI scenario.Much like Step 2, in which the structure and representation of the behavior model can take many forms, so too the implementation should reflect the underlying model appropriately.For example, tree-based models could utilize existing frameworks (e.g., the Behavior Tree Engine for Python1 ) to capture the information described in the visualized tree, and RL libraries can assist environment crafting for robot control applications (e.g., OpenAI Gym2 ).Evaluation of the system is also included in this step.At this stage the stakeholders in question will no longer be domain experts, but will be the target end users for the system.We recommend long-term evaluations in the target environment wherever possible, with close ongoing involvement from stakeholders.This is particularly important when considering ongoing personalization of the behavior model as adaptions will take time to manifest.

CASE STUDIES
The effectiveness of our approach emerged through various research activities which we now go on to discuss in the following three case studies, showing how our approach is not only theoretical but translates effectively into real-world implementations of SRs.

A Social Robot for Squash Coaching
The first research activity investigated the design of an adaptive SR coach for individual squash training.Full results have been previously published of the in-depth process of task observations, behavior clustering and stakeholder interviews [30] so this section focuses more on the use and effectiveness of our design approach to produce functional behavior models.

Data Gathering.
To produce a structured dataset of squash coaching episodes, systematic observations of eight professional squash coaches were undertaken, each conducting two one-toone coaching sessions with players of varying abilities in different squash clubs across the country.To record behaviors, the Arizona State University Observation Instrument (ASUOI) [18], a popular observation instrument in sports coaching, was used.This instrument focuses on 16 high-level behaviors related to sports coaching, some of which can be seen in Figure 2, and was used to produce task trajectories from 16 sessions (episodes) which averaged 39 minutes.Behaviors were systematically recorded on paper by the observing researcher as they occurred during the coaching session.
The observed sessions comprised sequences of behaviors of varying length used by the coach (e.g.pre-instruction, questioning, praise, concurrent instruction (positive), positive modeling, post instruction (positive)).For the purposes of creating a dataset usable in Step 2, these behavior sequences were represented as transition matrices, which captured the transition probabilities between the 16 high-level behaviors.

Behavior Model Development.
The researchers applied clustering to the recorded dataset, resulting in the identification of six distinct models of coaching styles.The clustering method used was an adaption of the expectation maximization based algorithm [25], with the M-step altered slightly to account for the case where all sequences grouped into a given cluster had 0 transitions between the nodes in question.The data points used in the current work were the behavior sequences observed in Step 1, represented by their transition matrices.The code used for clustering is available on GitHub3 .It should be noted that we did not go on to use inverse RL (IRL) to determine a reward function representative of each user type as was done by Nikolaidis et al, because only the behaviors of the coach and not the player had been gathered.This meant there was no way to infer a new user's type online so we instead used a personalization approach informed by the data collected from domain experts in Step 3, which is explained in Step 4 (Section 4.1.4).
Clustering the data produced behavior models that could be implemented in a robotic system to choose which action to take at each timestep of a coaching session.One of these is visualized as a Markov Decision Process (MDP) in Figure 2, and all six can be seen on GitHub 4 .In these visualizations, the boxes represent behaviors used by the physiotherapist and a larger box indicates that the behavior happened more frequently.The arrows between the boxes represent transitions between behaviors and a thicker arrow with a larger number represents a more frequent transition.The MDP representation was chosen to facilitate the planned RL approach to personalization.The code used to produce the visualizations of each model made use of the jgraphx library and is available on GitHub5 .

Behavior Model Annotation.
For this step online, individual, semi-structured interviews were conducted with 9 squash coaches.The inclusion criteria used was: 1.a) a minimum of level 2 coaching certification OR b) a minimum of 5 years squash coaching experience, 2. have coached squash on at least a weekly basis for the last year, 3. have worked with both senior and junior players, and either international or developmental players, in the last year.The majority of participants participated in Step 1 described in Section 4.1.1 but two new participants were also recruited.Discussion and collaborative annotation of the 6 behavior graphs was conducted using the online whiteboard tool Mural (overview shown in Figure 3).Each interview was recorded and transcribed for later analysis and the study received full ethical approval from the University's ethics board.
With regard to findings, the model annotation step gave rise to a considerable number of recommendations around the design and use of several unforeseen aspects of the initial models.For instance, eight of the nine squash coaches highlighted the need for positive reminders during play but stressed that the majority of explanation should be done between sets of shots.The behavior models were also able to facilitate the stakeholders' discussion of questioning at different stages of an episode.For example, near the start of a session questions should focus on breaking down a particular skill to make it understandable for the player.But when used alongside post instruction behaviors, questions are more likely to be used to highlight the effect of a particular skill on the shots played.
As well as focusing on individual behaviors, participants were also invited to compare and contrast the behavior models as a whole.These discussions helped to identify situations that each model would be most suited to, thus giving strong indications of how personalization of robotic behavior could occur during HRI.
4.1.4Robot Implementation.The system was implemented on the Pepper SR platform, which is used to verbally guide a user through an individual squash training session.The system receives performance data in the form of a numerical score provided by a squash racket-mounted Interial Measurement Unit (IMU) sensor that represents how closely the player's swing matched that of a professional player on the last shot they played.Feedback and coaching is provided to the player using the developed behavior models.
The underlying model of each coaching style is represented in the SR system as a  (  ,   +1 ) transition matrix where  denotes a behavior and  + 1 denotes the next timestep.An appropriate choice (based on user information and training context) is made as to which of the clustered models to deploy at the beginning of an interaction episode.At each timestep, the system selects a behavior from the next distribution as defined by the  matrix.Therefore, at nodes in which a choice point is encountered (i.e.there is more than one possible action) the system selects an action using weighted randomness defined in the  matrix.Specific utterances are formulated to incorporate user performance data and the current stage of the interaction session to coach a player as they conduct their training.
Although long-term evaluation results have not yet been published, the system has also been implemented and evaluated to include RL for further personalisation.A SARSA lambda RL algorithm, with a reward function crafted around technical swing performance detected by the IMU sensor and an engagement metric (specifically the user's responses to the system's questions), adapts the online policy to learn different coaching styles suited to each individual user.OpenAI's Gym library 6 was utilized to facilitate the creation of the learning environment.The code for the full implementation and RL process is available on GitHub7 .This implementation was created for use on a specific robot and the full system isn't a general purpose application that runs without the hardware (robot and IMU) used in testing.Please refer to the README file on GitHub for details of the parts of the system that can be run in simulation.

A Social Robot for Stroke Rehabilitation
The second case study describes the process of developing a SR for guidance during one-to-one physical recovery sessions after stroke.We used the same process for data gathering and behavior model development as described in the previous case study (Section 4.1.1).We observed 10 practicing stroke physiotherapists using the ASUOI and found the behaviors of this tool to be transferable from sports coaching to the physiotherapy context [30].An example model produced from clustering the observed behavior sequences can be seen in Figure 4.
8 professional stroke physiotherapists participated in online, semi-structured interviews.The inclusion criteria used to select appropriately adept physiotherapists were: 1.a) a minimum BSc physiotherapy or related subject OR b) a minimum of 5 years physiotherapy experience, 2. have administered stroke rehabilitation on at least a weekly basis for the last year.
Concerning results, annotating the behavior graphs through Mural during discussions with stakeholders again helped validate the use of a robot in assisting stroke patients between sessions with a physiotherapist, while building more refined behaviors on top of the initial representations of the behavior models.For instance, upon seeing the large concurrent instruction boxes in many of the models, three of the physiotherapists had suggestions on when, how, and with which users to utilize this behavior.In contrast to the squash coaches, the use of questioning by a stroke physiotherapist is mainly used to check understanding or ask how a particular exercise felt to the patient in terms of pain level or difficulty, regardless of when they are used within an episode.Such aspects were not caught during initial observations, but would go on to be included in the implemented HRI.
The resulting behavior models where included in a final SR system that was used to interact with 3 stroke survivors during 15-minute rehabilitation sessions.The implementation itself was very similar to that used in the SR squash coach, with the underlying models represented as transition matrices and personalization achieved by selecting the best-fit model for each user.Various aspects uncovered in our model annotation phase as informed by the physiotherapists were included, such as nuances in the verbal utterances and physical movements.

A Social Robot for Upper-Limb Fracture Recovery
Our final research activity aimed to develop a SR to provide patient support and engagement during 20-30 minute physical recovery sessions following shoulder fractures.We leveraged the dataset created in Case Study (Section 4.2) due to the notable similarities in We developed two distinct, composite models to reflect polarizing coaching styles that we wanted to explore.One centered around a positive coaching style consisting of behaviors such as positive modeling (demonstrating how to perform exercises correctly) and praise (Figure 5), and another on negative coaching behaviors like negative modeling and consolation (Figure 6).Furthermore, we included times during the interaction where user responses to each physiotherapist behavior would occur, highlighted using yellow markers in Figure 5, to improve interpretability.
We recruited 7 physiotherapists from a mix of the UK's National Health Service and academia to undertake an in-person, group workshop.To ensure reliable insights, we specified that each participant: 1) be a fully qualified physiotherapist; 2) have a minimum of 2 years experience; 3) currently work with upper-limb fractures or have done so within the past 3 years.
Findings revealed how positive behavior strategies emerged as pivotal, emphasizing positive reinforcement, motivational language, trust-building through non-verbal interactions, adaption to the patient's emotional state, and the use of humor.Interestingly, progress feedback was not captured in the initial dataset and therefore not included as a behavior in the models but emerged as a recurrent theme in participants' annotations.In summary, the behavior model annotations played a crucial role in refining the interaction episodes for stakeholders.They actively engaged with the model visualizations, identifying specific behaviors of interest and annotating where necessary.Furthermore, participants found the models highly interpretable, commending their ability to capture the flow of a session.As depicted in Figure 6, participants also began crafting their own refined versions of model connections, capturing granular insights from their personal experiences and quoting possible verbal utterances for the proposed SR.
With regard to Step 4, robot implementation is currently underway.Future work aims to evaluate this system in long-term HRI deployment using the underlying models learned during the previous steps in our approach to act as a basis for action-based learning.

DISCUSSION
Our three case studies show how our theoretical approach to modeling structured behaviors in SRs can not only be practically applied to real-world research, but also yield more informed designs of underlying behavior.Importantly, in each case study, the initial human task observations were kept purposefully short to avoid resource intensive generation of large datasets.From this implicit knowledge, we demonstrate how the researchers then developed their first representations of SR behavior models that captured their current understanding of the structured task.Next, our model annotation step featured a second round of stakeholder collaboration to further refine the models from explicit expert knowledge.Notably, in each use case, this step uncovered new, important information on multiple behavior aspects that where not recorded in the researchers' initial representations.
Each of our case studies describe SRs with a goal of achieving future SR personalization through machine learning mechansims, specifically using RL to adapt coaching and therapy styles to different users.Machine learning techniques have become a prominent theme across modern HRI applications [21], yet typically suffer from the sample efficiency problem [9] due to the inability of accurately model and simulate human behavior pre-deployment.Our approach by no means attempts to solve this issue, but as demonstrated in our case studies, can offer researchers a structured process form which to follow when creating initial behavior models from which further adaption can build upon (i.e.informing environment dynamics in RL).Notably, from our human observations and behavior model annotations, we were able to craft basic 'starting' policies represented by each of our models which were then used as the basis for RL in the SR squash coach.

Limitations
It is worth noting that the datasets procured in Step 1 may include interaction data from one or more stakeholders involved in the task (e.g.clinician and patient, teacher and student).In our presented case studies, the data of only one stakeholder role (that of the coach) was recorded.This may seem surprising in comparison to other data-driven approaches which would also represent the distribution of clients' responses to coaches behaviors.However, there were practical, methodological, and ethical issues that led the researchers to choose to omit this data.Step 3 of our approach enabled us to compensate for the absence of this personal client data through directly querying experts.But we stress that our approach could also be employed in cases where all interaction stakeholders are modeled.
Additionally, although in different domains, our case studies have similarities in that the SR takes the role of a coach and makes use of the same behaviors.Yet, our approach has potential to be generalizable and researchers should test our approach in wider SR domains, with behaviors that fit their specific use case, with a suitable level of granularity to capture the correct depth of interaction, as advised in Section 3.1.

CONCLUSION
The ability to harness and model expert human behavior efficiently is a fundamental component to SR development but remains a challenging task for the HRI community.Through three case studies involving 24 participants in the domains of sports coaching, stroke rehabilitation and upper limb recovery, we have demonstrated how structured behavior models for SRs can be created through a novel 4-step approach, yielding refined results that captured the implicit (Step 1) and explicit (Step 3) knowledge of human experts.We have shown how this approach can be practically applied end-to-end, and how a behavior model annotation step revealed additional insights about the interaction beyond the findings from initial human observations, helping to inform a reinforcement learning based environment for control.
Scenarios which necessitate social robots in more sophisticated and prolonged roles are becoming increasingly desirable.Thus, the HRI community must be equipped with updated approaches for designing and developing structured behavior models which can facilitate such applications.Our approach offers a robust framework which encourages more accurate modeling of SR behavior and we invite the HRI community to utilize and adapt this beyond our examples in the coaching and rehabilitation space into a wide range of SR applications.

Figure 2 :
Figure 2: A social robot's structured behavior model developed for use in squash coaching (Case Study 1).This represents one of many coaching styles discovered during our Behavior Model Development step uisng a clustering technique on the dataset from Step 1.

Figure 3 :
Figure 3: The online collaborative whiteboard tool Mural was used to annotate the behavior graphs in Case Studies 1 and 2.

Figure 4 :
Figure 4: An example behavior model developed for use in Case Study 2

Figure 5 :
Figure 5: One of the two behavior graphs used during our model annotation step for designing a SR for upper limb fracture rehabilitation

Figure 6 :
Figure 6: An example behavior graph sheet from the workshop with the physiotherapist annotations