Developing Autonomous Robot-Mediated Behavior Coaching Sessions with Haru

This study presents an empirical investigation into the design and impact of autonomous dialogues in human-robot interaction for behavior change coaching. We focus on the use of Haru, a tabletop social robot, and explore the implementation of the Tiny Habits method [10] for fostering positive behavior change. The core of our study lies in developing a fully autonomous dialogue system that maximizes Haru's emotional expressiveness and unique personality. Our methodology involved iterative design and extensive testing of the dialogue system, ensuring it effectively embodied the principles of the Tiny Habits method while also incorporating strategies for trust-raising and trust-dampening. The effectiveness of the final version of the dialogue was evaluated in an experimental study with human participants (N=12). The results indicated a significant improvement in perceptions of Haru's liveliness, interactivity, and neutrality. Additionally, our study contributes to the broader understanding of dialogue design in social robotics, offering practical insights for future developments in the field.


INTRODUCTION
The use of social robots in behavior change coaching presents a unique intersection of human-robot interaction (HRI), psychology, and AI.Social robots have been previously utilized in behavior change coaching, with various studies underscoring their potential in this field [3,5,19].Social robots offer new ways of interaction, learning, and behavior adaptability that can complement traditional human-led coaching methods.
Trust emerges as a fundamental element in the realm of HRI, particularly within the scope of behavior change coaching.The concept of trust calibration, which involves aligning a human's trust level with a robot's actual capabilities [18,24], plays a pivotal role.Proper calibration of trust is crucial for accurately reflecting a robot's competence and reliability, thereby preventing over-reliance or under-utilization of robotic systems.This calibration is integral for effective and safe interactions [8].
Previous work suggests that one way of raising and maintaining the level of trust in HRI is the robot's emotional expressions, encompassing aspects such as emotional routines, voice modulation, and timing.These elements significantly influence the robot's perceived trustworthiness and level of engagement [6,26,34].
In behavior change coaching, where personal and emotional interactions are paramount [13], the rich expressivity of a robot can profoundly impact the coaching effectiveness.The importance of emotional expressions in enhancing the coaching experience was demonstrated, for example in [20], utilizing the Tiny Habit method and the social robot Haru.
The study revealed that emotional expressions in Haru significantly enhanced the perceived quality of the lesson and the retention of the habit practiced.Additionally, Haru's presence, particularly when using emotional behaviors, led to higher confidence in participants regarding their behavior change and increased the likelihood of habit retention compared to control groups presented with the same information via web content [19].
However, this study had various limitations: they relied on a wizard operator and used only virtual simulation of the robot.These constraints potentially limited the depth of interaction and the naturalness of the robot's behavior.
Our study addresses these limitations by taking a novel approach.We developed a new fully-autonomous dialogue for Haru, aiming to harness the full emotional potential of the physical robot, allowing for more organic and realistic interactions, critical in the context of behavior change coaching.By integrating emotional routines and new voice design [29] into Haru's autonomous dialogue, we aim to deepen the level of engagement and trust in HRI.
Moreover, our research extends the understanding of how emotionally intelligent interactions influence users' perception of the robot.In doing so, we seek to bridge the gap between the theoretical potential of social robots in behavior change coaching and their practical application in real-world scenarios.

METHOD
In developing the new dialogue for behavior change coaching, we embraced the Design Thinking method [15].This method is renowned for its hands-on, user-centric approach to problem-solving.
Our primary goal was to craft a dialogue that leverages the Tiny Habits method, explicitly tailored for Haru, a social robot.We focused on utilizing Haru's unique capabilities to enhance the effectiveness of the dialogue in behavior change coaching.This section will delve into the development journey of the dialogue and highlight the key design decisions made throughout this process.

Behavior change coaching
We selected The Tiny Habit Method [13] as a framework for behavior change coaching.The Tiny Habits method is grounded in several theoretical principles from behavior change science -e.g., [2,32,33].It is based on Fogg's Behavioral model [14], which states that behavior is a product of three elements: motivation, ability, and prompt.For behavior to occur, a person must be i) sufficiently motivated, ii) has the ability to perform the behavior, and iii) have the proper trigger has to be present.The Tiny Habits method focuses on addressing all those aspects.It involves anchoring new, small habits to existing routines, simplifying aspirations into manageable actions, and rewarding completion with positive emotions to integrate these habits into daily life.

Social Robot
We utilized Haru, an experimental tabletop social robot, as our target robot.Haru's design, inspired by animated characters [16], features two eyes with LCD screens and moving rims equipped with addressable LED strips alongside an LED matrix mouth.Haru boasts seven degrees of freedom, including eye tilt, rotation, inner eye movement, base rotation, and body leaning, allowing it to create dynamic emotional signals [16,17].Haru's design emphasizes the upper and middle body for expressivity, based on studies showing these areas are crucial in conveying emotions [17].Without hands, Haru's emotional expressions are conveyed through eye and body movements, similar to the emotional affordances of hands and arms.
Haru's expressivity is enhanced by its custom Text-to-Speech (TTS) voice, featuring a range of vocal genres for emotive communication.The voice includes seven vocal genres: default, cheeky, high-energy, question, sad, serious, and whiny, for a broad emotional range [29].Haru's library of over 100 routines also supports multimodal expressions, including body and eye motions, animations, and sounds, covering a full spectrum of emotions [17].
The robot is using Google Speech-to-Text API for automatic speech recognition (ASR), and Intent Classification and Entity Recognition Models developed by Honda Research Institute.The dialogue structure is then loaded into behavior trees, with all the components interacting with the robot hardware using ROS [27].This system, which combines off-the-shelf components with custom robot elements, was previously tested in other applications [28], such as small talk scenarios and hospital dialogues.

Implementation Process
Our development, guided by Design Thinking, involved iterative refinement of the dialogue.Initially, we assessed the existing dialogue [19], identifying key issues to be changed, such as the verbosity of the dialogue, focus only on yes/no questions, or the need for a human operator to make all the decisions in the dialogue.Our primary goals were defined as to develop an autonomous dialogue aligned with Haru's personality [29], incorporating trust cues and enhancing emotional expressions, which we then ideated and converted into a first dialogue prototype.Pilot testing, conducted over six distinct iterations with volunteers, primarily engineers without previous knowledge of the dialogue, was instrumental in evolving the prototype into the final dialogue.The process focused on enhancing natural emotional responses and trust-building strategies in the dialogue flow.
We first describe the the new dialogue developed.Subsequently, we'll delve into specific implemented refinements, highlighting their potential applicability in other social robotics dialogues.

New Dialogue Development
The final version of the dialogue, in which Haru coaches participants to adopt a Tiny Habit, is divided into four main sections.In the first section (S1), Haru explains the importance of finding the right aspiration for behavior change and then provides suggestions on how to transform this aspiration into a small daily routinethe Tiny Habit.In the second section (S2), the participant and the robot collaboratively identify a routine that can serve as a reminder to conduct the new habit, referred to as the anchor moment.The third section (S3) focuses on reinforcing the newly developed Tiny Habit by celebrating the participant's successful completion of the new habit.In the last section (S4), Haru summarizes the theory and offers additional help before concluding the session.
In the first three sections (S1-S3), the dialogue follows a consistent structure.In the first part (P1), which users can skip, Haru presents an example of how he addressed the material covered in the section in his own life.In the second part (P2), participants are asked how they want to address the section's topic -defining the Tiny Habit, finding the right anchor moment, or the celebration (e.g., "What's your aspiration you'd like to work on?").Based on their response, the robot selects the most appropriate answer from one of the pre-defined categories (e.g., User reply: "I want to start running"; Haru's response: "Awesome!A dash of exercise can spark a fitness firework!One way to get into shape is by scaling down to doing two pushups or putting on your running shoes to get ready.").
In the third part (P3), the robot confirms if the subjects have understood the concept and offers additional coaching tips.Simultaneously, the robot begins capturing potential user responses.The last part (P4) focuses on capturing the user's progress (e.g., their new Tiny Habit).Haru tries to capture the entity multiple times, offers additional guidance and examples of good practice, and provides a default entity for adoption if participants are unsure (e.g., "Here's one last idea for your possible change.You seem a tad stressed.How about focusing on relaxation?Your new Tiny Habit could be, 'I will take three mindful breaths.' Does that sound like something you might use in the future?").If they do not confirm, the dialogue concludes preliminarily.
If the entity is captured during the third and fourth parts of the dialogue, Haru confirms with the user whether it was captured correctly.If the answer is positive, the dialogue proceeds to the next section without the possibility of returning.

Dialogue Refinement
2.5.1 Emotional and Vocal Expressivity .We leveraged Haru's extensive library of emotional routines and a diverse range of voice genres to create a more engaging and human-like interaction.Our decision to implement these routines was grounded in theoretical principles and insights from pilot testing, which revealed a preference for routines that included sound and were shorter than 3.5 seconds, particularly crucial in extended dialogue segments.
Haru's emotional routines encompass the full spectrum of Ekman's basic emotions [9], with multiple variants for each emotional state.We strategically assigned these routines to specific parts of the dialogue, ensuring they did not disrupt the flow, particularly at the end of utterances or after questions.This approach aligns with affective grounding theory, emphasizing shared emotional understanding as vital in human-robot interaction [22].
Other key enhancements in our dialogue design included integration of nuanced emotional expressions, such as back-channels like "Oh!" and nods, to indicate understanding and engagement.Furthermore, we tried to utilize emotional mimicry, adapting to the emotional context of interactions, informed by research on human emotional response patterns [23].
These features are rooted in the concept that robots can express emotions as signals to reveal internal states, based on basic emotion theories.Our methodology also incorporated affective storytelling, where Haru's dialogue reflected appropriate emotional intonations corresponding to the narrative, further enhancing the interaction's intuitiveness and effectiveness [7], and trustworthiness [12].

Empathy.
Our study emphasized empathy in the dialogue design to foster a more sympathetic and relatable interaction.Key to this was Haru's ability to respond to detected problems in the interaction with comforting statements such as "Don't worry, I am here to help you, " "Take your time!" and "No problem.Let's move forward!".This approach is rooted in theories of empathetic design [25,30,31], highlighting the importance of empathy in effective interaction design.By incorporating these empathetic responses, Haru could better connect with users, enhancing the overall trust and effectiveness of the behavior change coaching process.

Situation Awareness.
The concept of situation awareness in human-computer interaction underscores the importance of context for creating meaningful and trustworthy interactions [1,24].In social human-robot interaction, a robot's situation awareness can significantly affect the robot's persuasive abilities [11].Recognizing this, our pilot testing revealed the potential to enhance interaction with improved situation awareness, thus refining the dialogue flow.Haru was programmed to tailor its responses based on the user's current environment and expressed preferences.For instance, if the user expressed a desire to skip a story, Haru responded with, "Let's get right to the point then."Tailored suggestions were also provided in various parts of the dialogues, depending on the user's replies.For example, when the user was searching for the right moment for their new routine, Haru would provide options such as "After I pour my cereal" for mornings, "After I brush my teeth" for evenings, and "After I log into my computer" for work settings.

• Branching Based on User Input
The dialogue was scripted with branches to react more effectively to user inputs.For example, when Haru asks, "Do you like science?" and the user replies, "Yes, it is cool, " Haru responds enthusiastically.If the reply is "I think it is boring, " Haru adapts with a more engaging response, "Oh, so you think it is boring?Let me tell you something."

• Personalization and Entity Capture Override
To provide personalized coaching, Haru utilized captured entities throughout the interaction.This included using participant's names in dialogues (e.g., "Oh, {name}, what a ride we had") and incorporating them into the context of the dialogue (e.g., "Your anchor, along with the tiny habit, might sound something like -'After I finish my sandwich, {Tiny Habit of the person}'").
Recognizing the need to capture entities while also evaluating whether a question required a yes/no response, we implemented an override system.This allowed Haru to capture entities effectively while still enabling users to ask for help or clarification.

Error Mitigation
During pilot testing, we identified the need for a robust help system.This led to the development of a repeat function, enabling Haru to reiterate the last dialogue section upon request (e.g., "Can you repeat that?").Additionally, users were given the option to seek help (e.g., "I don't know, can you please help?"), utilizing the dialogue system's intent recognition feature.Haru also acknowledges and corrects misunderstandings or errors, making statements like, "I'm sorry.I didn't catch that."or "It was tricky.I couldn't record your anchor moment.",using apology as a possible trust repair strategy [10,21].Such features, including a repeat system, enhance the user experience by providing support and fostering a sense of understanding, which is crucial in effective human-robot interactions.

Procedure
To assess the effectiveness of our interventions, we conducted an experiment at Honda Research Institute in Japan.Participants included employees and interns proficient in English, chosen through convenience sampling.This approach allowed us to compare results with a prior study [19].Participants were first provided with a written consent form, ensuring voluntary participation and agreement with data handling.They were then familiarized with the social robot and the behavior change session, with an opportunity to ask questions before starting.
The sessions with a physically embodied Haru, lasting approximately 12 minutes, were recorded using dual cameras: one capturing participant reactions and the other focusing on the robot.Post-session, participants completed a questionnaire to evaluate the session quality.This included selected items from the Godspeed questionnaire [4] focusing on anthropomorphism, animacy, and likeability, alongside the MDMT trust scale [35], and a set of open questions.We also gathered basic demographic data and inquired about prior experiences with robots.

RESULTS
A group of 12 individuals with an average age of 29 years (Stdev=5.72years) participated in the experiment.Most participants were male, constituting 83% of the sample, with 17% identifying as female.Regarding previous robot experience, 25% of participants have only seen robots a few times in reality, while the remaining 75% have had a more direct engagement, either playing or regularly working with robots.The participants' origins were predominantly Asian (58%), followed by North Americans (25%) and Europeans (17%).
In data analysis, conducted using SPSS, we compared our results with the data collected in Condition 1 of the previous experiment [19].Utilizing an Independent Samples T-Test, our findings supported the hypothesis that the changes made in our experiment had a significant effect on the session outcomes.
In the newly developed dialogue, Haru was perceived more positively compared to earlier versions.The statistical analysis revealed that Haru was perceived as significantly less fake (p = 0.042), significantly more lively (p = 0.002), and interactive (p = 0.005).These results indicate a notable shift in participants' perceptions, aligning with the objectives of our dialogue enhancements.
The qualitative data from the experiment with Haru revealed a mix of positive and negative reactions.Participants generally appreciated Haru's expressive characteristics, such as its positive demeanor, movements, and specific routines like chuckling, which added to its human-like appeal.The flow of conversation with Haru was perceived as natural, with an ability to maintain context, and its responsiveness and timing were praised.Visual elements like eye animations and movements also enhanced the experience.Overall, the interaction was described as enjoyable.However, issues with the ASR system were noted multiple times (including cases where the robot 'misunderstood' users and followed incorrect dialogue branches), which likely impacted the overall session evaluation.
See the experiment video for implementation examples: link.

CONCLUSION
Our research presents an advancement in the field of human-robot dialogue design, especially in the context of autonomous behavior change coaching with Haru.Through rigorous pilot testing and evaluation, we developed a refined approach to dialogue design, enhancing situation awareness and trust-building strategies.The implementation of a fully autonomous dialogue system, enriched with nuanced situation awareness, robot emotiveness, and a variety of trust-building strategies, marked a novel approach in this domain.The effectiveness of these interventions was reflected in our statistical analysis, showcasing the positive impacts of these innovations.However, the study also revealed potential flaws, such as issues with the Automatic Speech Recognition (ASR) system, which provide valuable insights for future improvements.
Our work not only contributes to the general knowledge of Haru dialogue design but sets a precedent for future research and applications in robot-mediated behavior coaching, highlighting the potential of empathetic, context-aware, and trustworthy interactions in this evolving field.Furthermore, the knowledge and insights gained from our Design Thinking (DT) approach provide a robust foundation for designing future conversations not only for Haru, showcasing how nuanced and effective dialogue design can significantly enhance user experience in diverse contexts.

FUTURE WORK
For future work, we aim to utilize the developed dialogue as a dynamic platform for continued research, investigating the specific effects of the isolated dialogue improvements on coaching efficacy and trust.Additionally, we might focus on long-term evaluations to understand Haru's sustained impact in behavior change coaching.Addressing technical challenges and further development of the dialogue system refinement might also be future research.

ACKNOWLEDGMENT
This project was funded by Honda Research Institute Japan and the Independent Research Fund Denmark, grant number 1032-00311B.

Figure 1 :
Figure 1: Overview of the Dialogue Structure

Figure 2 :
Figure 2: Illustration of the Tiny Habit Dialogue

Table 1 :
Summary of mean differences and statistical significance for the measured items.