Assessing the Effectiveness of Using Chatbots for Positive Psychological Intervention: A Randomized Control Study

Abstract: Positive psychological intervention (PPI) is an effective way to increase happiness and make life worth living. To make PPIs publicly available and accessible, researchers have turned to information technology to provide low-cost and automatic human-computer interaction solutions. Although several studies have confirmed the effectiveness of chatbot-based PPIs, the understanding of its mechanisms is still limited. Therefore, this study aimed to investigate how adaptiveness and interaction, two important chatbot features, contribute to the effectiveness of chatbot-based PPIs. Two randomized controlled experiments were conducted with 154 participants. To test the importance of adaptiveness, we divided the participants into three groups (robot recommendation, autonomy, and control groups) and provided PPIs based on different rules. To investigate whether interaction with a chatbot could increase the effectiveness of PPIs, we divided participants into two groups (robot interaction and control groups). Subjective well-being increased significantly for participants in the robot interaction group [t(50)=2.748, p=.008<.01, Cohen's d=.78]. These participants also showed lower signs of depression, anxiety, loneliness, and negative emotions than the control group. Participants in the robot recommendation group showed increased psychological resilience, subjective well-being, positive emotions, and decreased mental disorders. However, some of these changes were not statistically significant. Moreover, this study did not find evidence linking engagement to effectiveness; thus, this connection warrants further investigation.


INTRODUCTION
Leading a happy life is the ultimate goal for many people.Happier people have higher incomes, more stable marriages, and better health [1].In addition, maintaining a positive mental state enables one to build resilience against mood disorders [2,3].Studies have also linked mental health to physical health [4], and positive psychological constructs, including positive affect [5], well-being [6], optimism [7], and vitality [8], have been associated with superior cardiovascular outcomes [9,10] and better immune functioning [11,12].Additionally, happy people have a lower risk of physical illnesses or death [11,[13][14][15][16].However, ensuring happiness has become even more challenging in the post-COVID-19 era [17,18].The pandemic has threatened not only our lives but also our mental health.The worries of being infected and the inconvenience caused by endless precautions have made it challenging to feel relaxed and happy [19].
Over the past two decades, positive psychology researchers have proposed various methods to help people become happier and find meaning in their lives.Studies have found empirical evidence to support the use of simple cognitive and behavioral exercises, such as the Three Good Things (TGT), Best Possible Self (BPS), and gratitude exercises, to increase well-being and reduce negative symptoms [20][21][22][23].TGT encourages participants to recall and describe three good things that they experience in a day.Researchers have demonstrated that this method significantly increases happiness and alleviates depressive symptoms [23,24].BPS is an exercise that requires participants to set goals and plans by imagining a better version of themselves.In a four-week experiment, researchers showed that BPS significantly promoted positive emotions [1].The gratitude exercise encourages participants to choose someone who has had a positive influence on them to express their gratitude.Researchers have found that gratitude interventions also improve happiness [25].
Although their effectiveness has been well-established, delivering these positive psychological interventions (PPIs or positive psychological exercises) is challenging.As the target audience is large and widespread, positive psychology researchers require tools that can offer these exercises anywhere and at any time in the users' daily lives at a low cost [26].To resolve these difficulties, researchers have focused on information technology [27,28].Over the last few decades, information technology (such as text messaging, smartphone apps, online interventions, virtual reality, augmented reality, and serious games) has been successfully utilized to deliver PPIs [29][30][31].Among these, chatbots have recently garnered significant attention.
A chatbot is a human-computer interaction system that simulates human or human conversations to interact with users [32][33][34][35].Advances in artificial intelligence and big data technology have made it possible to develop chatbots that can assist psychotherapy remotely and automatically [36].Chatbots can communicate with users in natural language and adaptively provide real-time feedback when guiding them through PPIs [37,38].
Although studies have shown that chatbot-based PPIs are effective and useful self-help tools that can increase the well-being of users, the mechanism behind these results remains unclear.Researchers have argued that the personalization and adaptiveness of chatbots and the interaction between users and chatbots are two important features that make PPIs effective and useful [37,[39][40][41][42][43][44][45].However, the existing literature contains little evidence to support the importance of these features [46][47][48].
Therefore, the purpose of this study was to address this gap in the literature using randomized control tests.This study aimed to verify the importance of adaptiveness and user-chatbot interaction by dividing users into groups representing different PPI selection strategies and user interaction levels.In addition to measuring the changes in well-being and mental disorder symptoms, we also used the usage time and length of text input to verify whether chatbots can improve users' engagement.

METHODS 2.1 Participants and process
A total of 207 participants were recruited online in China, and 154 participants completed the experiment (68.2% female), aged 18-55 years.The study protocol was approved by the ethics board of the Department of Psychology at Tsinghua University, and informed consent was obtained from all the participants.Each participant received 25 Renminbi (approximately 3.7 USD) as compensation.After receiving responses to the questionnaires, we investigated the importance of adaptability and personalization by dividing the participants into three groups [Figure 1(1-a)].In the robot recommendation group, the chatbot recommended the most appropriate PPI based on information learned from interacting with the participant.Autonomy group participants were allowed to choose the PPIs by themselves as a sense of autonomy can be vital for reducing depressive symptoms [49].Finally, in the control group, PPIs were randomly assigned to participants.
We then validated the importance of the interaction between chatbots and users by dividing participants from each of the three groups into two groups as a 3x2 experiment design [Figure 1(1-b)].
In the robot interaction group, we used a chatbot to guide participants through the PPIs, while participants in the control group received only a paragraph of instructions similar to traditional online PPIs [23].All participants were asked to interact with the chatbot between 6 pm and 3 am for six consecutive days, following Auyeung and colleagues' study [49], and complete a ten-minute positive psychological exercise six times.At the end of the sixth day, we asked the participants to complete the same questionnaires again to assess any changes in their mental state.

Chatbot design
Our team developed a chatbot, Blinkinbot, based on the Baidu intelligent dialogue platform (UNIT) [50].Application Programming Interface and graphic user interface provided by UNIT were used to develop the chatbot by drawing task flows and providing training samples.After each interaction with a user, UNIT determined the most appropriate route at each node using natural language processing technology.
We designed a two-stage flow to guide the participants during the PPIs [see Figure 1 (2)].During the first stage, Blinkinbot selected a PPI based on the different experimental settings.For example, if the participant was in the robot recommendation group [Figure 1 (3)], Blinkinbot attempted to find the most appropriate PPI for them.After greeting the participant, Blinkinbot tried to identify their emotional state (positive, negative, or neutral) based on their responses and provided the most appropriate PPI accordingly.For example, if a participant was in a positive emotional state, based on the mood-dependent memory theories (which suggest that people would retrieve positive memories easily) [87], the Blinkinbot asked the participant to undertake the TGT exercise.In contrast, if a participant was in a neutral or negative emotional state, both BPS and the gratitude exercise could help them focus on the positive aspects of life and feel better.In this study, Blinkinbot assigned BPS to the participants if it determined that they were in a negative emotional state, and the gratitude exercise when participants were in a neutral emotional state.For participants in the autonomy group, Blinkinbot provided a complete introduction to the three exercises and then asked them to choose the exercise for the day themselves.If a participant was assigned to the control group, Blinkinbot asked them to participate in one of the three exercises randomly.
In the second stage, participants were randomly assigned to two groups: the robot interaction or no-interaction groups [Figure 1 (2-d)].For the robot interaction group, the Blinkinbot guided participants step-by-step through the PPI [Figure 1 (4)].We also used the self-exposure strategy, which can increase interaction between chatbots and users [76].For example, in the BPS exercise, the chatbot initiated interaction by saying, "Lately, I have been wondering if I can be upgraded to a better version, I might be able to help more people." [Figure 1 (4-f)] After receiving participants' responses, the chatbot asked them to imagine a better version of themselves.During the interaction, if the chatbot determins that the participant is confused, it provided them with some examples for reference [Figure 1 (4-g)].Before concluding the exercise, Blinkinbot asked the participants to share their feelings and provide feedback for the chatbot and experiment design [Figure 1 (4-h)].By contrast, if a participant was in the control group, the chatbot provided

Measures
2.3.1 Questionnaires.The participants completed the Patient Health Questionnaire-9 [51], which is a self-report questionnaire used to assess the frequency and severity of depressive symptoms in the past two weeks.The Generalized Anxiety Disorder Scale-7 [52] was used to evaluate the frequency and severity of anxious thoughts and behaviors in the past two weeks.We also used the third edition of the University of California at Los Angeles Loneliness Scale [53] to measure loneliness caused by participants' desire for social interaction versus their perceived level of social support.The Satisfaction with Life Scale was used to measure the overall cognitive judgment of the participants regarding life satisfaction [54].The Positive and Negative Affect Schedule [55] was used to measure the participants' positive and negative emotions.Half of the scale items represent positive emotions (i.e., interested, excited, determined), while the other half represent negative emotions (i.e., hostile, scared, ashamed).The positive and negative scores were summed separately.The Subjective Vitality Scale was used to measure the participants' perception of their own vitality, which is also an indicator of individual well-being [56,57].The Connor-Davidson resilience scale was used to assess the participants' psychological resilience, which is the ability to thrive in adversity [58].Last, the system usability scale [59], a set of 10 self-report indicators for quantifying user experience, was used to measure the overall perceived usability of the system after completing a series of task scenarios.

Automated Text Analysis.
To further understand the effect of positive psychological practice, we analyzed the text generated by the participants during their interactions with the chatbot using TextMind software, which is the Chinese version of the Linguistic Inquiry and Word Count program [60].During the analysis, the software performed automatic word segmentation on the text and then compared it with the built-in dictionary one word at a time to calculate the percentage of the occurrence of each category in the total number of words [61].Among the 102 features generated by the TextMind software, we focused on the analysis of pronouns, prepositions (e.g., to, with, above), and positive emotional words that were often used as references to happiness in previous studies [62].

Data description
We used the interquartile range method with a multiplier equal to 1.5 to remove outliers.Cronbach's alpha was calculated for all questionnaires, with the lowest being 0.821 and the highest being 0.931, indicating that the questionnaires we used had sufficient reliability.Generally, PPIs can produce positive help for people.In our experiment, after six days of practice, the participants' psychological resilience (t=2.74),happiness (t=1.58), and positive emotions (t=1.32)improved significantly.The scores for depression (t=-1.33),anxiety (t=-1.72),loneliness (t=-2.41),and negative emotions (t=-1.97)also decreased significantly (see Table 1).1For interaction mode, Independent samples t-test was conducted to compare differences of interaction group and no-interaction group.For recommendation mode, one-way ANOVA was applied to compared variances of robot-recommendation, random-recommendation(i.e.control) and autonomy, 2Interaction mode analysis was in the condition of "random recommendation", 3Δ indicates changes before and after the exercises.

PPI selection: adaptiveness
In the robot recommendation group, the scores for psychological resilience, well-being, and positive emotions showed an increase, and the scores for depression, anxiety, loneliness, and negative emotions decreased more than they did in the other two groups, indicating that the robot recommendation group had the best intervention effect.However, only the differences among subjective vitality and negative emotions were statistically significant.The daily practice duration decreased and the daily text length increased in robot recommendation and autonomy groups, but no significant differences were found between the two groups.In the text analysis, the use of pronouns, and positive and negative emotional words in the robot recommendation group did not change, whereas it changed in the control and autonomy groups significantly.

Interactions
To discuss how interactions influence PPIs' effect, we focused on the robot recommendation groups [G1 and G2 in Figure 1(1-b)].
Overall, the robot interaction group (G1) made the PPIs significantly effective, whereas participants benefited less from the PPIs in the control group.The chatbot-guided PPIs successfully increased participants' subjective well-being and psychological resilience while decreasing negative affect, depression syndromes, anxiety, and loneliness (p<0.05).However, the psychological measures for the control group (G2) did not change significantly after the six-day session, indicating that it is not enough only to assign PPIs by the chatbot; the combination of PPI recommendation and chatbot interaction makes the PPIs more effective.Although the effectiveness of the robot interaction group was significant but not for the control group, the differences were not significant.The level of subjective well-being was significantly higher in the robot interaction group [ (50) = 2.75,  = .008< .01,ℎ ′ s  = .78].The scores for depression, anxiety, loneliness, and negative emotions in the robot interaction group were also lower than they were in the control group, but the difference was not statistically significant.There was no significant difference between the two groups regarding psychological resilience, positive emotions, and subjective vitality.
The daily practice duration decreased, and the daily text length increased in both groups, but there was no significant difference.There was no difference between the two groups in the text analysis as well.

DISCUSSION 4.1 Principle findings
The study results confirmed that adaptability and interaction are essential factors for the effectiveness of chatbot-based PPIs.Our data showed that participants in the chatbot-based PPI groups (both the interaction and recommendation groups) had a higher level of happiness (subjective well-being and positive emotion) and a lower level of mental problems (depression, anxiety, loneliness, and negative affect) in both experiments.Researchers have found that chatbot-based PPI, cognitive therapy, and mindfulness can effectively increase well-being [63] and reduce anxiety [64], stress [65], and depression symptoms [66][67][68][69][70].Our data echoed the results of previous studies and provided empirical evidence that chatbots can be used to boost happiness and reduce mental disorders.However, this study did not find evidence linking engagement to effectiveness; thus, this connection warrants further investigation.

Limitations
Although most findings were as expected, some were not statistically significant.Moreover, we expected the link engagement and words used in the text to determine the effectiveness of PPIs.However, our data did not support these relationships.Here, we discuss possible explanations and propose directions for future studies.
4.2.1 Chatbot Design.First, dialogue and interaction design are key in determining a chatbot's effectiveness and usability [70,71].
The diversity and personalization of the dialogue between the chatbot and user is essential for improving positive emotions [72].In hindsight, we found that our chatbot consisted of only a few possible routes for interaction; consequently, participants may have felt bored after using the chatbot for a few days.Future studies should add more routes for interaction or allow users to use emoticons to make the interaction more interesting [73].Another important consideration is the type of questions asked by the chatbot.While it is harder to design a chatbot that can handle open questions, such questions can encourage users to spend more time responding to questions [67] and increase their engagement levels.Future studies should consider user preference and provide more personalized questions for better effectiveness.Third, the ability to provide appropriate answers is important in chatbots.Users often become frustrated when they learn that a chatbot can only use limited vocabulary and provide limited possible answers [66].In our experiment, we also found that that our chatbot could only provide limited responses.As enhancing these capabilities can improve user engagement [74], we encourage future studies to invest more time and effort in improving dialogue capability based on deep learning and natural language processing technology [73].
Fourth, in the robot recommendation group, our rule-based recommendation tended to introduce participants to take TGT (78%) as the exercise for the day.Although we have proven that this recommendation was effective and there is no reason to limit the use of TGT if it is the most appropriate choice, the biased recommendation results are still a potential threat to the validity of our results.
Finally, compared with traditional psychological interventions, chatbot-based PPIs consist of more steps.Additional steps may introduce environmental noise to the experiment.Some studies have found that chatbot self-disclosure can trigger disclosure reciprocity and encourage users to engage more in the process [75,76].While a certain level of self-disclosure is inevitable during the interaction between the chatbot and users, the impacts decrease over time if the chatbot interaction process remains unchanged [77,78].Accurately identifying the true cause of the effectiveness of chatbot-based psychological interventions has become a challenge that warrants further analysis.

Experiment
Design.First, we did not carefully consider the differences between PPIs in our chatbot design and used them interchangeably during our experiment.However, PPIs are not all same, and different PPIs may produce different results.A cross-cultural study in South Korea and the United States showed that gratitude intervention triggered regret mentality [79], i.e., gratitude exercise can sometimes trigger feelings of guilt [80][81][82][83].BPS promotes the ideal state of an individual but has little impact when the individual is in an undesirable situation [84].The effect of TGT seems to last for a longer period of time [23], BPS has a positive effect only for a short time, and gratitude exercise has the best positive effect within one month.Sheldon & Lyubomirsky [85] showed that the positive effects of BPS on the participants' mood were better than that of the gratitude exercise, and the effect of the gratitude exercise did not last long.In summary, all of these differences might have influenced our results, and future studies should be more careful regarding these differences.
Second, preparing the participants well before starting the experiment is essential.We hypothesized that chatbot-based PPIs could effectively improve user engagement and reduce usage time attrition during practice.However, the data did not support this assumption.During the experiment, we found that participants spent much time getting familiar with the chatbot in the beginning, and the reduction of usage time in the experiment was, therefore, caused not only by attrition but also by the participants' lack of familiarity.Another motivation for preparing the participants is effectiveness.Studies have shown that experiences with a chatbot can increase users' acceptance of the chatbot and make the psychological intervention more efficacious [86].
Finally, we did not consider incentive schemes for engagement.Studies have shown that engagement or attractiveness may disappear or significantly decrease after two weeks of interaction [69].Therefore, we suggest that future studies include incentives for users who finish the experiment.

CONCLUSION
This study contributes to the literature by demonstrating that both chatbot adaptiveness and interaction can increase the effectiveness of chatbot-based PPIs.Our results also support the use of chatbots in psychological intervention.However, we did not find evidence linking engagement to effectiveness; this connection warrants further investigation.

Figure 1 :
Figure 1: Grouping and exercise recommendation process