Caregiver's Evaluation of LLM-Generated Treatment Goals for Patients with Severe Mental Illnesses

The potential of LLMs to generate context-specific content for psychiatric patients could possibly be used to support treatment. Patients diagnosed with Severe Mental Illnesses face a significant challenge in the realm of goal setting. Caregivers work closely with patients with SMI to establish treatment goals. However, most goals are often immeasurable, hindering their integration into mHealth. This is a missed opportunity since mHealth has the potential to aid healthcare professionals in tracking a patient’s progress and help motivate patients to work on their goals using behavior change concepts like personalization and gamification. Recognizing the time-consuming nature of creating measurable goals. This study validates an LLM-powered goal system aiming to provide a time-efficient way of goal creation for patients with SMI, on the quality of the goals it generates in collaboration with caregivers.


INTRODUCTION
Large Language Models (LLMs) have advanced substantially in recent years, recent breakthroughs of LLMs such as BERT and GPT-4 have been disruptive.Many industries are currently preparing for the impact LLMs will have on their workforce and services [6].
Mental healthcare, in contrast to non-psychiatric healthcare, has been slow in the adoption of AI technologies due to mental health practitioners' emphasis on soft skills and establishing strong relationships with psychiatric patients [6] [5].However, modern LLMs have the capabilities to generate relevant content when given a context, it may therefore be possible for LLMs to generate relevant content for psychiatric patients [1].
In several countries, psychiatric patients diagnosed with Severe Mental Illnesses (SMI) such as schizophrenia, PTSD, severe depression, and bipolar disorder, are prescribed Flexible Assertive Community Treatment (FACT) [16].During FACT treatment, patients are assigned a healthcare professional who is the direct contact of the patient and their FACT team, this professional is called a case manager.Case managers continuously evaluate the patients' functioning and risk of relapse.Together with their case manager, patients create a treatment plan that documents the treatment goals they will work on over a period of a year.
Currently, to evaluate patient progression in their treatment, case managers solely have the option to directly contact their patient or the patient's direct environment (i.e., family members, GP) to assess the current mental state and progression of the patient.This possibly results in case managers being unable to assess when to scale patient care efficiently [8].Research shows that mHealth applications can potentially be used to assist mental healthcare professionals in assessing their patients' functioning and monitoring the progress that a patient is making on their treatment goals [7].In previous research, we evaluated that patients with SMI preferred goals in mHealth applications to be personalized based on their treatment plan [9].However, the majority of treatment plan goals are unsuitable for direct use in mHealth applications, due to 75% of the treatment plan goals not being measurable [9].Only measurable goals can be tracked and completed in mHealth applications [2].
Registering and creating measurable goals can be a time-consuming process, which case managers avoid especially considering the high workload of case managers.To address this challenge, we collaborated with case managers and students to create an LLM-powered goal augmentation system, for a streamlined and scalable creation of structured and measurable treatment plan goals for patients with SMI [8], based on a protocol we created with case managers [8].
In this paper, we introduce the GOALS system, an LLM-powered goal augmentation system that generates measurable goals.We also introduce the GOALS rubric, a rubric for measuring the quality of a goal, based on behavior change elements and the protocol created in collaboration with case managers.In this study, we aim to evaluate the quality of the goals generated by our GOALS system with case managers of patients with SMI using the GOALS rubric.

THEORETICAL BACKGROUND
To achieve a goal an individual needs to change their behavior to the targeted behavior needed to achieve the goal.According to the COM-B behavioral model, behavior is a product of three fundamental factors: 1. capability, 2. opportunity, and 3. motivation [11], and similarly the three fundamental factors according to the Fogg behavioral model are: 1. motivation, 2. ability, and 3. triggers [4].Motivation is an especially crucial factor in behavior change for individuals with SMI [3].Motivation can be categorized into intrinsic and extrinsic motivation.Extrinsic motivation stems from external factors, whereas intrinsic motivation arises from a personal interest [13].Intrinsic motivation leads to enhanced performance [13].
It is complex to define what a well-defined goal is, however, across multiple theories, it is generally believed that a good goal is a measurable goal [12] [2].A well-defined measurable goal is a goal that can be tracked to monitor progress [2].One way of defining measurable goals is by making the goal: Specific, Measurable, Achievable, Realistic, and Time-specific (SMART) [14].Another way of creating well-defined measurable goals used in mental healthcare for rehabilitative treatments such as FACT is through the use of the Goal Attainment Scaling technique [2].However, both SMART and GAS are based on the Goals Setting Theory (GST).The GST states that specific and challenging goals combined with appropriate feedback contribute to better results.According to GST, goals should follow the following principles: clarity, challenge, commitment, feedback, and task complexity [10].Goal setting is an effective way of achieving behavioral change in individuals [10].

METHOD 3.1 Recruitment
Case managers were recruited through the use of convenience sampling.The researcher conducted visits to two FACT teams, one area team (i.e., a team that also treats patients with mental health illnesses, who are not diagnosed with SMI), and one early intervention team (i.e., a team that treats mental health patients who are at risk of becoming patients with SMI), encompassing a diverse representation of mental health care settings in the Netherlands.

GOALS Rubric.
To evaluate the quality of treatment goals, we created the GOALS rubric which assesses how intrinsically motivating a goal is using the 3 elements of the SDT (i,e., autonomy, competence, and relatedness), and how measurable a goal is using the 5 elements of the GST (i,e., clarity, challenge, commitment, feedback, and task complexity).Each element is evaluated using a five-point Likert scale which consists of 1 = strongly disagree; 2 = disagree; 3 = neutral; 4 = agree; and 5 = strongly agree.

GOALS system.
To generate measurable goals for treatment plans using LLMs, we created the GOALS system, a lightweight web application allowing users to submit their own goals in a plain text field.The system does not receive the treatment plan of patients, but solely the goals users want to work on, and adds the goals to an additional prompt sent to the GPT-3.5 API.The prompt hidden from the user, adds additional instructions to add healthy lifestyle choices to the goals.Once the GOALS system receives a response from GPT-3.5, it parses and maps the response into measurable goals, and displays the measurable goals to the user.Users can further modify the measurable goals, or regenerate goals through the website.Users of the GOALS system also have the choice to submit their generated goals to an mHealth application as personalized goals.The mHealth application supported by the GOALS system is the GameBus gamification platform that promotes healthy activities through fun challenges and competitions [15].To view example prompts see: GOALS system GitHub page.The data management guidelines and the study was approved by the Technical University of Eindhoven.

Study Design
The study uses a within-subject experimental design.Case managers are exposed to two conditions throughout the study.In the first condition, case managers rate a treatment goal using the GOALS rubric.In the second condition, case managers will submit the same goal used in the first condition through the GOALS system, and rate the measurable goal generated by the GOALS system using the GOALS rubric.
Before the intervention, the case manager is asked to provide a treatment plan goal for one of their patients diagnosed with SMI that the case manager has collaboratively created with the patient.The case manager is then asked to evaluate the goal using the GOALS rubric.Once the goal has been evaluated the intervention phase of the study begins.
During the intervention, the case manager is asked to use the goal chosen in the previous phase, as input for the GOALS system.Once the measurable goal has been generated, the case manager can modify or regenerate the generated goal.When the case manager is satisfied with the goals, they are asked to rate the generated goal using the GOALS rubric.Once the measurable goal has been evaluated the post-intervention phase begins.
After the intervention, The case manager is asked to complete a survey regarding their opinions on the new goal-setting workflow and overall system improvements.During the intervention and postintervention, the researcher takes notes of any extra comments and suggestions from the case managers.

Measurements
The mean scores of the elements of both groups were compared to determine the difference in quality.First, a test of normality was performed on the data to determine if the data was normally distributed.Second, a two-tailed statistical test was conducted on the data, a p-value of 0.05 was considered statistically significant.
Lastly, the effect size of each element was calculated, a p-value greater than 0.08 was considered large.
The survey result data and researcher notes were used as input for the thematic analysis.The researchers first familiarized themselves with the data by coding similar answers and comments from the case managers.Second, codes were reviewed and revised.Third, codes were grouped and coded into themes.Fourth, codes were reviewed and revised.Finally, the results of the thematic analysis were written.

Intervention results
In total, 13 participants contributed to the study of which 9 were case managers.The additional 4 health professionals contributed to the study by answering the survey questions.Only case managers were asked to take part in the assessment of goals.

Overall goals.
To compare the means of both evaluation groups, we calculated if the results of each group followed a normal distribution using a Shapiro-Wilk test.The results indicated that both groups follow a normal distribution.Additionally, an Anderson-Darling test confirmed the successful fit of both groups to their respective distributions.On visual inspection, when comparing the means displayed in the columns of Table 1, the goals generated by the GOALS system are evaluated higher than the goals created by the case managers.A paired sample t-test reveals that there is statistical significance, indicating that the observed differences in ratings between the two groups are not random.

Per element results.
When comparing the SDT elements, during the initial rating of the goals, case managers rated all the elements of the SDT higher than the elements of the GST, except the elements of effort and challenge.When comparing the case manager evaluation of the SDT elements with the goals that the GOALS system generated, the evaluation is the same for the elements of competence and relatedness, however, the element autonomy is evaluated higher.A paired sampled t-test reveals that there is no significant difference between the elements of competence and relatedness.However, not only is there a statistical significance between the elements of Autonomy, but Cohen's D reveals that the effect size is also large.
When comparing the elements of GST, during the initial rating the case managers rated the elements of challenge and effort higher than the other elements of GST.When comparing the case manager evaluation of the SDT elements with the goals that the GOALS system generated, the elements of challenge and effort are still the highest-rated elements.A paired sampled t-test reveals that every element of the GST is statistically significant, except for the elements of challenge and effort.

Thematic Analysis
The thematic analysis identified two main themes: 1) Literacy.Although participants unanimously agreed the new goal-setting approach to be beneficial, participants emphasized the critical role of language.The language used in the goals generated by the GOALS system was often considered overly complex for patients diagnosed with SMI.Participant 1 highlighted, "The language used in the generated goals is pretty difficult," and Participant 2 pointed out, "The clarity can be more difficult if the language is too complex." Participant 1 recommended making the language concise and suitable for a 12-year-old to understand.
2) Working process.Participants unanimously agreed that the new goal-setting workflow was easy to understand and use.participant 3 stated, "It doesn't look too complex.With a bit of practice, I think it should be possible to get the hang of it."participant 1 further endorsed the workflow as "A lot more professional than how it currently goes."Most case managers emphasized that the tool would save time.Participant 4 observed, "It takes quite a bit of administrative time to write goals in detail, as the 'machine' does it," while Participant 8 stated, "It's fast and much more specific than when I do it myself." For successful implementation the importance of client opinions was stressed, participant 6 expressed that if "clients also find it a pleasant way [of working], then I am certainly willing to do this.".Despite overall enthusiasm, practical barriers surfaced, participant 7 acknowledged "some resistance to using AI in the execution of my work.",participant 3 noted a possible "reluctance from patients regarding privacy-sensitive information, ".Participant 2 also identified a potential administrative burden stating " [Generated] goals ask too much from the patient or caregiver in terms of searching for or printing additional information".

Findings
Case managers evaluated the goals generated by the GOALS system, significantly higher than the goals they created themselves.Interestingly, case managers rated the elements of intrinsic motivation the same (i.e., competence, relatedness), except for the element of autonomy.This could be due to the goals being created in collaboration with the patients.Patients may have an equal amount of intrinsic motivation toward their treatment goal whether they are made with or without the use of the GOALS system.The significant difference in autonomy between the GOALS system goals and the case manager-created goals could potentially be because the goals are more specific and as a result, the user of the GOALS system has the autonomy to change more specific elements of the goals.During the evaluation of the generated goals, some case managers did claim that the goals generated by the system will require patients to put more effort in, because the clearer and measurable the goal is, the more commitment a patient needs to make.This is also supported by the fact that Challenge and effort, despite having no significant difference between the groups, are always the highest evaluated GST elements.The effect size of the statistical significance between goals is large further highlighting that the goals generated by the GOALS system are more measurable.

Limitations
This study was subjected to several limitations.Although the results of the goal evaluation have statistical significance, the evaluation has low power due to the small sample size of the case managers.Recruiting case managers for scientific research poses challenges due to their high workload and limited availability for dedicated research contributions.Additionally, the perspectives of the patients diagnosed with SMI on the GOALS system and the evaluation of the generated goals were not included in the study.

Future work
In future research, patients diagnosed with SMI should be included in the evaluation of the goals generated by the GOALS system.Currently, conclusions can only be made on the opinions of case managers, and not of the patients.Patients may rate the goals generated by the GOALS system differently.Case managers have also highlighted the importance of the opinions of the patients.
Case managers are more likely to use the web application if the patients want to.Although prompt engineering techniques were implemented in the GOALS system, no in-depth experimentation has been done with prompt engineering.The scope of this study was limited to creating quality measurable goals for patients.Prompt engineering should be expanded to possibly create more personalized goals for patients diagnosed with SMI.Furthermore, experimenting with personalized goal generation, utilizing the GOALS system in combination with prompt engineering, may aid in addressing diverse health issues in various populations.

CONCLUSION
In this study, we aimed to evaluate the quality of treatment goals generated by LLMs for use in the treatment of patients with SMI, using the GOALS system.The GOALS system is an LLM-powered goal augmentation web application that can be used to augment any type of goal into measurable goals.It was found that case managers found the goals generated by the GOALS system significantly better than the goals that they created without the GOALS system.Additionally, case managers found that using the GOALS system and its new goal-setting workflow could potentially be an improvement over the current goal-setting workflow because, the new workflow could be time-saving, generate more specific goals, and can lead to more SMART goals being registered for patients with SMI.However, patients may resist working with AI, and LLM models may generate contradictory goals.

Table 1 :
Comparison of CMs' (i,e., case managers) Evaluation and Statistical Analysis Results