ChatGPT in Healthcare: Exploring AI Chatbot for Spontaneous Word Retrieval in Aphasia

Having a word on the tip of one’s tongue can be frustrating. Individuals with a language disorder like aphasia, however, face this experience regularly, making it both stressful and debilitating. Large language models, such as ChatGPT, have been gaining traction in healthcare recently. They could enable digital voice assistants to help people find what they want to say during a conversation. However, research on the topic is still not mature. Our study aims at providing a first exploration of the potential use of LLMs for aphasia. Specifically, we aim to examine whether ChatGPT can aid in word retrieval in aphasia. In our study, ChatGPT is tested on real-life speech samples of people with aphasia using the AphasiaBank corpus. Additionally, we investigate whether ChatGPT can utilize politeness strategies. We found ChatGPT to be accurate in identifying the intended word 91.67% of the time as well as successful in incorporating politeness strategies.


INTRODUCTION AND RELATED WORK
One of the most common complaints associated with aphasia is difficulty finding the right words [8].As Davis puts it: "When unable to find a word, he or she might talk around it saying, "I wear it right here, and I tell time with it.Mine goes tick, tick."After I shower, I dry myself with it.This symptom of commission is called circumlocution" [8].People with aphasia (PWA) commonly exhibit circumlocution, which implies that they know exactly what they wish to convey but are unable to find the proper words to say it. 1 This can exacerbate their feelings of social isolation and hasten the progression of degenerative conditions. 2 Furthermore, PWA are at a higher risk of developing depression [23].Speech and language therapy is significantly beneficial for an individual with aphasia as it can help to improve the individual's ability to communicate, by helping them to develop new strategies for understanding and expressing language including word retrieval.However, it is reported that therapies at a high intensity, high dose, or long duration improves functional communication in PWA compared to those at a lower intensity, dose, or duration [3] and "even well-studied impairmentbased approaches show limited effects on language recovery" [30].Speech and language therapy is, moreover, a resource-intensive procedure and is difficult to obtain [24].
In situations where a speech and language therapist may not be available on request or when the people wishing to help struggle to understand, there is a question about who should assist PWA.Furthermore, what are the best options for PWA who are several years post stroke and have been discharged from speech and language therapy but are still experiencing circumlocution problems while communicating with family members, support staff or participating in social networking or online forums?In response to the limitations of clinical practice, a substantial body of experimental research has underscored the benefits of using computer-assisted therapies [1,5,28] to aid individuals with aphasia.
Studies have shown that computer-assisted therapies may be helpful in the improvement of word retrieval skills [24,27] and that they can be used without the direct involvement of a speech and language therapist [18].Computer software such as StepByStep offers individuals with daily word-finding exercises based on their needs and chosen words and therapy software such as Linguagraphica provides a wide range of exercises that may be beneficial for the improvement of trained words [24].
While computer-assisted therapies may include exercises aimed at aiding in word retrieval for PWA, it is worth noting that no computer program has yet been specifically designed to assist these individuals in instances of circumlocution, where the patient struggles to recall a specific untrained word on the spot.
Conversational agent interfaces, where a standard web interface is replaced by a conversation between a human and a chatbot has been presented in the news as the next revolution. 3Such interactions have been envisioned through textual interaction or through voice interaction with the rise of connected chatbots such as Amazon's Alexa or Apple's Siri [13].These voice assistants have been found to offer promising opportunities for supporting speech and language impairment [19], but there is a lack of professional studies exploring their professional adoption [15].Previous research has indicated that the "AI system for standard speech recognition (Alexa, Siri, and thousands of voice-activated apps) do not have non-standard speech recognition capabilities and, for this reason, they are not able to 'understand' speech-impaired people" [20].
Large language models (LLMs) like GPT-3, BERT, ChatGPT are artificial intelligence conversational agents that have drawn significant attention across the globe due to their remarkable capability to hold human-like conversations, comprehend professional language, and generate specialized text, to name a few [26].
The very recent LLM, ChatGPT created by OpenAI, holds promise in the field of public health due to its capacity to produce text that closely resembles human language [2].The envisioned case study for the use of an LLM for aphasia in this paper is as follows.A system based on an LLM-enabled conversational agent could be capable to support live conversations between two or more people with aphasia.The system would recognize cases of circumlocution and intervene during the discussion to suggest a precise word or concept that was sought.As such the system should be able to (1) understand from a conversation that circumlocution is occurring (2) find the intended word from an identified circumlocution occurrence and (3) deliver the intended word adequately.The aim of this paper is to get first insights into features (2) and (3), i.e., intended word identification and intended word delivery.With respect to concept identification from circumlocution, the study aims to investigate the following research question: RQ1: Can ChatGPT accurately retrieve the intended word an individual with aphasia is attempting to express through circumlocution?
In terms of intended word delivery, one important aspect is politeness.Brown and Levinson's theory of politeness states that human beings as members of society have a negative and positive "face".Negative face represents their freedom from imposition and freedom of action while positive face refers to their desire to be liked, respected, and appreciated by those involved in communication [4].Sometimes verbal or nonverbal communicative acts can be perceived as a threat to a person's face or self-image, also known as face-threatening acts (FTAs).FTAs can include actions such as criticizing, correcting, or contradicting someone, making requests that infringe on their autonomy, challenging their beliefs or values, or expressing disagreement or disapproval.Through interest and agreement seeking, hedges, indirect statements, and other politeness strategies, interlocutors lessen positive and negative face threats in a communication [4].It is particularly important to minimize the risk of FTAs when communicating with PWA as they are at a greater risk of experiencing FTAs due to their communication difficulty, and the resulting frustration can lead to strained relationships. 4 Studies have suggested that PWA, just like healthy individuals, are sensitive to politeness breaches as a result of their relatively intact pragmatic skills [12,29].
Brown and Levinson propose several politeness strategies to mitigate FTAs [4] to save face and show respect.Moreover, in several studies, it has been identified that politeness strategies are crucial 4 https://www.aphasia.org/aphasia-resources/communication-tips/to promoting positive interactions and outcomes for non-human agents such as robots, chatbots, and voice agents, particularly for individuals with mental health conditions and the elderly [11,21,22].Based on these observations, we aim to address the following research question related to intended word delivery: RQ2: What specific attributes of politeness can be identified within the responses generated by ChatGPT when communicating with PWA?

METHODOLOGY
This preliminary study employs American English speech samples of PWA who underwent the AphasiaBank protocol from the Aphasi-aBank corpus [17].The standard protocol follows the same format, which includes a clinical interview, the story narration task, the picture description task, and the procedural discourse task.People with different types of aphasia are recorded in audio and/or video format in the corpus.Every recording is transcribed using CLAN (Computerized Language ANalysis ) software.In addition, the software enables corpus analysis tasks such as keyword and regular expression searching, frequency counting, etc. [16].The corpus also includes the extent of severity of aphasia that is measured quantitatively by the Western Aphasia Battery-Revised (WAB-R) score [9].The WAB set a cut-off score of 93.8 on the Aphasia Quotient (AQ) out of 100, with higher scores indicating better language function [14,25].Figure 1 presents the demographic information of the individuals.One participant (Case 6) with the WAB-R score above 93.8 is also included in the study as such individuals still exhibit impairments in their ability to communicate that noticeably differ from non-aphasic/healthy controls and require special attention and continued treatment [9].Additionally, such individuals have been recorded and documented in the corpus under the aphasia group.
It is common for PWA to experience word truncation and distortion, known as paraphasia [6], which results in disturbed speech.Disfluencies such as revisions, repetitions, filled and unfilled pauses further impede their speech.The following example from a person with moderate conduction aphasia describing how to make a peanut butter and jelly sandwich serves a vivid illustration of the challenges encountered in communication: "you put a piece of bread down here and you put your your uh black your uh some um see the word I can't with trouble with that red thi that color brown that brown color thing.you put the brown thing there and on this one here you put on the red the orange the uh apple green" (AphasiaBank).As seen above, people with aphasia suffer from severe repetitive self-corrections resulting in deterioration of verbalization precipitously [8].
To evaluate the identification of the intended word, we used the original speech samples (fig 2) for ChatGPT evaluation.
It should be noted that we provided the sentences as is to the LLM, however, with prompt engineering beforehand.That is, we started the chat conversation by indicating to the LLM that it was supposed to be an aid for someone with aphasia who struggled with finding words.
Based on the protocol data, a total of 12 instances of circumlocution from 8 PWA were provided to ChatGPT (GPT-3.5 5 ).The instances were retrieved by executing the command freq +s"<+ cir>"  Besides, the accuracy of the word retrieved by ChatGPT was confirmed by analyzing the protocol data from the corpus, which showed that the individual intended to convey this specific word.We then tagged the output as correct if it contained the intended word and incorrect if it did not.
To evaluate the delivery of the intended word, we coded the outputs of the ChatGPT prompts based on the four politeness strategies of Brown and Levinson's politeness model [4]: • Bald on record without redressive action: it relies on directness and straightforwardness without any mitigation.ChatGPT would, for example, provide a straightforward answer without being more polite.• Positive politeness: the individual is made to feel appreciated and respected.For instance, ChatGPT can display heightened interest and ask clarifying questions when responding to circumlocution problems.• Negative politeness: the strategy involves using language that highlights the individual's negative face needs, such as autonomy or privacy.This can be achieved through hedging, indirectness, apologies.• Off record: includes indirect statements, understatement that infer the need for a certain action to be taken.

PRELIMINARY RESULTS
In terms of accuracy, as shown in figure 2, based on the evaluation, ChatGPT successfully retrieved the accurate word and information in eleven out of twelve instances (91.67%).During the testing process, ChatGPT retrieved the correct word on its first attempt in ten cases (90.9%), however in one instance, ChatGPT needed two attempts to retrieve the accurate word.
In terms of politeness, figure 2 shows that ChatGPT employed different face-saving strategies when dealing with circumlocutions.For instance, ChatGPT used different negative politeness markers to mitigate the imposition on PWA.These markers included hedging, such as using phrases like "it seems like" (1), "I think" (2) and "it sounds like" (3), indirect suggestions like "I would recommend seeking shelter" and indirect requests such as "Can you please clarify" (5).The phrase "if possible" (5), for instance, avoids imposing on autonomy by allowing the person to do what they can while still providing a recommendation to avoid getting wet.This strategy seems to align with a study that shows that elderly people find indirect suggestions and requests from robots highly polite [11].Similarly, positive politeness can be characterized by gratitude and deference as they incur a social cost which helps balance the burden on the recipient [7].In line with this, ChatGPT utilized positive politeness markers such as "Ah I see", "Is there anything else you would like to know" (1), "I am so sorry to hear that", "I'd be happy to help" (5), which shows empathy and concern for the patient's interest.The use of these markers establishes a friendly and approachable tone, which is important for establishing rapport with the addressee [4].Furthermore, it was observed that ChatGPT used a bald-on-record strategy when uncertain about the accuracy of the retrieved word in its second attempt (4).This involves stating the situation without mitigating face threat to the addressee.ChatGPT did not use politeness markers to soften its words, which can be useful in time-critical situations where clear and direct communication is necessary [4].The lack of off-record strategy is understandable in this context.Off-record communication implies that the addresser has multiple intentions [4] and as an AI language model, ChatGPT generated coherent and contextual responses ensuring transparent approach.

CONCLUSION AND FUTURE RESEARCH
In this paper, we performed a preliminary study that showed Chat-GPT can identify intended words in a circumlocution context by PWA, and that it incorporates politeness strategies, which are crucial to effective communication with PWA.As such this paper offers a first insight into the use of LLMs to give support for PWA in a potential live conversation setting.Nevertheless, these results need to be confirmed in a larger setting, refined and extended in future work.
One limitation of this preliminary study is the limited dataset on which the results were applied.In future work we plan to use a bigger portion of the AphasiaBank corpus.The results should be refined with additional prompt engineering interventions to improve the output in terms of politeness and also in terms of text length.As we examined the responses received from ChatGPT, we found that they were often lengthy.We believe that this could be

Word retrieval
I think you might be referring to a castle instead of the White House.In the story of Cinderella, the carriage is used to take her to the ball at a castle where she meets the prince.overwhelming for PWA as almost every type of aphasia results in some difficulty with oral reading [10].Therefore, a conversational agent should provide balanced, not excessively long or short responses as reading is slow and effortful for PWA.For instance, in the response to the circumlocution instance in 3, ChatGPT goes on to explain the meaning of parasailing which is not required as the patient understands the concept of parasailing but they are not able to recollect the word.
Another limitation of this preliminary study is the lack of end user feedback on the potential adoption of such a live dialogue support system.Future research should address this shortcoming and include PWA to evaluate not only the adequacy of responses, but also the interaction with such a third party in a face-to-face conversation.Finally, the results should be extended to include the first step of the envisioned system, i.e., the identification of the occurrence of a circumlocution that triggers the conversational agent to intervene.This aspect will also involve privacy concerns mainly because ChatGPT records and stores transcripts of conversations.Besides privacy concerns, interaction alternatives should be discussed with PWA.

Figure 1 :
Figure 1: Demographic data of the PWA in the study

Figure 2 :
Figure 2: ChatGPT's responses and politeness strategies to circumlocution.The words highlighted in bold indicate the correct words retrieved by ChatGPT.* indicates that ChatGPT provided additional information related to the retrieved word/topic