Talk2Care: An LLM-based Voice Assistant for Communication between Healthcare Providers and Older Adults

Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their powerful natural conversation and summarization capability. However, there is a limited understanding of LLMs' role during the communication. We first conducted two interview studies with both older adults (N=10) and healthcare providers (N=9) to understand their needs and opportunities for LLMs in patient-provider asynchronous communication. Based on the insights, we built an LLM-powered communication system, Talk2Care, and designed interactive components for both groups: (1) For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered conversational interface for effective information collection. (2) For health providers, we built an LLM-based dashboard to summarize and present important health information based on older adults' conversations with the VA. We further conducted two user studies with older adults and providers to evaluate the usability of the system. The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time. We envision our work as an initial exploration of LLMs' capability in the intersection of healthcare and interpersonal communication.


INTRODUCTION
The healthcare needs of older adults are uniquely critical, with a significant proportion living with chronic diseases that necessitate long-term monitoring and regular interaction with healthcare providers [5].According to the National Council on Aging (NCOA), nearly 95% of older adults in the US have at least one chronic health condition, and about 80% have multiple conditions [5], amounting to over 50 million seniors needing healthcare support.On the one hand, older people often experience a gradual decline in cognitive and physical abilities, which can make regular outpatient visits increasingly challenging.On the other hand, healthcare providers are already overloaded and have limited bandwidth [112,115].Asking them to visit patients' homes to check their health is not a scalable solution due to the significant workload [107].As such, accessible home-based healthcare, which bridges the gap between older adults and their healthcare providers, emerges as one of the promising solutions.It can offer a familiar and convenient environment to older adults while saving healthcare providers' efforts.
Research has focused on various home-based telehealth applications to support asynchronous communication between older adults and their healthcare providers, such as remote health monitoring [15,34,47,59,77], healthcare plan management [11], and patient-provider collaborative review [30,31,98].However, these platforms only support specific data formats and contexts that are tailored to some specific common diseases.When it comes to various daily health communication scenarios (e.g., unexpected health concerns), older adults often have to refer to traditional asynchronous communication methods such as a basic text or voice messaging system, as healthcare providers are often not available in real-time for a call (e.g., with a busy schedule or outside working hours).However, there are critical gaps in these methods.Firstly, getting feedback and progress via asynchronous messages is time-consuming, and digital messages could be lost due to providers' hasty schedules [69].Also, older adults may fail to identify crucial symptoms in self-report messages due to limited health literacy.Although questionnaires can be used for more informative communication, they are mostly fixed and lack adaptability for different individual conditions [65,89].Meanwhile, older adults generally have lower technology proficiency compared to younger generations, which is another common obstacle to accessing and using some of these tools [40,45,65,104].For healthcare providers with certain patient follow-up protocols (e.g., post-surgery follow-up), making manual phone calls is one of the most common practices, yet it costs a significant amount of workload, and sometimes older adults may not be reachable.The issues in efficiency, adaptability, and accessibility within current communication methods exert communication challenges on older adults and their healthcare providers.
A novel conversational interface between older adults at home and healthcare providers may fill in these gaps in asynchronous communication by providing an accessible, interactive, and intelligent communication system.The recent technological boost of large language models (LLMs) brings us new opportunities to better achieve the goal.The HCI community has explored the application of LLMs for various healthcare practices, such as mental health support [66,68], health information seeking [81,117,128], and public health interventions [58], revealing LLMs' intelligence in processing health-related information.However, there is little previous work leveraging LLMs for patient-provider communication.In our case, we can leverage its outstanding power of natural language conversation generation and information summarization.For older adults, an LLM-based conversational agent (CA) may provide convenience and easy accessibility.Given older adults' limited health literacy and complicated health conditions, the LLM-powered CA may help older adults better convey their information asynchronously Fig. 1.Overview of Talk2Care System.The system consists of two modules.1) The patient module: An LLM-powered VA interface (in purple) that generates natural conversation with home-based older adults to collect health information and forward it to healthcare providers; 2) The provider module: A dashboard interface (in green) that summarizes the key information from the older patient conversation to assist providers who are responsible for communication (e.g., nurses and patient navigators).Note that Talk2Care does not provide specific healthcare advice.Our current implementation does not involve an actual electronic health record (EHR) system, which can be a promising future direction.
to providers when needed, such as in cases with unexpected health concerns.Meanwhile, providers can also use LLMs to automate the information-gathering and summarization process with high-quality data collected from older adults.
However, the role of LLMs in the asynchronous communication between older adults and healthcare providers is still unclear.To address this gap, we first conducted two semi-structured interviews to understand communication challenges and older adults' and providers' perceptions towards LLMs-based communication systems.Our studies involved both older adults with various health conditions (N=10), as well as health providers (N=9).Our results revealed a set of asynchronous communication challenges between older adults and healthcare providers.We found that it is hard for patients and providers to reach each other, and much information often requires follow-up, explanation, and analysis.These results illustrate the high potential of an LLM-based communication tool.
Leveraging the insights from our interview study, we designed and implemented a pilot LLM-based system, Talk2Care, to support diverse home-based healthcare communication scenarios.The system consists of two modules.(1) A patient-facing module: A conversational interface presented as a voice assistant (VA) that provides a natural conversation experience with older patients to collect health information.(2) A provider-facing module: A dashboard interface for health providers that summarizes and highlights key information from patient-VA conversations.It is worth noting that our system does not aim to provide any specific health advice due to the early-stage exploratory nature of this work and ethical concerns.As discussed later, the ethical concerns must eventually be addressed, but in this work, we explore the value of LLM-driven communication as a first step.We evaluated the pilot Talk2Care in two cases close to real-world scenarios, one usually initiated by older adults (unexpected health concerns) and one usually initiated by providers (post-surgery follow-up).We conducted two user studies with both sides to evaluate the usability of Talk2Care.Our results suggest that our LLM-powered communication tool can facilitate the asynchronous communication process, enrich the health information collected from older adults, and save providers' efforts and time considerably.support [103,105,106].Yet studies have also pointed out the limited use of these VAs by older adults due to usability barriers, cultural gaps, lack of explainability, and flexibility [18,44,49,108,131], and none of them have explored CAs as communication interfaces.This motivated us to explore novel CAs to fill in the gaps in older adults' home-based healthcare and communication, taking VA as an example.
Some recent studies have explored the combination of conversational user interfaces and more intelligent techniques.For example, Natural Language Processing (NLP) methods have been used to build CAs in travel apps and browsers to simplify user interaction [25,90], and LLMs have been used in CAs to interact with mobile UI and programming tasks [94,113].Recently, Zhang et al. proposed SpeechGPT, an LLM with intrinsic cross-modal conversational abilities [129].Rubenstein et al. presented AudioPaLM, a unified speech-text LLM [95].However, little is known about whether these could be applied to real-world contexts, especially for home-based older adults.Our work seeks to leverage this interaction paradigm to build an interactive and easy-to-use LLM-powered CAs that advances personalized information gathering and asynchronous communication with healthcare providers.

Large Language Models for Healthcare
Recently, the technology boost of Large Language Models (LLM), such as ChatGPT [87], has shown its great potential in engaging in, scaffolding, and processing natural conversations [54,72,102,113,120,121].The latest studies have discussed and explored its application for individual patients' in health information seeking [119,128], mental health support [66,68,124], personal health coaching and management [81,117] and health education [67,81].For healthcare professionals, researchers have also leveraged LLMs to support public health interventions [58], clinical pre-screening [48,115], risk prediction [16,42,63,86] and information processing [9,10,26,64,84,85].However, multiple studies have raised concerns about using LLMs for healthcare scenarios, due to their inconsistency, potential errors, and bias in clinical work [17,55,64].Due to these concerns, in our work, we focus on facilitating communication, but not on providing any specific healthcare advice.
Despite the great potential of LLM for promoting healthcare work and experiences, little work aims to address the challenges of patient-provider communication mentioned in Section 2.1, and little is know about how LLM might help patient-provider communication and benefit both stakeholders.Close to our research, Jo et al. conducted interviews with stakeholders in an LLM-powered system that supports public mental health interventions for people living alone [58].The study revealed that the system could reduce the workload for mental health practitioners while providing mental support for end users in large-scale healthcare services.However, the existing solution still failed to meet the personalized requirements of users at home because it followed a uniform setup("one-size-fits-all"), which lacked the necessary flexibility for customization [58].With our pilot system, we aim to facilitate patient-provider communication by providing older adults with more interactive, responsive, and adaptive experience, and providing healthcare providers with more efficient information analysis assistance.

NEED-FINDING STUDY
To better understand the communication gap between older adults and their healthcare providers and to identify user needs, we conducted two semi-structured interview studies, one with older adults (Section 3.1), and the other with healthcare providers (Section 3.2).Through our interview studies, we identify (1) communication challenges such as limited provider availability and disparities in literacy, (2) the opportunities for future LLMpowered systems to offload providers and provide mental support, as well as (3) reliability and ethical concerns.Understanding these challenges and user needs suggests the potential of LLM-powered systems to facilitate communication.Moreover, our results guide the design of our system in Section 4.

Interview Study 1: Older Adults
3.1.1Methods: Participants and Procedure.After obtaining the IRB approval, we recruited older adults (N=10) from a local university-maintained participant pool of older adults with snowball sampling.Participants lived in an area in the northeast part of the United States.All the older adults resided in single-family homes and were 65 years of age or older.Participants reported one or more chronic conditions and urgent care histories; they had rich experience communicating with their healthcare providers in recent years.Out of the ten older adult participants, two of them had experience using AI (e.g., using ChatGPT on its web interface), six knew about the concept but had not used AI, and two others were unfamiliar with the concept.Table 1 summarizes their detailed demographic information1 .Our interview sessions followed a semi-structured format.Our key interview questions covered three perspectives: (1) What were older adults' needs and challenges in communicating with their healthcare providers at home? (2) How might an LLM-based CA system facilitate communication and address these needs and pain points?(3) What concerns and risks would they have when applying such an AI-powered system?For the participants who had no experience interacting with LLMs, we used slides to introduce the concept of AI and LLM in descriptive words and visual cues and presented an example conversation with ChatGPT on the web interface.Based on these starting points, the interviewer followed up with interviewees based on their responses.The specific questions can be found in Appendix A.1 and Appendix A.2.The study lasted 25-30 minutes.Participants were compensated $25 for their time.The interviews were recorded via Zoom and transcribed afterward.Thematic analysis was employed to analyze the data.Two authors coded interview transcripts iteratively until they reached a consensus.We summarize our findings as follows.
3.1.2Findings: Communication Needs and Challenges.Participants reported a variety of examples where they needed to communicate directly with their healthcare providers at home, such as daily questions about new symptoms and urgent care (e.g., COVID or flu symptoms), confusion or changes about an existing condition (e.g., occasional pain or blood pressure change, prescription instruction), and post-surgery-related questions.Many cases were consistent with, and supported by, prior research [37,43,80,112].Most of these communication needs were case-specific and personalized.Participants' current solutions included visiting/calling healthcare services, communicating through the patient portal with providers, discussing with friends or family, and seeking online resources as alternatives.Our interview revealed a set of challenges to these existing solutions.
Communication Inefficiency Despite the wish to get quick and in-time responses from the healthcare system and providers, 5 participants mentioned that it was hard to get to healthcare providers because they were often too busy during working hours.For example, " then you're on hold for maybe 30 minutes, which is crazy."(P10) The in-person visiting option posed more challenges.Besides the traveling difficulties, the waiting time for an appointment with providers often took several weeks or even months.
Even in cases where participants got to talk to providers, some information was often ambiguous or even lost during communication.For example, P1 mentioned that nurse practitioners "never put down exactly what I say...They may not realize that it is very significant".P8 gave another example where the messages from providers and lab results often lacked explanations and were hard to understand.Such an information loss caused the participant to pay extra effort to describe things or request explanations one more time when they talked to doctors in subsequent interactions, which reduced communication efficiency and brought confusion for older adults.
Some participants were more familiar with technology and learned to use digital patient communication portals to leave messages to providers, which could address these challenges to some extent.However, lack of digital literacy is a widely known issue [40,91,104].Only four out of the ten participants knew how to use such digital technology.
Reliability Concerns of Alternative Resource.Since they could not get in-time communication with health providers, four participants reported resorting to alternative resources by searching online or talking to families.Consistent with previous work [17,21], they expressed concerns about the reliability of these resources."I'd rather just go to the doctor.That's their area of expertise."(P1) This became another strong motivation for us to facilitate communication between older adults and providers.

Findings:
Opportunities for An LLM-facilitated CA System.To further understand how LLMs may help older adults with their communication, we introduced the concept of an LLM-based CA with the possibility of implementing a VA as the user interface.We also presented examples of how such a system could have natural conversations, collect information and conditions, and forward the information to healthcare providers for further decision and contact.We highlighted explicitly that our system would not provide any specific health advice, but the system could provide general information about clinics, procedures, or term explanations.Overall, older participants expressed strong interest in such an AI-powered system to help them address the communication challenges.
Offloading for Healthcare Providers.As introduced in Section 3.1.2,one of the key communication challenges was the unavailability of healthcare providers.Older adults hoped that the system could help save providers' efforts."... Whereas with a computer, and assuming it can respond to 100 people at the same time ..." (P8) This could potentially help providers with tedious procedure work and give them the bandwidth to respond to patients' actual inquiries.
Emotional and Accessibility Support Although we made it clear that the system would not provide specific health advice, 3 participants still thought it could give immediate responses and provide emotional support.For example, P4 thought the system could accompany her at home since she lives alone.P10 also agreed that the audio modality of the VA would be comforting, "Sometimes you just wanted to ... hear a voice" (P10).Moreover, the support could also be expanded to other family members.For example, P1 shared her post-surgery experience and thought that if such a system existed, it would "give my daughter enough comfort to let me be in our apartment alone after" (P1).
In addition, participants commented that the natural VA system could address the technology barrier of navigating digital communication portals by improving cognitive accessibility the intelligence from LLMs.For example, P1 had a potential dementia risk and said "there's going to come a time when I'm not going to be able to remember how to use it." Function Expectations.In addition to the natural conversation, information summary, and forwarding features, participants also suggested other expectations, including having pre-set personalized instructions from doctors (P8), providing high-level suggestions and references (P1, P6), and delivering administrative information (P2, P10).
3.1.4Findings: Concerns and Risks.We also investigated the potential concerns of older adults when using the intelligent system.We found the common ethical concerns for AI systems [130] and privacy concerns of VAs from 3 participants [91].For example, P5 was concerned that they would accidentally reveal sensitive personal information: "because it's a little bit [listening to] everything".However, many participants did not have such concerns, as long as it "follows the criteria that the federal regulations" (P2).Besides, some participants were also concerned about the learning curve of the system, which we will evaluate through a user study in Section 5.2.
3.2 Interview Study 2: Healthcare Providers 3.2.1 Methods: Participants and Procedure.By emails and snowball sampling, we recruited 9 nurse practitioners, doctors who are primary care physicians or specialists, and other professionals who have close contact with patients.Participants have frequent communication with patients at home, and most of them have close communication with older adult patients.Six of the nine participants had experience with AI (e.g., using ChatGPT on its web interface), and three others knew about the concept.Table 2 summarizes their demographics and expertise.We mainly recruited physicians because we focus on the communication needs concerning clinical decisions or action in this study.In these scenarios crucial to older adults' health, physicians are the ones who are mainly responsible, while nurse practitioners or other staff largely worked on administrative work.In addition, the older adult participants expressed their strong wish to closely communicate with physicians, as discussed in Section 3.1.2We include providers with different roles to gain a holistic view of the providers' perspectives.We designed a semi-structured interview process, with a focus on three perspectives: (1) What were healthcare providers' needs and challenges in communicating with their patients at home, especially with older adults?(2) How might an LLM-based system facilitate their communication with patients and address these needs and challenges?(3) What concerns and risks would they have when applying such a system?The specific questions can be found in Appendix A.2.The interviewer followed up with the providers based on their responses.The study lasted 25-30 minutes.Providers were compensated $25 for their time.Similar to our procedure in Section 3.1, we recorded the Zoom interview, and two researchers used thematic analysis on the transcribed data.We summarize our findings in the following sections from the three perspectives listed above.
3.2.2Findings: Communication Needs and Challenges.Providers mentioned a variety of scenarios that required patient-provider communication.The majority were about older patients' daily care inquiries and the follow-up of previous treatments or surgeries.Sometimes long-term patient education also requires communication.Providers did use digital tools to manage information and reminders, yet for older adults, phone calls were still the most common method.Our interview results suggest several prominent challenges.
Ineffective Communication with Existing Asynchronous Methods.In Section 3.1.2,participants mentioned that providers were hard to reach.However, interestingly, providers felt the same.Four providers found patients, especially older adults, were hard to reach via phone calls.They often needed to call multiple times to finally reach patients, and sometimes they just could not succeed.Meanwhile, providers mentioned that they are "understaffed and overworked" since they usually have too many patients to handle (PP8).Their bandwidth was limited, so they could not spend all their time reaching out to older patients or going through a standardized inperson visit.As a result, providers had to prioritize other tasks, and communication often experienced inevitable delays.
Existing asynchronous solutions have been trying to address this limited availability challenge.Tools such as patient portals or EHR systems are set up as an alternative to manual phone calls.However, providers commented that the messages from patients were usually unclear.These messages were usually too limited or overly long for providers to make the right decision.Meanwhile, the design of current EHR systems has too much redundant information for providers to process "There's so much information that you need to disregard."(PP5) This may explain the information loss issue mentioned by older adults in Section 3.1.2.
Extra Effort Communicating with Older Adults.Out of the nine providers, seven also mentioned the challenges during the real-time communication with older adults: Older adults often had limited health literacy and higher accessibility needs.Four of the providers felt that they needed to pay extra attention to language and spend more time explaining concepts to make sure that the information was delivered to and understood by older adults."... like talking [in a] slower pace or using non-medical terms that any patient can understand."(PP2) 3.2.3Findings: Opportunities for An LLM-facilitated Communication System.To investigate how LLM-based AI can be used to help providers address communication challenges, we introduce the concept of LLM and our system.We briefly introduced the concept of both the patient-facing VA system for older adults and the LLM-powered information dashboard that summarizes the health information of older adults.Similar to Study 1, we also explicitly mentioned that the AI system would not provide any specific health advice.Overall, providers were excited about the potential of the system after learning the concept.We summarize the opportunities below.
Promoting Efficiency In Section 3.1.3,older adults expressed the hope that such an LLM-powered system could help offload providers.This was confirmed by our Study 2. Providers agreed that using a VA system & an information dashboard could significantly save their effort."[The system] could be very helpful for those local community clinics that may cover tens or even hundreds of patients."(PP1) Providers could leverage such an intelligent communication system to reduce the procedural process and focus on more critical tasks.
Moreover, providers also mentioned that such a system could help to address the information loss problem (Section 3.1.2and 3.2.2) by recording things and provide analysis results."Sometimes we weren't able to capture the full experience... AI will come in if it's being recorded... and capture that data."(PP7) Providers would like to see a concise summary of the conversation as they saw during our study (PP1, PP2, PP3), which could help them with the triage and prioritization of the right next step.These comments guided the design of our dashboard system in Section 4.2.
Experience Improvement for Older Adults.Consistent with the comments from older adults, providers, from their own perspective, also agreed that the VA system could help with the conversation experience for older adults."[The VA system] might eliminate that feeling of judgment, and the patient will probably answer more accurately, as opposed to sitting across from another person who could potentially, you know, unintentionally making them feel uncomfortable because of the certain questions and answers that are required."(PP7) Providers also agreed with the accessibility advantage of a VA system over text-based solutions , which strengthens our motivation for the VA system for older adults.
Function Expectation.Providers also brought up a few features that the communication system would be good to have.For the VA system, several providers mentioned the necessity to provide "closed loop communication"."That would be helpful and increase the accuracy of the information."(PP2) Meanwhile, providers also mentioned the expectation of such an AI system to be integrated with EHR systems (PP1, PP4).Some providers also brought up the importance of recording conversation audio to assist with certain diagnoses.
3.2.4Findings: Concerns and Risks.Providers had a strong agreement with our design of not providing health advice due to the concern about AI reliability.Moreover, they also expressed concerns about the risks of misinterpretation (PP2, PP5), AI-embedded bias (PP3, PP7), and sensitive/triggering topics, such as cancers or sex crimes (PP4).Meanwhile, some were also worried about how AI might influence patient-provider interaction in the long run (PP4, PP9).Some of these ethical concerns were beyond the scope of this paper, and we will discuss potential solutions in Section 6.

Findings Summary
We summarize our key findings of this section as follows: (1) Problems in current communication methods, including provider availability, limited information communicated, and accessibility needs, have led to communication challenges for older adults and their providers.(2) Participants are highly interested in an LLM-powered system to offload healthcare providers, provide mental support, and gather more information.(3) Reliability and privacy concerns were raised by both patients and providers.
These insights can guide the design of our LLM-powered system in the next section to facilitate patient-provider communication for older adults.

TALK2CARE SYSTEM
The insights from older patients and health providers shed light on the design and implementation of our pilot Talk2Care system to facilitate patient-provider communication.Figure 1 presents an overview of our pilot system.Talk2Care consists of two parts: First, the patient component: an LLM-powered VA interface that asks and answers older patients' questions to collect their health information, which will be forwarded to healthcare providers for future decisions and contact (Section 4.1).Second, the provider component: a dashboard interface that employs LLM to summarize patients-VA conversations and present important information while keeping original details to save effort and time for healthcare providers (Section 4.2).

Patient Component: LLM-powered Voice Assistant
The results in Section 3.1.3confirmed the advantage of an LLM-powered VA system for its convenience and accessibility for home-based older patients.We build a VA interface to collect information health information (e.g., detailed symptom descriptions).The interface has a multi-turn conversation ability so that they can have continuous and natural back-and-forth conversations with older patients, generating appropriate and personalized follow-up questions to collect important information.Figure 2 provides an overview of this component of The VA interface has multi-turn personalized conversations with the older adult to collect related health information.The LLM-powered Question Generation Module is responsible for taking the older adult's words and generating questions for effective information collection.The prompt design of this module is detailed in Figure 3. Another LLM-powered Content Loopback Module is to make sure that key information from the older adult (e.g., pain level) is accurate by double-checking the content, a common healthcare communication practice.The older adult's information, conversation protocol, and conversation log are stored in the information database.
Talk2Care system, including the two modules for conversation generation (Section 4.1.1 and 4.1.2),the VA hardware (Section 4.1.3),and the database (Section 4.1.4).
4.1.1Question Generation Module.The core part of the VA system is to generate high-quality questions to collect key health information from older adults, which can then be forwarded to healthcare providers for decision-making.To achieve this goal, we leverage GPT-3.5-Turbo over GPT-4.0 due to the fast speed of GPT-3.5.LLMs are known to generate offensive or biased texts, sometimes even harmful content [51,130].To address the ethical concerns, we conduct a series of prompt designs to improve the question generation referring to prompt optimization practices from previous works [8,117].Specifically, one researcher conducted a set of preliminary prompt designs and tested them with GPT-3.5.Then, another two researchers, one with technical expertise and the other with healthcare expertise, acted as quality checkers for the content and provided feedback.This process was iterated through multiple rounds until all researchers were satisfied with the generated content. 2inally, we propose a set of five important content factors to be integrated as a complete prompt, as illustrated in Figure 3. (1) Patient Information.This part includes the older adult's basic information, such as their name, gender, age, and living situation.Moreover, it also includes their health situations such as chronic conditions and medical history.This information was added to the database during the initial setup of the system.(2) Conversation Protocol.To ensure the validity of the information collection questions, the VA should follow a standard protocol employed by providers for a specific task (e.g., daily healthcare needs in Figure 3).Different scenarios may require different protocols (e.g., post-survey follow-up vs. daily care needs).Therefore, the information database stores a set of protocols collected from online resources, researchers, and providers.We separate the task summary and question protocol into two chunks to provide a flexible mapping between tasks and protocols. 3(3) System Setting.This part instructs the role of the VA (as a communication facilitator), its responsibilities (i.e., collecting health information but not giving specific health advice), and its communication style (i.e., concise, health-focused, and easy-to-understand, as suggested in Section 3.2.2).The specific wording of our prompts can be found in Figure 3. (4) Conversation History.This part includes the conversation history between the older adult and the VA, which can augment the VA's memory and personalize the conversation experience.(5) Response Optimization.To optimize the VA's response to the older adult, this part is designed to improve the quality of each response and question generated by GPT-3.5.For each conversation session, (1) -( 4) are only used at the beginning of the generation, while (5) is added in the prompt across all conversation rounds.4.1.2Content Loopback Module.In addition to the question generation, our interviews with providers (Section 3.2) suggest the importance of content confirmation, especially for important health-related information.Therefore, we design a simple content loopback module.When a question asks about specific values (e.g., the pain level), the module loops the value back to users and asks them a second time to confirm.

Voice Assistant
Hardware.We experimented with multiple audio devices, including Raspberry Pi and Arduino boards.Finally, we decided to deploy our system on the Amazon Alexa platform by wrapping it as an Alexa Skill, so that we can leverage the well-integrated ecosystem.We use Alexa Echo Dot as the hardware.Voice-to-text and text-to-voice service provided by Amazon Alexa is employed to transcribe the older adult's input into texts and convert LLM-generated texts into audio, bridging the text content and natural speech interaction.
4.1.4Information Database.We employ a cloud-based database to store the older adult's information, conversational protocols, as well as conversation log.These contents are used to construct the prompt for both the question generation module (1, 2, and 4 in Section 4.1.1)and the content loopback module.The database is encrypted to protect each individual's privacy.4.1.5Conversation Flow.Depending on the scenarios, the conversation can be initiated either by older patients (e.g., with specific health concerns and care needs) or by the VA (e.g., health condition follow-up requests).In each session, the VA will lead the conversation to ask the patient about symptoms and iteratively gather detailed information to pass to healthcare providers following the prompt instructions.For example, in the post-surgery scenario, the VA will first ask the patient about the overall health conditions and any pain or discomfort.If the patient mentions any pain or discomfort, the VA will ask about details (with loopback confirmation), such as pain level, positions of discomfort, and any actions that may influence the symptoms.If the patient has any confusion about the information, the VA explains the concepts accordingly but does not provide direct medical advice.
4.1.6Exception Handling.We designed the system and prompts to handle common exceptions in voice recognition and user input to minimize unsatisfactory user experience in abrupt failure or termination.We use Alexa skill re-prompts [1] and conversation history to handle unexpected pauses, so that users could have continuous conversations with the VA after pauses.If voice recognition has errors, users could interrupt the VA and start again.
Some users might seek immediate medical advice from the system, and according to prompts, the system generates responses such as "I'm not a doctor, but it would be best to consult with your healthcare provider about Fig. 3. Prompt Design of High-Quality Question Generation for Health Information Collection.The input prompt consists of five parts: 1) patient information, 2) conversation protocol, 3) system setting, 4) conversation history, and 5) response optimization.For multi-turn conversation, 5) will be repeated for each round of conversation.The colored texts are parameters that can be extracted from the information database (see Figure 2).Note that the conversation protocol needs to be set by researchers or healthcare providers to ensure question validity.This figure shows an example of daily-care protocol.which painkiller to take.I can pass this information along to the nurse so they can assist you further."(VA) Since we aim to take advantage of LLMs, we allow the LLM to provide reasonable explanations for general information assistance, such as "What is hospice and palliative care?"The strategies appeared to be successful in our user study in Section 5.2.

Provider Component: LLM-powered Information Dashboard
Other than the LLM-powered VA system for older adults, we further developed a dashboard interface for healthcare providers to review the information in the patient-VA conversation and take further actions if necessary.Figure 4 illustrates an overview of this component of Talk2Care, including three modules for information summary (Section 4.2.1),context highlight 4.2.2), and risk prediction (Section 4.2.3),together with the dashboard (Section 4.2.4) and the database (Section 4.2.5).

Conversation Summary Module.
Our interview results with experts suggest that providing a summary of the conversation between the older adult and the VA system is very helpful (Section 3.2.3).This module aims to extract the key messages from the conversation log and format them into a clinical notes structure.To achieve this goal, we also leverage GPT-3.5 with a series of prompt designs.
By leveraging provider training tutorials and literature [2, 7, 8], our prompt iteration process is close to the one introduced in Section 4.1.1.The final prompt design structure, as shown in Figure 5, also resembles the question generation for older adults (Figure 3), including five factors.(1) Patient Information.This is the same information of the older adults from the information database.(2) Conversation Protocol.Other than providing the task summary and question protocol, the prompt further emphasizes the key information to help the LLM narrow down its focus and improve the quality of the summary.Beyond the key information, the protocol also instructs the LLM to highlight any extra communication needs with healthcare providers if requested by users.For example, if the patient shared a particular concern when asked "is there anything else that I could help?", the system would present the response as an additional note.This piece usually comes together with each question protocol.(3) System Setting.This part instructs the role of the AI assistant (as a text summary tool), its responsibilities (i.e., extracting health information and patient questions), and output format (i.e., concise clinical note structure, Section 3.2.3).The specific wordings are shown in Figure 5. (4) Conversation Log.The complete log of the conversation session between the older adult and the VA.(5) Summary Optimization.Through our pilot tests, we noticed the need to adopt an in-context learning paradigm to help the LLM generate high-quality summaries.Therefore, this part shows a summary example to ensure the output is well-structured.

Information Highlight Module.
The healthcare providers in our interviews also mentioned the importance of having the raw conversation log (Section 3.2.3),enabling them to check the raw data when necessary.We design a highlighting feature to save their effort.The prompt setup is similar to the conversation summary module introduced in Section 4.2.1, with the main difference being the system setting (the responsibility is to find and return the important quotes of the older adult) and output optimization (no longer needed).

Risk Prediction Module.
There has been a wide range of research on risk prediction in healthcare (e.g., [42,63,86]).We envision our dashboard should also include such a module to help providers allocate limited resources appropriately.For the completeness of our Talk2Care system, we design an LLM-powered risk prediction module.The prompt design is also similar to the conversation summary module in Section 4.2.1, with the main difference being the system setting (the responsibility is to predict the risk level of the older patient: low, moderate, and high) and output optimization (ask for the prediction reasoning).However, we recognize the ethical risks of mis-predictions.Evaluating the prediction performance of the LLM goes beyond the scope of our paper.Additionally, health providers need to implement safety protocols to standardize risk management.Therefore, our evaluation of Talk2Care in Section 5 mainly focuses on the usability of the system design.

Dashboard Interface.
We develop our dashboard interface with the React framework.Our design simplifies the interface of existing EHR systems [3, 4] and mainly focuses on the new features we propose.Figure 6 presents the continuous example of the older adult's information, following Figure 3 and Figure 5. Providers can easily navigate to read the conversation summary, the raw conversation logs and highlights, and the communication session history.For each session, they can take follow-up actions or write notes under the summary.4.2.5 Information Database.The information database of the dashboard is shared with the database of the VA system.It provides the older patient's information, conversation log, and health protocol for prompt input.It also saves providers' action logs or notes for future visualization.

Interaction Flow.
Depending on the specific scenarios, the providers first input instructions and protocols into the system, which are used for both the older adult's and the provider's components.When new conversation sessions are stored in the database, the providers are notified.Using the LLM-powered dashboard, they can check the conversation content, take notes, and make decisions on the appropriate follow-up actions.
Combining the two components together, we aim to build Talk2Care as a complete system to facilitate the communication between home-based older adults and healthcare providers.3, the input prompt consists of five parts: 1) patient information, 2) conversation protocol, 3) system setting, 4) conversation log, and 5) summary optimization.For multi-turn conversation, 5) will be repeated for each round of conversation.This figure continues the example of the protocol of daily care.

EVALUATION
To evaluate the design of our Talk2Care pilot system for stakeholders, we proposed two specific healthcare scenarios to put Talk2Care in realistic settings (Section 5.1).Then, we conducted two user studies using the two Fig. 6.LLM-powered Dashboard Interface Layout for Healthcare Providers.The dashboard presents the conversation summary (center), raw log, and its highlights (right).The colored dot represents the risk prediction (left and center).Providers can take specific follow-up actions or write down notes under each conversation summary, after which they can mark one session as done.
scenarios with both older adults (Section 5.2) and healthcare providers (Section 5.3) to measure the usability of Talk2Care design.

Scenario Setup
Based on our interview feedback in Section 3, we designed two scenarios corresponding to the communication needs of older adults at home and healthcare providers, one with older adults as initiators (Section 5.1.1),and the other with providers as initiators (Section 5.1.2).

Scenario
A: Daily Care Needs.One of the common communication needs of older adults at home is when they have unexpected health concerns and want to communicate with providers.We designed a character, Mary, a 75-year-old female patient living alone.Mary has been coughing and feeling tired since this morning.She suspects it may be related to COVID-19, but the home-test result turns out to be negative.Mary does not want to travel to the clinic unless necessary, because it takes too much time and energy, and she wants to avoid spreading the potential virus.So she wants to ask for advice on going to urgent care or staying at home.Meanwhile, Mary has high blood pressure, so she is worried about the complex situation and wants to check if there may be any problems.
On the provider side, we designed another character, Tom, a medical office assistant working at a clinic, and his primary responsibility is to answer patient phone calls every day.He gets a number of calls from patients asking about various things, such as lab results, prescriptions, random health-related questions, and requests for appointments.After Tom has all the information from the phone calls, he needs to prioritize them and take the next steps.5.1.2Scenario B: Post-Surgery Follow-up.For providers, one common communication need is to follow up with older patients about their symptoms after hospital discharges.Following the character above, John, a 72-year-old male, just went through a small joint surgery on the knee two days ago.He feels good overall, but there is still some pain in the knee.Meanwhile, John has two different kinds of painkillers: aspirin and ibuprofen.He is not sure which one to take.
On the provider side, Emily is a postoperative care nurse working at a busy hospital.Part of her primary responsibility is to follow up with patients after their surgeries, ensuring a smooth recovery process.Emily has to call a number of patients to ask about their current situation and questions.John is one of Emily's patients, so Emily needs to follow up with him about the knee surgery.After Emily has all the information, she will prioritize them and take the next steps.

User Study 1: Conversational Interface with Older Adults
This user study aims to evaluate the older adult's component of Talk2Care.Our evaluation results indicate that the pilot system can provide good usability (Section 5.2.2), help older adults provide more health information (Section 5.2.2), and offer better mental support (Section 5.2.2).

Methods:
Participants and Procedure.With the IRB approval, we recruited the same set of 10 participants from our interview (Table 1).We confirmed that no participants have any hearing or speaking disabilities that may impact the interaction with the VA.
We compared Talk2Care against the most common asynchronous communication method by leaving a message to providers.Specifically, we introduced the two scenarios one by one.After introducing each scenario, older adults were asked to play the role of the character (Mary or John) by writing down (or speaking aloud) the message to leave for providers.Then, they had a conversation with the LLM-powered VA system we implemented, with the same goal of conveying the information to providers.Researchers first gave a brief tutorial on the LLM-powered VA and invited participants to test simple interactions such as invocation.Then the older adults were presented with a sheet to remind them of key invocation phrases and scenario setup.During the interactions with the VA, they were encouraged to engage in natural conversations according to their own language style and add details.After going through the two scenarios, they completed a short evaluation questionnaire and a semi-structured exit interview.Note that the order of the two scenarios was counterbalanced (e.g., participants with odd numbers start with Scenario A and even numbers with Scenario B).Two examples of the conversation log between the older adult and the VA can be found in Appendix B.1 and B.2. Figure 7 shows an example of the user study.The study lasted 25-30 minutes.Participants were compensated $25 for their time.
Our questionnaire includes the System Usability Scale (SUS) [22], the comparison against the basic method of leaving a message, the health support experience, and a few VA-specific experience questions, using a 5-point Likert scale.The SUS is an industrial standard scale to evaluate system usability for users to rate from "strongly disagree" to "strongly agree" on 10 arguments such as whether the system is easy to understand, their need for technical assistance, and learning difficulty [22].In the interview session, we collected user feedback on what they liked and disliked regarding their interaction experience, as well as their future expectations about the system.The interviews were recorded, transcribed, and processed with Thematic analysis by two researchers.We summarize our findings as follows.
5.2.2 Findings: Good Usability, Description Enhancement, and Mental Support.During the study, all participants were able to successfully complete both scenarios following the instructions.They agreed that the two designed scenarios were close to their daily care needs situations (4.2±1.0 out of 5).Meanwhile, participants found the  VA system easy to use.The average SUS score was 75.5±17.1 (out of 100), indicating good usability."[It's] almost like talking to anybody.I mean, talk to people on the telephone.It's not that different."(P10) They also found that the VA system's generated questions were simple to understand (4.6±0.5), and that the system could understand their words accurately (4.3±0.8).Participants agreed that the system was reliable (4.2±1.0) and the overall privacy concern was not high (2.4±1.4).Through our analysis of the conversation logs, we find that the LLM can successfully identify varying ambiguous statements from older adults.For example, in Scenario B, participants described the pain as "my knees are killing me" (PP5) or "I have body aches" (PP6), and the LLM asked proper follow-up questions about pain level (PP5) or position (PP6).
During the study, participants tried the traditional method of leaving a message, so that they could compare it against the method of talking with the VA.Our results showed that older adults thought the new method could better support their communication with providers (4.3±0.7), and enhance the descriptions of their health information (3.4±1.3)."It reminded me of additional symptoms that I went through" (P9)."I wouldn't necessarily have thought to include that in my message.[But that is] significant to the doctor."(P1) This was a strong advantage of Talk2Care: With the help of natural and interactive conversation, older adults were able to better communicate and give more health information to providers.
As mentioned in Section 3.1.3,older adults thought our system might provide better mental support, even though it was intentionally designed to not give specific health advice.This comment was validated after they used our system.Participants gave high ratings on mental support (4.0±1.3) and on the support for their home-based healthcare needs (4.2±1.0).In particular, P2, previously working as a nurse in a local clinic, provided some insights."A lot of people are sometimes afraid to ask a doctor.They think it's foolish, so I think that gives them the opportunity to ask without being too threatening by picking up a phone and having to talk to a secretary and go through all of that."This provided an interesting effect of AI-mediated communication, which we will further discuss in Section 6.

5.2.3
Findings: Future Expectations and Concerns.Participants had more specific expectations after using the system.A dominating direction was to have the system provide more administrative information, integrate with the healthcare system, and help manage their personal health information.For example, P7 and P9 suggested the VA could provide an estimated time that it takes for the healthcare provider to respond.Participants also mentioned that the system could play the role of "an entry point of the different specialists" within the EHR system to minimize the communication effort.
During the study, some participants did experience voice assistant failure such as being interrupted and voice recognition failure."I think it's a little awkward, the need to have her complete all her sentences." (P6) However, most participants felt confident about adjusting to the interaction mode as long as they have time to practice."Once you get into the rhythm, I think it's okay." (P6) Yet still, some participants thought the reluctance to adaption for older adults might be a barrier, "because it's a big change".Therefore, a smooth onboarding experience and great usability would be crucial to such an LLM-powered interface for older adults.As mentioned previously, the overall privacy concern was not high (2.4±1.4)among participants.Interestingly, some participants felt that their privacy were better protected interacting with the VA."... as opposed to talking to a nurse that could be a next-door neighbor ... there's a little bit more of a barrier between that and humans seeing the information."(P9)

User Study 2: System Interface Study with Healthcare Professionals
Our Study 3 evaluated the first half of the Talk2Care system for older adults.This study focused on evaluating our dashboard design's usability with providers.Our results demonstrate that the pilot system provides good usability (Section 5.3.2), and that it can effectively support providers' communication and information processing (Section 5.3.2).

Methods:
Participants and Procedure.Our study was also IRB-approved.We recruited the same set of 9 healthcare providers from our User Study 2 (Table 2).All providers have the experience of using one or multiple EHR systems.
Similar to Study 3, we compared Talk2Care against the most common asynchronous communication method of receiving a message from older patients to providers.We briefly introduced the dashboard interface to providers.After they got familiarized with the system, we introduced the two scenarios to them one by one.Since providers had different responsibilities and specialties, they were not asked to play the role but to suggest the characters (Emily and Tom) about the appropriate actions.In each scenario, they were first asked to give suggestions on the next steps with the traditional communication method, i.e., after receiving patient messages corresponding to the scenario.Then, they were invited to share their screen and navigate through different pages in the dashboard interface by clicking, scrolling, or reviewing information to complete the same goal of processing patient information.In the meantime, they were encouraged to share their thoughts about the interface design.Note that the dashboard designof the two scenarios was the same, except for the different content (scenario A: Figure 6, scenario B: Figure 9), where we made up patient information for each scenario where the critical cases were as stated in 5.1 After going through the two scenarios (with a counterbalanced order across participants), they completed a short evaluation questionnaire and a semi-structured exit interview.The study lasted 25-30 minutes.Providers were compensated $25 for their time.
Our questionnaire includes the NASA Task Load Index (TLX) [50], the System Usability Scale (SUS) [22], the comparison against the basic method of processing messages, the work support experience, and a few dashboard user experience questions.The TLX questions had a 7-point Likert scale to evaluate aspects of the task load.In questions about demand, effort, or frustration, a lower score indicated a lighter workload and was desirable for the system; in those about performance, a higher score indicated a better outcome.All other questions had a 5-point scale, where higher scores indicated better support.In the interview session, we collected providers' feedback on their experience regarding using the dashboard to facilitate their communication with older adults.
We also asked about their future expectations and concerns about the system.Like other studies, two researchers processed the interview transcriptions with Thematic analysis.We summarize our findings below.5.3.2Findings: Good Usability, More Information, and AI-Mediated Communication.After navigating through the interface for the two scenarios, providers commented that the system was easy to understand and convenient to use.They also agreed that the scenarios and questions were realistic to their actual scenarios, and that the system established a good workflow.The system has an overall SUS score of 85.8±9.8,indicating very good usability.The NASA TLX results were also consistent with their comments (see Figure 10), with low ratings on mental demand (2.2±1.0 out of 7), physical demand (1.4±0.5),temporal demand (1.7±0.9),effort (2.0±1.1), and frustration (1.1±0.3), as well as high ratings on performance (6.1±0.6).
Providers also thought the system was helpful in their communication with older adults (4.6±0.5 out of 5) and offloading their work (4.4±0.9).Six participants mentioned cases where the system could save time for them.For example, PP2, PP3 and PP9 were convinced that the VA system could save their time spent on the phone calling.PP1 also mentioned that the whole workflow might also reduce unnecessary outpatient visits.
Moreover, also consistent with Study 2, more than half of the providers (5 out of 9) mentioned that the LLMpowered dashboard could assist in triage and help prioritize their tasks, improving efficiency in the process.Meanwhile, the summary and highlights could help them organize notes for documentation and speed up the review process (PP2, PP3, PP4, PP7, PP9).This was also backed up by the high ratings on the system reliability (4.1±0.6).Providers also favored leveraging the conversation history for their work in long-term, which could be used for clarification, confirmation, and reminder before a patient's visit (PP1, PP3, PP7).
In Study 3, patients commented that talking to the intelligent VA system could help them reveal more health information (Section 5.2.2).This was also confirmed by providers in Study 4. Providers agreed that the summary and log contained more information compared to the commonly seen short messages in the traditional patent portal."When they're writing, they're probably going to put less detail than when they're talking."(PP9) Compared with some questionnaire practices, they also thought that the information could help narrow down the problem to identify the key issue, and get providers more prepared for future visits.With the sufficient information provided, providers also agreed that the communication procedures would be optimized to reduce the back and forth (PP4, PP8).
Interestingly, providers mentioned that the LLM-powered CA could reduce their mental load of talking with patients by playing the intermediate communication role."[Talking with patients is] pretty soul-sucking, ... and [the nurse] has to be good at keeping their temper and managing people's expectations..." (PP5) 5.3.3Findings: Future Expectations and Concerns.In addition to the positive comments, providers also brought up more expectations.Four providers mentioned the need to integrate the system with EHR systems, which was also brought up by older adults in Study 3. Also consistent with older adults' expectations, providers mentioned that the system could offer pre-set valid suggestions and explanations from providers.For instance, examples for rating pain levels should come together with the question querying pain level as pre-set explanations (PP7).Meanwhile, we only covered two protocols for the two scenarios, and providers expected a future intelligent system could incorporate more detailed protocols customized to medical history and key symptoms, and thus collect more detailed health information (PP2, PP3, PP7).For example, PP3 suggested having different protocols for different surgery types and following up with bleeding-related questions, such as recent bleeding time and any effective interventions.With the concerns about AI making mistakes, some providers also suggested that the system should enable them to edit content to correct errors, if any.When "the human component comes in" (PP8), they feel they would not "miss anything".

Finding Summary
In summary, we present the following key findings from our user studies, guiding our discussion in Section 6.
(1) Talk2Care has great accessibility and usability for older adults and their healthcare providers.
(2) The LLM-powered CA could enrich the information communicated and provide mental support for older adults.
(3) The LLM-powered dashboard could promote provider efficiency by offloading and supporting prioritization.(4) Users identify ethical and privacy concerns as well as desirable features in such LLM-powered systems.

DISCUSSION
User Studies 3 and 4 demonstrate the effectiveness of the LLM-powered Talk2Care to facilitate communication between home-based older adults and healthcare providers.In this section, we discuss the effects of AI-mediated communication (Section 6.1), the potential of deployment Talk2Care in real-world scenarios (Section 6.2), and the important ethical concerns and safety considerations of Talk2Care (Section 6.3).We also discuss the limitations of our current work (Section 6.4).

AI-Empowered Healthcare Work
Existing work has explored how AI may empower healthcare work for providers, focusing on Clinical Decision Support Systems (CDSS) and clinical practices, highlighting how AI may assist human experts throughout medical diagnosis and treatments [41,73,76,115,127].Given the limited work in multi-stakeholder contexts [58], our work uncovers AI's capability to address challenges in patient-provider communication.Our interviews and user studies revealed that the LLM-powered system has the strong potential to fill in the gap between older adults' expectations of their healthcare providers and their reality.Firstly, with pre-set protocols and explanations, LLM could satisfy some of the communication needs of patients if their providers are not immediately available, improving responsiveness and supporting triage.Furthermore, LLMs could also address the limitations of existing communication technologies.For example, previous HCI research has also explored strategies and interactive tools for healthcare communication such as embodied agents, voice assistants, and mobile apps, which do not necessarily involve AI and thus follow more fixed interaction flows [13,31,32,99,106].Based on these foundations, our study presents the benefits of AI in facilitating communication.With given protocols and interactive conversations, they provide higher adaptability and efficiency for asynchronous communication.Thus, LLM-powered CAs could promote the level of details of asynchronous communication.For healthcare providers, such AI could also lighten their workload in patient-provider communication by taking over basic or repetitive tasks, such as answering phone calls according to a protocol and finding key information in a conversation.Thus, AI might be able to help patients and providers prioritize their communication tasks and maximize communication outcomes within limited availability.Future designers and system builders could utilize AI to work on the tasks in healthcare communication that are time-consuming but could be well structured with instructions.
In multiple stages of our user studies, older adults and providers mentioned how the role of the LLM-powered system can support their communication needs and reduce mental load.Even if they do not receive specific health advice, older adults still feel mentally supported when talking to our VA system (Section 5.2.2), and some might feel less judged compared to when talking to a real healthcare provider (Section 3.2.3).For providers, other than saving their effort and time, such an intelligent system can also reduce their mental load for patient communication (Section 5.3.2).These results suggest the potential of AI as a mediator role in human-human communication, which can help to take over some less favorable parts during the communication and improve both patients' and providers' experience.At the same time, as the AI-powered system promotes the professionalism of providers during communication, such an AI mediator also minimizes harm to patient-provider relationships.This work could serve as an example to support the promising future of AI-mediated healthcare communication [53,79].For more personal communication scenarios, such as sharing health information across a family, it remains an open question on how such an AI mediator may affect human relationships.
Despite our limited number of participants, our participant groups have a rather wide range of medical conditions (patients) or areas of expertise (providers), covering different communication needs and challenges.Meanwhile, our patient participants' medical conditions are quite common among the general patient population (e.g., migraine, hypertension).Therefore, we envision that our findings could generalize to patient-provider communication for older adults and potentially other patient populations.We discuss more in future work in Section 6.4.

Necessary Functions before Real-World Deployment
Our studies suggest that Talk2Care has a strong potential to be deployed in the real world.However, there are a number of features that Talk2Care needs to support and be evaluated before deployment (Section 5.2.3 and 5.3.3).

6.2.1
Integration with EHR Systems.First, both older adults and providers suggested the idea for our system to be integrated with EHR systems, which could enable our system to provide more personalized conversation with older adults with more validated and detailed question protocols, update their health information in the system more seamlessly, support providers with a smoother working flow, and further facilitate the communication between older adults and multiple providers.Admittedly, integration with EHR may come across several obstacles, including demand for budget and resources, handling sensitive information, and interoperability issues [33,52,118].We envision such an integration could potentially start at a specialized clinic and include minimal sensitive data.

Incorporating
Long-term Conversation History.Another opportunity to develop a personalized experience is to leverage conversational history in the interaction between patients and LLMs.In our study, since we are using a scenario-based user study setting, we mainly leveraged the conversation history on a per-session scale.Our results have revealed that short-term history (within the same round of conversation) helps the system ask key questions, probe for more health details, and keep the conversation brief.Beyond the scope of the interactions during the study, participants also expressed a strong interest in the possibility of using long-term history (between different conversations) to personalize their experiences.This suggests another direction for our future LLM system.For example, if the user mentions a symptom of pain one day, the LLM CA could follow up in the next few days and inquire if the pain persists.In our current implementation, we have enabled a basic long-term memory mechanism in our system by keeping all conversation records.However, we also recognize the technical difficulties when integrating long-term memory.As pointed out by existing work in LLMs and natural language processing, there are restrictions to the number of tokens input to the LLMs [6], limiting the length of conversation history texts that could be included in LLM prompting [12,70,116].Potential solutions may include using conversation summaries instead of original transcripts for long-term use.Still, future work could further explore the technical solutions to enhance long-term system memory and their effectiveness.

6.2.3
Establishing the Boundary of System Capability.Given the complexity of medical conditions, such systems should have flexibility in handling various health concerns.Some functions of the current system need to be automated, such as selecting the question protocol based on the varying scenarios.The protocols could also be more specific to patients' health conditions or adopt more complicated logic compared to our scenario-based studies.In addition, failure of the CA such as voice recognition issues (Section 5.2.3) [27,83], should also be addressed to ensure the information accuracy.
However, even with close integration with medical and conversation history, such LLM-powered systems may still come across complicated scenarios that challenge their capabilities in real-world deployment.For example, patients may respond with a mixture of symptoms and questions that do not fit into a pre-set protocol.Patients may also ask about a novel or rare term about which the LLM lacks knowledge.This calls for the LLM's higher flexibility in handling various health concerns, which is an active research topic.Coming across these capability boundaries, the LLM should respond with recognition of the limitations, such as "Sorry, I don't know that one, so I will record your questions and have your provider get back to you.".

Ethical, Privacy and Safety Concerns of LLM-powered Healthcare Communication
Although Talk2Care shows a number of promising advantages, ethical, privacy and safety concerns must be addressed adequately and thoroughly before any real-world deployment.Our participants mentioned a range of ethical concerns that need to be considered in future work (Section 3.1.4and 3.2.4).

Ethical Concerns:
Ensuring Reliability and Human Involvement.Talk2Care is intentionally designed to not provide health advice due to the concern about AI reliability in healthcare [24,55,82].Nevertheless, it can still make mistakes at multiple stages of communication, such as misunderstanding older adults' meaning of a sentence, misleading older adults when providing their health information, or mis-predicting the severity level so that providers' priorities are biased.Potential solutions include having a separate sanity-check AI agent (thus forming a group of LLM-based agents), requesting human attention when the system is uncertain, or a combination of both.
Similarly, participants also reported reliability concerns regarding LLM's risk predictions, summaries, and highlights.Thus, we emphasize that the human counter-check in our dashboard interface is a crucial step in the proposed workflow, and providers should be informed of the potential risks of LLM results.Meanwhile, patients could also have access to the transcriptions and LLM outputs for double-check.Furthermore, we recommend future work to systematically evaluate the reliability of such predictions and interpretations, thus guiding real-world deployment.
Meanwhile, the impact of such a remote communication system on in-person outpatient visits and patientprovider relationship, as mentioned in participants' concerns, is also unclear.Although providers can ask patients to come in for more tests after the system mediates the communication, this system may reduce older adults' desire to leave home and increase health risks.Similarly, given earlier work on deception and social CAs [100], it is possible that this system could lead to the social isolation of some older adults if they communicate more with the system and less with their family members or providers directly.Future LLM-powered systems should further explore the essential patient-provider interactions to where direct synchronous patient-provider interactions are crucial.
6.3.2Privacy Concerns: Informed Consent and Interventions.In addition, in future deployment, privacy risk is another important concern of the LLM-powered system, as the system may collect patients' personal information.
As mentioned by participants in interviews (Section 3.1.4and 5.2.3), future designers and developers should refer to regulations such as HIPPA and existing secured telehealth systems to enable secured access to the database [46,93], and could potentially utilize natural language processing (NLP) methods to filter critical identifiable information.Meanwhile, in deployment, patients should be fully informed of the data collection of the system.We suggest a system should at least cover the following two steps.Firstly, the system should explicitly explain the data usage and privacy concerns, especially in cases when patients have a lower technical literacy.The system can use either traditional written consent forms, explanations by health providers, or explicit reminders from the system.Secondly, during regular human-CA interactions, the system should regularly remind the patients of risks when sensitive data is collected and suggest practical alternatives.
Much future work is needed to carefully address these concerns to ensure a reliable, safe, and robust healthcare communication system.

Limitations and Future Work
There are a few limitations in our work.First, our sample sizes for the user studies are limited (10 older adults and 9 healthcare providers).Also, since our participants are recruited from the university participation pool, some of them may have higher education levels, digital literacy, and a better understanding of LLMs than other older adult groups.Therefore, the results obtained from our study may not generalize to other populations.We also acknowledge the limited number and occupation of our provider participants.We selected the provider profiles to focus on clinical communication needs and patient concerns, including other healthcare providers who have more frequent communication and patient management experience may enrich the study insights.However, the provider participants are mainly physicians instead of nurse practitioners or support staff.Thus, we suggest future work include a wide range of both patient and provider populations to further study the generalizability of such systems in real-world deployment.
Second, our current system is a proof-of-concept prototype.The protocols we used in the study need manual effort and can be more detailed.As we discussed in Section 6.2, it needs more future work to achieve a smooth deployment.For example, although our findings suggest that LLMs could enable the system to be customized for each patient, we did not integrate patient information and provider instructions in our user study in a large scale.Resource limitations, privacy, and ethical concerns would arise when scaling up the system.Third, the evaluation of the system mainly focused on system usability and LLM opportunities, but we did not evaluate the performance of different modules in Talk2Care, such as the quality of the generation question or summary, or the accuracy of the risk prediction.Before scaling up the system, there should be a comprehensive and systematic evaluation of the LLM performance in the different modules to ensure reliability and minimize risk to patients.

CONCLUSION
In this paper, we designed and implemented Talk2Care, a novel LLM-powered pilot system to facilitate communication between older adults at home and healthcare providers.To better understand the communication challenges and opportunities of LLM-based systems, we first conducted two semi-structured interview studies, one with older adults (N=10), and the other with healthcare providers (N=9).The interview results reveal the need for an AI-based asynchronous communication system and shed light on our design of Talk2Care.Our pilot system consists of two components.The patient component is an LLM-powered VA system that generates high-quality questions to collect health information from older adults and send it to providers for further decision-making.The provider component is a dashboard that summarizes and highlights key information from the patient-VA conversation to save providers' time and assist their analysis process.We then conducted two more user studies to evaluate Talk2Care.Our results showed that Talk2Care may address the challenges mentioned by patients and providers, facilitate their communication, enrich the health information shared by older adults, and reduce the efforts of providers.Our work serves as an example of LLMs' application in the intersection of healthcare and interpersonal communication.
(2) In your case, if there is such an AI voice assistant to help you gather information from your patients by [referring to their difficult cases mentioned above] Or: In your case, if there is such an AI that summarizes the conversation and highlights the key information, Do you think it will be helpful?

Fig. 2 .
Fig.2.The Component of Talk2Care System for Home-based Older Adults.The VA interface has multi-turn personalized conversations with the older adult to collect related health information.The LLM-powered Question Generation Module is responsible for taking the older adult's words and generating questions for effective information collection.The prompt design of this module is detailed in Figure3.Another LLM-powered Content Loopback Module is to make sure that key information from the older adult (e.g., pain level) is accurate by double-checking the content, a common healthcare communication practice.The older adult's information, conversation protocol, and conversation log are stored in the information database.

Fig. 4 .
Fig. 4. The Component of Talk2Care System for Healthcare Providers.The information dashboard summarizes and highlights key older adults' information.The main content on the dashboard is generated by three LLM-powered modules: (1) The Content Summary Module formats the conversation log and user information into a clinical note structure.(2) The Information Highlight Module color-codes the parts in the conversation log that require attention.(3) The Risk Prediction Module suggests the health risk (low, moderate, and high) based on the current conversation log.Providers can take notes or further actions on the dashboard, which are then stored in the information database.

Fig. 5 .
Fig. 5. Prompt Design of Patient-VA Conversation Summary for Healthcare Providers.Similar to Figure3, the input prompt consists of five parts: 1) patient information, 2) conversation protocol, 3) system setting, 4) conversation log, and 5) summary optimization.For multi-turn conversation, 5) will be repeated for each round of conversation.This figure continues the example of the protocol of daily care.

Fig. 8 .
Fig. 8. Subjective Ratings on Different Perspectives of The Usability of The VA System.Error bars show the standard deviation.The average SUS score is 75.5±17.1,indicating good usability.

Fig. 9 .
Fig. 9. Dashboard Interface Example for Scenario B: Post-Surgery Follow-up.The example for Scenario A can be found in Figure 6, with the same interface and different scenario-specific content.

Fig. 10 .
Fig. 10.Subjective Ratings on Different Perspectives of The Usability of The Dashboard System Design.The average SUS score is 85.8±9.8,indicating very good usability.
(a) What do you think about it?(b) What do you like/dislike about it?(c) What other features would you like or dislike?Closing Question (1) Do you have any questions for us about the study?B APPENDIX: CONVERSATION HISTORY B.1 Post-Surgery Example of Conversation Log Between The Older Adult and Our Voice Assistant

Table 1 .
Demographics of Older Adult Participants

Table 2 .
Demographics of Healthcare Provider Participants Voice AssistantHello!I'm Talk2Care.How are you feeling after your surgery?Any discomfort or concerns?Patient I'm feeling good overall, thanks for asking.But I have a little pain.Voice Assistant I'm glad to hear that you're feeling good overall.I'm sorry to hear about the pain.On a scale of 1 to 10, how would you rate your pain?Patient I'm not sure.Can you give some examples of how to rate my pain?Voice Assistant On a scale of 1 to 10, with 1 being no pain and 10 being the worst pain imaginable, how would you rate your current pain level?Patient I would probably rate 2. I think it's not too severe.On a scale of 1 to 10, with 1 being mild and 10 being severe, you rate your discomfort as a 2. Is that correct?Voice Assistant Thank you for letting me know.Is the pain more towards the mild side or is it more severe?Patient I would say it's more mild, and actually I'm not sure about my painkiller.I have 2 different kinds, but I kind of want to ask my doctor which one I should use.Assistant I understand that you have two different painkillers.Would you like me to pass this information to the doctor and have them advise you on which one to use? Patient Yes, please.And also I think one of them is aspirin and the other one is like a paste that I can put on my skin.Maybe you can ask my doctor about that?Voice Assistant Thank you for letting me know about the medications.I'll make sure to pass that information along to your doctor.Is there anything else I can help you with?Daily Care Example of Conversation Log Between The Older Adult and Our Voice Assistant Patient I have like cold or flu symptoms, I took a covid home test it was negative, but I, I'm not, I'm worried that I might have a false negative and then I might be having covid.I, I just don't know should I go to the clinic and get their test or should I stay at home.Voice Assistant I understand that you're worried about the possibility of a false negative.Can you describe your symptoms in detail?Patient I have a fever, I'm coughing a lot, very tired, and I have a headache.Voice Assistant I understand that you're experiencing a fever, coughing, fatigue, and a headache.It's important to consult a healthcare provider for further evaluation.Would you like me to notify your doctor about your symptoms?Patient Yes, please notify my doctor and ask should I be extra worried because of my high blood pressure.Voice Assistant I understand your concern.It's important to monitor your symptoms closely.Would you like me to inform your healthcare provider about your symptoms?Patient Inform my doctor, thank you.Voice Assistant I have informed your doctor about your symptoms.They will contact you directly for further guidance.Is there anything else I can assist you with?