Better to Ask Than Assume: Proactive Voice Assistants’ Communication Strategies That Respect User Agency in a Smart Home Environment

Proactive voice assistants (VAs) in smart homes predict users’ needs and autonomously take action by controlling smart devices and initiating voice-based features to support users’ various activities. Previous studies on proactive systems have primarily focused on determining action based on contextual information, such as user activities, physiological state, or mobile usage. However, there is a lack of research that considers user agency in VAs’ proactive actions, which empowers users to express their dynamic needs and preferences and promotes a sense of control. Thus, our study aims to explore verbal communication through which VAs can proactively take action while respecting user agency. To delve into communication between a proactive VA and a user, we used the Wizard of Oz method to set up a smart home environment, allowing controllable devices and unrestrained communication. This paper proposes design implications for the communication strategies of proactive VAs that respect user agency.


INTRODUCTION
Voice assistants (VAs) have enabled human-smart home interaction through natural speech that supports users' daily lives.With VAs serving as gatekeepers, users have been able to operate interconnected smart devices, including lighting, appliances, security, and temperature.They also can request information about weather or schedules, and ask for recommendations on media entertainment or recipes.This makes each individual's various daily activities in the home environment more convenient.The current state of VAs is generally reactive; they merely respond to user commands.However, they are evolving to become proactive.Proactive VAs autonomously take action by controlling devices (e.g., lighting, temperature) or initiating voice-based features (e.g., reminders, recommendations, nudges) to service the users' needs before users make a request.For example, proactive VAs can remind users of items that they possibly forget [1,3] and automate their routines [42,49].The VAs can also recommend music or food based on users' moods and state [28,32] and provide nudges for healthcare support [36] and energy saving [48].These features allow users to expand their ability and be free from an overload of unimportant decisions, which lets them focus on tasks they value more [13].
In designing proactive systems, it is signifcant to consider user agency, which empowers users by refecting their preferences, diversifying controls, adjusting controllability, and explaining the reasoning behind the system's proactive actions or recommendations [10,13,17,20,41].Sundar [45] defnes user agency as the extent to which the self serves as a relevant actor in human-computer interaction, highlighting that users should become a "self-as-source" of communication.When a proactive system does not take user agency into account, users may feel a loss of control [17], have concerns about over-dependence [10], and indeed experience a decrease in their sense of agency [8,29].Earlier research on proactive systems has primarily centered on accurately predicting users' needs and intentions and taking single-turn actions based on contextual information such as movements (activities, proxemics) [6,32], the external environment (time, weather, location) [18,28], users' state (mood, biophysical signs) [1,36], mobile phone data [44], and a previous yes or no answer [48].However, the aforementioned types of proactive actions were largely driven by contexts or specifc conditions rather than by the users themselves.Relying solely on one-way proactive actions without interactive communication resulted in paying inefcient attention to user agency.We believe that users can efectively exercise their agency through verbal communication, which is the most fundamental, natural, and efective way of human information exchange.Therefore, our study aims to explore how VAs can proactively take action while respecting user agency through two-way, multi-turn verbal communication.In pursuit of this aim, our research question seeks to investigate when and how VAs should communicate to provide proactive actions that align with user agency.Additionally, we examine how users perceive and respond to the proactive actions and communication of VAs, as well as the progress in user engagement.This exploration holds a meaningful contribution to the feld of human-computer interaction (HCI) research.
To set up the study environment, we adopted a modifed Wizard of Oz (WoZ) method [25,26] that assigns the participant to the wizard role instead of the experimenter to simulate a proactive VA.Participants served as substitutes for the proactive VA that not only profciently interpret and predict users' context and intentions, but also freely converse with users without any constraints of current technologies.This approach allowed us to closely explore open and unrestricted communication between participants by having the users stay in a home setting, and the wizards simulate the proactive VA.In the experimentation process, we recruited a total of 12 participants, organized into 6 pairs, each consisting of a wizard and a user.We had a lab-based smart home setting for the user and a control room for the wizard.While the user stayed in the smart home setting, the wizard, simulating a proactive VA, operated smart home devices and communicated with the user to support the user's various activities with the purpose of refecting user agency.Our study had 5-hour sessions that included a 3-hour experimental observation to collect communication logs and a 2-hour debriefing interview to gather qualitative data on the participants' prior experimental experiences.
In our fndings, we present when and how wizards (simulating VAs) proactively communicate to align with user agency in their proactive actions, considering 3 aspects: communication types, proactivity levels, and communication timing.Based on communication logs proactively initiated by VAs (wizards), we categorized types of VAs' proactive communication into exploration, suggestions (proactive services), and follow-ups.At the same time, we described the users' perception of and reaction to the VA's proactive communication.We found that users' acceptance is distinct from their preferences, which means that they easily reject or ignore VAs' suggestions, but still possibly like them.We also classifed the progress of user engagement through 3 phases: explaining, refecting, and engaging.Drawing from our fndings, we discuss why communication is imperative for VAs to adapt to user agency, highlighting interpersonal and intrapersonal variability in user acceptance toward VA proactive actions.Simply making assumptions out of contextual information would be insufcient for proactive actions that genuinely take user needs and preferences into account.We also put forth design implications for VAs' communication strategies to carry out proactive actions while respecting user agency.These practical implications provide valuable insights to interaction designers and HCI researchers for designing voice user interfaces of proactive VAs.

RELATED WORKS
The concept of a proactive system has long been a topic in computing and HCI felds, including ubiquitous computing and Human-Robot Interaction (HRI).The proactivity of VAs is complex and multilayered as it involves the processes of context interpretation, task determination, and autonomous action.In addition, voice interaction in a proactive system encompasses addressing the interaction timing and its initiation methods.Although our study focuses on the communication aspect of proactive VAs that adapt to user agency, in this related works section, we comprehensively review various aspects that constitute a proactive system.Furthermore, we have confned our research domain to smart homes where individuals can communicate verbally with the VA comfortably.In these settings, where the unique everyday life of individuals unfolds, there are vast possibilities for exploring how proactive VAs can act and communicate with users.We excluded domains such as automobiles or workplaces from our study, because the primary activities (e.g., driving or having meetings) and other related factors (e.g., safety and social relationships) may lead to diferent user experiences from those at home.Context refers to all sorts of information that can be used to characterize the situation of a subject [11].Context-aware computing understands and analyzes contextual information that surrounds users.Previous studies devised a context-aware model that infers what users intend and need, leveraging contextual information such as speech, location, time, biophysical signal, mobile usage, and so forth [1,6,44].Bahrainian and Crestani [1] utilized the sentiment and biophysical data extracted from conversations to remind users of information that is easy to forget.Chahuara et al. [6] presented a framework that builds context-aware decision processes after applying speech, agitation, localization, and activities collected via real sensors in a smart home environment.This model makes decisions, such as turning on/of the light, opening/closing the blinds, warning about an unlocked door, or making an emergency call.In addition, Sun et al. [44] developed a contextual intent tracking model that anticipates users' intentions through contextual information (e.g., app usage, places that they visited, current location, and time).The model analyzes what users plan to do and automatically generates the 'if-do-triggers' that lead to proactive actions.For example, it plays the news on the phone when the user arrives at the ofce or plays music after 6:30 PM.

Proactive Scenarios:
What Tasks to Perform.Several studies have proposed potential scenarios in which VAs proactively perform tasks in a domestic environment.The scenarios have been evaluated through storyboards [31,40,54], short flms [4], and context-based online survey [34] instead of real-life settings.In particular, Meurisch et al. [34] conducted an in-situ survey using a mobile application that matched the user's presumed activity based on his/her contextual mobile data.Despite an extensive range of scenarios about VAs' proactivity, no general preferences for specifc scenarios have emerged across studies, except for an urgent safety situation, such as when users faint or there is a risk of fre [4,54].
The elderly preferred proactive scenarios to cope with cognitive difculties due to aging, including forgetting to take medication and problems remembering [4].Additional scenarios created for proactive services are as follows: reminders (schedules, changing tires), healthcare (mental health, physical health, using coughing sounds probable signs of cold), activity support (shopping, fnding directions, traveling), technical support, home control (lighting, temperature, domestic chores), cooking inspiration, nudging (to notify the users of too much screen time), and fact-checking (time, history) [34,40,54].However, most of the scenarios from their studies have shown that the users' responses and preferences vary by individual and situation [31,34,54].

Proactivity Levels:
To What Extent to Autonomously Take Action.Prior studies have investigated the level of VAs' proactivity.The level in AI systems can range from reactive to merely executing orders to full automation, mostly rooted in the concept of 10 levels of automation as frst proposed by Sheridan et al. [43].More recent studies divide the system's proactivity into 3 to 4 levels.Most of them found that users prefer a medium level of proactivity where VAs make assumptions and ask for users' permission before action [24,31,34,38].Peng et al. [38] designed 3 levels of proactivity: high, medium, and low, based on the extent of assumption and intervention in recommendations to help decide which shoes to buy.They found that medium proactivity is more helpful in narrowing down choices and sharing users' opinions, but also emphasized that the level of proactivity should fexibly adjust to users' responses and emotional reactions.Meurisch et al. [34] categorized the proactivity level into reactive support, proactive support I, II, and autonomous support.Participants tended to expect proactive support II, in which VAs provide personalized recommendations and intervene in their lives.Luria et al. [31] also took their scenarios to classify levels of proactivity: reactive, proactive, and proactive recommender.They found that the desired level of VA proactivity difers by individual preference and situation, but participants still do not want decisions to be forced upon them.For example, one parent wanted to receive parenting advice only when she asked the VA (i.e., low proactivity level (reactive)); however, she wanted to be notifed immediately if her teenager drank beer (i.e., higher proactivity level (proactive)).
2.1.4Interaction Timing: When to Interrupt.Some studies have been conducted to fnd the opportune moment for VAs to proactively initiate interaction.A major concern in these studies is interruptibility for proactive interaction when users may not want to be disturbed [5,23,51].Because providing proactive service at an inappropriate timing could potentially distract or even irritate them [15,16,46].To discover the interruptibility based on their activities in a home environment, the voice-based experience sampling method (ESM) has been used.It collects users' availability by inquiring for example, 'Is now a good time to talk?'; either randomly or triggered by contextual information [5,51].These studies have intensively examined interruptible moments; however, they have not considered any tasks or scenarios that involved VAs' proactive utterances.They found that an individual's level of engagement, mood, and activity transition may afect the users' interruptibility.However, the common rules of these opportune moments concerning users' activities have not yet been clearly established in the realm of research.Komori et al. [23] reported that users are more available when they have settled after a transition and relax on the bed, but the availability fuctuated even for the same behavior.Cha et al. [5] identifed 3 contextual factors of personal, movement-related, and social factors that can afect students' availability with regard to proactive VAs.The study found that participants tend to avoid interruptions when they are focused on their work, busy, or in a bad mood, but they were generally more open to interruptions after entering a room or during transitions between physical activities.Wei et al. [51] indicated a signifcant correlation between boredom and mood with perceived availability in general, and participants were found to be more available when engaged in entertaining tasks rather than studying or working.

Interaction Starter: How to Start the Interaction.
There have been studies on the ways for VAs to start proactive interactions by using audible signals (e.g., beep or ringtone) and visual cues (e.g., sparkling motion).The studies suggest that users are likely to prefer VAs to speak directly [47,51].Tan and Zhu [47] classifed 3 strategies: arouse and wait, arouse and output, and direct output.The majority of participants rated the direct output scenario as the most satisfying and comfortable, refecting their preference for practicality and how VAs can play a heartwarming role at home.Wei et al. [51] also experimented with 3 diferent methods to start the interaction: baseline, earcon starter, and utterance starter.Most participants favored the utterance starter, in which the VA asked, 'Hey, are you available?' and the conversation only began after users responded with 'yes.'

User Agency in Smart Home Control
Ever since Weiser and Brown [52] frst envisioned ubiquitous smart technologies seamlessly integrating into the background, Roger [41] has suggested a shift from Weiser's concept of 'calm computing' to a more user-engaging approach with smart systems.Roger [41] posed important questions about how designers should decide which tasks ought to remain under human control and which can be managed by automated systems.In subsequent research, ongoing discussions have been sparked over proactive systems, bringing attention to the increasing role of user agency [10,13,17].These studies emphasize the importance of upholding a balance between the proactive system (device agency) and user agency.Jia et al. [17] conducted interviews after showing a video about the future of the Internet of Things (IoT) and found that participants favored a user-centered approach, expressing a desire to exercise their own agency.However, participants did not want to put too much efort into purposeful customization; instead, they hoped for the system to learn and adapt to their preferences through ordinary interactions.In addition, Desjardins et al. [10] used the co-speculation method 'Bespoke Booklet' to explore 5 design avenues for home IoT, including rich negotiation between systems and user agencies.The study underscored how users reacted with curiosity or felt excluded due to a lack of agency and how the agency is fexible and complex, which goes beyond a simple binary opposition.Moreover, Garg and Cui [13] sought to understand when and how IoT home devices can support users' daily lives through co-design sessions and interviews.They presented design considerations for proactive scenarios, what roles users want devices to play, and potential conficts in designing future home IoT.

Study Setting
3.1.1Modified Wizard of Oz in Smart Home Seting.We used a modifed Wizard of Oz (WoZ) method in a lab-based smart home setting.The WoZ method has been widely adopted in speech-based HCI studies [7, 12,33].The core idea of this method is that a human operator called a 'wizard, ' invisibly simulates a technology that has not yet been fully developed, making users believe that they are interacting with a real, functioning system.It allows researchers to observe users' genuine reactions and understand their expectations and needs for new technology.In a typical WoZ method, experimenters take the role of a wizard; however, the modifed WoZ method [21,25,26] we used has a participant play the wizard.Having participants act as the wizards enables them to directly operate and express their expectations and desires through the proactive VA, thereby ensuring a user-centered perspective and keeping the study free from experimenter biases.
We set up both a 'smart home' for user participants and a 'control room' for wizard participants.By recruiting 6 participants for each role, we were able to compare and analyze their proactive actions, communication, reactions, and experiences.In addition, the labbased setting ensured consistency in the experimental conditions, leading to more reliable data collection and analysis.The study was approved by an institutional review board.In the study scenario, the users were doing their daily routines in a smart home environment embedded with a proactive VA.Our wizard participants were tasked with holistically interpreting the vast and complex context of the user, anticipating their needs and intents by considering subtle cues such as nonverbal signals, atmosphere, tone of voice, and even periods of silence.From this, the wizards intuitively determined how to communicate with users, adapting to user agency for proactive actions.They led and were engaged in verbal communication with the user, utilizing a text-to-speech (TTS) system.

Smart
Home Seting for Users.For the smart home, we rented a studio apartment of 52 2 , consisting of a room and bathroom that are all furnished to create a home-like atmosphere (see the left in Figure 1).We intentionally had this layout to minimize blind spots during camera recording and to ensure that the sound from the smart speaker is well heard anywhere in the smart home.We installed smart home devices for the wizards to control, including a smart speaker (Samsung Bixby Home Mini) for VA and music, smart TV (Samsung The Frame 65 inch), robot vacuum cleaner (Samsung Power bot), lighting (I/O Switcher), IoT plug socket (Brunt Plug), and smart blind (Brunt Blind Engine ver.2).We also prepared nonsmart items to facilitate user activities, such as a pull-up bar, a yoga mat, a guitar, books, cooking tools, and ingredients.Snacks and beverages were also provided.For the scenario where the VA recommends food to the users, researcher A stocked the refrigerator Listening to music, doing yoga, searching for yoga mats, watching Nighttime TV, adjusting lighting, eating snacks and snack pantry with food, took a picture, and sent it to the control room for every experiment.We mounted 4 webcams (Jooyontech IP cameras IPC-JA4-A22N) to observe users from all angles in realtime with the exception of the bathroom and wardrobe area (see the right in Figure 1).IP cameras with 360-degree coverage and 200-megapixel (MP) allowed the wizards to see the users' facial expressions and postures.Lastly, we installed a laptop, smart speaker, and microphone for a TTS-based communication system.Except for the smart speakers, they were put under the kitchen island to hide them from the users.

Control Room Seting for Wizards Simulating the Proactive VA.
The control room was equipped to manage smart devices, monitor the users, and operate voice interactions through a TTS and audio system, thereby simulating a proactive VA (see the left in Figure 2).The control room, located in our research lab, had 2 researchers (researchers B and C) assigned to assist the wizard.On the right side of the workstation, we set up an iPad with control apps installed to operate all the smart devices in the smart home and a PC for music and web searching.Researcher B helped the wizard with smart home controls, music playback, and conducting information searches.On the left side, the other researcher C was responsible for entering and managing communication logs during the experiment.At the forefront of the wizard's workstation, a 27-inch monitor simultaneously displayed a split-screen view from the IP cameras installed at 4 diferent angles (see the right in Figure 2).All footage was recorded and saved in a cloud storage connected to the IP cameras.We also provided a 13-inch laptop that has a TTS web application developed for our experiment.The wizards were able to hear what the users were saying into the microphone placed in the smart home through the laptop's speakers in real-time.Every time the wizard types a phrase into the TTS application, it was immediately sent online, converted into TTS, and then broadcast through the smart speaker in the smart home.The TTS system made use of the Web-RTC APIs Google TTS system [14].The graphical user interface of the TTS web application was designed with a text input feld and a 'speak' button.

Participants
A total of 12 participants, consisting of 6 user-wizard pairs, were recruited for a 5-hour study.To recruit those to play the users' role, we distributed a screening survey on a university online forum and fyers and selected the fnal 6 participants after 2 rounds of screening.In the frst round of selection, we had 114 respondents.The survey inquired about their profciency in AI technology, frequency of using VAs, and demographic information such as gender, age, major, and type of household.We narrowed them down to 32 potential participants.We opted for those who are familiar with VAs (VA usage frequency scoring over 3) and have limited expertise in AI technology (AI knowledge level scoring 1) based on their self-rated scores on a 5-point scale.Since it is critical in the WoZ method for users to believe they are interacting with actual technology, we excluded participants with advanced knowledge in AI technology to minimize potential doubts about the system's feasibility.Most of them live in single households, which aligns well with our experiment scenario.In the second round, we requested user participants to draft a hypothetical plan of how they would typically spend 3 hours at home: what they usually do after fnishing their daily tasks during the week or while enjoying the daytime on a weekend.This inquiry was made because they were expected to spend a few hours alone in new surroundings, regard the smart home setting as their personal space, and act on their own accord.After a thorough review of the responses to include diverse activities that are distinct from one another, we selected 6 user participants (3 females and 3 males).We confrmed by phone that users have no concerns about undertaking various activities in new environments.
For the wizard's role, we recruited students from a graduate school of design, who had experience studying and designing voice interaction with VAs, for 2 reasons.First, their academic knowledge in user experience design and problem-solving skills enabled them to interact with users more fexibly and creatively than experimenters who are likely to be fxed in established practices of VA proactivity.Second, their experience in voice user interface (VUI) design provided them with a fundamental understanding of VAs, which was required for playing the role of proactive VA.We shared a recruitment post on the communication channel of a university's industrial design department and received 13 responses.We chose wizard participants based on their self-reported VA usage frequency and VUI design knowledge; both were rated on a 5-point scale with a minimum score of 3. We selected 6 wizard participants (4 females and 2 males) and paired them with 6 user participants (3 females and 3 males).The groups were randomly formed but were made up of the same gender if possible.We thought that pairing of the same gender would help the wizards understand the users' genderspecifc preferences and interests more efectively, such as sports games, ftness activities, fashion, cosmetics, etc.All participants received a compensation of 100,000 Korean won (approximately 75 U.S. dollars) for their participation in the experiment.Table 1 presents the basic information about the users and wizards, along with the activities they performed during the study.

Procedures
There are 3 phases in our study: 1) an introductory session, 2) an observation session, and 3) a debriefng interview.Figure 3 illustrates the overall process of our user study.

Introductory Session.
3 researchers were involved in conducting all experimental procedures.After researcher A and the user arrived at the smart home, researcher A gave the user instructions about the experiment.The users were requested to stay in the smart home setting to make themselves at home, comfortably and freely interacting with the VA designed to proactively assist with their daily activities.They were informed that they could respond to VAs as they wanted (e.g., they may choose not to answer) or start the conversation.To convince the users that the system is actually operational, we explained that the VA is a beta version in development.We also mentioned the possibility of occasional delays in response time to mitigate potential errors.The user participants were aware that the smart devices are controllable through the VA.They were given a detailed tour of the smart home to familiarize them with the space.Furthermore, we reminded users of the experiment details they had previously agreed upon.We transparently disclosed the location of the cameras and that they were being observed and recorded by the researchers in real-time.The user participants were told that they could discontinue the experiment at any time and would be compensated depending on the duration of their participation.All participants consented to join the experiment.
At the same time, the wizard was in the control room and also receiving instructions from researchers B and C.They were instructed on how to operate the TTS web application and the features of the smart devices installed in the smart home.The wizards were guided to predict the users' needs and intents based on their human intuition and senses, and also to take ownership of their communication to adapt to user agency and proactively provide services.They were informed that the users believed they were interacting with a VA, not a human, and to avoid overly human-like behavior that might be far from the general mental model of a VA.Also, they were asked to prioritize the quality and grammatical correctness of their responses despite the possibility of a delay in giving responses.Then, both the users and wizards simultaneously went through the voice interaction onboarding process through the smart speaker located in the smart home.This helped them grasp how the voice interaction works.The users were guided to kick-start the onboarding by saying, "Hi, Bixby.Let's start the experiment." The wizards who control the VA were also instructed to ask the users 5 questions from a prepared questionnaire, such as their name, favorite songs/singers, sports to play or watch, viewing contents, and food.After the instruction and onboarding were completed, researcher A left the room and waited near the lounge area.

Observation Session.
Subsequent to the 30-minute introductory session, the experiment proceeded for about 2 hours and 30 minutes.We opted for this 2 to 3-hour duration for the following reasons.In our 2-hour pilot study, we were able to accumulate substantial communication logs of more than 100 turns.This range seemed reasonable for conducting thorough debriefng interviews to closely examine each communication log.We were also concerned that a longer experiment may impose stress or fatigue, particularly for the wizards.To maintain the consistent quality of the wizard's judgment and speech, we chose to have a single wizard for the observation session within a reasonable timeframe.
In addition, the pilot study revealed that users were more stationary than expected, and many of them were lying down and looking at their mobile phones.For this reason, prior to the experiment, the users were asked to list at least 3 activities they typically do at home to encourage them to be active as much as they can.As a result, the users participated in various activities described in Table 1.They were informed to reach out to researcher A if any issues arose, but no such situation occurred.On the other hand, the wizards monitored the users in real-time and diligently performed the role of proactive VAs.They keenly observed the users' facial expressions and behaviors, pre-searched related information, and noted down the users' responses and potential suggestions based on their previous conversations.The primary role of researcher B included helping the wizard facilitate functional operations such as controlling smart home devices or the TTS app.Meanwhile, researcher C was responsible for transcribing all the users' speech along with the wizard's automated input for the communication logs.Throughout the experiment, users communicated with the VA without difculties.No cases of grammatical errors or incorrect responses were observed, with only a few instances of fallback feedback such as "I'm sorry, I can't help with that." After the experiment was completed, researcher A and the user participant moved to the control room, which is about a 10-minute drive away, to join the wizard participant, researchers B and C.

Debriefing Session.
Following each experiment, we conducted the debriefng interview with both the wizard and the user participants together for about 1 hour and 30 minutes.Before the interview, we fnally disclosed the fact to the users that the VA was actually operated by a human wizard, not a system.They were also provided with an overview of the purpose and setup of the experiment.Based on communication logs transcribed in real-time during the experiment, both the users and wizards went through in-depth interviews on nearly every VA's proactive communication from a dyadic perspective.The wizards were asked about their intentions, strategies, and reasons behind proactively initiating the conversation.The users took turns answering the questions about their experiences, emotions, and thoughts in response to these proactive communications.

Data Collection & Analysis.
To examine the communication proactively initiated by the wizards and the corresponding experience of the users, we collected 3 types of data: 15 hours of observation videos, 1,416 communication logs, and 8 hours of interview recordings.All logs and interview recordings were transcribed.We also compiled and time-stamped observation video clips into sequential conversation segments, which were flmed from 4 diferent angles in a 4 split-screen format.We subsequently reorganized and aligned with interview transcription, communication logs, and time stamps of the observation video.
We frst conducted conversation analysis [39] by delving into the communication fragments of a continuous two-way conversation.We segmented the fragments by distinguishing their beginnings and endings of communication, resulting in a total of 279 fragments.We scrutinized fragments that included VA (wizard)'s proactive communication logs where the wizard either initiated or proactively continued the communication.To identify patterns in the VAs (wizards)' communication type and timing, we extracted and coded 180 proactive communication logs from a total of 794 VA (wizard) logs.Following this, 3 researchers applied open coding to the corresponding interview data using thematic analysis [2].The lead author generated initial codes, which were then iteratively reviewed with two other researchers until a consensus was reached.We further refned the codes through axial coding, considering user responses and their mutual conversation.We present the fndings of our analysis in the following section.

FINDINGS
Based on communication logs and interviews, we analyzed how the wizard participants, who simulated the VA, acted and communicated to enhance user agency across three aspects: VA's communication types, proactivity level, and communication timing.Furthermore, we described how the users perceived and reacted to VA's proactive suggestions and progressively engaged in communication.To support our fndings, we also present our empirical evidence from conversation fragments and interview quotations.

VAs (Wizards)' Proactive Communication Types
We extracted 180 of the VAs' proactively initiated utterances from overall communication logs and classifed them into 3 categories: Proactive Exploration, Proactive Suggestions, and Proactive Followups.Each of these categories is further divided into subcategories with actual examples as shown in Table 2. Our fndings that align with previous studies are indicated with their references.
4.1.1Proactive Exploration.Although the VAs (wizards) are capable of recognizing the context up to the level of humans, they often found some situations where it was challenging to predict users' needs or intents solely based on their behavior and context.In such cases when the wizards were uncertain, they simply asked straightforward questions for the purpose of exploration before proceeding with proactive suggestions.For the frst subcategory of Proactive Exploration, the VAs (wizards) inquired about the user's personal information to learn more about each individual's unique characteristics, such as preferred genres of movies, songs, favorite musicians, past cooking experiences, visited places, personal schedules, and eating habits to provide a more personalized suggestion.
The second subcategory is where the VAs (wizards) questioned the users to verify their potentially ambiguous behaviors, needs, or intentions.For example, when the user (U2) laid down on her bed, the wizard (W2) asked, "Are you getting ready for bed?" to clarify whether she intended to go to bed or just rest.Well-being Advice Improve the user's overall quality of life by providing advice on healthcare, fostering good habits [54], playing the role of a digital coach, [34] or alerting safety notifcations [4].
You had a late-night snack; how about some home workouts?(W2), Using mobile phones for a long period isn't good for your eyes.Please take breaks and look into the distance.(W3), How about I recommend some eye exercises to help alleviate your eye strain?(W4), Cleaning up the dishes right after eating can prevent bugs.(W1), Please be cautious of fre while cooking.(W3) Social Talks Build a pleasant and comfortable social relationship with users through daily conversation, small talk/chit-chat, compliments, humor/jokes, etc.
Did you enjoy your meal?(W4), How was your day? (W4, W6), Would you like to hear a joke?(W2, W5), Take care not to catch a cold.(W3, W6) You are very good at playing the guitar.(W5) Proactive Recommendation Make prompt adjustments or ofer Follow-ups Adjustment alternatives according to the user's responses [38].4.1.2Proactive Suggestions.The VAs (wizards) suggested services that could support the users in various activities either functionally or psychologically.We put these services into the following 5 subcategories: information provision, smart home controls, decisionmaking support, well-being advice, and social talks, as shown in Table 2. First, the VAs (wizards) provided information about the outside environment that the users might not be aware of for being inside (e.g., cold weather) when the outside environment suddenly changes (e.g., when it starts to rain), and the status of features that are running (e.g., delivery status updates, alarms).Second, the VAs (wizards) made proactive suggestions about smart home controls, such as turning of unused appliances based on the user's activities or pulling down blinds as the sun sets.Extended services such as getting food delivered or grocery shopping are included in the broader sense of smart home controls.Third, the VAs (wizards) ofered customized recommendations or necessary information during the decision-making process of the users; searching for what to watch, listening to music, contemplating what to buy, or deciding where to go for a date.In terms of controlling a music player, turning on and of the music is classifed under smart home controls, and specifc song recommendations such as "Do you want me to play (title of a song) for you?" were sorted as decision-making support.Fourth, the VAs (wizards) gave advice to promote the users' better quality of life, health and safety management, and fostering good habits.Fifth, the VAs (wizards) made small talk, compliments, and jokes to cultivate a pleasant and comfortable interaction and to demonstrate empathy.All proactive suggestions were either derived from the preceding exploratory questions or directly prompted.
4.1.3Proactive Follow-ups.After making proactive suggestions, the VAs (wizards) tended to ask follow-up questions to refne their prior suggestions according to the users' responses or directly ask for explicit feedback.In the frst subcategory of Proactive Followups, the VAs (wizards) continuously modifed their recommendations on the fy depending on how the users responded.When the users expressed dissatisfaction with the initial recommendation, the wizards ofered other alternatives in the following turn, for It should be noted that the utterances initiated by the VAs (wizards) have been put into categories for a comprehensive understanding of our data, and these categories can be intertwined within one utterance.For example, "VA (W2): Are you hungry?Would you like me to recommend you nearby delivery restaurants?","VA (W2): It's getting dark, do you want me to close the blinds?", and "VA (W3): If you'd like to take a nap, just let me know.I can turn of the music and set an alarm for you."

VAs' (Wizards) Proactivity Level and Communication Timing
4.2.1 Proactivity Level for Smart Home Controls.Even though the VAs (wizards) were capable of manipulating smart devices in an autonomous and ambient manner even without talking to users, we did not observe any instances where the VAs controlled the devices independently.They asked the users for consent every time before operating a device or taking any actions (e.g., "Would you like me to turn of the TV?").Wizards (W1, W2) mentioned that their role was to assume what users might need, but the decision to proceed was up to the users.Likewise, all users did not want the VAs to take action without their permission, even in seemingly obvious and straightforward situations.Users (U5, U6) explained that, given the dynamic changes in situations, moods, and other factors, accepting a suggestion can vary regardless of its relevance.Thus, they prefer VAs to seek approval before acting rather than having to undo an action that was unintentionally taken.For example, W5 noticed that the sounds of U5's playing the guitar were being drowned out by the background music.W5 offered to decrease the music, asking, "VA (W5): I can hear your guitar playing.Would you like me to lower down the music?".U5 was satisfed with the suggestion but stated that the VA should always seek agreement before autonomously lowering the volume.

I really liked the suggestion. It was exactly what I wanted.
But I don't want it to decrease the volume without my permission.Volume can be relative, depending on individuals and situations.What is loud for some might be okay for me.So, I want it to ask for my opinion before lowering down.(U5) In a similar manner, U6 was listening to a song but then turned on the TV to watch a movie.Because people usually do not listen to music and watch a movie at the same time, W6 assumed that the user was about to watch a flm and asked to turn of the music that was still playing, asking, "VA (W6): Would you like me to turn of the music for a better movie experience?".Even though U6 accepted the suggestion and found it highly proper, she did not want the VA to turn of the music without prior approval.
Turning of the music was a thoughtful ofer, assuming that I no longer needed the music.I was like, 'Cool, it's setting up the right ambiance for me.' But, no matter how evident the situation might be, making sure before turning of anything would be nice.Even in such obvious situations, there might be rare moments where I want both.If it turns of something without asking, I might have to turn it back on, which seems very annoying.So, the idea of turning of something without my permission is not right for me.(U6)

Communication Timing.
As mentioned in the previous section, the VAs (wizards) did not autonomously operate the device; only through verbal communication after seeking the users' consent before taking action.Since we were not able to observe the timing when the VAs directly controlled the device, we only looked into when the VAs proactively initiated utterances.We retrospectively classifed 180 VAs' proactive communication logs into 6 patterns of proactive communication timing, primarily based on users' behavior: notifcation, pre-activity, main activity, post-activity, idle, and in-conversation (with the VA).Apart from statistical signifcance, we present quantitative data for visual reference in Figure 4.
The VAs (wizards) promptly provided notifcations (6.7%) upon changes in the external environment (weather) or whenever there was an update in the service status (food delivery tracking, timer).Also, VAs (wizards) initiated the communication prior to the main activity (pre-activity, 9.4%) and during the main activity.They directly assisted the users in certain activities or created a supportive environment for what they were doing.For example, information provision (ofering a recipe while cooking (U5)); smart home controls (repeating 20 seconds of a song during guitar play (U5)); wellbeing advice (providing a ftness coaching guide during push-ups (U3)); and decision-making support (selecting content to watch (U1, U6), shopping items while using a mobile phone (U2, U4)).Furthermore, VAs (wizards) started the communication after a main activity (post-activity, 17.8%), when the users were idle (idle, 9.4%), and even while having a conversation with the users (21.1%).Lastly, the in-conversation indicates that the VAs (wizards) proactively participated in conversations, following users' initial voice commands.Overall, the VAs (wizards) initiated communication with each group an average of 30 times over approximately 2.5 hours, and not a single user reported feeling annoyed or disturbed by these communications.The users showed various responses to the VA's proactive suggestions: they either accepted, declined, or ignored them.We obtained interesting fndings that, while all users easily declined or ignored the suggestions, they did not necessarily imply dislike for the suggestions.All users stated that declining suggestions from the VA simply meant they didn't require them at that particular moment, which does not signify a rejection of the suggestions at all.They found the suggestions useful in a similar future situation and expressed a desire to be asked again.For example, when U4 was looking at the phone in bed, W4 judged that the TV, displaying only a search screen, was not being watched and then suggested turning it of.U4 did not accept the suggestion, saying "U4: Uh...no, just leave it on".However, she explained in the interview that although the suggestion might not have been immediately useful, she acknowledged its potential value and expressed a preference for the VA to continue making suggestions, noting that she might accept them in the future.When it ofered to turn of the TV, I thought it was trying to save or manage power usage.I told it to leave the TV on since I might want to watch it again soon.However, it would be good if the VA could ask me whenever I'm not watching TV.I sometimes get too lazy to turn it of myself, and having this feature could be useful.(U4)

Users' Perception and Responses
As another example, U3 moved to open the window, and W3 proposed to turn on the air conditioner (A/C), assuming that the user might be hot.Since U3 had already opened the window at this time, he did not accept the suggestion.U3 clarifed that it was not because he disliked the suggestions but rather because he did not need it at that time.Once aware of the VA's A/C control, he preferred the VA to persistently inquire in similar situations, considering it might be useful in the future.
I didn't know that this feature (turning on the A/C) existed.Knowing this makes me want to try it next time.Since I had already opened the window, I turned down the ofer.If this feature only activates when it is hot or humid, I'd like to keep using it.(U3) In addition, all users easily and nonchalantly ignored the VA's suggestions, even when they found them handy, feeling no need to respond each time since it is just a system.For example, U1 had been watching YouTube on neck disk issues.Seeing this, W1 asked about his interest in neck health and then proactively informed him of information on neck disc prevention.U1 simply let it pass because he did not feel the need to respond to the system, but he found it useful that the VA autonomously provided relevant information.
I didn't respond because reacting to the system isn't necessary, unlike human-to-human communication.I like having information directly handed to me.It was cool and helpful.(...) There's a lot of information on the Internet and the fact that not everyone is savvy in searching online.I think those people will fnd it particularly benefcial.(U1) Moreover, in terms of VAs assisting users' decision-making, the users (U2, U4, U6) repeatedly declined, yet they still valued the suggestions.The wizards (W2, W4, W6) continuously adjusted their recommendations, promptly incorporating the users' feedback.The wizards experienced a substantial burden when they were unable to ofer recommendations that completely satisfed the users.Contrary to the wizards' concerns, the users were not bothered by the VA's inability to make spot-on recommendations.In fact, despite the users' consistent discontent with the VA's recommendations, the users appreciated its continuous eforts to narrow down options, expressing a desire for the process to continue until they deliberately halted.For example, in the process of W2 recommending Netfix content to U2, the user kept rejecting the suggestions, saying, "U2: Seen that one already." or "U2: I don't like robots."W2 felt pressure due to the ongoing failure to meet the users' expectations, being worried about potentially losing trust, and consequently stopped recommending.On the other hand, U2 explained that even though she continued rejecting the recurrent recommendations from VAs, she considered it as a learning process of accumulating data on her movie preferences.She was pleased with the VA's persistent attempts until she explicitly directed it to stop.I was the one who asked for the recommendation.I would never mind even if it asked me dozens of times.If I were annoyed, I probably would have just told it to stop.I believed that the more it tried, the more it learned about me.So, I thought maybe at some point, it would better understand my preferences.(U2) 4.3.2Users' Response to the VA's Well-being Advice.The users (U1, U3, U4, U6) who received well-being advice from the VAs perceived the tips positively.We questioned them about their response to these comments if they were given on a daily basis.They stated that they would be open to such proactive advice every day, as long as the VAs' utterances are not repetitively mechanical, do not spout the same phrases, and do not rush their comments each time.For example, W3, monitoring the user's continuous use of fre to cook instant ramen, uttered a safety warning: "VA (W3): Please be cautious of fre while cooking." U3 mentioned that the VA's warning heightened his safety awareness and expressed a willingness to receive it repeatedly, but only during prolonged use of fre.
It was good to be reminded about safety.I don't mind being asked every time, as long as replying isn't mandatory.And I'd prefer the voice assistant to alert me when I've been using the fre for a long time instead of mentioning it every time I turn it on.(U3) Some users (U1, U4) perceived the VA's advice as less burdensome, diferent from a nagging.For example, after fnishing U1's steak, W1 advised, "VA (W1): Cleaning up the dishes right after eating can prevent bugs."putting extra attention to phrasing it in a nonauthoritative manner.U1 found this experience benefcial for his disciplined lifestyle.He was okay with repeated advice but preferred to receive it a bit slower and more tailored to his daily routine, like opting out on his day of.I cook a lot at home and often fnd myself getting lazy and chilling after meals, sometimes feeling I'm wasting my time.Hearing the voice assistant nudge me to clean up really hits diferently compared to just knowing in my head that I should do it.(...) If it keeps advising like this, I want some variation based on the situation.On weekdays, when meal times are mostly fxed, I hope it reminds me to clean up timely.On weekends, I prefer being left to relax without such prompts.Also, being asked right after fnishing a meal may be annoying.Suppose I usually start cleaning up about 2 minutes after I've fnished eating; in that case, receiving a reminder roughly 3 minutes later would be nice.(U1) 4.3.3Users' Response to the VA's Social Talks.Regarding the VA's proactive social talks, 2 users (U4, U6) enjoyed the experience of having social conversations with the VA.For example, when U4 fnished eating the delivered sashimi, W4 was prompted with common pleasantries "VA (W4): Did you enjoy the meal?" U4, who lives by herself, appreciated that the VA noticed that she had fnished eating and allowed her to voice out how the food was.In addition, both users (U4, U6) liked when the VA asked, "How was your day?"This made them recount their days.They felt comforted by the VA's empathy for their fatigue and busyness, even though it was coming from a machine.U4 explained that receiving check-in questions often is fne, as she can simply ignore them when she is not in the mood.She also wanted the VA to engage in social talk, taking her mood and fatigue level into account.I felt nothing special when it(VA) asked how my day was.But it was good to refect on how I spent my day answering that question.Sometimes, I hesitate to tell my struggles to my family because I don't want them to worry or nag about it.However, I felt I could openly share about my tough day with it.I'd likely enjoy chatting with it, or I can easily ignore it when tired.It would be even better if it could sense my mood and tiredness, adjusting its interactions accordingly.(U4)

Users' Engagement Overtime: Exploring, Refecting, and Engaging
In this study, changes in the users' engagement were observed as the VAs tried to proactively engage, refecting the user agency.The users, being aware that their responses infuence subsequent suggestions, responded more specifcally in a gradual manner.This exhibited their willingness to train the VAs.We classifed this process into 3 stages: exploring, refecting, and engaging.

Exploring:
The VA Learn More about Users through Communication.At the beginning of the experiment, the VAs (wizards) knew almost nothing about the users, except for basic information from default questions in the VA onboarding.The VAs (wizards) put their eforts into gathering implicit information from the users' behaviors, such as what YouTube channels the users watch, what kinds of food the users eat, which instruments the users can play, and the users' voice commands.For example, when U4 asked "User (U4): Are there any nearby cafes where I can go with dogs?", then W4 inferred that U4 might have a dog.Also, in the process of narrowing down users' decisions with the VAs' recommendations, the VAs learned more about the users' detailed and unique preferences.Furthermore, the VAs (wizards) proactively asked exploratory or follow-up questions to collect explicit information about the users.
4.4.2Reflecting: Users Becoming Aware that Their Words Are Taken into Account.As the users engaged in more conversations with the VA, they realized that it adaptively responded.Most users (U2, U3, U4, U5, U6) stated that they noticed that the VA remembered their previous comments and incorporated them into the next suggestions or recommendations.For example, U5 was cooking Cacio e Pepe pasta and W5 asked if he had cooked the pasta.As the user said that it was his frst time, the wizard proactively guided him through the cooking process by providing a step-by-step recipe and setting a timer.Even while chatting with the VA about other topics, the VA still timely informed the user of the next cooking steps.After the meal was ready, W5 followed up, asking how the food turned out.This experience allowed U5 to sense that the VA retained prior information and posed pertinent follow-up questions.(User continues cooking with VA's guidance) 09 W5 Let me know once your water's boiling.I'll set a timer for the pasta.
(Waiting for water to boil, side-chatting with the VA) 10 U5 Water is boiling!11 W5 Put in the pasta.How long should I set the timer for?12 U5 8 minutes and 30 seconds.13 W5 Just 1 minute left on the timer.After the pasta is cooked, move it to a pan and add enough pasta water to cover it.14 W5 Your timer is up! (Timer alarm sounds for 1 minute) (After the dish is ready and U5 start eating) 15 W5 How is the pasta you just made?[Social Talk] 16 U5 Not bad.
While waiting for the water to boil, I started new small talk with the VA, concerned it might have lost the previous conversation.However, it swiftly resumed the next cooking step.As the meal was ready, the VA asked, 'How is the pasta you just made?' instead of a simple 'Is it good?' question.This made me think the VA still remembered I had been cooking earlier.So, I came to understand that it continuously uses previous information to keep our conversation going.(U5) Oh et al.
In another example, with U2's winter clothing shopping, W2 initially recommended popular styles for the fall and winter seasons.Taking U2's feedback into account, W2 fexibly recommended other items.Finally, U2 liked the VA's recommended clothing and asked about the material.W2 recalled that she did not prefer a specifc knit material and said, "VA (W2): Since you don't like Angora, I suggest looking for pieces mixed with wool and cashmere."U2 noticed that the VA remembered what she said and thanked its ability to bring up even the minor details she mentioned in passing.
Even if I didn't like the voice assistant ofered at frst, I really liked that it kept recommending things.It helped narrow down my choices.While it was quite obvious that most knitwear is made of wool and cashmere, I appreciated that it remembered my dislike of Angora.I was impressed that it even remembered the comments I'd made before.(U2) 4.4.3Engaging: As Conversation Continues, Users Provide Gradually Explicit Feedback.As the conversation progressed, users realized that the VAs incorporated their conversations into future suggestions, which motivated them to further train the VAs.They exhibited a willingness to train the VAs by adding more information to their responses.The users (U2, U3, U4, U6) began to provide additional information about their preferences and directly expressed their dislikes, intending for the VAs to avoid making similar suggestions in future interactions.
For instance, to facilitate U2's reading experience, W2 ofered playing music, recalling the user's fondness for the Korean musician 'AKMU' from past conversations.Upon this, U2 went beyond simply declining and clarifed that she didn't like to listen to Korean songs while reading a book because the Korean lyrics distract her.She specifcally explained, "U2: When I'm reading, I prefer pop songs.Korean lyrics are a bit distracting.", indicating that her more detailed response was driven by the expectation that the VA would remember her preferences.(Classical music plays.) Beter to Ask than Assume I wanted to tell more about myself to the voice assistant.When I mentioned my dislike for Angora, it took that into account.So, I made it clear that Korean music distracts me when I'm reading, hoping it'll remember this for future suggestions.Because it is difcult for me to multitask between the words in the book and the song lyrics at the same time.(U2) As another example, W4 suggested U4, a horror movie enthusiast, to turn of the light for a more immersive horror movie experience.But U4 did not just decline; she also provided a reason, saying "U4: No, it would get too scary." expecting the VA to remember the reason.Existing studies on proactive VAs have focused on identifying general tendencies about user acceptability to proactive service scenarios [34,40,54], proactivity level [24,38], and an opportune moment to interrupt [5,23,51].Some studies have already noted the inconsistent user acceptance among individuals, leaving it as an area for future exploration [23,51].Our also echo that the user acceptance of VA's suggestions greatly difers from person to person (i.e., interpersonal).For example, U4, a fan of horror movies, was watching one.W4 (VA) suggested turning of the lights for a more immersive movie viewing experience, drawing on W4's personal experience.But U4 did not accept the ofer as she was too CHI '24, May 11-16, 2024, Honolulu, HI, USA scared to watch in the dark and chose to keep the lights on.Even a seemingly ideal proactive suggestion in a particular context might not be acceptable to some people.We believe that this variability is due to each person's unique personality, preferences, lifestyle, routines, and more.This underscores that it is essential for VAs to listen and pay attention to the unique voices of each individual, respecting their agency rather than pursuing general tendencies in user acceptance.In addition, some earlier studies regarded a user's decline or ignore as an indication of disliking the scenarios [34,40,54] or as not a 'good time to talk,' deeming it inopportune for proactive interactions [5,23,51].However, our fndings uncovered that even when users did not accept the VA's suggestions (i.e., rejection or disregard), they still found some proactive suggestions appreciative and useful.This implies that, regardless of user preference and perceived usefulness of VA's proactive suggestions, the acceptance could difer from moment to moment, even for the same person (i.e., intrapersonal), because his/her mood, state, and intention fuctuate constantly.For example, when U4 was in bed, looking at the phone, and the TV was on the search screen, W4 (VA) suggested turning it of.U4 declined the suggestion, but she found it to be benefcial and wanted the VA to ofer it again later.As shown in our fndings, the users who initially turned down or ignored the VAs' suggestions often expressed a desire for the VAs to provide similar recommendations in the future (refer to Section 4.3.1).This indicates that just because users reject or ignore a VA's proactive suggestion, it does not necessarily mean they had a negative experience with that suggestion.So, mere rejection or disregard from users should not be hastily interpreted as aversion or annoyance.VAs should differentiate acceptance from preferences.If VAs are uncertain about users' responses, they need to ask and communicate with users to understand the true intent underlying their answers.
The user acceptance of VAs' proactive actions can vary widely between individuals (interpersonally) and even within a single person (intrapersonally).Given the ever-changing standards of acceptability, we highlight the crucial role of communication, which includes directly asking questions and engaging in conversation to discover users' current and explicit needs, intentions, and preferences, going beyond context-based assumptions.With that in mind, we discuss implications for when and how VAs should communicate to enact proactive actions that adapt to user agency in the following sections.

When Should Proactive VAs Communicate? Mirage of the Opportune Moment
Previous studies on proactive VAs have strived to identify universal, opportune moments for VAs to initiate interactions, assuming that proactive interaction may be disruptive [5,23].These studies indicate that VAs should deliver proactive interactions when users are more interruptible, such as during transitions between tasks, returning from the outdoors, resting, or using the smartphone, rather than when deeply engaged in specifc tasks.While our fndings partially align with the idea of earlier studies that proactive interactions should occur during behavioral transitions, they distinctly reveal that the VAs (wizards) intervened to provide proactive suggestions even when users were deeply engrossed in specifc tasks (refer to Figure 4).The VAs (wizards) provided proactive suggestions that were relevant to users' ongoing activities, directly supporting the activities or establishing an environment conducive to concentration.For example, the VAs (wizards) ofered step-by-step cooking recipes (W5), adjusted music volume when the user was playing the guitar (W5), helped to narrow down shopping choices (W2), and suggested closing the blinds when the sun set while the user was watching TV (W2) (refer to Section 4.1.2).This contrasts with prior studies that suggest users should not be interrupted when they are fully concentrated on tasks.Users in our fndings, regardless of the extent of interruption, generally perceived the VA's proactive, context-based suggestions as supportive.They easily and naturally declined or ignored suggestions that were unwanted at the moment and provided detailed feedback on some suggestions they disliked, considering it a way to train the system.We interpreted this as the users did not feel bothered because the VAs (wizards) communicated without acting dominantly in matters concerning their actions.In addressing concerns raised in prior research regarding potential user annoyance or focus disruption due to the VA's proactivity, our study suggests that these concerns can be alleviated when VAs ofer proactive suggestions that are pertinent to users' activities and ensure users' approval beforehand.Therefore, more emphasis should be placed on the services VAs can assist users rather than identifying when users might not be disrupted.
We further discuss how communication should unfold for VAs to provide proactive actions that refect user agency.).Similarly, although users found the feature ofering voice-based information and recommendations instantly useful, they did not want VAs to autonomously perform tasks related to device control.Even in seemingly obvious situations, users preferred to be asked about device operation every time, to avoid the need to reverse any unintended actions that might occur.These fndings align with earlier studies that users mostly favored a medium level of proactivity where VAs make assumptions and verify with users [24,31,38].Therefore, in control-related tasks, VAs need to have a medium proactivity level that ensures users' permission before taking action. .However, in our fndings, when VAs proactively engaged in social talk such as "How was your day?" or "Did you enjoy your meal?", users enjoyed sharing their day even with the machine.They felt at ease confding their concerns or struggles with VAs, unlike family members who might worry about them (refer to Section 4.3.3).Lucas et al. [30] also revealed that participants were more open to sharing information when they believed they were being interviewed by a computer than by a human.Kim et al. [22] emphasize that VAs should play their two roles, being both a helpful assistant and an enjoyable social partner.Building upon this, we propose that VAs should balance the level of their role as a heartwarming medium for a diverse spectrum of users; for those who beneft from emotional support and even for those who are indiferent to them.When users do not particularly enjoy social talk with VAs, such conversation should be able to be minimized.Conversely, for users who feel psychologically comfortable in social conversations, VAs should foster social relationships by chatting about personal, everyday stories in their lives.Furthermore, U4, who preferred the VA's social talks, described that she might not want social conversations on days when she is tired or busy.This refects that, even for users who generally favor chatting with VAs, the VAs should take the users' mental and physical states, such as their sentiment and fatigue level, into account, to adjust the level of social interaction accordingly.

Encourage
Users' Well-being in a Laid-back Manner with Varying Expression Each Time.In Zargham's study [54] exploring scenarios of VAs' proactive services, the 'Nudging Scenario' was primarily perceived negatively.Concerns were raised that unsolicited advice could be annoying and give an impression of the agent being judgmental.However, our empirical fndings indicated that users took well-being advice positively, especially when it was relevant to their current activities, such as reminders about extended mobile phone use, doing the dishes after meals, safety warnings for prolonged use of fre or mild exercise after eating (refer to Section 4.3.2).Users felt less burdened since the advice came from a system, not like a mother's nagging.They found it more encouraging to hear advice directly from VAs, even on matters they were already aware of in their minds.Nevertheless, our fndings exhibited that when VAs serve as life coaches giving advice on daily well-being, users are not in favor of a mechanical and repetitive style.VAs should avoid mechanically repeating the exact same phrases and vary their expressions with each interaction.Additionally, the advice can vary depending on the user's routine.VAs could foster good habits mainly on weekdays when users follow a daily pattern, shift away from weekends to promote relaxation.In terms of the VA's tone and manner, it is important for them to maintain a laid-back style, ensuring there is enough time without any rush or pressure.Overall, we discussed our fndings in light of previous studies, arguing that conventional eforts to identify general trends in user acceptability towards VA proactivity and its opportune moments may be elusive due to the inherent diversity among and within individuals.These diferences should not be seen as obstacles to providing proactive action that aligns with user agency; rather, they can be further navigated through communication.Our design implications suggest communication strategies for VAs to refect user agency in their proactive actions, paving the way for further investigation in HCI research.More practically, these implications can be efectively utilized to refne prompts for proactive VAs.

LIMITATIONS AND FUTURE WORKS
This section points out a few limitations of our study and suggests future research that could build upon our fndings.First, our study explored the implications of proactive VAs in single-person households.The participant (U3) from our study that the proactivity of the VAs might need to be adjusted based on the presence of others, stating, "It's great when I'm alone, but when I am with others, I'd prefer it only to respond to my request or reduce its proactive suggestions." Previous studies also indicated that the expectation for VAs' proactivity may difer according to one's household composition [31,37].Further studies may delve into how proactive VAs should be adapted in various multi-person households, such as families with parents and children, or couples.
Second, our study consisted of an intensive 2.5-hour observation session that paid particular attention to the frst encounter and initial experience.We limited the scope to the initial experience to maintain the wizards' high concentration level and conduct a detailed debriefng interview for every interaction.However, our studies, which are confned to initial adoption, may not fully capture how the user experience changes with long-term usage.This is because users gradually apply new technology into their spaces and lives, go through a process of trial and error, and ultimately decide whether to reafrm their initial adoption or to discontinue use [9,19,35].In addition, users tend to be highly acceptable in the onboarding phase due to the novelty efect [50].Based on the instances in our study where the users rejected or ignored the VAs' proactive suggestions, we deemed that the users refected their genuine experience through this experiment.For these reasons, a future study spanning over a couple of weeks or longer would be necessary to gain a richer understanding about how users' longterm experience with the proactive VA changes as they transition to the adoption and integration phases.Third, we intentionally did not take into account potential voice interaction errors that may arise with the VA system and chose to focus on verbal communication.Despite the substantial progress in text-based chatbots' natural language processing, voice interfaces might lead to voice recognition errors due to the disfuency of humans, including context-dependent omissions, verbosity, and self-corrections [27].Also, the commercialized VAs still cannot fully facilitate "conversational interaction" [39].However, our study is based frmly on the progression of voice interaction technology, capable of profciently processing the user's voice input and enabling adaptive multi-turn communication.Therefore, our fndings should be interpreted in light of prospective technological advancements.

CONCLUSION
Our study aims to explore how VAs can proactively take action through verbal communication while respecting user agency.Under our research aim, we utilized a modifed Wizard of Oz method to investigate dyadic communication between the proactive VA, simulated by the wizard participants, and the user participants who stayed in a smart home setting.This approach allowed us to create a study environment where VAs demonstrated human-level abilities in understanding the context and user speech, thereby exploring how VAs can ofer proactive actions aligned with user agency through rich communication.Based on the communication logs and interview data, we presented the VAs' proactive communication types, proactivity levels for smart controls, and communication timing.Furthermore, we examined the underlying user perception, reactions, and user engagement progress over time in relation to the VAs' proactive actions and communications.One of our main fndings is that the users became more motivated to train the VAs by providing explicit feedback when they realized that the VAs were incorporating their previous comments.We found this to be signifcant in exercising user agency through communication.Based on these fndings, we elaborated on the implications for VAs' communication strategies that respect user agency.We hope our research inspires interaction designers and HCI researchers to create VAs that proactively communicate with users, considering user agency to provide truly user-centric proactive services.

Figure 1 :
Figure 1: (Left) Actual view of the smart home setting.(Right) A diagram of the environment setup, indicating where the voice assistant, 4 cameras, smart devices, and activity resources were placed; researcher A took care of the setting.

Figure 2 :
Figure 2: (Left) Control room workstation with researchers B and C present.The wizard participant simulated the proactive VA, sitting in the middle.(Right) A 4-panel split-screen was playing the smart home setting in real-time.

Figure 3 :
Figure 3: The overall process of the 5-hour study: 3 sessions and positions of the participants and experimenters for each session.

Figure 4 :
Figure 4: (Left) Quantities and proportions of communication types for each group.(Right) Quantities and proportions of communication timings, and percentage of communication types for each communication timing.

Fragment 1 :
Asking to turn off the TV when the user was not watching 01 W4 Would you like me to turn off the TV? [Smart Home Control] 02 U4 Uh...no, just leave it on.

Fragment 2 :
Recommending what to watch on Netflix 01 U2 Can you recommend something on Netflix? 05 U2 Seen that one already.(Omitted) 02 W2 Would you like me to recommend a reality show like the previous time or are you interested in a different genre?[Personal Information Inquiry] 03 U2 Hmm. . .thrillers.04 W2 How about 'Shutter Island'?It is directed by Martin Scorsese who has been in the spotlight.

Fragment 3 :
Encouraging the user to clean up after a meal 03 U1 Cool, thanks.01 W1 Have you finished eating?02 W1 Cleaning up the dishes right after eating can prevent bugs.[Well-being Advice]

Fragment 4 :
Inquiring "How was your day?" 01 W4 How was your day? [Social Talk] 02 U4 Well...I went to a lab mate's defense presentation and had a meeting with the professor...It was a busy day.03 W4 Sounds like you've had quite a day.04 U4 Yeah, it really was a hectic day.I even had to rush through lunch.05 W4 Are you feeling tired?06 U4 My eyes are a bit strained... 08 U4 Sounds good.07 W4 How about I recommend some eye exercises to help alleviate your eye strain?[Well-being Advice]

Fragment 5 :
Providing recipe for cooking pasta 01 W5 Are you cooking by any chance?[User Behavior Verification] 02 U5 Yeah.Just about to start.03 W5 What are you planning to make? [User Intent Verification] 04 U5 I'm making Cacio e Pepe pasta.05 W5 Have you made Cacio e Pepe pasta before?[Personal Information Inquiry] 06 U5 Nope, it's my first time.07 W5 Would you like me to guide you through a recipe?[Information Provision] 08 U5 Sure, go ahead.

Fragment 7 :
Asking to play music while the user reads 01 W2 Would you like me to play some music while you are reading?[Smart Home Control + Decision Making Support] 02 U2 That's nice, thanks.(Korean music by AKMU is played.)03 U2 When I'm reading, I prefer pop songs.Korean lyrics are a bit distracting.04 W2 Oh, I see.Do you have specific pop artists you like? 05 U2 Not really.06 W2 Alright then, I'll prepare a playlist with chill pop songs for you.(Music plays.)07 U2 Can you turn the volume down a bit?(The volume is turned down.)08 U2 Switch to classical music.09 W2 Sure, changing it right away.

Fragment 8 : 5 DISCUSSIONS 5 . 1
Asking to turn off the lights for the horror movie 02 U4 No, it would get too scary.03 W4 Okay, got it.01 W4 Would you like me to switch off the lights to watch the movie?[Smart Home Control] At frst, I didn't give much thought to what the voice assistant was saying.But as time went on, I felt it was learning more and more from what I said.It seemed to genuinely consider my responses before making suggestions.So, when the VA suggested turning the lights of, I told it that I don't like turning the lights of because I get too scared, even though I like scary movies.(U4) Why is Communication Important for VAs to Take Proactive Action?Variability in User Acceptance Among and Within Individuals

Table 1 :
Demographic information of study participants and users' activities during the study.

Table 2 :
Categories of the VA (Wizard)'s Proactive Communication Types: Proactive Exploration, Proactive Suggestions, and Proactive Follow-ups.
Would you like me to recommend you a sports/pilates channel?(W3, W6), Would you like me to recommend you some clothing/shoe trends to shop?(W2, W4), You can also watch highlights of the soccer match between South Korea and Lebanon on December 14th on the SPOTV channel.(W3), Would you like me to play songs similar to those of (name of a singer)?(W1) [20] explicit user information was collected from previous communications, including user commands and answers to VAs' exploratory or follow-up questions.As communication with the VAs progressed, users came to understand that what they said was being refected in the VAs' responses and future proactive suggestions.This realization greatly motivated the users to engage in training their VAs.Some users began to willingly pinpoint the reasons they disliked certain suggestions from the VAs (refer to Section 4.4.3).We highlight that clear and rich feedback from users becomes valuable information, enabling VAs to learn and progressively adapt to user agency[53].Therefore, VAs should articulate what users have mentioned into proactive suggestions, stemming from previous conversations.This would elicit explicit feedback from users, creating an interaction loop essential for an AI system to refect user agency[20].5.3.3DoNotHastilyInterpret Users' Simple Refusals orNo Response as Dislike.We found that-despite users valuing the VAs' proactive suggestions-they often easily rejected or ignored them.Users simply did not want the suggestions at that specifc moment but expressed hope for similar suggestions to be made again in the future (refer to Section 4.3.1).This indicates that it is challenging to discern users' genuine desires toward VA's suggestions solely based on their rejection or non-response.For example, W3 (VA) recommended a just-updated sports highlight video on YouTube to U3, who usually enjoys watching sports, but U3 ignored the suggestion.Following up, W3 (VA) asked U3 for feedback on how the recommendation was and U3 replied that he intended to watch it later during dinner.Regardless of U3's initial disregard, based on his later explanation, W3 interpreted this as the user still having an interest in the latest sports highlights update and decided to continue sharing them.So, VAs should refrain from making hasty judgments when users simply reject or ignore suggestions but instead, ask follow-up questions to elicit users' explicit feedback.As mentioned in the prior section, as users begin to engage with VAs by providing explicit feedback on unwanted or disliked proactive suggestions (exercising user agency), simple rejections can be seen as temporary disinterest.In such instances, VAs should continue to ofer similar suggestions in relevant situations.5.3.4SeekPermission fromUsers for Control-Related Tasks, Even in Seemingly Obvious Situations.Even with the VAs (wizards)' ability to control smart devices instantly, they never attempted to autonomously operate smart devices like lighting, blinds, TV, music, robot vacuum, and food delivery.They consistently sought users' approval before taking any control (refer to Section 4.2.1 tions.In our fndings, the VAs (wizards) incorporated users' explicit feedback into their subsequent proactive suggestions; for example, "VA (W5): How is the pasta you just made?", "VA (W2): Since you don't like Angora. . .", and "VA (W6): Do you want me to continue playing songs by your favorite BOL4?".
5.3.5 Keep Supporting Users' Decision-making Until Users Explicitly Say 'Stop.'Inassisting with user decisions on what to watch or buy, it was hard for the VAs (wizards) to precisely meet users' expectations at once.Mostly in our fndings, a series of multi-turn communications ensues with adjusted recommendations based on users' responses (refer to Section 4.3.1).During this process, the wizards felt signifcant pressure due to their inability to provide the right recommendation and the continuous need for adjustments.However, the users found this process meaningful, as it helped them narrow down their decisions by rejecting choices they didn't want and refning their ideal selection until they were satisfed.They also expected this process to accumulate more user information, enabling VAs to make more personalized suggestions.Consequently, in the decision-making support process, VAs are required to persistently adjust recommendations based on users' responses-even if they fail to ofer a satisfying recommendation immediately-until users clearly state 'stop.' 5.3.6 Moderate the Level of Social Talk by Considering the Disposition of Social Chaters and Users' Current State.Users mainly prioritized transactional interaction over social ones and expressed no desire to build relationships with VAs [7]