Comparing How a Chatbot References User Utterances from Previous Chatting Sessions: An Investigation of Users' Privacy Concerns and Perceptions

Chatbots are capable of remembering and referencing previous conversations, but does this enhance user engagement or infringe on privacy? To explore this trade-off, we investigated the format of how a chatbot references previous conversations with a user and its effects on a user's perceptions and privacy concerns. In a three-week longitudinal between-subjects study, 169 participants talked about their dental flossing habits to a chatbot that either, (1-None): did not explicitly reference previous user utterances, (2-Verbatim): referenced previous utterances verbatim, or (3-Paraphrase): used paraphrases to reference previous utterances. Participants perceived Verbatim and Paraphrase chatbots as more intelligent and engaging. However, the Verbatim chatbot also raised privacy concerns with participants. To gain insights as to why people prefer certain conditions or had privacy concerns, we conducted semi-structured interviews with 15 participants. We discuss implications from our findings that can help designers choose an appropriate format to reference previous user utterances and inform in the design of longitudinal dialogue scripting.


INTRODUCTION
Advances in language models are leading to chatbot interactions that can persist across multiple sessions, and refer back to previous user utterances [4,52,75,76].This use of long-term memory can help maintain relationships and build rapport [9,10,24], and can improve user experience in chatbot interactions such as in opendomain conversations [75,76] or discussions of personal health and wellness [4,31,34].Additionally, by giving more relevant responses [16,61] or explicitly referencing past user utterances [34,38,39,57], a chatbot could increase its social presence: the feeling that it is present in the conversation [8,50,57].
While this could prove beneficial to users and improve user perceptions of the chatbot, it could also lead to feelings of privacy violations.This phenomenon is known as the Personalisation Privacy Paradox [3], where there is a tension between collecting more user data to provide personalised services, and a user's feeling of intrusiveness leading to unwillingness to share their personal information.This trade-off could particularly be an issue when people are discussing their sensitive information [26] [23,Art.9].For example, people may be less willing to disclose socially undesirable behaviours due to embarrassment [68], and users of mHealth services have reported that concerns over use of their personal data can negatively impact service adoption and satisfaction [28].
The Personalization Privacy Paradox may hold additional uncertainty when chatbot designers need to choose an appropriate referencing format given the range of styles available to them [13,22,71,79] 1 .To explore this paradox, we investigated the level of social presence used when a chatbot references a user's utterances, and its effect on both how privacy violating, and positively (e.g., intelligent, engaging) users perceived the chatbot.Specifically, we compared 3 referencing formats from low social presence (not explicitly referencing user utterances) to higher social presence (referencing user utterances either verbatim or via paraphrases).We conducted a between-subjects longitudinal study ( = 169) where participants talked to a chatbot about their dental flossing once a week for three weeks.Participants talked to a chatbot that either: (1-None) did not explicitly reference their previous week's utterances, (2-Verbatim) referenced their previous week's utterances verbatim (e.g.., "Last week you said "My teeth sometimes hurt when I floss""), or (3-Paraphrase) referenced their previous week's utterances using paraphrases (e.g., "Last week you said that your teeth hurt").Users found chatbots that explicitly referenced their past utterances more intelligent and engaging.However, explicitly referencing a user's past utterances also lead to increased feelings of privacy violations.To gain further insights as to why users might have perceived chatbot referencing formats differently, we conducted semi-structured interviews ( = 15).Finally, we discuss implications and provide recommendations for chatbot designers when scripting interactions that reference user utterances.

RELATED WORK 2.1 Increasing Social Presence and User Perceptions of Chatbots
As a chatbot uses more conversational referencing formats, it could be said to possess higher levels of social presence [34,38,39,57] (i.e., the feeling of being with an interlocutor [8,50,57]).Previous work has investigated the user's feeling of "being with" a more present, engaging and human-like chatbot [16,80], and increasing these feelings has been found to have benefits such as improved trust [ 29,81] and desire to engage with [6,7,42,53] chatbots.Furthermore, chatbots have been found to benefit from various human-like qualities such as empathy [43,44], listening [73], and differing conversational styles [15,18,77], personas [58,69] or politeness strategies [11,47,51].Previous studies have also found benefit in interviewers that have higher levels of social presence.For example, Xiao et al. found that people give higher quality responses to chatbots that use a battery of AI-driven techniques such as using more relevant responses to users [73]; Tsai et al. found that users were more likely to disclose embarrassing behaviours related to their sexual health to a human compared to a chatbot [68]; and multiple studies have found that chatbots that self-disclose information lead to mutual disclosure from users and improved feelings of trust [1,41,48,59].
More specifically to our study of chatbot referencing format, previous work has found benefit in chatbots that remember and reference details from previous interactions [17,33,46,55,78].For example, Jain et al. found chatbots that reference details from previous conversations lead to increased feelings of empathy [33], and Portela and Granell-Canut reported that participants perceived a chatbot to have higher levels of affection when it remembered previous user utterances or the user's name [55].Beyond this, we are interested in the effect on positive user perceptions caused by the format used by a chatbot when referencing a user's previous utterances.This gives us our first research question of: • RQ1: How does chatbot referencing format (None, Verbatim, Paraphrase) impact: (a) desire to continue using the chatbot?(b) perceived chatbot engagement?(c) perceived chatbot intelligence?

Privacy Concerns Among Chatbot Users
However, while Section 2.1 outlines the benefits of increased social presence, it could also lead to increased feelings of privacy concerns [74] amongst chatbot users [17,32].For example, Schuetzler et al. [60] found that people were less likely to disclose to chatbots that use more relevant responses to user utterances during a small-talk session before asking (non-differentiated) health questions.Ng et al. showed participants two hypothetical financial chatbots (one human-like and one factual) and found that, while the humanlike chatbot scored higher social presence, participants were more likely to share information with the factual chatbot [49].Bae et al. [5] found that people trusted a robot-like chatbot more than a human-like chatbot when discussing positive experiences.More analogous to our study's aim, Chen et al. investigated the perceived invasiveness of a chatbot that referenced participants' personal information (name, presence of heart disease and hand-washing frequency) [17].While some findings indicated that people found chatbots more invasive when referencing their information, this was contrasted with a null finding once the user's perceived identity of the chatbot (human or chatbot) was taken into account.Building on these previous findings and conflicting results gives us our second research question of: • RQ2: How does the chatbot referencing format (None, Verbatim, Paraphrase) impact the user's feelings of privacy violations?We were interested to explore RQ2, as conflicting previous work indicates potential contradictory and uncertain findings.That is to say, by referencing user utterances in different formats, it could become more apparent to the user that the chatbot is storing or manipulating their personal information, and thereby heighten privacy concerns.Alternatively, users could appreciate the increased levels of social presence and personalisation.Additionally, by referencing user utterances verbatim, the chatbot could either make data storage more apparent and therefore privacy-violating to users, or it could be seen as more transparent about storing the user's data without manipulation (and by showing less advanced AI capabilities, users may perceive the chatbot more favourably by generating a metaphor of a chatbot which is less capable [35]).Similarly, by paraphrasing user utterances the chatbot could be seen as invasive (by storing and manipulating user data), or create greater feelings of engagement with the user.Finally, by not explicitly referencing user utterances, the chatbot could be seen as less privacy violating, but also potentially less engaging.

USER STUDY
This study investigates the effect of a chatbot remembering (and incorporating into conversation dialogue) user utterances from a previous chatting session.For this, we conducted a longitudinal between-subjects experiment where participants talked to a chatbot about their dental flossing once a week for three consecutive weeks 2 .Our chatbot had an independent variable of Chatbot Referencing Format (3 levels) which affected whether the chatbot explicitly referenced (the previous week's) user utterances, and the format used when referencing utterances (see Figure 1 for examples of referencing format).The levels of Chatbot Referencing Format 3 are: • None (control group): Chatbot did not explicitly incorporate previous user utterances into subsequent conversations, and instead referenced previous discussions at a highlevel.• Verbatim: Chatbot incorporated previous user utterances verbatim into subsequent chatbot utterances.• Paraphrase: Chatbot incorporated paraphrased versions of user utterances into subsequent chatbot utterances.

Chatbot Script
The chatbot led a conversation with the user about their dental flossing habits and beliefs.We chose dental flossing as we want users to discuss something personal to themselves and (as flossing can benefit from both diary keeping [67] and brief interventions [25]) it is appropriate for short weekly personal conversations.Additionally, 2 Ethics approval received from our institutional IRB prior to study commencement. 3Literature may refer to referencing formats using various terminology.In our case, verbatim is analogous to extractive summarisation [22,79] or direct quotation [13,71], and paraphrase is analogous to abstractive summarisation [22,79] or indirect quotation [13,71].
dental flossing is an activity that health experts recommend daily adherence to [63], and people can have barriers to dental flossing [2,12], both of which are incorporated into our chatbot's script.
The conversations for each of the 3 weeks were as follows (responses elicited by the chatbot were open-ended unless specified otherwise)4 : • Week 1: All participants saw the same script as the chatbot could not yet reference previous week's utterances.Participants shared their dental flossing beliefs [12] (7-point Likert), flossing frequency, and perceived benefits of flossing.• Week 2: The chatbot referenced flossing frequency and perceived benefits from Week 1. Participants shared their flossing frequency, barriers to flossing, and strategies to overcome barriers.• Week 3: The chatbot referenced flossing frequency, and barriers and strategies from Week 2. Participants shared their flossing frequency, reflected on their barriers and strategies from the previous week, and shared their perceived susceptibility and perceived risks, before sharing their dental flossing beliefs [12] (7-point Likert).

Implementation Details
The chatbot was hosted on Qualtrics, and used JavaScript and HTML to emulate the look and feel of a chatbot.Microsoft LUIS 5 was used for both intent recognition (in real-time) and for selecting the most appropriate paraphrase for a given week.Intent recognition was trained using utterances from [18] for users' barriers to flossing and strategies to overcome barriers.Training data for other prompts was generated by the research team and by piloting the chatbot until a range of responses could be recognised.Data augmentation (e.g., synonym replacement) was then used to generate additional training data.

Intent Recognition:
We used intent recognition (in all 3 conditions) to recognise the intent of user utterances within a week's session.An appropriate response would then be appended to the start of the subsequent chatbot utterance.For example, the chatbot could deliver "Well done on flossing five days a week" in response to a user's flossing frequency.

Delivering Paraphrases:
To deliver paraphrases of user utterances, first user intent was recognised via LUIS.Each user intent had a corresponding paraphrase written by the research team, that was then used as the paraphrase in the next chatting session (e.g., for flossing benefits, an intent of "prevent gum disease", was given the paraphrase "flossing helps prevent gum disease").While this approach is limited in providing a discrete number of paraphrases and not accounting for multiple intents, it ensured that consistent and coherent paraphrases could be delivered to users.Example script and paraphrases can be seen in Figures 1 and 2, and a full list of paraphrases and script can be found in supplementary material.

Participants
We recruited participants using university advertisement boards.We only selected participants who did not fully adhere to daily flossing (similarly to previous intervention studies [37]), and all responses were completed remotely and asynchronously.Participants were paid S$2 for the first week's session, S$2 for the second, and S$3 for completing the third and final week.Weeks 1 and 2 took on average ∼3 minutes, and Week 3 ∼5 minutes.
169 participants (mean age 22.7; 64% female) completed all 3 weeks of the study, with 7 participants completing weeks 1 and 2 only, and 4 participants completing week 1 only.We only include data from participants who completed all 3 weeks (with other participants being paid for their completed time, but excluded from analysis), resulting in 55 None, 58 Verbatim, and 56 Paraphrase.

Procedure
Each week, participants were contacted via email and followed the procedure: (1) Follow Qualtrics link to individual chatbot session.
Participants were invited to weeks 2 and 3 seven days after completing the previous week's session, and were given three days to complete these sessions.Responses were controlled so that only desktop or laptop devices could be used.

Weekly Measures:
At the end of each week's chatbot session, participants rated their experience on 7-point Likert scales (Strongly Disagree to Strongly Agree), and were asked "Do you personally agree or disagree that..." for the following measures: Interest to continue chatbot usage: "I would want to continue using the chatbot" [72]; Chatbot engagement: "The chatbot seemed engaged in our discussion", "I felt the chatbot was NOT paying attention to what I said" [66]; Chatbot intelligence: "The chatbot was intelligent", "The chatbot was competent" [18,19].

Privacy concerns, intrusiveness, and risks:
To investigate whether chatbot referencing style impacts privacy-related measures, (at the end of week 3 only) participants responded to the following 7-point Likert scale questions.For privacy concerns (referring to concerns that inhibit users from sharing information [74]) measures were: "I was concerned that the chatbot was collecting too much personal information about me", "I was concerned about submitting my information to the chatbot".For privacy intrusiveness (referring to the unwelcome general encroachment into another's presence or activities [74]) measures were: "I feel that as a result of this interaction, information about me is out there that, if used, will invade my privacy", "I feel that as a result of this interaction, my privacy has been invaded".For privacy risks (referring to the uncertainty arising from the possibility of an adverse consequence [74]) measures were: "Personal information was inappropriately used by the chatbot", "Providing the chatbot with my personal information involved many unexpected problems".

USER STUDY RESULTS
We fit a linear model on each dependent variable collected from the final week and Chatbot Referencing Format as the fixed effect, and performed post-hoc Student's t-tests to identify specific differences.We excluded the Likert scale ratings of 8 participants (4 None, 1 Verbatim, 3 Paraphrase) who gave conflicting responses for chatbot engagement (e.g., both rated as Strongly Agree).This left us with 161 responses.See Figure 3 for summary results.In addition, we analysed user responses (response length before and after removing stop words), but found no difference between conditions.We will now discuss individual findings and their significance.

General Chatbot Perceptions
Measures related to RQ1 are described below.Chatbot referencing format had no direct impact on a user's desire to continue using the chatbot, and there was no significant difference between conditions.However, participants found the Verbatim and Paraphrase conditions to be more engaging compared to None.Specifically, for positively perceived engagement, both Paraphrase ( = 0.0022) and Verbatim ( = 0.0444) were rated more favourably than None.While Paraphrase scored higher than Verbatim, it was not statistically significant.Similarly, for negatively perceived engagement, both Paraphrase ( = 0.0125) and Verbatim ( = 0.0159) were rated more favourably than None.These results indicate that explicitly referencing a user's previous week's utterances positively impacts the user's feelings of chatbot engagement, while not explicitly referencing negatively impacts a user's feelings of being listened to.
Participants also found the Verbatim and Paraphrase chatbots to be more intelligent.For perceived intelligence, both Paraphrase ( = 0.0093) and Verbatim ( = 0.0301) were rated more favourably than None.For perceived competence, Paraphrase was rated more favourably than None ( = 0.0327).These two results indicate that explicitly referencing a user's previous utterances makes a chatbot appear more intelligent and competent.

Privacy Perceptions
Measures related to RQ2 are described below.The Verbatim chatbot was found to generate more privacy concerns than None for one of the measures.Specifically, for "I was concerned that the chatbot was collecting too much personal information about me", Verbatim scored higher than None ( = 0.0227).
For measures of privacy intrusiveness, there were weakly significant differences, that could suggest participants found Verbatim or Paraphrase conditions to be more intrusive compared to None.
For the measure: "I feel that as a result of this interaction, information about me is out there that, if used, will invade my privacy", Verbatim scored highest (worst) and was weakly different to None ( = 0.0716).For the measure: "I feel that as a result of this interaction, my privacy has been invaded", both Verbatim ( = 0.0677) and Paraphrase ( = 0.0866) scored higher than and were weakly different to None.
For both measures of privacy risk, while Verbatim and Paraphrase trended above None, there were no statistically significant differences between conditions.
These results indicate that explicitly referencing a user's previous utterances may raise privacy concerns, and that this may be further exacerbated if utterances are referenced verbatim.In particular, Verbatim participants were more concerned that the chatbot was collecting too much information about themselves.This may indicate that directly quoting a user's utterances made users more conscious of their data being collected, and therefore increased privacy concerns.
However, it is important to note that all privacy measures averaged below "4 -Neither Agree Nor Disagree" reflecting that feelings of privacy violations were still low amongst participants.This feeling may be reflected by the domain of the chatbot (dental flossing) which some participants may not find to be a very sensitive topic (discussed further in Section 5.3.1).

SEMI-STRUCTURED INTERVIEWS
Our quantitative results found Verbatim raised more privacy concerns than None, and there were also trending (but weakly-significant) results to indicate that participants found Verbatim and Paraphrase potentially more intrusive than None (see Section 4.2).To gain further insights as to why people may perceive the chatbot referencing formats differently, we conducted semi-structured interviews.

Participants
We recruited 5 participants per condition (N = 15; mean age 20.9; 9 female) for remote interviews, all of whom had completed the full 3 weeks of the study.Interviews lasted between 20 and 30 minutes, and participants were reimbursed S$5 for their time.

Procedure
First, participants were instructed that there are no right or wrong answers, and consent was sought to record the interview.Participants then discussed their experience taking part in the study and responded to questions pertaining to perceived effect on dental flossing, privacy violations, chatbot intelligence, chatbot warmth and the participant's perception of their assigned condition.
After these questions, the interviewer concluded by revealing and describing the 3 experiment conditions.Participants were then asked to think-aloud [14], and rank their preference for the conditions while explaining their opinion and reasoning.See supplementary material for the interview guide used.

Findings
We will now discuss the findings from our semi-structured interviews.We discuss privacy concerns raised by participants (split between those related and not related to referencing format), chatbot intelligence, recall assistance, and chatbot naturalness.Finally, we discuss the last section of the interview where participants saw all 3 conditions and explained their referencing format preferences.

Privacy Concerns (Unrelated to Reference Format):
Similarly to previous findings, the perceived sensitivity of the domain varied among participants, and affected their hesitancy in sharing information [18,45].Some participants without privacy concerns described dental flossing as a non-sensitive domain, meaning they were not hesitant sharing their information: "this topic isn't something that's very sensitive, so I wasn't particularly concerned about it."-P5(Verbatim) In contrast, some participants were concerned to share their dental flossing behaviour as they saw it as sensitive information.P11(Verbatim) raised this concern in addition to hesitancy sharing their information from uncertainty as to who will read their messages: "dental flossing is [...]6 a more private, embarrassing.. umm.. thing.So I think differently sometimes, like whether telling the chatbot like how often I floss or whether I managed to achieve my goals [...] it does feel a little scary because I'm not very sure who exactly is seeing the information."-P11(Verbatim) On from this, participants described feeling embarrassed when sharing health behaviour that they perceived as insufficient: "I was a bit embarrassed, because they asked me, uh, how many times did I floss over the week, and then I was like "0"" -P4(Paraphrase) Furthering this, P12(Paraphrase) felt embarrassed when their flossing frequency was referenced: "I was like... slightly embarrassed <laughs> about how I-I never floss at all.[...] it made me health-aware about how I wasn't really flossing at all [...] I wouldn't really say that made me feel uncomfortable.Just like a little bit embarrassed.A little bit self-aware."-P12(Paraphrase) Similarly to previous findings on socially desirable responding [60,62], one participant described how they considered lying to the chatbot about their flossing behaviour: "I felt guilty for like flossing my teeth like once a week.And I was like "try to lie to them", but then I was like: "OK, never mind I won't lie to them"" -P2(None) The expectation of data storage and perceived sensitivity of the task also affected feelings of privacy invasion, with P1(Paraphrase) equating the task and data storage to writing a diary for themselves: "it's normal for it to store information.[...] It's equivalent as to you writing a diary, so it wasn't really something that was particularly invasive to me." -P1(Paraphrase) 5.3.2Privacy Concerns (Related to Reference Format): When discussing privacy concerns, several participants expressed surprise that the chatbot referenced what they said in previous weeks: "at first when they repeated what I said the previous week, then I was like, "oh shit, they record everything" but-but it's not that big of a deal, I guess.Like it's alright, it's just dental hygiene" -P4(Paraphrase) Some of this surprise was accounted to participants' (lack of) expectations of chatbot abilities, with interviewees describing their concerns subsiding after the initial exposure to chatbot referencing.
However, some Verbatim participants were negatively surprised.P11(Verbatim) found it "unnerving" that Verbatim remembered what they said, and found sharing flossing embarrassing: "it was like a little unnerving because the chatbot remembered what I said previously.[...] I didn't expect it to be that smart <laughs>, so it was a little startling but, because-because we're talking about something like dental flossing so, I guess it was a little embarrassing at first." -P11(Verbatim) Conversely, P1(Paraphrase) described how the referencing format did not raise feelings of privacy intrusion as they expected their data to be stored: "I think that the chatbot just took in [...] whatever I put in from the last time around, and [...] data storage is [...] a normal thing of a chatbot for me, so not really is anything that felt very intrusive" -P1(Paraphrase) P9(Verbatim) described appreciating utterances being unchanged, as any "processing" would have raised privacy concerns: "I would prefer this over if they were to process my message [...] Rather that they just feedback what I have said so my perception would be: " [...] they have just stored my data and then given it back to me" as compared to them [...] processing the background and feeding something else, which I think would have raised a bit more of a privacy issue for me."-P9(Verbatim) When asked what made them hesitant sharing information, some None participants described how the non-explicit referencing format of None made them doubt the engagement of the chatbot and thereby be hesitant in sharing information: "I think the only thing that was uncomfortable was that the chatbot [...] didn't really seem to engage in the conversation.[...] he made a reference to its own question.He didn't make reference to my answer.[...] he just made the whole chat feel a bit disengaged.
So like whatever answer I put down doesn't really matter to the chatbot anyway" -P7(None) This went further with some None participants describing simplifying their responses as they did not think the chatbot would understand them otherwise: "probably if anything [made me hesitant sharing information], it was maybe like how complex I structured my sentences.So I tried to keep my sentences like as simple as possible so that maybe the chatbot would be.. <pause> easier for the chatbot to recognise the sentence structures" -P6(None) However, some None participants described lack of privacy concerns due to no explicit references to their utterances: "it was just a series of prompts that doesn't really consider any reference to my own, and so I don't really feel any breach of privacy or something" -P7(None) However, some participants thought less of Verbatim with P5(Verbatim) stating: "it felt like a survey".Others disliked Verbatim due to its repetition of their utterance word-for-word: "it'd be good to somehow be able to paraphrase what I've said [...] so it wouldn't feel so obvious that it's just copying and pasting what I've said previously" -P5(Verbatim) By contrast, while None participants described the intent recognition as a feature of an intelligent chatbot, they also (due to None's referencing format) questioned the intelligence of the chatbot, with some doubting the chatbot's ability to understand them.

5.3.4
Referencing Format and Recall: Participants described how the referencing from both Verbatim and Paraphrase helped them remember what they wrote previously.Verbatim was preferred by some participants as a more precise reminder of their utterance.For example, P15(Verbatim) equated the referencing style to a lecture recap, and valued Verbatim's consistency: "Like in lectures and like videos where there's like a recap or review.[...] I wouldn't have remembered what I said to the robot, so it kept like a certain consistency of like the interview" -P15(Verbatim) Otherwise, participants appreciated Verbatim as they: distrust a chatbot's ability to accurately paraphrase their words (and believe paraphrasing will lose nuance); want to know their exact utterance so previous conversations are not repeated; consider Verbatim will better distinguish their own utterances from the chatbot's; or may desire to be held accountable to their prior utterances: "the retrieval by the chatbot to bring back exactly, especially word for word, what I said, kind of reminded me that "ohh, I kind of agreed to this, to try this strategy" and yeah to see one week later I actually did carry it out" -P9(Verbatim) By contrast, the None participants found referencing utterances at a high-level negatively impacted recall: "the problem in very generic statements is that [. "it'd be good to somehow be able to paraphrase what I've said, or to do so without directly quoting?Yeah, so it wouldn't feel so obvious that it's just copying and pasting what I've said previously, yeah?" -P5(Verbatim) Some None participants described the condition as less natural, and suggested that explicitly referencing past utterances would make the chatbot more personable: "if they reference to my difficulties directly, you feel more... personal."-P7(None)

5.3.6
Comparing the 3 Referencing Formats: At the end of the interview, we revealed the 3 referencing formats to participants, and asked them to think-aloud and explain their preference between formats.This reinforced some of the previous qualitative findings, and also generated opinions from participants of their non-assigned conditions.When ranking their preference for referencing format, all interviewees put None as their last choice, 5 interviewees put Verbatim as their first choice, and 10 interviewees put Paraphrase as their first choice.Some user feedback, mirrored that discussed in Sections 5.3.1 to 5.3.5, with users describing Verbatim as "creepy", "scary" and "guilttripping" them, or stating that they appreciate the fidelity to their original utterance; Paraphrase as more natural and human-like; and None as unengaging.Interestingly, some participants who chose Verbatim as their first preference described that they see chatbots as a tool, and value their own word over that of a robot.In contrast, those who favoured Paraphrase described seeing a chatbot as a conversational partner that they wish to be more human-like.

DISCUSSION
Here we discuss the implications of our study.We aimed to investigate the impact of a chatbot's format when referencing a user's utterances from a previous chatting session.By comparing high-level non-explicit references, verbatim references, and paraphrased references, we wanted to investigate effects on both positive user perceptions and privacy-related perceptions.Our findings provide some empirical evidence that users value Verbatim and Paraphrase as more engaging and intelligent.However, (in support of Personalisation-Privacy Paradox [3]) there is some evidence Verbatim and Paraphrase raised privacy concerns among users.
Although we did not find measurable differences in response quality between conditions, results indicated that people receiving non-explicit or verbatim references may be hesitant in their personal information.Specifically, Verbatim participants were more concerned about the quantity of personal information being collected, and our interviews found that Verbatim participants raised concerns that the referencing style was "unnerving" and "creepy".Some None participants were hesitant providing complex utterances (as they doubted that the chatbot could understand them).These findings could reflect the expectations of users before interacting with the chatbot [30,36,56,70].In order to abate these concerns, more clear consent could be sought and explanation of privacy practices could be provided [40,54,65] before using different referencing formats, and the abilities of the chatbot could be more clearly advertised to avoid user disappointment [35].
Interviewees saw chatbots along a spectrum as either more of a conversation partner, or more of a tool to be used.Implications from this are that those who view chatbots as conversation partners may prefer paraphrased references, while contrarily, those who view chatbots as more of a tool may prefer a chatbot that references them verbatim.Similarly, those with more faith in their own word compared to a chatbot (or no belief in chatbot intelligence or emotions) may prefer a verbatim reference format.This could be taken further by investigating the role of personality in user preferences for referencing formats.For example, users who are more extroverted or agreeable may prefer a more conversational (paraphrased) format, while users who are more introverted or conscientious may prefer a more direct and factual (verbatim) format.
Our findings also indicate the contextual nature of reference format.For example, if the user's utterance is akin to a "contract" to themselves (such as a goal for a healthy behaviour), they may want to see their utterance in its entirety in order to solidify their commitment.Similarly, if there is purpose in the user revisiting and developing on previous utterances (such as for creativity tasks or goal-setting) users may prefer their words to remain unchanged so as to build on their previous interaction.Equally, certain use cases (such as in legal settings) may require chatbots to be more conservative in their use of paraphrasing, or to provide verbatim quotes alongside the chatbot's paraphrase (akin to the use of mixed quotations in linguistics literature [13,71]).
This implies that chatbots could, in some cases, use a mixture of paraphrased and verbatim reference formats, depending on the content of the user's utterance.In the case of dental flossing, the chatbot could use paraphrased responses to reference a user's previous behaviour (flossing frequency), but maintain the user's utterance when referencing the user's behaviour strategy that they devised in the previous chatting session.
Study findings also have implications for the design of chatbot interfaces.If chatbots are designed to reference utterances (e.g., verbatim quotes), designers need to be transparent to users, and ensure user control over their data and that user privacy is protected.Similarly, if paraphrased references are used, the chatbot needs to ensure that the meaning of the user's original utterance is retained and that users do not feel that their utterance has been distorted.

LIMITATIONS AND FUTURE WORK
The user study was conducted over 3 weeks with one chatting session per week, which was not long enough to potentially encourage health behaviour change among participants.Furthermore, we cannot claim generality over different chatbot referencing formats [22,79], sensitivity and intimacy of user data in references [26], domain of conversations with the chatbot, and input modalities.
Further work could investigate the use of referencing formats across different modalities.For example, while a voice-user interface (VUI) could also reference users verbatim or via paraphrasing, verbatim references could have the added dimension of using the voice of either an agent or of the user themselves [27].The added dimension of voice playback could raise addition concerns among users.Additionally, alternative referencing formats (such as summarisation styles [20,22,79], or use of mixed quotations [13,71]) could be investigated.Choice of these could depend on factors such as the length, quantity, temporal spacing and content of utterances.For example, for longer utterances, showing the entire utterance verbatim may prove unwieldy, adding to user burden [21,64].

CONCLUSION
This study investigates how the format used when a chatbot references user utterances from a previous chatting session affects a user's positive perceptions (chatbot intelligence and engagement) and privacy related perceptions.Our findings suggest that if a chatbot references previous user utterances, both verbatim or by using paraphrases, it can lead to increased feelings of chatbot intelligence and engagement.Despite this, referencing user utterances can also raise privacy concerns among users.Our semi-structured interviews then investigated why people have these privacy concerns.We discussed the implications of our findings for chatbot designers and researchers, and we provided recommendations for the choice of referencing format.

Figure 1 :
Figure 1: Extracts from week 3 of the study showing the 3 levels of chatbot referencing format.Grey bubbles are chatbot utterances, and teal bubbles are user utterances (as seen by participants).Differences in referencing format are circled in red.

Figure 2 :
Figure 2: User utterances and their potential None, Verbatim and Paraphrase chatbot responses across all 3 weeks.Grey bubbles are chatbot utterances, and teal bubbles are user utterances.Red arrows show where a user utterance would be referenced (by Verbatim and Paraphrase) in the following week.

Figure 3 :
Figure 3: Outcome measures by question asked in the final week of the study.Significance  < 0.05 indicated by * , and  < 0.10 indicated by +.

5. 3 . 3
Perceived Intelligence and Engagement: Interviewees generally viewed Verbatim and Paraphrase chatbots as intelligent."I was like pretty pleasantly surprised that it like remembered my answers from previous weeks.Yeah, It made me think the chatbot was like a little bit more intelligent."-P12(Paraphrase) Similarly, Verbatim and Paraphrase participants found referencing their previous utterances made the chatbot feel engaged.
..] I kind of like forgot what I've written, and then when they tried to resume conversation, I had no idea what I said."-P2(None) Naturalness of Referencing Format: Participants described Paraphrase as feeling natural and human-like.For example, P3(Paraphrase) appreciated that the chatbot did not copy previous utterances word-for-word, and thereby felt more engaging: "how the bot referenced it feels very natural.[...] it didn't copy what I said verbatim.So like it felt as if like a friend was just like, "Oh yeah, I remember you said something about this like last time we met" so it felt quite natural, and [...] I also really like the fact that they did remember [...] because then it made me feel like "OK, at least the bot is listening to what I say.I'm not like shouting into the abyss"" -