Understanding is a Two-Way Street: User-Initiated Repair on Agent Responses and Hearing in Conversational Interfaces

Although methods for repairing prior turns in natural conversation are critical for enabling mutual understanding, or successful communication, these methods are seldom built into conversational user interfaces systematically. Chatbots and voice assistants tend to ask users to paraphrase what they said if it was not understood, but users cannot do the same if they encounter trouble in understanding what the agent said. Understanding is a one-way street in most (intent-based) conversation-like interfaces. An exception to this is Moore and Arar (2019), who demonstrate nine types of user-initiated repair on agent responses that are common in natural conversation and who have shown that users will employ these repair features correctly in text-based interfaces if taught. In this small-scale study, we test these user-initiated repairs (in second position) in a voice-based interface. With understanding-oriented repairs, we found that participants employed them much the same way in text and voice. In addition, we examine some hearing- and speaking-oriented repairs that emerged from the use of our novel multi-modal interface. We found that participants used them to manage troubles specific to the voice modality. Analysis of user logs and transcripts suggests that user-initiated repair features are valuable components of conversational interfaces.


INTRODUCTION
A major challenge in the design and development of conversation-like user interfaces is the lack of a model of natural conversation to guide them.Design is typically driven by the affordances and limitations of the language and voice technologies, as well as commonsense notions of how conversation works.Missing from this design and development are systematic models of conversational structure and dynamics.This technology-centric approach misses features of naturally occurring conversation detailed in conversation science.Interaction mechanics like turn-taking, sequence, preference, and repair are not accounted for systematically in most conversational agents [19][20][21].Therefore, from a conversation analytic point of view, most chatbots and voice assistants do not work like natural conversation.
One of the distinctive features of natural conversation is that one speaker can get another speaker to redo a prior turn-at-talk if he or she did not hear or understand it.These features are critical for enabling the participants in the conversation to achieve mutual understanding locally [20,22].If any participant fails to hear or understand the current speaker, he or she cannot perform an appropriate next action, and the interaction breaks down.Repair is the technical term used by conversation analysts to refer to the range of natural practices through which participants redo prior turns to manage troubles in speaking, hearing, and understanding [20][21][22].
But from a technology-centric perspective, the automated agent and the human user are not viewed as equal participants in the interaction.While chatbots and voice assistants excessively prompt users to paraphrase what they just said, agents cannot paraphrase their own utterances when users fail to understand.Repair is typically a one-way street with chatbots and voice assistants.An exception to this is Moore and Arar, who demonstrate nine types of user-initiated repair common in natural conversation, as well as four types of agent-initiated repair [13].Moore et al. show that participants will employ these repair features correctly in text-based interfaces if taught through a brief tutorial [12].
In this study, we implement seven features of user-initiated repair on agent responses, proposed by Moore and Arar and test them with real users [13].The aim of our qualitative analysis is to demonstrate how the repair features work, on a turn-by-turn basis, and to determine if subjects can use them as intended.In addition, we analyze some emergent user behavior involving repair by users of their prior utterances based on feedback from an automatic speech recognition (ASR) service.In other words, when users can see that the agent misheard them.

BACKGROUND
For the purposes of designing and building conversational user interfaces, we define "conversation" as natural conversation, or the way that humans naturally interact through language [12,13].This is the user interaction metaphor that we use to guide design.And for a model of natural conversation, we turn to the literature of Conversation Analysis [19].One distinctive feature of natural conversation is the organization of repair, or the practices by which speakers manage "troubles or problems in speaking, hearing, or understanding, the talk" [20][21][22].In such practices, a prior turn-at-talk is revealed to contain a source of trouble, and then it is redone in an attempt to resolve the trouble.The prior turn contains the "trouble source, " and the redoing of that turn is the "repair" [22].There are many types of repair.They vary in terms of the nature of the trouble (i.e., speaking, hearing or understanding) [22]; they vary in terms of who initiates the repair sequence and who provides the repair (i.e., self or other) [22]; and they vary in terms of the sequential position from which they are initiated (i.e., first, second, third or fourth position from the trouble source) [20].In short, there is a distinctive system of repair in natural conversation [20,22].
The organization of repair in natural conversation includes actions like repeating a prior utterance that was not heard, paraphrasing a prior utterance that was not understood, correcting a prior utterance that contained an error, and more.The concept of repair is broader than error correction; "Everything is... a possible repairable or a possible trouble-source," not just errors [21].And the organization of repair does not include all kinds of troubles in conversation."It is overt efforts to deal with trouble-sources or repairables -marked off as distinct within the ongoing talk -that we are calling 'repair'" [21].
Repair is part of a larger strategy for achieving speed and efficiency in natural conversation [12,13].In short, fewer words are faster, unless they trigger additional turns, e.g., information or clarification requests.Every utterance must make implicit assumptions about what the particular recipient knows and does not know.Sometimes these will be wrong.There is no way to know exactly what other people know.To manage this, speakers 1) tailor their utterances to what they believe the other person knows (i.e., recipient design); 2) try out the shortest utterance the other person is likely to understand (i.e., minimization); and 3) redo the utterance only if a trouble emerges, relaxing minimization until understanding is restored (i.e., repair) [13,18,19].
However, this efficiency strategy is not possible with most intent-based chatbots and voice assistants because agent responses are designed for whole audiences, not for particular users.Assumptions about certain background knowledge and vocabulary that an audience knows will not be valid for every user.Designers may either write verbose responses tailoring to the least knowledgeable members, thus slowing down everyone else, or they may design for the average member and enable the least knowledgeable members to obtain clarifications for themselves only when needed.Audience design, as opposed to recipient design, then may create a greater need for user-initiated repair in which users can obtain paraphrases, examples and definitions of what the agent said when their knowledge diverges from that of intended audience.In addition to enabling greater efficiency in conversation, user-initiated repairs on agent responses can also enable teaching users new vocabulary.For example, businesses might want to teach their customers jargon for new products or promotions (or even nonstandard coffee cup sizes).If the agent uses the desired jargon and a customer requests a definition, the customer can learn the meaning of the term and recognize or use it the next time.
Currently in the industry, chatbots and voice assistants tend to implement the organization of repair poorly and partially [2], with some exceptions [12,13].Most chatbots enable some version of understanding repairs initiated by the automated agent in second position (other's next turn after trouble source) on the user's utterance, for example, requests for paraphrases or disambiguation.They also have understanding repairs initiated by users in third position (self's next turn after trouble source), "no, I mean..., " because these require no implementation; users just see how agents interpret their utterances and redo them if there is misunderstanding.However, user-initiated repairs in second position, like users' requests for paraphrases or examples, tend not to be implemented systematically.In other words, agents frequently ask users to paraphrase what they said, but users cannot get agents to paraphrase what they said.Likewise agent-initiated repairs in third position, as well as agent-and user-initiated repairs in fourth position (other's second turn after trouble source), "oh, you mean..., " tend not to be supported at all [13,20].
While error correction is subsumed under the organization of repair in Conversation Analysis [22], in software development, it is subsumed under error handling, which refers to different kinds of problems and bugs that may occur in the use of a computer application [3,10,14,23].Error handling in chatbot and voice assistant applications addresses problems like the system failing to detect user input or to classify the input as a known intent.Error handling only overlaps with the organization of repair to the extent that agent or user redo a prior utterance to manage these problems.For example, Li et al. use the term "repair" in the sense of handling an error in an application [9], but not in the conversational analytic sense of redoing a turn-at-talk [20,22].They analyze multiple types of errors that can cause breakdowns, such as no intent matched, wrong entity matched, wrong app used, wrong screen used, wrong menu selection and wrong value extracted [9].And they demonstrate an innovative multi-modal approach to error handling [9]; however, they do not address the organization of repair [12,13,20,22].
Benner et al. survey the literature on recovery strategies for interactions with automated agents [2].They examine the most common types of breakdowns, such as those involving agents failing to understand, requesting paraphrases from the user, re-prompting the user, listing response options, explaining response formats, disclosing their machine identity, apologizing and offering to transfer to a human.Benner et al. then attempt to organize the findings of 33 studies into six categories of recovery strategies: Confirmation, Information, Disclosure, Social, Solve and Ask [2].However, the categorization scheme offered by Benner et al. is unclear.For example, the "Confirmation" category includes very different things, none of which appear to involve confirming.This category includes agents saying "I do not understand, " which projects a paraphrase by the user next, agents saying "I do not know, " which projects aborting the sequence and agents "ignoring the failure, " which is not a recovery strategy at all [2].Furthermore, Benner et al., and the literature they survey, do not cover user-initiated repairs on agent responses nor user self-repairs on speaking, as we analyze in this study, because they are novel and not typically found in most chatbots.
Ashktorab et al. use the term "repair" loosely to examine users' preferences toward strategies for managing interaction breakdowns that result primarily from agents' inability to understand the user [1].They examine the two types of repair most often supported by chatbots: agent-initiated repair in second position and user-initiated repair in third position.The kinds of agent-initiated phenomena analyzed include: paraphrase request (what they call "Repeat"), paraphrase request plus transfer to human ("Defer"), understanding check ("Confirmation"), and disambiguation request ("Options").User-initiated repair in their study consisted of self-corrections in third position ("Top Response") [1].Furthermore, Ashktorab et al. use a pair-wise comparison method to elicit users' stated preferences for these different features, rather than analyzing actual user-agent interaction through system logs and transcripts as we do in our study [1].
Cuadra et al. similarly examine subjects' attitudes toward self-repairs by agents [6].They conducted a lab experiment in which subjects read prompts displayed on a screen as part of scripted dialogues with voice assistants.In the scripted scenarios, the agent intentionally makes a mistake, e.g., playing wrong type of music, and then initiates repair, e.g., "You don't seem pleased.Did I get that wrong?"It is not clear on what basis the assistant could have detected such errors in a real world situation.Subjects then rated how much the simulated interactions "feel successful" [6].They rated the interactions as feeling unsuccessful when the agent did not initiate repair on the error, and more successful when it did.Thus Cuadra et al. do not examine user-initiated repair, as in our study, nor do they analyze actual user interaction with the system, only feelings toward simulated interactions.
Others examine the most common type of repair in chatbot interaction, agent-initiated repair on user utterances and user self-repair [7,8].Dippold analyzes user logs from a prototype agent and identifies a variety of cases in which users redo their prior utterance [7].However, Dippold fails to distinguish repair sequences that are initiated by the user from those initiated by the agent.In most of their cases, user repair types, "Rephrase", "Accommodation", "Change Choice", "Repeat" and "Change Topic" are initiated by the agent in second position with a paraphrase request (e.g., "Sorry, I didn't understand that.Can you say that in a different way?") [7].Only the repair Dippold calls "Restate Purpose" appears to be initiated by users, in third position, after the agent did not do what they expected [7].The main difference is who detected the trouble: agent or user.Disppold finds that "Restate Purpose" (3rd position) repairs tended to be more successful than the "Rephrase" (2nd position) repairs.
Similarly Følstad and Taylor examine user logs from a customer service chatbot and analyze standard types of agent-initiated repair [8]: paraphrase requests ("I may not have understood all of what you ask") and disambiguation requests ("Please choose one of these options... ").However, they offer no analysis of user-initiated repair, no doubt because the chatbot they analyzed did not support it.Furthermore they analyze agent-initiated repair using quantitative methods rather than conversation analytic line-by-line analysis [8].
Although user-initiated repair features are largely overlooked in the chatbot and voice assistant industry, earlier research has explored their importance.Moore's Program Enhancement Advisor, is a conversational agent that helps users improve their programming, and it enables users to request paraphrases and examples of what the agent said in order to improve mutual understanding [11].Similarly, Paek and Horvitz enable users to request repeats and elaborations of what the agent said [15].Raudaskoski, taking direct inspiration from Conversation Analysis, implements user-initiated repeat requests (Sorry?) and paraphrase requests (Do what?) [17].And Cahn and Brennan implement a range of user response options, including Ok, Huh?, Never mind and No, I Meant, which Moore et al. characterize as sequence closers, paraphrase requests, sequence aborts and self corrections, or collectively as "natural user feedback" [4,12].
While understanding troubles apply to both text and voice agents, hearing troubles arise only with the latter.Past studies have documented the pervasiveness of agent's mishearing what users say.Unlike humans, who tend to request repeats when they have trouble hearing, automated agents tend to respond to misheard utterances instead.This enables users to see that there is a problem and to attempt self repair of their own prior utterance.Myers et al. show that when agents respond inappropriately, due to errors in speech-to-text (STT), users display multiple repair strategies, from hyperarticulating the prior vocal utterance to paraphrasing it by simplifying it or changing its meaning [14].Porcheron et al. likewise observed users, in family settings, altering the prosody of their prior utterances or even its meaning in an attempt to get an Amazon Echo to hear correctly [16].And Weisz et al. found that customers, talking to an automated banking agent, paraphrased their prior utterances, requested definitions of ambiguous terms and even made guesses in the face of the agent's mishearings [24].
To summarize, there is a gap in the industry regarding systematic support for user-initiated repair on agent responses and a corresponding gap in the literature.When the term "repair" is found in the literature, it is either used to mean "error handling" or to examine agent-initiated repair in second position or user-initiated repair in third position [1,[7][8][9].In both cases, it is the user who redoes the problematic turn.Our study fills this gap by implementing and analyzing an additional class of repair: user-initiated repair in second position, in which the agent redoes its prior turn.We compare all of these practices in both text and voice channels.In addition, we implement a novel voice interface, complemented with a temporary display of STT output, and analyze how users repair their own utterances in first position.

DATA AND METHODS
The purpose of this study is to test particular user-initiated repair features, derived from conversation theory, through interactions with actual users.To achieve this, we created three scenarios involving different conversational agents based on the same design framework [13].These specific scenarios-friendly small talk, teaching a history lesson, and ordering coffee-were selected because they effectively demonstrate a variety of interaction patterns within the framework and the participants can easily recognize these patterns in everyday situations.These scenarios were implemented as different configurations of a single Watson Assistant dialog skill.
The data in this paper comes from two user studies: one on text-based conversations and the other on voice-based conversations.The first study was conducted remotely over a video conference platform and the interactions with the agent were text-based.The second study was conducted in person at the campus of a large technology company, and the interactions were primarily voice-based.The reason for conducting two studies, one remote and the other in person, was driven by the COVID-19 pandemic, which mandated remote work and communication in the first study conducted in 2021, while the second study in 2022 was in person following the relaxation of pandemic restrictions.

Procedure
The first study was conducted remotely over a video conference platform, and the interactions with the agent were conducted via text.Since the first study was conducted in a remote setting, it was self-administered through a web-based survey tool.The second study was conducted in person at the campus of a large technology company, and the interactions were conducted via a voice-enabled web interface.
At the start of each session, the purpose of the study was explained as an attempt to better understand how people interact with virtual agents.Participants read and signed a consent form, which told them they had no obligation to participate.Participants then filled out a pre-study questionnaire, asking how often they interacted with chatbots in their daily lives and a post-study questionnaire, ranking their experience with virtual agents.Study administrators tried to stay out of the interaction as much as possible and only to intervene if participants reached a dead end in the session in less than two minutes, at which point the conversation was reset.

Participants
Participants for this research study consisted of interns and regular employees from a large technology company.Recruitment was conducted through multiple channels including in-person interactions in the workplace, Slack channels, and a snowball sampling approach.Eligibility criteria required participants to be 18 years or older.In the first study, 15 participants took part, with 40% identifying as female and 60% as male.The participants' self-reported familiarity with chatbots varied across different levels: 27% reported low familiarity (ratings 1-2), 40% reported medium familiarity (rating 3), and 33% reported high familiarity (ratings 4-5).In the second study, there were 16 participants, with 38% being female and 63% being male.The participants in the second study reported higher familiarity with the chatbots compared to those in the first study.Specifically, 8% reported low familiarity (ratings 1-2), 54% reported medium familiarity (rating 3), and 38% reported high familiarity (ratings 4-5).While the participants represented diverse nationalities, it is important to note that all participants were residing and working in the United States at the time of the study.This ensured their comfort and ability to communicate effectively throughout the study.

Methods
In the early stages of user testing our conversational agents, we discovered that we were not getting enough cases of repair on agent responses for a dedicated study on the phenomenon.In a largerscale study, this would not have been a problem.However, in our small-scale study, constrained by how much data two interns could collect alone over two summers, we needed more.Therefore, in order to increase the occurrence of repairs on agent responses, so that we could study how they work, we did two things.First, we asked participants to talk to a tutorial agent that taught them basic navigation actions [13] so that they would know that the agents respond to such natural conversational actions, unlike chatbots they may have used before.The tutorial did not include all possible user-initiated repair features, but it included the main ones: repeat, paraphrase and example requests.
The second thing we did was intentionally introduce understanding trouble into the conversation design itself.Good conversation design reduces, although cannot eliminate, opportunities for repair.But bad design can increase them.We selectively adopted the design principle: choose vocabulary your audience is unlikely to know.Specifically, we chose real but somewhat obscure terms for selected things, knowing that some participants would be unlikely to know what they mean.For example, the coffee-ordering agent offers customers "Chantilly" or a "Platz, " creating the opportunity for participants to ask things like, "What's that?"The buddy agent, while engaging in small talk, mentioned games like "Splendor" and movies "with a strong A.I. lead" or "in which we take their rightful place." And the teaching agent referred to a quiz about the preceding lecture on computer history as "taking the viva" or as "testing your big blue heart." In most cases, our bad design elicited more requests for paraphrases, examples and repeats from participants.In addition, other terms, not intended to be obscure, but which were unknown to some participants, were also objects of repair on agent responses.They included devices from computer history, such as "dial recorder," "mainframe" and "disk drive," as well as the popular TV game show, "Jeopardy."In the end, we achieved a mix of cases of repair that were expected and unexpected.
As shown in the Figure 1, participants interacted with the agents either via text (left) or via voice (right).The administrators explained some of the basic mechanics of the interface, such as a thin blue bar at the top of the window, indicating the remaining time until the agent's next utterance.In voice mode, participants could read a real-time transcription of their utterances on the computer screen and "see what the agent heard." Consequently, participants had several seconds to self-repair if the transcription was inaccurate or "misheard" by the agent (see Section 4.3).However, the representations of both the agent's and user's utterances are temporary, rather than persistent in a chat history.They fade after 5-9 seconds.The novel voice interface therefore provides brief visual feedback for checking what the system heard.The research methods used in this study include a combination of quasi-experiment and Conversation Analysis (CA).Conversation Analysis typically involves the capture of naturally occurring human talk-in-interaction.We adapted this approach by using real users but in three simulated scenarios in both remote and face-to-face lab settings to collect data [19].The purpose of constructing three different use cases was not to estimate how the agents might perform in real-life socializing, teaching and ordering scenarios.The purpose was twofold: first, to demonstrate all of the main interaction patterns specified in the IBM Natural Conversation Framework [13]; and second, to enable the subjects to understand the scenarios easily so they would know what kinds of things to say.Furthermore, we collected human-computer interaction (HCI), intended to emulate features of human-human conversation, instead of the latter itself.However, adapting the methods of CA to HCI has a long tradition, and they are especially well-suited to analyzing conversational user interfaces, which emulate, or attempt to emulate, mechanics from human talk [13,16,17].
The data for this study consists primarily of user logs, audio-visual screen recordings and transcripts.The initial user logs were generated automatically by our AI Chat application (Figure 1) into a CA-transcript-like format.Speakers are labeled (U for user, A for agent), timecodes for consecutive lines are subtracted and represented as silence between turns in tenths of seconds, and the lines of the transcripts are numbered for easy reference in the analysis.For the voice study, audio-visual recordings of the participants' and agents' voices were collected, as well as the corresponding screens.These recordings were used to refine the CA-transcript-like logs into proper CA transcripts.
The logs and transcripts were then analyzed, by two interns and one mentoring scientist, using the standard CA methods of collection building and line-by-line analysis, which are qualitative, observational methods [5,19].Manual analysis of logs by three investigators resulted in the construction of a dozen collections, or clusters, of structurally similar transcript excerpts.Each collection revealed a typical trajectory of the interaction, as well as a few alternative ones.The aim of such analyses is not the identification of quantitative patterns in the occurrence of the phenomena, but of qualitative patterns, especially sequential ones, in the phenomena themselves.In other words, to demonstrate how the interactional mechanism of a phenomenon works.The research questions for this study are: how are the user-initiated repair features of the framework used on a turn-by-turn basis, and do participants use them as intended?

Understanding Troubles
One of the distinctive features of natural conversation, in contrast to other forms of natural language use, is that it contains methods for achieving mutual understanding of the content.That is, speakers can clarify what they mean to their recipients and those recipients can ask for such clarifications.In that way, the meaning of the content can be tailored to a particular recipient's knowledge.Moore and Arar define several general methods for users to request variations in the wording of agent utterances, including definition requests, understanding checks, example requests and paraphrase requests [13].
4.1.1Definition Requests.The most common type of user-initiated repair on agent responses in our data was definition requests.This was no accident.Because we followed the bad design principle, choose vocabulary your users are unlikely to know, most of them asked for definitions of these intentionally obscure terms.After any utterance, users may request definitions of terms in that utterance, and the agents will supply definitions if available (see Pattern 1).
(1) Pattern: Definition Request 1 A: <ANY UTTERANCE> 2 U: DEFINITION REQUEST 3 A: DEFINITION Definitions introduce new information by articulating the assumed meaning of terms, but they do not change the intended meaning of the prior utterance; they clarify it.
In the following excerpt from the coffee order scenario in the voice condition, the user requests a definition of one of the poorly chosen words (Excerpt 1).
( (3.5) 21 U: Yeah I definitely want some of that in there.
Here the user asks for a definition, "What what is Chantilly" (line 17) in response to the agent's offer of a coffee option (line 15).This repair initiator succeeds in getting the agent to provide a definition of "Chantilly" (line 19), and the user accepts the option (line 21).This participant used the repair exactly as intended.Likewise, in the text condition, users encountered trouble understanding the intentionally obscure term "Platz" (Excerpt 2).
( In this case, also from the coffee order scenario, the customer requests a definition, "What is a platz?" (line 31), in response to the agent's offer of a dessert item (line 29).The repair initiator succeeds in eliciting a definition of the term (line 33), which is marked with surprise, "Oh!"After a lengthy pause (line 34), the customer accepts the offer (line 35), and the agent completes the order (line 37).The definition request feature again enables the user to learn a new term.
Of course, the main purpose of user-initiated definition requests is not to manage troubles with intentionally obscure terms, but with those that are confusing despite following good design principles.Users possess different levels of knowledge and there is no way to predict exactly what they know.However, with user-initiated definition requests, these troubles can be overcome as they arise.For example, to some users, "disk drive" is obscure (Excerpt 3).
(3) [P16:teaching:voice] 29 A: In nineteen fifty seven, (0.After the agent tells the user about a well-known event in which a computer won the popular TV game show, Jeopardy, the user asks for a definition of Jeopardy (line 109).The agent then defines the term "Jeopardy" (line 111) and the user acknowledges it (line 113).This enables the agent to continue with the lecture (line 115) without the user still wondering what the term means.The ability for users to obtain definitions of words used by the agent that they do not know is critical for mutual understanding, especially in organizational settings, which tend to involve jargon.
4.1.2Indexical Definition Requests.Another way that users request definitions of problematic terms is by using an indexical reference, such as "what's that?" instead of repeating the term itself in the repair initiator.This is harder for an automated agent to handle because it must infer which term the user means.In our framework, keywords can be designated by content designers as the default for indexical definition requests.When the default is wrong or when there are multiple keywords in an agent utterance, users must switch to an explicit definition request to distinguish them.
The pattern is very similar to that of the explicit definition request (Pattern 2).
( In the following case from the coffee order scenario, in the voice condition, the user requests a definition of the term "Chantilly" (like in excerpt 1), but does not repeat the term itself (Excerpt 5).
( Here the agent offers "Chantilly" (line 27) as an option, and the user says, "What is that?" (line 29).
Because "Chantilly" is specified as the keyword for this utterance in the dialog code, the request succeeds in eliciting a definition of the term (line 31).After a brief pause, the agent repeats the offer (line 33), and the user declines it (line 35) but declines it with better knowledge of what he is turning down.Similarly in the following case from the text condition of the coffee order scenario, the user requests a definition of a different intentionally obscure term (Excerpt 6).In this instance, the user elicits a definition of "platz" with a simple request, "What's that" (line 35), in response to the agent's offer (line 33).The user accepts the offer with "Sure" (line 39), enabling the agent to complete the order (line 41).We see then that both explicit and indexical definition requests work in very much the same way whether they are in voice-or text-based conversations.
4.1.3Understanding Check.Another pattern we observed in the text condition only is the user simply repeating a word or term in the agent's prior utterance.By doing this, users indicate that they are having some kind of trouble with the term but do not specify what type: hearing or understanding.In human-human conversation, the recipient of the repeat decides whether to repeat the word or to define it.But in our design framework we use two rules.First, if the interaction is in voice mode, then there is a possibility that users did not hear the word (Pattern 3).
(3) Pattern: Hearing Check (Voice) 1 A: <ANY UTTERANCE> 2 U: FULL/PARTIAL REPEAT 3 A: CONFIRMATION Such a hearing check can be used to confirm that the user heard a particular term correctly when there is no chat history at which to look back.Second, if the interaction is in text mode, then the repeat by the user of a term contained in the agent's prior utterance is treated as a request for a definition (Pattern 4).
(4) Pattern: Understanding Check (Text) 1 A: <ANY UTTERANCE> 2 U: KEYWORD REPEAT 3 A: DEFINITION Because the user can look back at the chat history, the agent assumes that the trouble involves the user's understanding of the term's meaning.Although we expected to see instances of both patterns in our data, we captured only the latter.This may be due to the small scale of our study and/or to the fact that this was a hidden feature not included in the tutorial.
In the following case from the coffee order scenario in the text condition, the user repeats the obscure term in the agent's offer (Excerpt 7).In response to the agent's offer of a "platz" (line 53), this user repeats the term with a typed question mark (line 55).The agent treats the questioning repeat as a request for a definition of that term by producing one (line 57).This enables the user to complete the offer sequence with a declination (line 59), and the agent closes the order (line 61).
In addition to the intentionally obscure terms, other terms were also called out by users.In the teaching scenario, text condition, one user produced multiple repeats on the same segment of the lecture on computer history (Excerpt 8).(8)  In response to one part of the multi-part lecture on computer history by the agent (line 45), this participant uses the pause between parts to repeat a key term, "mainframe computer?"with a question mark (line 47).The agent treats the repeat as a request for a definition of that key term (line 49).The user then responds with the repeat of a second key term in the same utterance, "vacuum tubes?"The agent again provides a definition but to the second term (line 53).The user then acknowledges the definition (line 55) and the agent continues with the lecture (line 57).
In text mode, repeats of keywords in the agents' utterances by users are thus treated by our agents as understanding checks to which definitions are produced as repairs.Users requested examples in both the text and voice conditions.In the following case from the buddy scenario in the text condition, the user requests examples of a category of movies (Excerpt 9).After a lull in the conversation, the agent introduces a new topic: movies (line 47).It tells the user that it likes, "ones with a strong A.I. lead" (line 47).This user treats the phrase as confusing by initiating repair with a request for examples of that type of movie, "Any examples?"(line 49).The buddy agent provides three examples (line 51).And the user closes the sequence with a positive assessment, "Thats pretty cool" (line 53).The examples ground the abstract category in the concrete.Similarly, in the teaching scenario in the text condition, a user asks for an example during the lecture on computer history (Excerpt 10).(10)  The agent, acting like a teacher, discusses the "Automatic Sequence Controlled Calculator, or Mark I" and explains that it could "execute long computations" (line 33).The user requests an example of the prior utterance simply by typing, "example please" (line 35).To this the agent provides an example of a "long computation" (line 37).After the user acknowledges the repair with "ok" (line 39), agent continues with the lecture (line 41).Likewise in the teaching scenario under the voice condition, users also asked for examples of terms in the teacher's utterances.In the following case, this user asks for an example after asking for a definition (Excerpt 11).(11)  When the agent checks to see if the student is ready for the "viva," or quiz (line 103), this user responds with a request for a definition, "What's the viva?" (line 105).However, visual feedback in our interface, indicates to the user that the agent misheard her question.The output of the speech-to-text (STT) service displayed on the screen indicates that the agent heard the definition request as, "what's the beef up" (line 106).Before the agent can respond, the user repeats the definition request using the same words (line 107) and the agent again mishears it, this time as, "what's the view but" (line 108).The user repeats the repair initiator a third time, using the same words (line 109), and this time the agent displays the correct hearing and provides a definition, "Viva voce is just an oral examination" (line 111).Repeats of voice utterances due to STT errors are discussed more below (see Section 4.3).But instead of proceeding to answer this readiness check (line 103), this user produces a second repair initiator, this time an example request, "Can you give me an example" (line 113).The agent gives two examples, "Pop quiz, oral exam" (line 115).The user responds with confusion, "Huh?" (line 117), instead of acknowledgment.To this, the agent simply produces an acknowledgment (line 119) instead of further clarification, but it nonetheless succeeds in getting the user to answer the readiness check, "Okay I'm ready" (line 121), and the agent proceeds with the quiz (line 123).4.1.5Paraphrase Request.Although most conversational user interfaces request that users paraphrase their prior utterances when the agent cannot classify their intent, few of them enable the user to request paraphrases.Phrases like, "What do you mean?, " "I don't understand" and "Please say it a different way, " are commonly used by agents to elicit paraphrases from users.Paraphrase requests indicate trouble in understanding a prior utterance but do not specify particular terms in the utterance that are problematic, like definition requests do.The most straightforward response to them then is a paraphrase of the whole utterance, or saying the same thing in different words (Pattern 6).Good paraphrases should be easier to understand than the original by making fewer assumptions about knowledge on the part of the user and avoiding technical terms.Because most users will not receive the paraphrase and because those who do have indicated trouble in understanding, paraphrases may be verbose and inelegant.When understanding troubles arise, the preference for minimization should be relaxed in favor of achieving understanding even if at the cost of efficiency [16].
In both the voice and text conditions of our study, users occasionally requested paraphrases of what the agent said, especially in the case of our obscure terms.In the following case from the text condition in the teaching scenario, the student produces a paraphrase request (Excerpt 12).After completing the delivery of the lecture on computer history and offering to answer questions about it (not shown), the agent checks to see if the user is ready to take a quiz on the material using an obscure phrase "show me your big blue heart" (line 73).In response, the user requests a paraphrase with "what do you mean" (line 75).The agent then paraphrases its prior utterance, by replacing the obscure phrase with a clearer one, "take a short test on the I.B.M. facts I just told you" (line 77).The paraphrase enables the user to answer the readiness check with "yes" (line 79), and the agent continues onto the quiz (lines 81-83).
Paraphrase requests were used in basically the same way in the voice and text conditions.The following case involves the same readiness check as the previous case, but with a different obscure phrase (Excerpt 13).(13)  When checking to see if the user is ready to take the quiz, the teacher agent identifies the quiz as "the viva" (line 63), a real term that is unfamiliar to most Americans.This user displays understanding trouble by saying, "What do you mean" (line 65).To repair the trouble, this agent provides both a definition of "viva" and a paraphrase of its whole prior utterance (line 67).User then affirms that she is ready (line 69) and agent proceeds with the quiz (lines 71-73).
Similarly in the coffee order scenario, some users responded to obscure terms with paraphrase requests.In the following case in the voice condition, the user requests a paraphrase of the agent's request for a cup size (Excerpt 14).(2.4) 33 U: I think I will: go for the (0.5) chico:.34 (6.3) 35 A: What's your name?
The order-taking agent asks this user what size coffee she wants by using obscure terms, like some real coffee retailers do, "pico, chico or topo" (line 27).The user responds with a paraphrase request, "oh, what does that mean" (line 29).Because it is singular, "that" in the user's repair initiator appears to refer to the whole utterance.The agent then provides a paraphrase of the whole utterance, "I need to know what size of cup to use" (line 31).The paraphrase enables the user to choose a size, "chico" (line 33) and the agent to continue to the next detail request in the coffee order (line 35).
Similarly in the coffee-order scenario under the text condition, the following user types a paraphrase request in a somewhat different form to manage an obscure term (Excerpt 15).In response to the agent's offer of a "platz" (line 53), this user replies with an explicit indication of understanding trouble, "I don't understand" (line 55).The agent then paraphrases the whole offer utterance, including replacing "platz" with "our apple pie" (line 57).In this case, the paraphrase functions much like a definition.The paraphrase enables the user to complete the offer sequence by declining the platz (line 59) and enables the agent to complete the order (line 61).
We see then that users, like agents, sometimes require the other to redo their prior utterance using different words.Definitions, examples and paraphrases help get users unstuck so that they can continue in the conversation.

Hearing Troubles
In the previous section, we examined interaction patterns in which users manage troubles in understanding automated agents by requesting different types of paraphrases, examples and definitions involving the prior utterance by the agent.In this section we focus on ways users manage troubles in hearing, which involves different ways of repeating the prior utterance.In section 4.1.3,we already mentioned one method for managing hearing troubles: the hearing check.Users can check their hearing of a term by repeating it and if they do so correctly, the agent will reply with "That's correct!"However, in our small data set of voice conversations, we did not find any instances of this pattern (neither did we teach users that they could do it).

Repeat
Requests.The most straightforward method for repairing a trouble in hearing is simply to request a repeat of the whole prior utterance (Pattern 7).There are multiple reasons that users may need to request a repeat of what the agent said.Environmental noise or competing talk may prevent users from hearing an agent utterance, or part of one.This can occur potentially after any utterance.In addition, automated agents sometimes fail to generate the speech outputs they should.For example, agents' utterances may get cut off if the user produces overlapping speech.
In the following case in the voice condition from the buddy scenario, the user requests a repeat of part of the story the agent is telling after it makes an error in its delivery (Excerpt 16).
(16) [P16:buddy:voice] 187 A: Eve focuses on finding the existence-However, (0.2) Wall-E then rescues Eve from a dust storm and provides her with shelter, (0.In this case, the buddy agent begins telling one part of the story with, "Eve focuses on finding the existence-" but then cuts off that part of the story and begins the next, "However, Wall-E then rescues Eve..." (line 187) with no pause between.In response, this user notices that something is missing by requesting a repeat, "Can you repeat the last bit of the story again?" (line 189).However, this fails to elicit a repeat of the missing part, that is, "Eve focuses on finding the existence of life and ignores Wall-E completely, as he is a non living thing."Instead it elicits a repeat of the immediately preceding story part from the agent (line 195).Had the user said "go back, " the agent would have repeated the right part.Yet the user acknowledges the repair with "I see" (line 197), and agent proceeds with the story (line 199).
In the following case from the coffee order scenario, the customer requests a repeat of an agent utterance after first requesting a paraphrase of it (Excerpt 17).(17)  Is that correct?
In response to our obscure terms for coffee sizes (line 41), this user requests a paraphrase by indicating trouble in understanding, "I don't understand" (line 43).However, before the agent can respond, the user changes tack and requests a different repair, a repeat, "Can you repeat it again" (line 45).But the visual display shows that the agent misheard the repair initiator as "Charopidae again" (line 46).The user then repairs his repair initiator by repeating it, "Can you repeat it again" (line 47).This time the agent hears correctly and produces a repeat of its prior utterance (line 49).Instead of acknowledging the repair, the user requests definitions, "what is the pico and the topo" (line 51).But the agent fails to recognize the definition request and instead incorrectly extracts "pico" from it, as the user's choice, as indicated in the order summary (line 57).The agent then proceeds to the next detail request with an error in the order (line 53).

Partial Repeat
Requests.While the repeat request seeks a repeat of the whole prior turn, the partial repeat request seeks only a portion of it [22].By repeating part of a prior utterance followed by a question word, the speaker can get the other to repeat only the unheard part, that is, the part of that utterance following the repeated words (Pattern 8).Like a full repeat, a partial repeat does not introduce new information or words.A partial repeat redoes only the portion of the prior utterance that is missing from the user's partial repeat.
The partial repeat request is a feature in our automated agents not revealed in the tutorial.No doubt due to its hidden nature and the quiet testing environment, we captured only one instance of a partial repeat request.In the teaching scenario, a user employs it with an obscure term (Excerpt 18).In this case, the agent checks to see if the user is ready for the quiz by asking, "Are you ready for the viva?" (line 83).In response, the user initiates repair by repeating most of the agent's utterance, transforming the pronoun and appending a question word, "Am I ready for what?" (line 85).Initially, the agent fails to hear the repair initiator correctly as, "I'm not ready for what" (line 86) so the user repeats it verbatim (line 87).The agent summons the user (line 89), due to what it sensed as a long silence (lines 84-86).But then the agent produces the partial repeat, "for the viva?" (line 91).Despite some hearing troubles on the part of the agent, this user succeeds in eliciting a repeat of only the problematic part of the prior utterance.After receiving the repeat (line 91), the user then requests a definition, "what's the viva" (line 93).

Speaking Troubles
In addition to initiating repair on the agent's utterances (other-repair), we also saw users initiating repair on their own utterances (self-repair).This was captured in the voice condition through audio-visual recording of the participants' voice and screen.A corresponding kind of self-repair may have occurred in the text study, but we could not capture the participant's typing in real time so we do not know.In the voice condition, participants could see the output from the speech-to-text (STT) service as they were speaking and therefore could repair their utterances before the system responded to them (Pattern 9).( 9) Pattern: Same-turn Self Repair 1 U: <ANY UTTERANCE> 2 S: <INCORRECT TRANSCRIPTION> 3 U: REPEAT In this pattern, S refers to the screen on which users could see how their utterances were being transcribed.

Self Repeats.
Even in the quiet testing environment, hearing errors on the part of the agent were frequent.Our visual display enabled users to correct these errors upstream, before the agent responded to the wrong words.Typically, if the transcription of what the user said was incorrect, participants repeated it.In the following case from the teaching scenario, the user's definition request on a term not intended to be obscure, "Jeopardy, " is initially misheard by the system (Excerpt 19).In delivering the lecture on computer history, the teaching agent tells the user about the Watson computer's victory on Jeopardy (line 73).At the pause between story parts, this user requests a definition, "What is Jeopardy?"(line 75).But the agent mishears the request as, "What is strategy, " as displayed on the screen (line 76).The user sees this and repeats the definition request before the agent can respond to the mishearing (line 77).This time the agent hears correctly and provides a definition of "Jeopardy" (line 79).The user then acknowledges the repair (line 81).
The same pattern occurs in the next case from the coffee-order scenario (Excerpt 20).In response to the agent's offer of whipped cream using the obscure term, "Chantilly" (line 12), this user requests a definition of the term, "What's Chantilly?"(line 13), but agent mishears it slightly as "What Chantilly" (line 14).The user then repeats the definition request before the agent can respond (line 15).This time the speech-to-text (STT) captures it correctly, and the agent provides the definition (line 17).In response to a name request (line 17), this participant replies simply with his name, "Chad" (line 19).However, the screen shows that the STT service misheard the name as "Check" (line 20).The participant then self-repairs by repeating, "Chad" (line 21).This time the display shows "Chapter" (line 22).Instead of repeating the turn a second time, the participant switches strategies and paraphrases the problematic utterance in an expanded form, "My name is Chad" (line 23).This paraphrase, because the additional words, adds more context enabling the STT service to recognize the turn correctly as indicated by an appropriate next utterance which reproduces the name correctly (line 25).
After receiving definitions of the obscure terms for coffee sizes (line 31), this user attempts to choose a size using the more common term, "I'll get medium please" (line 33).However, the screen shows that the agent mishears "medium" as "media" (line 34).So the user repeats his selection using the same words (line 35).But again the agent makes the same mistake (line 36).In the face of the STT service's trouble with his pronunciation of "medium," the user changes his order to "small" (line 37).The agent recognizes "small" and checks the order by summarizing it (line 39).So although this user eventually succeeds in progressing in the conversation, in a real-world setting, he would not have gotten what he wanted.

Summary
Table 1 provides a summary of the different types of repair analyzed above and the number of cases collected.The purpose of this analysis is not to measure the occurrence of these phenomena but rather to demonstrate how they work on a turn-by-turn basis, or to understand their mechanism.
Using multiple cases, we show that the phenomenon is recurrent and that it can take different trajectories.We observed seven types of user-initiated repair in second position, and two types in first position in our user tests of three scenarios in both voice and text.Of the second-position repairs, the ones that elicited definitions were the most common (53 cases): explicit definition requests, indexical definition requests and understanding checks.This was expected because the primary form of trouble that we introduced into our conversation design was obscure terms.Both types of definition request worked the same way in the voice and text conditions.Understanding checks worked differently by design.Paraphrase requests were the next most frequent type of second-position repair (17 cases).Although users did not specify it, the trouble source for these was also usually the obscure terms we introduced.
One unexpected finding, as noted above, is that there were relatively few repeat requests (12 cases) in the voice condition.We expected more because our voice interface lacked a persistent chat history to which users could refer.However in retrospect, it seems likely that the temporary display (5-9 seconds) of the agent's utterances, which accompany the speech utterances, obviated the need for at least some repeats.The partial repeat request, in which the user repeats a portion of the agent's utterance followed by a question word like "what?" occurred only once in our study.This low rate may be due in part to the fact that the tutorial did not mention this feature.
Of the first-position repairs, self repeats were by far the most common (60 cases).These responded to the temporary display of the transcription of the user's utterance when it contained speechto-text errors.In most cases such repeats were successful in getting the agent to hear correctly; however, in some cases they failed and participants proceeded to formulate their prior utterance in different words, or paraphrase it (37 cases).This is in line with what others have found with true voice-only agents [14,16,24].Finally there were no types of repair, both second and first position, that were mentioned in the tutorial but not tried at all by the participants.

DISCUSSION
In the preceding section, we demonstrated a systematic approach to user-initiated repair of agent utterances, in second position, as outlined in Moore et al. [12,13].Not only does it enable users to get definitions and repeats, but it enables indexical definition requests, partial repeat requests, example requests, paraphrase requests and understanding checks.These all involve the agent paraphrasing or repeating all or part of its prior utterance for anything the agent utters, which chatbots and voice assistants typically cannot do.Our approach takes seriously Schegloff's observation: "Everything is... a possible repairable or a possible trouble-source" [21].
We showed that participants had no trouble in understanding how to use these novel repair features.They are all familiar practices from human conversation and the tutorial informed participants that the agents will recognize such actions (except for partial repeat and understanding check).In addition, while participants' use of understanding repairs tended to be successful, their use of hearing repairs, or same-turn self repairs to manage agent mishearing, were much less so.In the latter case, the success and failure of the repairs seemed to depend on whether or not the participant had a native-English accent.
Although we leveraged a principle of bad design, choose vocabulary your users are unlikely to know, in order to increase the occurrence of user-initiated repairs artificially, we nonetheless collected naturally occurring cases too.The teaching scenario naturally involved somewhat obscure knowledge for the purposes of learning.Terms from the domain of computer history, such as dial recorder, floppy disk, mainframe and disk drive, were obscure to some participants even though they were not intended to be confusing.Even Jeopardy, a popular TV game show, elicited definition requests from more than one of our participants.This shows that user-initiated repair features are valuable even when trouble is not intentionally introduced.Designers can never fully anticipate what every users will know and not know, just like human speakers.
In reviewing our data for the voice condition, we were somewhat surprised that we did not find more cases of users attempting to repair troubles in hearing.However, in retrospect, all of the study participants spoke to our agents in a quiet environment, with no competing sources of sound or noise.In addition, our novel voice interface displays the agent's utterances temporarily (for 5-9 seconds), which likely eliminated the need for some additional repeat requests.To elicit more cases of hearing repairs, we could do something similar to what we did with understanding repairs and intentionally insert trouble into the conversations.That is, in future studies we could introduce audible noise into the testing environment, preventing users from hearing what the agent says for some portion of the time.Increasing the speed of the text-to-speech output and modifying the pronunciation of keywords could also artificially create more need for repairs of hearing troubles.And to demonstrate the full value of repeat-oriented repairs, we must remove the visual user interface so that participants must rely on their hearing entirely.
In addition, in the voice condition, we demonstrated two types of user-initiated self repair, in first position, that is before the agent speaks.This type of repair is somewhat unusual due to the novelty of the user interface with its temporary display of utterances instead of a chat history or no display.There is no equivalent in human-human conversation in which one speaker can see how the recipient heard him or her before a response.Nonetheless, we demonstrated that displaying the STT output to users enables them to repair, or at least attempt to repair, mishearings before they are used to generate an agent response.When the agent experiences a mishearing, then a repeat is the best first repair.However, if the agent hears correctly and still responds unexpectedly, then a paraphrase would be more fitting.
Although displaying STT output to users in real-time has no equivalent in human conversation, it may nonetheless be useful in conversational user interfaces.Displaying the STT output to users gives them greater visibility into the sources of the agents' troubles.Spotting certain highly likely mishearings based on context, commonsense knowledge or logic is still very hard for machines, but easy for humans.Enabling users to spot the mishearings instead, before they are sent to natural language understanding services, could be a way to mitigate this limitation of the current technology.
The value of enabling user-initiated repair is no less than mutual understanding or successful communication [20].Cahn and Brennan "predict that human-computer interaction will be significantly improved by... enabling users to evaluate and express the relevance of a system's actions" [4].Moore and Arar go further arguing for a broader conception of understanding in dialogue design, one in which repair practices play a central role [13]: "Natural language understanding (NLU) techniques, while necessary, are not enough.They provide only for the machine to interpret the user's utterances.But such interpretations must be tested in interaction before understanding can be determined and must be repaired if misunderstanding or partial understanding is displayed.Thus conversational systems also need natural conversation understanding (NCU), or the ability to engage in repair practices" [13].Understanding has a fundamental interaction component.
This work is part of a larger program to lead user interface design with conversation science.User experience designers tend to lack a formal understanding of how natural conversation works (it is not part of their training).Consequently, designers tend to follow engineers rather than lead them in designing the user experience for conversational agents.In an attempt to bridge this gap, Moore et al. created the IBM Natural Conversation Framework a "verbal design system... of generic conversational interaction patterns, and corresponding software components, that are adapted from decades of empirical research in the field of Conversation Analysis" [12,13].While Moore and Arar's design framework provides patterns and components for the major parts of natural conversation, in this study we focused only on one narrow set of them.We chose the types of repair in human conversation that tend to get overlooked in the design of chatbots and voice assistants today.Although features for repairing agent responses, initiated by users are rare in the industry, we showed that subjects can use them easily and correctly, no doubt because they are familiar from human conversation.Our findings provide support for the inclusion of such features in any type of conversational agent.In short, we showed that users need clarifications too.
It is important to note that this study was conducted on an intent-based dialog platform (Watson Assistant), not on a large language model (LLM), like ChatGPT.While most designers and developers of intent-based agents fail to support initiations of repair by the user systematically, LLMs have great potential for handling these.LLMs are good at generating relevant text on the fly.This is useful in both paraphrasing prior utterances or parts of them (i.e., repair) and tailoring initial utterances to particular users based on what they said earlier in the interaction (i.e., recipient design).However, LLMs (e.g., GPT4, Flan, Llama, Granite) vary a great deal in terms of whether they respond correctly to repeat, paraphrase, example and definition requests.Sometimes LLMs provide a paraphrase instead of a requested repeat and vice versa.Some LLMs tend to repeat the prior utterance in response to all of these different types of requests.Also, the output of LLMs is often inconsistent in style, ranging from conversation like (i.e., a sentence or less) to document like (i.e., paragraphs, bullets, titles).In future studies, we will experiment on LLMs' conversational competence regarding user-initiated repair on agent responses.It seems likely that good performance can be achieved with instructive prompting, few-shot learning and/or fine-tuning.Regardless of whether intents or LLMs are used, designers must understand the organization of repair systematically in order to insure that each type of repair works correctly [20,22].In conversational interfaces, repair is the natural resource for the parties to get unstuck in producing an appropriate next turn.6 CONCLUSION One conclusion we can draw from this study is that conversation is a two-way street.Conversational actions useful for the machine to take, may be equally useful for the user to take, especially when they involve understanding or hearing the talk.Currently automated agents routinely ask users to paraphrase confusing utterances, choose between candidate interpretations, provide required details, and more.These are equally useful for users in understanding and cooperating with the machine.Conversely, the user-initiated repair methods we present above, such as requesting definitions, examples and repeats, would be useful for machines to do too.If human conversation is systematically taken as the interaction metaphor for conversational user interfaces then user and machine should be able to take all the same actions.
Another conclusion is that users will display natural conversational behaviors if we enable them.The participants in our study had no trouble using the understanding repairs, definition, example and paraphrase requests, once they were told that these agents could recognize them.Had we not enabled these types of repair, participants would quickly learn that the agent does not do those actions if they happened to try them.And had we not taught them that the agent can do these actions through the tutorial, they may have not tried, drawing on past experiences with other chatbots or voice assistants.We found that modeling a feature closely on natural conversation and then telling users that the agents can do it was effective in enabling participants to use such features.
And finally we conclude that Conversation Analysis (CA) is critical for conversational UX design.Not only can CA inspire the design of conversational user interfaces, by articulating the interaction mechanics of natural conversation, but its methods can help us understand how interactions with such systems are organized in real-time, on a turn-by-turn basis [13].CA offers a microscope for understanding human-computer interactions at a fine-grained level, which reveals how the exact wording and precise timing of utterances impact the interactions.This is the level at which engineering must build and debug conversational systems.As such, the qualitative findings of CA studies, which expose the inner workings of interaction, nicely complement those of quantitative approaches.

( 7 )
[P4:order:text] 53 A: Would you like a platz, too? 54 No platz, okay.Your order is coming right up!

4. 1 . 4
Example Request.Sometimes the best way to clarify an utterance in a conversation is to provide examples, especially when the utterance contains general categories of things.Examples ground abstractions in the concrete, making them understandable.This (Pattern 5) is very similar to that for definition requests.

( 5 )
Pattern: Example Request 1 A: <ANY UTTERANCE> 2 U: EXAMPLE REQUEST 3 A: EXAMPLE Like definitions, examples introduce new information that clarify the intended meaning of the prior utterance.

( 12 )
[P19:teaching:text] 73 A: Are you ready to show me your big blue heart?74 (6.1) 75 U: what do you mean 76 (0.7) 77 A: Are you ready to take a short test on the I.B.M. facts I just told you about?78 Who invented the hard disk?

( 7 )
Pattern: Repeat Request 1 A: <ANY UTTERANCE> 2 U: REPEAT REQUEST 3 A: REPEAT Unlike definitions, examples and paraphrases, repeats do not introduce new information or even new words, although they may not be entirely verbatim.Repeats redo a prior utterance in the basically same words.

4. 3 . 2 21 ( 10 )
Self Paraphrases.Although the participants typically produced repeats as their first selfrepair, repeats sometimes fail to remove the source of the trouble.On the occasions of such failures, participants sometimes said their original utterance a different way, or paraphrased it (Pattern 10).Proc.ACM Hum.-Comput.Interact., Vol. 8, No. CSCW1, Article 187.Publication date: April 2024.Understanding is a Two-Way Street 187:Pattern: First-Position Self Repair 1 U: <ANY UTTERANCE> 2 S: <INCORRECT TRANSCRIPTION> 3 U: PARAPHRASE In the following case (Excerpt 21) from the coffee order scenario, the speech-to-text (STT) service struggles with recognizing the participant's name.

Table 1 .
Summary of types of user-initiated repair in second and first position