skip to main content
research-article

My Bad! Repairing Intelligent Voice Assistant Errors Improves Interaction

Published:22 April 2021Publication History
Skip Abstract Section

Abstract

One key technique people use in conversation and collaboration is conversational repair. Self-repair is the recognition and attempted correction of one's own mistakes. We investigate how the self-repair of errors by intelligent voice assistants affects user interaction. In a controlled human-participant study (N =101), participants asked Amazon Alexa to perform four tasks, and we manipulated whether Alexa would "make a mistake'' understanding the participant (for example, playing heavy metal in response to a request for relaxing music) and whether Alexa would perform a correction (for example, stating, "You don't seem pleased. Did I get that wrong?'') We measured the impact of self-repair on the participant's perception of the interaction in four conditions: correction (mistakes made and repair performed), undercorrection (mistakes made, no repair performed), overcorrection (no mistakes made, but repair performed), and control (no mistakes made, and no repair performed). Subsequently, we conducted free-response interviews with each participant about their interactions. This study finds that self-repair greatly improves people's assessment of an intelligent voice assistant if a mistake has been made, but can degrade assessment if no correction is needed. However, we find that the positive impact of self-repair in the wake of an error outweighs the negative impact of overcorrection. In addition, participants who recently experienced an error saw increased value in self-repair as a feature, regardless of whether they experienced a repair themselves.

Skip Supplemental Material Section

Supplemental Material

V5cscw027VF.mp4

Supplemental video

References

  1. Amazon.com. 2015. Amazon Echo. Smart Speaker.Google ScholarGoogle Scholar
  2. Sean Andrist, Xiang Zhi Tan, Michael Gleicher, and Bilge Mutlu. 2014. Conversational gaze aversion for humanlike robots. In Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction. ACM, 25--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anki. 2018. Vector. Robot toy.Google ScholarGoogle Scholar
  4. Apple Inc. 2017. Homepod. Smart Speaker.Google ScholarGoogle Scholar
  5. Zahra Ashktorab, Mohit Jain, Q Vera Liao, and Justin D Weisz. 2019. Resilient chatbots: repair strategy preferences for conversational breakdowns. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Timothy Bickmore, Ha Trinh, Reza Asadi, and Stefan Olafsson. 2018. Safety first: Conversational agents for health care. In Studies in Conversational UX Design. Springer, 33--57.Google ScholarGoogle Scholar
  7. Dan Bohus. 2007. Error awareness and recovery in conversational spoken language interfaces. Technical Report. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE.Google ScholarGoogle Scholar
  8. Konstantinos Bousmalis, Marc Mehu, and Maja Pantic. 2013. Towards the automatic detection of spontaneous agreement and disagreement based on non-verbal behaviour: A Survey of related cues, databases, and tools. Image and vision computing, Vol. 31, 2 (2 2013), 203--221. https://doi.org/10.1016/j.imavis.2012.07.003 eemcs-eprint-24491.Google ScholarGoogle Scholar
  9. Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, Vol. 3, 2 (2006), 77--101.Google ScholarGoogle Scholar
  10. Cynthia Breazeal and Brian Scassellati. 1999. How to build robots that make friends and influence people. In Intelligent Robots and Systems, 1999. IROS'99. Proceedings. 1999 IEEE/RSJ International Conference on, Vol. 2. IEEE, 858--863.Google ScholarGoogle ScholarCross RefCross Ref
  11. Susan E Brennan et almbox. 2005. How conversation is shaped by visual and spoken evidence. Approaches to studying world-situated language use: Bridging the language-as-product and language-as-action traditions (2005), 95--129.Google ScholarGoogle Scholar
  12. Ivan Bretan, Anna-Lena Ereback, Catriona MacDermid, and Annika Waern. 1995. Simulation-based dialogue design for speech-controlled telephone services. In Conference Companion on Human Factors in Computing Systems. ACM, 145--146.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Janet E Cahn and Susan E Brennan. 1999. A psychological model of grounding and repair in dialog. In Proc. Fall 1999 AAAI Symposium on Psychological Models of Communication in Collaborative Systems.Google ScholarGoogle Scholar
  14. Heloisa Candello and Claudio Pinhanez. 2018. Recovering from Dialogue Failures Using Multiple Agents in Wealth Management Advice. In Studies in Conversational UX Design. Springer, 139--157.Google ScholarGoogle Scholar
  15. Justine Cassell, Joseph Sullivan, Elizabeth Churchill, and Scott Prevost. 2000. Embodied conversational agents. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nicole Chovil. 1991. Social determinants of facial displays. Journal of Nonverbal Behavior, Vol. 15, 3 (1991), 141--154.Google ScholarGoogle ScholarCross RefCross Ref
  17. Herbert H Clark and Edward F Schaefer. 1989. Contributing to discourse. Cognitive science, Vol. 13, 2 (1989), 259--294.Google ScholarGoogle Scholar
  18. Kevin Corti and Alex Gillespie. 2016. Co-constructing intersubjectivity with artificial conversational agents: people are more likely to initiate repairs of misunderstandings with agents represented as human. Computers in Human Behavior, Vol. 58 (2016), 431--442.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ellen Douglas-Cowie, Roddy Cowie, and Marc Schröder. 2000. A new emotion database: considerations, sources and scope. In ISCA tutorial and research workshop (ITRW) on speech and emotion. ISCA.Google ScholarGoogle Scholar
  20. The Economist. 2017. Terry Winograd: Where Humans still Beat Computers. The Economist (Jan 2017).Google ScholarGoogle Scholar
  21. Paul Ekman. 1976. Pictures of facial affect. Consulting Psychologists Press (1976).Google ScholarGoogle Scholar
  22. Paul Ekman and Wallace V Friesen. 1969. The repertoire of nonverbal behavior: Categories, origins, usage, and coding. semiotica, Vol. 1, 1 (1969), 49--98.Google ScholarGoogle Scholar
  23. Yuan Fan and Qiuchen Wang. 2013. Robot. US Patent App. 29/431,926.Google ScholarGoogle Scholar
  24. Shinya Fujie, Yasuhi Ejiri, Kei Nakajima, Yosuke Matsusaka, and Tetsunori Kobayashi. 2004. A conversation robot using head gesture recognition as para-linguistic information. In Robot and Human Interactive Communication, 2004. ROMAN 2004. 13th IEEE International Workshop on. IEEE, 159--164.Google ScholarGoogle ScholarCross RefCross Ref
  25. Petra Gieselmann. 2006. Comparing error-handling strategies in human-human and human-robot dialogues. In Proc. 8th Conf. Nat. Language Process.(KONVENS). Konstanz, Germany. 24--31.Google ScholarGoogle Scholar
  26. Alex Gillespie and Flora Cornish. 2010. Intersubjectivity: Towards a dialogical analysis. Journal for the theory of social behaviour, Vol. 40, 1 (2010), 19--46.Google ScholarGoogle ScholarCross RefCross Ref
  27. Google. 2016. Google Home. Smart Speaker.Google ScholarGoogle Scholar
  28. Samuel D Gosling, Peter J Rentfrow, and William B Swann Jr. 2003. A very brief measure of the Big-Five personality domains. Journal of Research in personality, Vol. 37, 6 (2003), 504--528.Google ScholarGoogle ScholarCross RefCross Ref
  29. Chien-Ming Huang, Sean Andrist, Allison Sauppé, and Bilge Mutlu. 2015. Using gaze patterns to predict task intent in collaboration. Frontiers in psychology, Vol. 6 (2015), 1049.Google ScholarGoogle Scholar
  30. Chien-Ming Huang and Bilge Mutlu. 2013. The repertoire of robot behavior: Enabling robots to achieve interaction goals through social behavior. Journal of Human-Robot Interaction, Vol. 2, 2 (2013), 80--102.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Keith Johnstone. 2012. Impro: Improvisation and the theatre. Routledge.Google ScholarGoogle ScholarCross RefCross Ref
  32. Malte F Jung, Jin Joo Lee, Nick DePalma, Sigurdur O Adalgeirsson, Pamela J Hinds, and Cynthia Breazeal. 2013. Engaging robots: easing complex human-robot teamwork using backchanneling. In Proceedings of the 2013 conference on Computer supported cooperative work. ACM, 1555--1566.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Gina-Anne Levow. 1998. Characterizing and recognizing spoken corrections in human-computer dialogue. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 736--742.Google ScholarGoogle Scholar
  34. Jamy Li, Andrea Cuadra, Brian Mok, Byron Reeves, Jofish Kaye, and Wendy Ju. 2019. Communicating dominance in a nonanthropomorphic robot using locomotion. ACM Transactions on Human-Robot Interaction (THRI), Vol. 8, 1 (2019), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Diane J Litman, Julia B Hirschberg, and Marc Swerts. 2000. Predicting automatic speech recognition performance using prosodic cues. In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference. Association for Computational Linguistics, 218--225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, and Satoshi Nakamura. 2018. Emotional Triggers and Responses in Spontaneous Affective Interaction: Recognition, Prediction, and Analysis. Transactions of the Japanese Society for Artificial Intelligence, Vol. 33, 1 (2018), DSH-D_1-10. https://doi.org/10.1527/tjsai.DSH-DGoogle ScholarGoogle ScholarCross RefCross Ref
  37. Patrick Lucey, Jeffrey F Cohn, Takeo Kanade, Jason Saragih, Zara Ambadar, and Iain Matthews. 2010. The extended cohn-kanade dataset (ck): A complete dataset for action unit and emotion-specified expression. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on. IEEE, 94--101.Google ScholarGoogle ScholarCross RefCross Ref
  38. Daniel McDuff, Rana Kaliouby, Thibaud Senechal, May Amr, Jeffrey Cohn, and Rosalind Picard. 2013. Affectiva-mit facial expression dataset (am-fed): Naturalistic and spontaneous facial expressions collected. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 881--888.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michael F McTear. 2004. Spoken dialogue technology: toward the conversational user interface. Springer Science & Business Media.Google ScholarGoogle Scholar
  40. Matthew B Miles, A Michael Huberman, and Johnny Salda na. 2014. Qualitative data analysis: A methods sourcebook. 3rd.Google ScholarGoogle Scholar
  41. AS Miner, A Milstein, and S Schueller. 2016. Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health (vol 176, pg 619, 2016). JAMA INTERNAL MEDICINE, Vol. 176, 5 (2016), 719--719.Google ScholarGoogle ScholarCross RefCross Ref
  42. Nicole Mirnig, Gerald Stollnberger, Markus Miksch, Susanne Stadler, Manuel Giuliani, and Manfred Tscheligi. 2017. To err is robot: How humans assess and act toward an erroneous social robot. Frontiers in Robotics and AI, Vol. 4 (2017), 21.Google ScholarGoogle ScholarCross RefCross Ref
  43. Robert J Moore and Raphael Arar. 2018. Conversational UX design: an introduction. In Studies in conversational UX design. Springer, 1--16.Google ScholarGoogle Scholar
  44. Bilge Mutlu, Jodi Forlizzi, and Jessica Hodgins. 2006. A storytelling robot: Modeling and evaluation of human-like gaze behavior. In Humanoid robots, 2006 6th IEEE-RAS international conference on. Citeseer, 518--523.Google ScholarGoogle Scholar
  45. Bilge Mutlu, Takayuki Kanda, Jodi Forlizzi, Jessica Hodgins, and Hiroshi Ishiguro. 2012. Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 1, 2 (2012), 12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Bilge Mutlu, Toshiyuki Shiwa, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2009. Footing in human-robot conversations: how robots might shape participant roles using gaze cues. In Proceedings of the 4th ACM/IEEE international conference on Human robot interaction. ACM, 61--68.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Katashi Nagao and Akikazu Takeuchi. 1994. Speech dialogue with facial displays: Multimodal human-computer conversation. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 102--109.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Clifford Nass, Janathan Steuer, and Ellen R. Tauber. 1994. Computer are social actors. Conference on Human Factors in Computing Systems - Proceedings (1994), 72--78. https://doi.org/10.1145/259963.260288Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jakob Nielsen. 1992. Finding usability problems through heuristic evaluation. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 373--380.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sharon Oviatt, Margaret MacEachern, and Gina-Anne Levow. 1998. Predicting hyperarticulate speech during human-computer error resolution. Speech Communication, Vol. 24, 2 (1998), 87--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ana Paiva, Iolanda Leite, Hana Boukricha, and Ipke Wachsmuth. 2017. Empathy in Virtual Agents and Robots. ACM Transactions on Interactive Intelligent Systems, Vol. 7, 3 (2017), 1--40. https://doi.org/10.1145/2912150Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tomislav Pejsa, Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2015. Gaze and attention management for embodied conversational agents. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 5, 1 (2015), 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Marco Ragni, Andrey Rudenko, Barbara Kuhnert, and Kai O Arras. 2016. Errare humanum est: Erroneous robots in human-robot interaction. In 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 501--506.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Pramila Rani, Changchun Liu, Nilanjan Sarkar, and Eric Vanman. 2006. An empirical study of machine learning techniques for affect recognition in human--robot interaction. Pattern Analysis and Applications, Vol. 9, 1 (2006), 58--69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Frank Rudzicz, Rosalie Wang, Momotaz Begum, and Alex Mihailidis. 2015. Speech interaction with personal assistive robots supporting aging at home for individuals with Alzheimer's disease. ACM Transactions on Accessible Computing (TACCESS), Vol. 7, 2 (2015), 6.Google ScholarGoogle Scholar
  56. Harvey Sacks, Emanuel A Schegloff, and Gail Jefferson. 1978. A simplest systematics for the publisher of turn taking for conversation. In Studies in the publisher of conversational interaction. Elsevier, 7--55.Google ScholarGoogle Scholar
  57. A. F. Salazar-Gomez, J. DelPreto, S. Gil, F. H. Guenther, and D. Rus. 2017. Correcting robot mistakes in real time using EEG signals. In 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 6570--6577. https://doi.org/10.1109/ICRA.2017.7989777Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Maha Salem, Friederike Eyssel, Katharina Rohlfing, Stefan Kopp, and Frank Joublin. 2013. To err is human (-like): Effects of robot gesture on perceived anthropomorphism and likability. International Journal of Social Robotics, Vol. 5, 3 (2013), 313--323.Google ScholarGoogle ScholarCross RefCross Ref
  59. Emanuel A Schegloff. 1997 a. Practices and actions: Boundary cases of other-initiated repair. Discourse processes, Vol. 23, 3 (1997), 499--545.Google ScholarGoogle Scholar
  60. Emanuel A Schegloff. 1997 b. Third turn repair. AMSTERDAM STUDIES IN THE THEORY AND HISTORY OF LINGUISTIC SCIENCE SERIES 4 (1997), 31--40.Google ScholarGoogle Scholar
  61. Emanuel A Schegloff. 2000. When'others' initiate repair. Applied linguistics, Vol. 21, 2 (2000), 205--243.Google ScholarGoogle Scholar
  62. Emanuel A Schegloff, Gail Jefferson, and Harvey Sacks. 1977. The preference for self-correction in the publisher of repair in conversation. Language, Vol. 53, 2 (1977), 361--382.Google ScholarGoogle ScholarCross RefCross Ref
  63. Candace L Sidner, Christopher Lee, Louis-Philippe Morency, and Clifton Forlines. 2006. The effect of head-nod recognition in human-robot conversation. In Proceedings of the 1st ACM SIGCHI/SIGART conference on Human-robot interaction. ACM, 290--296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Sarah Strohkorb Sebo, Margaret Traeger, Malte Jung, and Brian Scassellati. 2018. The ripple effects of vulnerability: The effects of a robot's vulnerable behavior on trust in human-robot teams. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction. 178--186.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Bernhard Suhm, Josh Bers, Dan McCarthy, Barbara Freeman, David Getty, Katherine Godfrey, and Pat Peterson. 2002. A comparative study of speech in the call center: natural language call routing vs. touch-tone menus. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems. ACM, 283--290.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Margaret L Traeger, Sarah Strohkorb Sebo, Malte Jung, Brian Scassellati, and Nicholas A Christakis. 2020. Vulnerable robots positively shape human conversational dynamics in a human--robot team. Proceedings of the National Academy of Sciences, Vol. 117, 12 (2020), 6370--6375.Google ScholarGoogle ScholarCross RefCross Ref
  67. Mark West, Rebecca Kraut, and Han Ei Chew. 2019. I'd blush if I could: closing gender divides in digital skills through education. (2019).Google ScholarGoogle Scholar
  68. Alex C Williams, Harmanpreet Kaur, Gloria Mark, Anne Loomis Thompson, Shamsi T Iqbal, and Jaime Teevan. 2018. Supporting workplace detachment and reattachment with conversational intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Ziang Xiao, Michelle X Zhou, Q Vera Liao, Gloria Mark, Changyan Chi, Wenxi Chen, and Huahai Yang. 2020. Tell Me About Yourself: Using an AI-Powered Chatbot to Conduct Conversational Surveys with Open-ended Questions. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 27, 3 (2020), 1--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Bo Zhang, Qingsheng Cai, Jianfeng Mao, Eric Chang, and Baining Guo. 2001. Spoken dialogue management as planning and acting under uncertainty. In Seventh European conference on speech communication and technology.Google ScholarGoogle Scholar

Index Terms

  1. My Bad! Repairing Intelligent Voice Assistant Errors Improves Interaction

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!