Abstract
In "smart speaker'' digital assistant systems such as Google Home, there is no visual user interface, so people must learn about the system's capabilities and limitations by experimenting with different questions and commands. However, many new users give up quickly and limit their use to a few simple tasks. This is a problem for both the user and the system. Users who stop trying out new things cannot learn about new features and functionality, and the system receives less data upon which to base future improvements. Symbiosis---a mutually beneficial relationship---between AI systems like digital assistants and people is an important aspect of developing systems that are partners to humans and not just tools. In order to better understand requirements for symbiosis, we investigated the relationship between the types of digital assistant responses and users' subsequent questions, focusing on identifying interactions that were discouraging to users when speaking with a digital assistant. We conducted a user study with 20 participants who completed a series of information seeking tasks using the Google Home, and analyzed transcripts using a method based on applied conversation analysis. We found that the most common response from the Google Home, a version of "Sorry, I'm not sure how to help'', provided no feedback for participants to build on when forming their next question. However, responses that provided somewhat strange but tangentially related answers were actually more helpful for conversational grounding, which extended the interaction. We discuss the connection between grounding and symbiosis, and present recommendations for requirements for forming partnerships with digital assistants.
Supplemental Material
Available for Download
1. Screening Questionnaire, 2. Participant Instructions, 3.Post-task Questions, 4. Information About Task Counterbalancing
- Saleema Amershi, Saleema Weld, DaAmershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--13. https://doi.org/10.1145/3290605.3300233Google Scholar
Digital Library
- Tawfiq Ammari, Jofish Kaye, Janice Y. Tsai, and Frank Bentley. 2019. Music, Search, and IoT: How People (Really) Use Voice Assistants. ACM Trans. Comput.-Hum. Interact., Vol. 26, 3 (Apr. 2019). https://doi.org/10.1145/3311956Google Scholar
Digital Library
- Zahra Ashktorab, Mohit Jain, Q. Vera Liao, and Justin D. Weisz. 2019. Resilient Chatbots: Repair Strategy Preferences for Conversational Breakdowns. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--12. https://doi.org/10.1145/3290605.3300484Google Scholar
Digital Library
- Erin Beneteau, Olivia K Richards, Mingrui Zhang, Julie A Kientz, Jason Yip, and Alexis Hiniker. 2019. Communication Breakdowns Between Families and Alexa. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, 1--13. https://doi.org/10.1145/3290605.3300473Google Scholar
Digital Library
- Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottridge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, 3 (Sep. 2018). https://doi.org/10.1145/3264901Google Scholar
Digital Library
- Dan Bohus and Alexander I. Rudnicky. 2005. Sorry, I didn't catch that! An investigation of non-understanding errors and recovery strategies. In 6th SIGdial Workshop on Discourse and Dialogue. 128--143. https://www.isca-speech.org/archive_open/sigdial6/sgd6_128.htmlGoogle Scholar
- S.E. Brennan. 1991. Conversation With and Through Computers. User Modeling and User-Adapted Interaction 1 (1991), 67--86. https://doi.org/10.1007/BF00158952Google Scholar
Cross Ref
- Susan E Brennan. 1998. The grounding problem in conversations with and through computers. Social and cognitive approaches to interpersonal communication (1998), 201--225.Google Scholar
- Yun-Nung Chen, Asli Celikyilmaz, and Dilek Hakkani-Tür. 2017. Deep Learning for Dialogue Systems. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-Tutorial Abstracts,. Association for Computational Linguistics, Vancouver, Canada, 8--14. https://www.aclweb.org/anthology/P17--5004Google Scholar
Cross Ref
- Minji Cho, Sang-su Lee, and Kun-Pyo Lee. 2019. Once a Kind Friend is Now a Thing: Understanding How Conversational Agents at Home are Forgotten. In Proceedings of the 2019 on Designing Interactive Systems Conference (DIS '19). Association for Computing Machinery, New York, NY, USA, 1557--1569. https://doi.org/10.1145/3322276.3322332Google Scholar
Digital Library
- Herbert H. Clark. 1996. Using Language .Cambridge University Press. https://doi.org/10.1017/CBO9780511620539Google Scholar
- Herbert H Clark and Susan E Brennan. 1991. Grounding in Communication. (1991), 127--149. https://doi.org/10.1037/10096-006Google Scholar
- Herbert H Clark and Edward F Schaefer. 1989. Contributing to Discourse. Cognitive Science, Vol. 13, 2 (1989), 259--294. https://doi.org/10.1016/0364-0213(89)90008--6Google Scholar
Cross Ref
- Eric Enge. 2017. Rating the Smarts of the Digital Personal Assistants. https://blogs.perficientdigital.com/2017/04/27/1-rating-the-smarts-of-the-digital-personal-assistants/Google Scholar
- Joel E. Fischer, Stuart Reeves, Martin Porcheron, and Rein Ove Sikveland. 2019. Progressivity for Voice Interface Design. In Proceedings of the 1st International Conference on Conversational User Interfaces (CUI '19). Association for Computing Machinery, New York, NY, USA, Article 26, 8 pages. https://doi.org/10.1145/3342775.3342788Google Scholar
Digital Library
- Susan R Fussell and Robert M Krauss. 1989. The effects of intended audience on message production and comprehension: Reference in a common ground framework. Journal of experimental social psychology, Vol. 25, 3 (1989), 203--219.Google Scholar
Cross Ref
- Radhika Garg and Christopher Moreno. 2019. Understanding Motivators, Constraints, and Practices of Sharing Internet of Things. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, 2, Article 44 (2019), 21 pages. https://doi.org/10.1145/3328915Google Scholar
Digital Library
- Radhika Garg and Subhasree Sengupta. 2019. “When You Can Do It, Why Can't I?”: Racial and Socioeconomic Differences in Family Technology Use and Non-Use. Proc. ACM Hum.-Comput. Interact., Vol. 3, CSCW, Article 63 (2019), 22 pages. https://doi.org/10.1145/3359165Google Scholar
Digital Library
- Nigel Gilbert, Robin Wooffitt, and Norman Fraser. 1990. Organising Computer Talk. In Computers and Conversation, Paul Luff, Nigel Gilbert, and David Frohlich (Eds.). Academic Press, Chapter 11, 235--257. https://doi.org/10.1016/B978-0-08-050264--9.50016--6Google Scholar
- Jonathan Grudin. 2017. From Tool to Partner: The Evolution of Human-Computer Interaction. Synthesis Lectures on Human-Centered Interaction, Vol. 10, 1 (2017), i--183. https://doi.org/10.2200/S00745ED1V01Y201612HCI035Google Scholar
Cross Ref
- Jonathan Grudin and Richard Jacques. 2019. Chatbots, Humbots, and the Quest for Artificial General Intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). Association for Computing Machinery, New York, NY, USA, Article 209, 11 pages. https://doi.org/10.1145/3290605.3300439Google Scholar
Digital Library
- Drew Harwell. 2018. Why some accents don't work on Alexa or Google Home. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/Google Scholar
- Ryuichiro Higashinaka, Kotaro Funakoshi, Masahiro Araki, Hiroshi Tsukahara, Yuka Kobayashi, and Masahiro Mizukami. 2015. Towards Taxonomy of Errors in Chat-Oriented Dialogue Systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 87--95. https://doi.org/10.18653/v1/W15--4611Google Scholar
Cross Ref
- Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '99). Association for Computing Machinery, New York, NY, USA, 159--166. https://doi.org/10.1145/302979.303030Google Scholar
Digital Library
- Jiepu Jiang, Wei Jeng, and Daqing He. 2013. How Do Users Respond to Voice Input Errors? Lexical and Phonetic Query Reformulation in Voice Search. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). Association for Computing Machinery, New York, NY, USA, 143--152. https://doi.org/10.1145/2484028.2484092Google Scholar
Digital Library
- Rafal Kocielnik, Saleema Amershi, and Paul N Bennett. 2019. Will You Accept an Imperfect AI? Exploring Designs for Adjusting End-user Expectations of AI Systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 411. https://doi.org/10.1145/3290605.3300641Google Scholar
Digital Library
- Robert M Krauss and Susan R Fussell. 1991. Perspective-taking in communication: Representations of others' knowledge in reference. Social cognition, Vol. 9, 1 (1991), 2--24. https://doi.org/10.1521/soco.1991.9.1.2Google Scholar
- Lenneke Kuijer and Elisa Giaccardi. 2018. Co-performance: Conceptualizing the role of artificial agency in the design of everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Article 125, 13 pages. https://doi.org/10.1145/3173574.3173699Google Scholar
Digital Library
- J. C. R. Licklider. 1960. Man-Computer Symbiosis. IRE Transactions on Human Factors in Electronics, Vol. HFE-1, 1 (March 1960), 4--11. https://doi.org/10.1109/THFE2.1960.4503259Google Scholar
- Gustavo López, Luis Quesada, and Luis A. Guerrero. 2017. Alexa vs. Siri vs. Cortana vs. Google Assistant: A Comparison of Speech-Based Natural User Interfaces. In Advances in Human Factors and Systems Interaction. AHFE 2017, Isabel L. Nunes (Ed.), Vol. 592. Springer, Cham, 241--250. https://doi.org/10.1007/978--3--319--60366--7_23Google Scholar
- Ewa Luger and Abigail Sellen. 2016. Like Having a Really Bad PA: the Gulf Between User Expectation and Experience of Conversational Agents. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '16). 5286--5297. https://doi.org/10.1145/2858036.2858288Google Scholar
Digital Library
- Gary Marchionini. 1995. Information Seeking in Electronic Environments .Cambridge University Press. https://doi.org/10.1017/CBO9780511626388Google Scholar
- Matthew Marge and Alexander I Rudnicky. 2019. Miscommunication Detection and Recovery in Situated Human--Robot Dialogue. ACM Trans. Interact. Intell. Syst., Vol. 9, 1, Article 3 (2019). https://doi.org/10.1145/3237189Google Scholar
Digital Library
- Michael McTear. 2008. Handling Miscommunication: Why Bother? In Recent trends in Discourse and Dialogue, Dybkjær L. and Minker W. (Eds.). Text, Speech and Language Technology, Vol. 39. Springer, Dordrecht, 101--122. https://doi.org/10.1007/978--1--4020--6821--8_5Google Scholar
- Michael McTear. 2018. Conversational Modelling for Chatbots: Current Approaches and Future Directions. In Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung 2018, André Berton, Udo Haiber, and Wolfgang Minker (Eds.). TUDpress, Dresden, 175--185.Google Scholar
- Michael McTear, Zoraida Callejas, and David Griol. 2016. The conversational interface: Talking to smart devices .Springer. https://doi.org/10.1007/978--3--319--32967--3Google Scholar
Digital Library
- Raveesh Meena, José Lopes, Gabriel Skantze, and Joakim Gustafson. 2015. Automatic Detection of Miscommunication in Spoken Dialogue Systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 354--363. https://www.aclweb.org/anthology/W15--4647Google Scholar
Cross Ref
- Grégoire Mesnil, Yann Dauphin, Kaisheng Yao, Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiaodong He, Larry Heck, Gokhan Tur, Dong Yu, et al. 2014. Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 23, 3 (2014), 530--539. https://doi.org/10.1109/TASLP.2014.2383614Google Scholar
Cross Ref
- Roger K Moore. 2017. Is Spoken Language All-or-Nothing? Implications for Future Speech-Based Human-Machine Interaction. In Dialogues with Social Robots, Jokinen K. and Wilcock G. (Eds.). Lecture Notes in Electrical Engineering, Vol. 427. Springer, Singapore, 281--291. https://doi.org/10.1007/978--981--10--2585--3_22Google Scholar
- M Granger Morgan, Baruch Fischhoff, Ann Bostrom, and Cynthia J Atman. 2002. Risk communication: A mental models approach .Cambridge University Press.Google Scholar
- Chelsea Myers, Anushay Furqan, Jessica Nebolsky, Karina Caro, and Jichen Zhu. 2018. Patterns for How Users Overcome Obstacles in Voice User Interfaces. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Article 6. https://doi.org/10.1145/3173574.3173580Google Scholar
Digital Library
- Katashi Nagao. 2019. Symbiosis between Humans and Artificial Intelligence. In Artificial Intelligence Accelerates Human Learning. Springer, 135--151. https://doi.org/10.1007/978--981--13--6175--3_6Google Scholar
- Jakob Nielsen. 1993. Usability Engineering .Academic Press, Inc.Google Scholar
Digital Library
- Sarah Perez. 2018. Google says it sold a Google Home device every second since October 19. https://techcrunch.com/2018/12/28/smart-speakers-hit-critical-mass-in-2018/Google Scholar
- Martin Porcheron, Joel E Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Article 640. https://doi.org/10.1145/3173574.3174214Google Scholar
Digital Library
- Martin Porcheron, Joel E Fischer, and Sarah Sharples. 2017. Do animals have accents? Talking with Agents in Multi-Party Conversation. In Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing. 207--219. https://doi.org/10.1145/2998181.2998298Google Scholar
Digital Library
- Antonio Roque and David Traum. 2008. Degrees of Grounding Based on Evidence of Understanding. In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue. 54--63. https://www.aclweb.org/anthology/W08-0107/Google Scholar
Digital Library
- Johnny Salda na. 2015. The Coding Manual for Qualitative Researchers .Sage.Google Scholar
- Emanuel A Schegloff. 2007. Sequence Organization in Interaction: A Primer. In Conversation Analysis. Vol. 1. Cambridge University Press.Google Scholar
- Alex Sciuto, Arnita Saini, Jodi Forlizzi, and Jason I Hong. 2018. Hey Alexa, What's Up? A Mixed-Methods Studies of In-Home Conversational Agent Usage. In Proceedings of the Designing Interactive Systems Conference (DIS '18). Association for Computing Machinery, New York, NY, USA, 857--868. https://doi.org/10.1145/3196709.3196772Google Scholar
Digital Library
- Phoebe Sengers and Bill Gaver. 2006. Staying Open to Interpretation: Engaging Multiple Meanings in Design and Evaluation. In Proceedings of the 6th conference on Designing Interactive systems (DIS '06). Association for Computing Machinery, New York, NY, USA, 99--108. https://doi.org/10.1145/1142405.1142422Google Scholar
Digital Library
- Gabriel Skantze. 2005. Exploring Human Error Recovery Strategies: Implications for Spoken Dialogue Systems. Speech Communication, Vol. 45, 3 (2005), 325--341. https://doi.org/10.1016/j.specom.2004.11.005Google Scholar
Cross Ref
- Chairs Constantine Stephanidis, Gavriel Salvendy, Members of the Group Margherita Antona, Jessie Y. C. Chen, Jianming Dong, Vincent G. Duffy, Xiaowen Fang, Cali Fidopiastis, Gino Fragomeni, Limin Paul Fu, Yinni Guo, Don Harris, Andri Ioannou, Kyeong ah (Kate) Jeong, Shin'ichi Konomi, Heidi Krömker, Masaaki Kurosu, James R. Lewis, Aaron Marcus, Gabriele Meiselwitz, Abbas Moallem, Hirohiko Mori, Fiona Fui-Hoon Nah, Stavroula Ntoa, Pei-Luen Patrick Rau, Dylan Schmorrow, Keng Siau, Norbert Streitz, Wentao Wang, Sakae Yamamoto, Panayiotis Zaphiris, and Jia Zhou. 2019. Seven HCI Grand Challenges. International Journal of Human-Computer Interaction, Vol. 35, 14 (2019), 1229--1269. https://doi.org/10.1080/10447318.2019.1619259Google Scholar
Cross Ref
- Lucy Suchman. 2007. Human-machine reconfigurations: Plans and situated actions .Cambridge university press. https://doi.org/10.1017/CBO9780511808418Google Scholar
- Paul Ten Have. 2007. Doing Conversation Analysis 2nd ed.). Sage. https://doi.org/10.4135/9781849208895Google Scholar
- Jason Wu, Karan Ahuja, Richard Li, Victor Chen, and Jeffrey Bigham. 2019. ScratchThat: Supporting Command-Agnostic Speech Repair in Voice-Driven Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 3, 2, Article 63 (2019). https://doi.org/10.1145/3328934Google Scholar
Digital Library
Index Terms
The Role of Conversational Grounding in Supporting Symbiosis Between People and Digital Assistants
Recommendations
Exploring Humor as a Repair Strategy During Communication Breakdowns with Voice Assistants
CUI '23: Proceedings of the 5th International Conference on Conversational User InterfacesVoice assistants are becoming increasingly useful and support realistic conversations, yet communication breakdowns occur. We investigate the use of humor as a repair strategy in an experiment where the voice assistant makes a mistake and then utilizes ...
Exploring requirements and opportunities of conversational user interfaces for the cognitively impaired
MobileHCI '18: Proceedings of the 20th International Conference on Human-Computer Interaction with Mobile Devices and Services AdjunctInteracting with traditional user interfaces can be challenging for people with cognitive impairments. Speech-based conversational interfaces and virtual assistants such as Amazon's Alexa and Apple's Siri might provide great potential for this user ...
Co-constructing intersubjectivity with artificial conversational agents
This article explores whether people more frequently attempt to repair misunderstandings when speaking to an artificial conversational agent if it is represented as fully human. Interactants in dyadic conversations with an agent (the chat bot Cleverbot) ...






Comments