skip to main content
research-article

ProtoChat: Supporting the Conversation Design Process with Crowd Feedback

Published:05 January 2021Publication History
Skip Abstract Section

Abstract

Similar to a design process for designing graphical user interfaces, conversation designers often apply an iterative design process by defining a conversation flow, testing with users, reviewing user data, and improving the design. While it is possible to iterate on conversation design with existing chatbot prototyping tools, there still remain challenges in recruiting participants on-demand and collecting structured feedback on specific conversational components. These limitations hinder designers from running rapid iterations and making informed design decisions. We posit that involving a crowd in the conversation design process can address these challenges, and introduce ProtoChat, a crowd-powered chatbot design tool built to support the iterative process of conversation design. ProtoChat makes it easy to recruit crowd workers to test the current conversation within the design tool. ProtoChat's crowd-testing tool allows crowd workers to provide concrete and practical feedback and suggest improvements on specific parts of the conversation. With the data collected from crowd-testing, ProtoChat provides multiple types of visualizations to help designers analyze and revise their design. Through a three-day study with eight designers, we found that ProtoChat enabled an iterative design process for designing a chatbot. Designers improved their design by not only modifying the conversation design itself, but also adjusting the persona and getting UI design implications beyond the conversation design itself. The crowd responses were helpful for designers to explore user needs, contexts, and diverse response formats. With ProtoChat, designers can successfully collect concrete evidence from the crowd and make decisions to iteratively improve their conversation design.

References

  1. Ram G. Athreya, Axel-Cyrille Ngonga Ngomo, and Ricardo Usbeck. 2018. Enhancing Community Interactions with Data-Driven Chatbots--The DBpedia Chatbot. In Companion Proceedings of the The Web Conference 2018 (Lyon, France) (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 143--146. https://doi.org/10.1145/3184558.3186964Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Yan Chen, Maulishree Pandey, Jean Y. Song, Walter S. Lasecki, and Steve Oney. 2020. Improving Crowd-Supported GUI Testing with Structural Guidance. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI '20). Association for Computing Machinery, New York, NY, USA, 1--13. https: //doi.org/10.1145/3313831.3376835Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yoonseo Choi, Hyungyu Shin, Toni-Jan Keith Monserrat, Nyoungwoo Lee, Jeongeon Park, and Juho Kim. 2020. Supporting an Iterative Conversation Design Process. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA '20). Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3334480.3382951Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Biplab Deka, Zifeng Huang, Chad Franzen, Jeffrey Nichols, Yang Li, and Ranjitha Kumar. 2017. ZIPT: Zero-Integration Performance Testing of Mobile App Designs. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (Québec City, QC, Canada) (UIST '17). Association for Computing Machinery, New York, NY, USA, 727--736. https://doi.org/10.1145/3126594.3126647Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Fabio Guaiani and Henry Muccini. 2015. Crowd and Laboratory Testing Can They Co-Exist? An Exploratory Study. In Proceedings of the Second International Workshop on CrowdSourcing in Software Engineering (Florence, Italy) (CSI-SE '15). IEEE Press, 32--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Henry Muccini. 2014. Is Crowd Testing (relevant) for Software Engineers? Keynote presentation at AST 2014, the 9th IEEE/ACM International Workshop on Automation of Software Test. http://www.slideshare.net/henry.muccini/is-crowd-testing-relevant-for-software-engineers.Google ScholarGoogle Scholar
  7. Tianran Hu, Anbang Xu, Zhe Liu, Quanzeng You, Yufan Guo, Vibha Sinha, Jiebo Luo, and Rama Akkiraju. 2018. Touch Your Heart: A Tone-aware Chatbot for Customer Care on Social Media. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). ACM, New York, NY, USA, Article 415, 12 pages. https://doi.org/10.1145/3173574.3173989Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ting-Hao Kenneth Huang, Amos Azaria, and Jeffrey P. Bigham. 2016. InstructableCrowd: Creating IF-THEN Rules via Conversations with the Crowd. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (San Jose, California, USA) (CHI EA '16). Association for Computing Machinery, New York, NY, USA, 1555--1562. https://doi.org/10.1145/2851581.2892502Google ScholarGoogle Scholar
  9. Ting-Hao Kenneth Huang, Amos Azaria, Oscar J. Romero, and Jeffrey P. Bigham. 2019. InstructableCrowd: Creating IF-THEN Rules for Smartphones via Conversations with the Crowd. Human Computation (2019), 101--131.Google ScholarGoogle Scholar
  10. Patrik Jonell, Mattias Bystedt, Fethiye Irmak Dogan, Per Fallgren, Jonas Ivarsson, Marketa Slukova, Ulme Wennberg, José Lopes, Johan Boye, and Gabriel Skantze. 2018. Fantom: A crowdsourced social chatbot using an evolving dialog graph. Proc. Alexa Prize (2018).Google ScholarGoogle Scholar
  11. PatrikJonell,PerFallgren,FethiyeIrmakDo?an,JoséLopes,UlmeWennberg,andGabrielSkantze.2019.Crowdsourcing a Self-Evolving Dialog Graph. In Proceedings of the 1st International Conference on Conversational User Interfaces (Dublin, Ireland) (CUI '19). Association for Computing Machinery, New York, NY, USA, Article 14, 8 pages. https: //doi.org/10.1145/3342775.3342790Google ScholarGoogle Scholar
  12. Aniket Kittur, Ed H. Chi, and Bongwon Suh. 2008. Crowdsourcing User Studies with Mechanical Turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI '08). Association for Computing Machinery, New York, NY, USA, 453--456. https://doi.org/10.1145/1357054.1357127Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Meng-Chieh Ko and Zih-Hong Lin. 2018. CardBot: A Chatbot for Business Card Management. In Proceedings of the 23rd International Conference on Intelligent User Interfaces Companion (Tokyo, Japan) (IUI '18 Companion). ACM, New York, NY, USA, Article 5, 2 pages. https://doi.org/10.1145/3180308.3180313Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rafal Kocielnik, Daniel Avrahami, Jennifer Marlow, Di Lu, and Gary Hsieh. 2018. Designing for Workplace Reflection: A Chat and Voice-Based Conversational Agent. In Proceedings of the 2018 Designing Interactive Systems Conference (Hong Kong, China) (DIS '18). Association for Computing Machinery, New York, NY, USA, 881--894. https://doi.org/ 10.1145/3196709.3196784Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Rafal Kocielnik, Lillian Xiao, Daniel Avrahami, and Gary Hsieh. 2018. Reflection Companion: A Conversational System for Engaging Users in Reflection on Physical Activity. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2, Article 70 (July 2018), 26 pages. https://doi.org/10.1145/3214273Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Steven Komarov, Katharina Reinecke, and Krzysztof Z. Gajos. 2013. Crowdsourcing Performance Evaluations of User Interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Paris, France) (CHI '13). Association for Computing Machinery, New York, NY, USA, 207--216. https://doi.org/10.1145/2470654.2470684Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Walter S. Lasecki, Juho Kim, Nick Rafter, Onkur Sen, Jeffrey P. Bigham, and Michael S. Bernstein. 2015. Apparition: Crowdsourced User Interfaces That Come to Life as You Sketch Them. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea) (CHI '15). Association for Computing Machinery, New York, NY, USA, 1925--1934. https://doi.org/10.1145/2702123.2702565Google ScholarGoogle Scholar
  18. Walter S. Lasecki, Rachel Wesley, Jeffrey Nichols, Anand Kulkarni, James F. Allen, and Jeffrey P. Bigham. 2013. Chorus: A Crowd-Powered Conversational Assistant. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST '13). Association for Computing Machinery, New York, NY, USA, 151--162. https://doi.org/10.1145/2501988.2502057Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sang Won Lee, Rebecca Krosnick, Sun Young Park, Brandon Keelean, Sach Vaidya, Stephanie D. O'Keefe, and Walter S. Lasecki. 2018. Exploring Real-Time Collaboration in Crowd-Powered Systems Through a UI Design Tool. Proc. ACM Hum.-Comput. Interact. 2, CSCW, Article 104(Nov. 2018), 23 pages. https://doi.org/10.1145/3274373Google ScholarGoogle Scholar
  20. Niklas Leicht, Ivo Blohm, and Jan Marco Leimeister. 2017. Leveraging the Power of the Crowd for Software Testing. IEEE Softw. 34, 2 (March 2017), 62--69. https://doi.org/10.1109/MS.2017.37Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Toby Li and Oriana Riva. 2018. Kite: Building Conversational Bots from Mobile Apps. 96--109. https://doi.org/10.1145/ 3210240.3210339Google ScholarGoogle Scholar
  22. Xulei Liang, Rong Ding, Mengxiang Lin, Lei Li, Xingchi Li, and Song Lu. 2017. CI-Bot: A Hybrid Chatbot Enhanced by Crowdsourcing. In Web and Big Data, Shaoxu Song, Matthias Renz, and Yang-Sae Moon (Eds.). Springer International Publishing, Cham, 195--203.Google ScholarGoogle Scholar
  23. Kurt Luther, Jari-Lee Tolentino, Wei Wu, Amy Pavel, Brian P. Bailey, Maneesh Agrawala, Björn Hartmann, and Steven P. Dow. 2015. Structuring, Aggregating, and Evaluating Crowdsourced Design Critique. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work amp; Social Computing (Vancouver, BC, Canada) (CSCW '15). Association for Computing Machinery, New York, NY, USA, 473--485. https://doi.org/10.1145/2675133.2675283Google ScholarGoogle Scholar
  24. Michael Nebeling, Stefania Leone, and Moira C Norrie. 2012. Crowd sourced web engineering and design. In International Conference on Web Engineering. Springer, 31--45.Google ScholarGoogle Scholar
  25. Michael Nebeling, Maximilian Speicher, Michael Grossniklaus, and Moira C Norrie. 2012. Crowdsourced web site evaluation with crowd study. In International Conference on Web Engineering. Springer, 494--497.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Michael Nebeling, Maximilian Speicher, and Moira C. Norrie. 2013. Crowd Study: General Toolkit for Crowdsourced Evaluation of Web Interfaces. In Proceedings of the 5th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (London, United Kingdom) (EICS '13). Association for Computing Machinery, New York, NY, USA, 255--264. https://doi.org/10.1145/2494603.2480303Google ScholarGoogle Scholar
  27. Jonas Oppenlaender, Thanassis Tiropanis, and Simo Hosio. 2020. CrowdUI: Supporting Web Design with the Crowd. Proc. ACM Hum.-Comput. Interact. 4, EICS, Article 76(June 2020), 28 pages. https://doi.org/10.1145/3394978Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Archana Prasad, Sean Blagsvedt, Tej Pochiraju, and Indrani Medhi Thies. 2019. Dara: A Chatbot to Help Indian Artists and Designers Discover International Opportunities. In Proceedings of the 2019 on Creativity and Cognition (San Diego, CA, USA) (C&C '19). ACM, New York, NY, USA, 626--632. https://doi.org/10.1145/3325480.3326577Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Joao Sedoc, Daphne Ippolito, Arun Kirubarajan, Jai Thirani, Lyle Ungar, and Chris Callison-Burch. 2019. ChatEval: A Tool for Chatbot Evaluation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations) (Minneapolis, Minnesota). Association for Computational Linguistics, 60--65. http://aclweb.org/anthology/N19-4011Google ScholarGoogle Scholar
  30. Nikita Spirin, Motahhare Eslami, Jie Ding, Pooja Jain, Brian Bailey, and Karrie Karahalios. 2014. Searching for Design Examples with Crowdsourcing. In Proceedings of the 23rd International Conference on World Wide Web (Seoul, Korea) (WWW '14 Companion). Association for Computing Machinery, New York, NY, USA, 381--382. https://doi.org/10. 1145/2567948.2577371Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Junjie Wang, Mingyang Li, Song Wang, Tim Menzies, and Qing Wang. 2019. Images don't lie: Duplicate crowdtesting reports detection with screenshot information. Information and Software Technology 110 (Jun 2019), 139--155. https: //doi.org/10.1016/j.infsof.2019.03.003Google ScholarGoogle Scholar
  32. Anbang Xu, Shih-Wen Huang, and Brian Bailey. 2014. Voyant: Generating Structured Feedback on Visual Designs Using a Crowd of Non-Experts. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work amp; Social Computing (Baltimore, Maryland, USA) (CSCW '14). Association for Computing Machinery, New York, NY, USA, 1433--1444. https://doi.org/10.1145/2531602.2531604Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Anbang Xu, Zhe Liu, Yufan Guo, Vibha Sinha, and Rama Akkiraju. 2017. A New Chatbot for Customer Service on Social Media. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI '17). ACM, New York, NY, USA, 3506--3510. https://doi.org/10.1145/3025453.3025496Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhou Hao Yu, Ziyu Xu, Alan W. Black, and Alexander I. Rudnicky. 2016. Chatbot Evaluation and Database Expansion via Crowdsourcing.Google ScholarGoogle Scholar
  35. Alvin Yuan, Kurt Luther, Markus Krause, Sophie Isabel Vennix, Steven P Dow, and Bjorn Hartmann. 2016. Almost an Expert: The Effects of Rubrics and Expertise on Perceived Value of Crowdsourced Design Critiques. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work amp; Social Computing (San Francisco, California, USA) (CSCW '16). Association for Computing Machinery, New York, NY, USA, 1005--1017. https://doi.org/10.1145/ 2818048.2819953Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Li Zhou, Jianfeng Gao, Di Li, and Heung-Yeung Shum. 2018. The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. CoRR abs/1812.08989 (2018). arXiv:1812.08989 http://arxiv.org/abs/1812.08989Google ScholarGoogle Scholar

Index Terms

  1. ProtoChat: Supporting the Conversation Design Process with Crowd Feedback

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!