ABSTRACT

Successful collaboration relies on the coordination and alignment of communicative cues. In this paper, we present mechanisms of bidirectional gaze - the coordinated production and detection of gaze cues - by which a virtual character can coordinate its gaze cues with those of its human user. We implement these mechanisms in a hybrid stochastic/heuristic model synthesized from data collected in human-human interactions. In three lab studies wherein a virtual character instructs participants in a sandwich-making task, we demonstrate how bidirectional gaze can lead to positive outcomes in error rate, completion time, and the agent's ability to produce quick, effective nonverbal references. The first study involved an on-screen agent and the participant wearing eye-tracking glasses. The second study demonstrates that these positive outcomes can be achieved using head-pose estimation in place of full eye tracking. The third study demonstrates that these effects also transfer into virtual-reality interactions.
References
- Sean Andrist, Wesley Collier, Michael Gleicher, Bilge Mutlu, and David Shaffer. 2015. Look together: analyzing gaze coordination with epistemic network analysis. Frontiers in psychology 6, 1016 (2015), 1--15. Google Scholar
Cross Ref
- Sean Andrist, Bilge Mutlu, and Michael Gleicher. 2013. Conversational gaze aversion for virtual agents. In Intelligent Virtual Agents. Springer, 249--262. Google Scholar
Cross Ref
- Sean Andrist, Tomislav Pejsa, Bilge Mutlu, and Michael Gleicher. 2012. Designing effective gaze mechanisms for virtual agents. In Proc. of CHI. ACM, 705--714. Google Scholar
Digital Library
- Gérard Bailly, Stephan Raidt, and Frédéric Elisei. 2010. Gaze, conversational agents and face-to-face communication. Speech Communication 52, 6 (2010), 598--612. Google Scholar
Digital Library
- Ellen Gurman Bard, Robin Hill, Manabu Arai, and ME Foster. 2009. Referring and gaze alignment: Accessibility is alive and well in situated dialogue. In Proc. of CogSci ('09). Cognitive Science Society, 1246--1251.Google Scholar
- Nikolaus Bee, Johannes Wagner, Elisabeth André, Thurid Vogt, Fred Charles, David Pizzi, and Marc Cavazza. 2010. Discovering eye gaze behavior during human-agent conversation in an interactive storytelling application. In Proc. of ICML-MLMI ('10). ACM, 1--8. Google Scholar
Digital Library
- Jean-David Boucher, Ugo Pattacini, Amelie Lelong, Gerard Bailly, Frederic Elisei, Sascha Fagel, Peter Ford Dominey, and Jocelyne Ventre-Dominey. 2012. I reach faster when I see you look: Gaze effects in human-human and human-robot face-to-face cooperation. Frontiers in Neurorobotics 6 (2012). Google Scholar
Cross Ref
- Susan E Brennan, Xin Chen, Christopher A Dickinson, Mark B Neider, and Gregory J Zelinsky. 2008. Coordinating cognition: The costs and benefits of shared gaze during collaborative search. Cognition 106, 3 (2008), 1465--1477.Google Scholar
Cross Ref
- Susan E Brennan, JE Hanna, GJ Zelinsky, and Kelly J. Savietta. 2012. Eye gaze cues for coordination in collaborative tasks. In Proc. of CSCW DUET 2012 Workshop, Vol. 9.Google Scholar
- Andrew G Brooks and Cynthia Breazeal. 2006. Working with robots and objects: Revisiting deictic reference for achieving spatial common ground. In Proc. of HRI ('06). ACM, 297--304.Google Scholar
Digital Library
- Sarah Brown-Schmidt, Ellen Campana, and Michael K. Tanenhaus. 2005. Real-time reference resolution by naïve participants during a task-based unscripted conversation. Approaches to studying world-situated language use: Bridging the language-as-product and language-as-action traditions (2005), 153--171.Google Scholar
- Ellen Campana, Jason Baldridge, John Dowding, Beth Ann Hockey, Roger W Remington, and Leland S. Stone. 2001. Using eye movements to determine referents in a spoken dialogue system. In Proceedings of the 2001 Workshop on Perceptive User Interfaces. ACM, 1--5. Google Scholar
Digital Library
- Herbert H Clark. 1996. Using language. Cambridge university press.Google Scholar
- Herbert H Clark. 2005. Coordinating with each other in a material world. Discourse studies 7, 4--5 (2005), 507--525.Google Scholar
- Herbert H Clark and Susan E Brennan. 1991. Grounding in communication. Perspectives on socially shared cognition 13, 1991 (1991), 127--149.Google Scholar
- Herbert H Clark and Meredyth A Krych. 2004. Speaking while monitoring addressees for understanding. Journal of Memory and Language 50, 1 (2004), 62--81. Google Scholar
Cross Ref
- Herbert H Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition 22, 1 (1986), 1--39. Google Scholar
Cross Ref
- Sidney D'Mello, Andrew Olney, Claire Williams, and Patrick Hays. 2012. Gaze tutor: A gaze-reactive intelligent tutoring system. International Journal of Human-Computer Studies 70, 5 (2012), 377--398. Google Scholar
Digital Library
- Mica R Endsley. 1995. Toward a theory of situation awareness in dynamic systems. Human Factors: The Journal of the Human Factors and Ergonomics Society 37, 1 (1995), 32--64.Google Scholar
Cross Ref
- S Garrido-Jurado, Rafael Muñoz-Salinas, Francisco José Madrid-Cuevas, and Manuel Jesús Marín-Jiménez. 2014. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47, 6 (2014), 2280--2292. Google Scholar
Digital Library
- Darren Gergle and Alan T Clark. 2011. See what I'm saying?: Using dyadic mobile eye tracking to study collaborative reference. In Proc. of CSCW ('11). ACM, 435--444.Google Scholar
Digital Library
- Darren Gergle, Robert E Kraut, and Susan R Fussell. 2013. Using visual information for grounding and awareness in collaborative tasks. Human-Computer Interaction 28, 1 (2013), 1--39.Google Scholar
- Zenzi M Griffin. 2004. The eyes are right when the mouth is wrong. Psychological Science 15, 12 (2004), 814--821. Google Scholar
Cross Ref
- Joy E Hanna and Susan E Brennan. 2007. Speakers' eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language 57, 4 (2007), 596--615. Google Scholar
Cross Ref
- Mary Hayhoe and Dana Ballard. 2005. Eye movements in natural behavior. Trends in cognitive sciences 9, 4 (2005), 188--194. Google Scholar
Cross Ref
- Graeme Hirst, Susan McRoy, Peter Heeman, Philip Edmonds, and Diane Horton. 1994. Repairing conversational misunderstandings and non-understandings. Speech communication 15, 3 (1994), 213--229. Google Scholar
Digital Library
- Mohammed Moshiul Hoque and Kaushik Deb. 2012. Robotic system for making eye contact pro-actively with humans. In Proc. of ICECE ('12). IEEE, 125--128.Google Scholar
Cross Ref
- Chien-Ming Huang and Bilge Mutlu. 2016. Anticipatory robot control for efficient human-robot collaboration. In Proc. of HRI ('16). IEEE, 83--90.Google Scholar
Cross Ref
- George Julnes and Lawrence B Mohr. 1989. Analysis of no-difference findings in evaluation research. Evaluation Review 13, 6 (1989), 628--655. Google Scholar
Cross Ref
- B.J. Lance and S.C. Marsella. 2010. The Expressive Gaze Model: Using Gaze to Express Emotion. Computer Graphics and Applications, IEEE 30, 4 (2010), 62--73. Google Scholar
Digital Library
- Michael Land, Neil Mennie, and Jennifer Rusted. 1999. The roles of vision and eye movements in the control of activities of daily living. Perception 28, 11 (1999), 1311--1328. Google Scholar
Cross Ref
- Gregor Mehlmann, Markus Häring, Kathrin Janowski, Tobias Baur, Patrick Gebhard, and Elisabeth André. 2014. Exploring a model of gaze for grounding in multimodal HRI. In Proc. of ICMI ('14). ACM, 247--254. Google Scholar
Digital Library
- Antje Meyer, Femke van der Meulen, and Adrian Brooks. 2004. Eye movements during speech planning: talking about present and remembered objects. Visual Cognition 11, 5 (2004), 553--576. Google Scholar
Cross Ref
- AJung Moon, Daniel M Troniak, Brian Gleeson, Matthew KXJ Pan, Minhua Zheng, Benjamin A Blumer, Karon MacLean, and Elizabeth A Croft. 2014. Meet me where i'm gazing: how shared attention gaze affects human-robot handover timing. In Proc. of HRI ('14). ACM, 334--341.Google Scholar
Digital Library
- Bilge Mutlu, Takayuki Kanda, Jodi Forlizzi, Jessica Hodgins, and Hiroshi Ishiguro. 2012. Conversational gaze mechanisms for humanlike robots. ACM Transactions on Interactive Intelligent Systems (TiiS) 1, 2 (2012), 12.Google Scholar
Digital Library
- Mark B Neider, Xin Chen, Christopher A Dickinson, Susan E Brennan, and Gregory J Zelinsky. 2010. Coordinating spatial referencing using shared gaze. Psychonomic bulletin & review 17, 5 (2010), 718--724.Google Scholar
- David G Novick, Brian Hansen, and Karen Ward. 1996. Coordinating turn-taking with gaze. In Proc. of ICSLP ('96), Vol. 3. IEEE, 1888--1891.Google Scholar
Cross Ref
- Tomislav Pejsa, Sean Andrist, Michael Gleicher, and Bilge Mutlu. 2015. Gaze and Attention Management for Embodied Conversational Agents. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 1 (2015), 3.Google Scholar
Digital Library
- C. Pelachaud and M. Bilvi. 2003. Modelling gaze behavior for conversational agents. In Intelligent Virtual Agents. Springer, 93--100. Google Scholar
Cross Ref
- Christopher Peters, Stylianos Asteriadis, and Kostas Karpouzis. 2010. Investigating shared attention with a virtual agent using a gaze-based interface. Journal on Multimodal User Interfaces 3, 1--2 (2010), 119--130.Google Scholar
Cross Ref
- Daniel C Richardson and Rick Dale. 2005. Looking to understand: The coupling between speakers' and listeners' eye movements and its relationship to discourse comprehension. Cognitive science 29, 6 (2005), 1045--1060. Google Scholar
Cross Ref
- Daniel C Richardson, Rick Dale, and Natasha Z Kirkham. 2007. The art of conversation is coordination common ground and the coupling of eye movements during dialogue. Psychological science 18, 5 (2007), 407--413.Google Scholar
- Daniel C Richardson, Rick Dale, and John M Tomlinson. 2009. Conversation, gaze coordination, and beliefs about visual context. Cognitive Science 33, 8 (2009), 1468--1482.Google Scholar
Cross Ref
- Kenji Sakita, Koichi Ogawara, Shinji Murakami, Kentaro Kawamura, and Katsushi Ikeuchi. 2004. Flexible cooperation between human and robot by interpreting human intention from gaze information. In Proc. of IROS ('04), Vol. 1. IEEE, 846--851. Google Scholar
Cross Ref
- Michael F Schober. 1993. Spatial perspective-taking in conversation. Cognition 47, 1 (1993), 1--24.Google Scholar
Cross Ref
- Natalie Sebanz, Harold Bekkering, and Gunther Knoblich. 2006. Joint action: bodies and minds moving together. Trends in cognitive sciences 10, 2 (2006), 70--76. Google Scholar
Cross Ref
- Gabriel Skantze, Anna Hjalmarsson, and Catharine Oertel. 2014. Turn-taking, feedback and joint attention in situated human--robot interaction. Speech Communication 65 (2014), 50--66. Google Scholar
Cross Ref
- Michael K Tanenhaus, Michael J Spivey-Knowlton, Kathleen M Eberhard, and Julie C Sedivy. 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268, 5217 (1995), 1632--1634.Google Scholar
- Cristen Torrey, Aaron Powers, Susan R Fussell, and Sara Kiesler. 2007. Exploring adaptive dialogue based on a robot's awareness of human gaze and task progress. In Proc. of HRI ('07). ACM, 247--254.Google Scholar
Digital Library
- Weilie Yi and Dana Ballard. 2009. Recognizing behavior in hand-eye coordination patterns. International Journal of Humanoid Robotics 6, 03 (2009), 337--359. Google Scholar
Cross Ref
- Yuichiro Yoshikawa, Kazuhiko Shinozawa, Hiroshi Ishiguro, Norihiro Hagita, and Takanori Miyamoto. 2006. Responsive Robot Gaze to Interaction Partner.. In Proc. of RSS ('06). Google Scholar
Cross Ref
- Christopher J Zahn. 1984. A reexamination of conversational repair. Communications Monographs 51, 1 (1984), 56--66.Google Scholar
Cross Ref
Supplemental Material
Index Terms
Looking Coordinated


Sean Andrist
Bilge Mutlu


Comments