skip to main content
research-article

Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants

Published:29 May 2020Publication History
Skip Abstract Section

Abstract

We propose a data-driven approach to detect conversational groups by identifying spatial arrangements typical of these focused social encounters. Our approach uses a novel Deep Affinity Network (DANTE) to predict the likelihood that two individuals in a scene are part of the same conversational group, considering their social context. The predicted pair-wise affinities are then used in a graph clustering framework to identify both small (e.g., dyads) and large groups. The results from our evaluation on multiple, established benchmarks suggest that combining powerful deep learning methods with classical clustering techniques can improve the detection of conversational groups in comparison to prior approaches. Finally, we demonstrate the practicality of our approach in a human-robot interaction scenario. Our efforts show that our work advances group detection not only in theory, but also in practice.

Skip Supplemental Material Section

Supplemental Material

References

  1. Jake K Aggarwal and Michael S Ryoo. 2011. Human activity analysis: A review. ACM Computing Surveys (CSUR), Vol. 43, 3 (2011), 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Alameda-Pineda, J. Staiano, R. Subramanian, L. Batrinca, E. Ricci, B. Lepri, O. Lanz, and N. Sebe. 2016. SALSA: A novel dataset for multimodal group behavior analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 38, 8 (Aug 2016), 1707--1720. https://doi.org/10.1109/TPAMI.2015.2496269Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Stefano Alletto, Giuseppe Serra, Simone Calderara, Francesco Solera, and Rita Cucchiara. 2014. From ego to nos-vision: Detecting social relationships in first-person views. In Proceedings of the 2014 Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops) (Columbus, Ohio). IEEE, 580--585.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Till Ballendat, Nicolai Marquardt, and Saul Greenberg. 2010. Proxemic interaction: Designing for a proximity and orientation-aware environment. In Proceedings of the 2010 ACM International Conference on Interactive Tabletops and Surfaces (Saarbrücken, Germany). ACM Press, 121--130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Loris Bazzani, Marco Cristani, and Vittorio Murino. 2012. Decentralized particle filter for joint individual-group tracking. In Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Providence, Rhode Island). IEEE, 1886--1893.Google ScholarGoogle ScholarCross RefCross Ref
  6. Dan Bohus, Sean Andrist, and Eric Horvitz. 2017. A study in scene shaping: Adjusting f-formations in the wild. In Proceedings of the 2017 AAAI Fall Symposium: Natural Communication for Human-Robot Collaboration (Arlington, Virginia). AAAI.Google ScholarGoogle Scholar
  7. Dan Bohus and Eric Horvitz. 2009a. Dialog in the open world: platform and applications. In Proceedings of the 2009 International Conference on Multimodal Interfaces (Cambridge, Massachusetts). ACM, 31--38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dan Bohus and Eric Horvitz. 2009b. Learning to predict engagement with a spoken dialog system in open-world settings. In Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 244--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dan Bohus, Chit W Saw, and Eric Horvitz. 2014. Directions robot: In-the-wild experiences and lessons learned. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multi-agent Systems, 637--644.Google ScholarGoogle Scholar
  10. Oliver Brdiczka, Jérôme Maisonnasse, and Patrick Reignier. 2005. Automatic detection of interaction groups. In Proceedings of the 2005 International Conference on Multimodal Interfaces. ACM, 32--36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2018. OpenPose: Realtime multi-person 2D pose estimation using Part Affinity Fields. arXiv preprint arXiv:1812.08008 (2018).Google ScholarGoogle Scholar
  12. Ming-Ching Chang, Nils Krahnstoever, and Weina Ge. 2011. Probabilistic group-level motion analysis and scenario recognition. In Proceedings of the 2011 International Conference on Computer Vision (ICCV). IEEE, 747--754.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chih-Wei Chen, Rodrigo Cilla Ugarte, Chen Wu, and Hamid Aghajan. 2011. Discovering social interactions in real work environments. In Face and Gesture 2011. IEEE, 933--938.Google ScholarGoogle Scholar
  14. Wongun Choi, Khuram Shahid, and Silvio Savarese. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the 2009 International Conference on Computer Vision Workshops (ICCV Worshops). IEEE, 1282--1289.Google ScholarGoogle Scholar
  15. Tanzeem Choudhury and Alex Pentland. 2002. The sociometer: A wearable device for understanding human networks. In CSCW'02 Workshop: Ad hoc Communications and Collaboration in Ubiquitous Computing Environments. ACM.Google ScholarGoogle Scholar
  16. Marco Cristani, Loris Bazzani, Giulia Paggetti, Andrea Fossati, Diego Tosato, Alessio Del Bue, Gloria Menegaz, and Vittorio Murino. 2011. Social interaction discovery by statistical analysis of f-formations. In Proceedings of the 2011 British Machine Vision Conference (BMVC). BMVA Press, 23.1--23.12.Google ScholarGoogle ScholarCross RefCross Ref
  17. Marco Cristani, Ramya Raghavendra, Alessio Del Bue, and Vittorio Murino. 2013. Human behavior analysis in video surveillance: A social signal processing perspective. Neurocomputing, Vol. 100 (2013), 86--97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peter Dalsgaard and Kim Halskov. 2010. Designing urban media facc ades: Cases and challenges. In Proceedings of the 2010 Conference on Human Factors in Computing Systems (CHI). ACM, 2277--2286.Google ScholarGoogle Scholar
  19. Elwys De Stefani and Lorenza Mondada. 2014. Reorganizing mobile formations: When ?guided" participants initiate reorientations in guided tours. Space and Culture, Vol. 17, 2 (2014), 157--175.Google ScholarGoogle ScholarCross RefCross Ref
  20. Eyal Dim and Tsvi Kuflik. 2015. Automatic detection of social behavior of museum visitor pairs. ACM Transactions on Interactive Intelligent Systems (TiiS), Vol. 4, 4 (2015), 17.Google ScholarGoogle Scholar
  21. Vanessa Evers, Nuno Menezes, Luis Merino, Dariu Gavrila, Fernando Nabais, Maja Pantic, and Paulo Alvito. 2014. The development and real-world application of frog, the fun robotic outdoor guide. In Proceedings of the Companion Publication of the 2014 ACM Conference on Computer Supported Cooperative Work and Social Computing. ACM, 281--284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Alircza Fathi, Jessica K Hodgins, and James M Rehg. 2012. Social interactions: A first-person perspective. In Proceedings of the 2012 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1226--1233.Google ScholarGoogle ScholarCross RefCross Ref
  23. Tian Gan, Yongkang Wong, Daqing Zhang, and Mohan S Kankanhalli. 2013. Temporal encoded f-formation system for social interaction detection. In Proceedings of the 2013 ACM international conference on Multimedia. ACM, 937--946.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Weina Ge, Robert T Collins, and Barry Ruback. 2009. Automatically detecting the small group structure of a crowd. In Proceedings of the 2009 Workshop on Applications of Computer Vision. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  25. Erving Goffman. 2008. Behavior in public places .Simon and Schuster.Google ScholarGoogle Scholar
  26. Isabella Gomez Torres, Gaurav Parmar, Samarth Aggarwal, Nathaniel Mansur, and Alec Guthrie. 2019. Affordable smart wheelchair. In Extended Abstracts of the 2019 Conference on Human Factors in Computing Systems (CHI). ACM, Article SRC07, 6 pages. https://doi.org/10.1145/3290607.3308463Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Georg Groh, Alexander Lehmann, Jonas Reimers, Marc René Frieß, and Loren Schwarz. 2010. Detecting social situations from interaction geometry. In Proceedings of the 2010 IEEE Second International Conference on Social Computing. IEEE, 1--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. 2018. Social GAN: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR. IEEE, 2255--2264.Google ScholarGoogle ScholarCross RefCross Ref
  29. Edward Twitchell Hall. 1910. The Hidden Dimension. Vol. 609. Garden City, NY: Doubleday.Google ScholarGoogle Scholar
  30. Hooman Hedayati, Daniel Szafir, and Sean Andrist. 2019. Recognizing f-formations in the open world. In Proceedings of the 2019 ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 558--559.Google ScholarGoogle ScholarCross RefCross Ref
  31. Hayley Hung, Gwenn Englebienne, and Laura Cabrera Quiros. 2014. Detecting conversing groups with a single worn accelerometer. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, 84--91.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hayley Hung and Ben Kröse. 2011. Detecting f-formations as dominant sets. In Proceedings of the 2011 International Conference on Multimodal Interfaces. ACM, 231--238.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Helge Hüttenrauch, Kerstin Severinson Eklundh, Anders Green, and Elin A Topp. 2006. Investigating spatial relationships in human-robot interaction. In Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5052--5059.Google ScholarGoogle ScholarCross RefCross Ref
  34. Junko Ichino, Kazuo Isoda, Tetsuya Ueda, and Reimi Satoh. 2016. Effects of the display angle on social behaviors of the people around the display: A field study at a museum. In Proceedings of the 2016 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW). ACM, 26--37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hanbyul Joo, Tomas Simon, Mina Cikara, and Yaser Sheikh. 2019. Towards social artificial intelligence: Nonverbal social signal prediction in a triadic interaction. In Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 10873--10883.Google ScholarGoogle ScholarCross RefCross Ref
  36. Manuela Jungmann, Richard Cox, and Geraldine Fitzpatrick. 2014. Spatial play effects in a tangible game with an f-formation of multiple players. In Proceedings of the 2014 Australasian User Interface Conference-Volume 150. Australian Computer Society, Inc., 57--66.Google ScholarGoogle Scholar
  37. Adam Kendon. 1990. Conducting interaction: Patterns of behavior in focused encounters. Vol. 7. CUP Archive.Google ScholarGoogle Scholar
  38. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  39. Hideaki Kuzuoka, Yuya Suzuki, Jun Yamashita, and Keiichi Yamazaki. 2010. Reconfiguring spatial formation arrangement by robot body orientation. In Proceedings of the 2010 ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 285--292.Google ScholarGoogle Scholar
  40. Oswald Lanz. 2006. Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, 9 (2006), 1436--1449.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Nicolai Marquardt, Robert Diaz-Marino, Sebastian Boring, and Saul Greenberg. 2011. The proximity toolkit: Prototyping proxemic interactions in ubiquitous computing ecologies. In Proceedings of the 2011 ACM Symposium on User Interface Software and Technology (UIST). ACM, 315--326.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Nicolai Marquardt, Ken Hinckley, and Saul Greenberg. 2012. Cross-device interaction via micro-mobility and f-formations. In Proceedings of the 2012 ACM Symposium on User Interface Software and Technology. ACM, 13--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Paul Marshall, Yvonne Rogers, and Nadia Pantidi. 2011. Using f-formations to analyse spatial patterns of interaction in physical environments. In Proceedings of the 2011 ACM Conference on Computer Supported Cooperative Work (CSCW). ACM, 445--454.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Yoichi Matsuyama, Arjun Bhardwaj, Ran Zhao, Oscar Romeo, Sushma Akoju, and Justine Cassell. 2016. Socially-aware animated intelligent personal assistant agent. In Proceedings of the 2016 meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL). Association for Computational Linguistics, 224--227.Google ScholarGoogle ScholarCross RefCross Ref
  45. Microsoft. 2019. Azure Kinect SDK (K4A). https://github.com/microsoft/Azure-Kinect-Sensor-SDK. [Online; accessed 14-October-2019].Google ScholarGoogle Scholar
  46. Alejandro Moreno, Robby van Delden, Ronald Poppe, and Dennis Reidsma. 2013. Socially aware interactive playgrounds. IEEE pervasive computing, Vol. 12, 3 (2013), 40--47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Daniel Olgu'in Olgu'in, Benjamin N Waber, Taemie Kim, Akshay Mohan, Koji Ara, and Alex Pentland. 2009. Sensible organizations: Technology and methodology for automatically measuring organizational behavior. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol. 39, 1 (2009), 43--55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Hyun S Park, Eakta Jain, and Yaser Sheikh. 2012. 3D social saliency from head-mounted cameras. In Proceedings of the 2012 International Conference on Neural Information Processing Systems (NIPS). Curran Associates Inc., 422--430.Google ScholarGoogle Scholar
  49. Massimiliano Pavan and Marcello Pelillo. 2007. Dominant sets and pairwise clustering. IEEE transactions on pattern analysis and machine intelligence, Vol. 29, 1 (2007), 167--172.Google ScholarGoogle ScholarCross RefCross Ref
  50. Ashwini Pokle, Roberto Mart'in-Mart'in, Patrick Goebel, Vincent Chow, Hans M Ewald, Junwei Yang, Zhenkai Wang, Amir Sadeghian, Dorsa Sadigh, Silvio Savarese, et al. 2019. Deep local trajectory replanning and control for robot navigation. arXiv preprint arXiv:1905.05279 (2019).Google ScholarGoogle Scholar
  51. Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 652--660.Google ScholarGoogle Scholar
  52. Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y Ng. 2009. ROS: An open-source Robot Operating System. In Proceedings of the 2009 International Conference on Robotics and Automation (ICRA) Workshop on Open Source Software (Kobe, Japan), Vol. 3. IEEE, 5.Google ScholarGoogle Scholar
  53. Elisa Ricci, Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulo, Narendra Ahuja, and Oswald Lanz. 2015. Uncovering interactions and interactors: Joint estimation of head, body orientation and f-formations from surveillance videos. In Proceedings of the 2015 International Conference on Computer Vision (ICCV). IEEE, 4660--4668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Jorge Rios-Martinez, Anne Spalanzani, and Christian Laugier. 2015. From proxemics theory to socially-aware navigation: A survey. International Journal of Social Robotics, Vol. 7, 2 (2015), 137--153.Google ScholarGoogle ScholarCross RefCross Ref
  55. Navyata Sanghvi, Ryo Yonetani, and Kris Kitani. 2018. Learning group communication from demonstration. In Proceedings of Robotics: Science and Systems (RSS), Workshop on Models and Representations for Natural Human-Robot Communication. RSS.Google ScholarGoogle Scholar
  56. Friederike Schneemann and Patrick Heinemann. 2016. Context-based detection of pedestrian crossing intention for autonomous driving in urban environments. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2243--2248.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Francesco Setti, Oswald Lanz, Roberta Ferrario, Vittorio Murino, and Marco Cristani. 2013. Multi-scale f-formation discovery for group detection. In Proceedings of the 2013 IEEE International Conference on Image Processing (ICIP). IEEE, 3547--3551.Google ScholarGoogle ScholarCross RefCross Ref
  58. Francesco Setti, Chris Russell, Chiara Bassetti, and Marco Cristani. 2015. F-formation detection: Individuating free-standing conversational groups in images. PLOS One, Vol. 10, 5 (2015), e0123783.Google ScholarGoogle ScholarCross RefCross Ref
  59. Mason Swofford, John Peruzzi, and Marynel Vázquez. 2018. Conversational group detection with deep convolutional networks. arXiv preprint arXiv:1810.04039 (2018).Google ScholarGoogle Scholar
  60. Lili Tong, Audrey Serna, Simon Pageaud, Sébastien George, and Aurélien Tabard. 2016. It's not how you stand, it's how you move: F-formations and collaboration dynamics in a mobile learning game. In Proceedings of the 18th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI). ACM, 318--329.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Rudolph Triebel, Kai Arras, Rachid Alami, Lucas Beyer, Stefan Breuers, Raja Chatila, Mohamed Chetouani, Daniel Cremers, Vanessa Evers, Michelangelo Fiore, et al. 2016. Spencer: A socially aware service robot for passenger guidance and help in busy airports. In Field and Service Robotics. Springer, 607--622.Google ScholarGoogle Scholar
  62. Sebastiano Vascon and Loris Bazzani. 2017. Chapter 3 - Group detection and tracking using sociological features. In Group and Crowd Behavior for Computer Vision. Academic Press. https://doi.org/10.1016/B978-0--12--809276--7.00004--7Google ScholarGoogle Scholar
  63. Sebastiano Vascon, Eyasu Z Mequanint, Marco Cristani, Hayley Hung, Marcello Pelillo, and Vittorio Murino. 2016. Detecting conversational groups in images and sequences: A robust game-theoretic approach. Computer Vision and Image Understanding, Vol. 143 (2016), 11--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Marynel Vázquez. 2017. Reasoning About Spatial Patterns of Human Behavior During Group Conversations with Robots. Ph.D. Dissertation. The Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.Google ScholarGoogle Scholar
  65. Marynel Vázquez, Elizabeth J Carter, Braden McDorman, Jodi Forlizzi, Aaron Steinfeld, and Scott E Hudson. 2017. Towards robot autonomy in group conversations: Understanding the effects of body orientation and gaze. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (HRI). ACM, 42--52.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Marynel Vázquez, Aaron Steinfeld, and Scott E Hudson. 2015. Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3010--3017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Danny Wyatt, Tanzeem Choudhury, and Jeff Bilmes. 2007. Conversation detection and speaker segmentation in privacy-sensitive situated speech data. In Proceedings of the 2007 Conference of the International Speech Communication Association (INTERSPEECH). ISCA.Google ScholarGoogle ScholarCross RefCross Ref
  68. Ting Yu, Ser-Nam Lim, Kedar Patwardhan, and Nils Krahnstoever. 2009. Monitoring, recognizing and discovering social networks. In Proceedings of the 2009 Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1462--1469.Google ScholarGoogle ScholarCross RefCross Ref
  69. Gloria Zen, Bruno Lepri, Elisa Ricci, and Oswald Lanz. 2010. Space speaks: towards socially and personality aware visual surveillance. In Proceedings of the 2010 ACM International Workshop on Multimodal Pervasive Video Analysis (MPVA). ACM, 37--42.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving Social Awareness Through DANTE: Deep Affinity Network for Clustering Conversational Interactants

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!