skip to main content
research-article
Open Access

Interactive augmented reality storytelling guided by scene semantics

Published:22 July 2022Publication History
Skip Abstract Section

Abstract

We present a novel interactive augmented reality (AR) storytelling approach guided by indoor scene semantics. Our approach automatically populates virtual contents in real-world environments to deliver AR stories, which match both the story plots and scene semantics. During the storytelling process, a player can participate as a character in the story. Meanwhile, the behaviors of the virtual characters and the placement of the virtual items adapt to the player's actions. An input raw story is represented as a sequence of events, which contain high-level descriptions of the characters' states, and is converted into a graph representation with automatically supplemented low-level spatial details. Our hierarchical story sampling approach samples realistic character behaviors that fit the story contexts through optimizations; and an animator, which estimates and prioritizes the player's actions, animates the virtual characters to tell the story in AR. Through experiments and a user study, we validated the effectiveness of our approach for AR storytelling in different environments.

Skip Supplemental Material Section

Supplemental Material

091-151-supp-video.mp4

supplemental material

3528223.3530061.mp4

References

  1. Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shailen Agrawal and Michiel van de Panne. 2016. Task-based locomotion. ACM Transactions on Graphics 35, 4 (2016), 1--11.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Raphael Anderegg, Loïc Ciccone, and Robert W Sumner. 2018. PuppetPhone: pup-peteering virtual characters using a smartphone. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games. 1--6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael Argyle and Mark Cook. 1976. Gaze and mutual gaze. (1976).Google ScholarGoogle Scholar
  5. Andreas Aristidou, Joan Lasenby, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Inverse kinematics techniques in computer graphics: A survey. In Computer graphics forum, Vol. 37. Wiley Online Library, 35--58.Google ScholarGoogle Scholar
  6. Yunfei Bai, Kristin Siu, and C Karen Liu. 2012. Synthesis of concurrent object manipulation tasks. ACM Transactions on Graphics 31, 6 (2012), 1--9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Mark Billinghurst, Hirokazu Kato, and Ivan Poupyrev. 2001. The magicbook-moving seamlessly between reality and virtuality. IEEE Computer Graphics and Applications 21, 3 (2001), 6--8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, and Jitendra Malik. 2020. Long-term human motion prediction with scene context. In European Conference on Computer Vision. Springer, 387--404.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Justine Cassell and Kimiko Ryokai. 2001. Making space for voice: Technologies to support children's fantasy and storytelling. Personal and ubiquitous computing 5, 3 (2001), 169--190.Google ScholarGoogle Scholar
  10. Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. 2017. Matterport3D: Learning from RGB-D Data in Indoor Environments. International Conference on 3D Vision (2017).Google ScholarGoogle ScholarCross RefCross Ref
  11. Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann.2021. Scene Graphs: A Survey of Generations and Applications. arXiv preprint arXiv:2104.01111 (2021).Google ScholarGoogle Scholar
  12. Long Chen, Wen Tang, Nigel John, Tao Ruan Wan, and Jian Jun Zhang. 2018. Context-aware mixed reality: A framework for ubiquitous interaction. arXiv preprint arXiv:1803.05541 (2018).Google ScholarGoogle Scholar
  13. Mengyu Chen, Andrés Monroy-Hernández, and Misha Sra. 2021. SceneAR: Scene-based Micro Narratives for Sharing and Remixing in Augmented Reality. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 294--303.Google ScholarGoogle Scholar
  14. Yifei Cheng, Yukang Yan, Xin Yi, Yuanchun Shi, and David Lindlbauer. 2021. SemanticAdapt: Optimization-based Adaptation of Mixed Reality Layouts Leveraging Virtual-Physical Semantic Connections. In UIST. 282--297.Google ScholarGoogle Scholar
  15. Sung Ho Choi, Kyeong-Beom Park, Dong Hyeon Roh, Jae Yeol Lee, Mustafa Mohammed, Yalda Ghasemi, and Heejin Jeong. 2022. An integrated mixed reality system for safety-aware human-robot collaboration using deep learning and digital twin generation. Robotics and Computer-Integrated Manufacturing 73 (2022), 102258.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zhi-Chao Dong, Wenming Wu, Zenghao Xu, Qi Sun, Guanjie Yuan, Ligang Liu, and Xiao-Ming Fu. 2021. Tailored Reality: Perception-aware Scene Restructuring for Adaptive VR Navigation. ACM Transactions on Graphics 40, 5 (2021), 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Matthew Fisher, Manolis Savva, and Pat Hanrahan. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM SIGGRAPH 2011 papers. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Matthew Fisher, Manolis Savva, Yangyan Li, Pat Hanrahan, and Matthias Nießner. 2015. Activity-centric scene synthesis for functional 3D scene modeling. ACM Transactions on Graphics 34, 6 (2015), 1--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ran Gal, Lior Shapira, Eyal Ofek, and Pushmeet Kohli. 2014. FLARE: Fast layout for augmented reality applications. In 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 207--212.Google ScholarGoogle ScholarCross RefCross Ref
  20. Terrell Glenn, Ananya Ipsita, Caleb Carithers, Kylie Peppler, and Karthik Ramani. 2020. StoryMakAR: Bringing stories to life with an augmented reality & physical prototyping toolkit for youth. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Raphaël Grasset, Andreas Dünser, and Mark Billinghurst. 2008. Edutainment with a mixed reality book: a visually augmented illustrative childrens' book. In Proceedings of the international conference on advances in computer entertainment technology. 292--295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Theodore P Grosvenor. 2007. Primary care optometry. Elsevier Health Sciences.Google ScholarGoogle Scholar
  23. Abhinav Gupta, Scott Satkin, Alexei A Efros, and Martial Hebert. 2011. From 3d scene geometry to human workspace. In CVPR 2011. IEEE, 1961--1968.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mohamed Hassan, Duygu Ceylan, Ruben Villegas, Jun Saito, Jimei Yang, Yi Zhou, and Michael J Black. 2021a. Stochastic scene-aware motion prediction. In ICCV. 11374--11384.Google ScholarGoogle Scholar
  25. Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dimitrios Tzionas, and Michael J Black. 2021b. Populating 3D Scenes by Learning Human-Scene Interaction. In CVPR. 14708--14718.Google ScholarGoogle Scholar
  26. W Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. (1970).Google ScholarGoogle Scholar
  27. Fengming He, Xiyun Hu, Tianyi Wang, Ananya Ipsita, and Karthik Ramani. 2022. ScalAR: Authoring Semantically Adaptive Augmented Reality Experiences in Virtual Reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.Google ScholarGoogle Scholar
  28. Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3d scene synthesis and 2d image rendering with per-pixel ground truth using stochastic grammars. International Journal of Computer Vision 126, 9 (2018), 920--941.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In CVPR. 3668--3678.Google ScholarGoogle Scholar
  30. Vladimir G Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671--680.Google ScholarGoogle Scholar
  32. Yining Lang, Wei Liang, and Lap-Fai Yu. 2019. Virtual agent positioning driven by scene semantics in mixed reality. In 2019 IEEE VR. IEEE, 767--775.Google ScholarGoogle Scholar
  33. Changyang Li, Haikun Huang, Jyh-Ming Lien, and Lap-Fai Yu. 2021. Synthesizing scene-aware virtual reality teleport graphs. ACM Transactions on Graphics 40, 6 (2021), 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, and Hao Zhang. 2019. Grains: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics 38, 2 (2019), 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wei Liang, Xinzhe Yu, Rawan Alghofaili, Yining Lang, and Lap-Fai Yu. 2021b. Scene-Aware Behavior Synthesis for Virtual Pets in Mixed Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, and Kui Jia. 2021a. Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks. In ICCV. 2783--2792.Google ScholarGoogle Scholar
  37. David Lindlbauer, Anna Maria Feit, and Otmar Hilliges. 2019. Context-aware online adaptation of mixed reality interfaces. In UIST. 147--160.Google ScholarGoogle Scholar
  38. Rui Ma, Honghua Li, Changqing Zou, Zicheng Liao, Xin Tong, and Hao Zhang. 2016. Action-driven 3D indoor scene evolution. ACM Trans. Graph. 35, 6 (2016), 173--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 6 (1953), 1087--1092.Google ScholarGoogle ScholarCross RefCross Ref
  40. Microsoft. 2016. Fragments. www.microsoft.com/en-us/p/fragments/9nblggh5ggm8Google ScholarGoogle Scholar
  41. Benjamin Nuernberger, Eyal Ofek, Hrvoje Benko, and Andrew D Wilson. 2016. Snapto-reality: Aligning augmented reality to the real world. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1233--1244.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sören Pirk, Vojtech Krs, Kaimo Hu, Suren Deepak Rajasekaran, Hao Kang, Yusuke Yoshiyasu, Bedrich Benes, and Leonidas J Guibas. 2017. Understanding and exploiting object interaction landscapes. ACM Trans. Graph. 36, 3 (2017), 1--14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. 2018. Virtualhome: Simulating household activities via programs. In CVPR. 8494--8502.Google ScholarGoogle Scholar
  44. Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In ICCV. 1164--1172.Google ScholarGoogle Scholar
  45. Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In CVPR. 5899--5908.Google ScholarGoogle Scholar
  46. Shuwen Qiu, Hangxin Liu, Zeyu Zhang, Yixin Zhu, and Song-Chun Zhu. 2020. Human-Robot Interaction in a Shared Augmented Reality Workspace. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11413--11418.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dariusz Rumiński and Krzysztof Walczak. 2013. Creation of interactive AR content on mobile devices. In International Conference on Business Information Systems. Springer, 258--269.Google ScholarGoogle ScholarCross RefCross Ref
  48. Manolis Savva, Angel X Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2014. SceneGrok: Inferring action maps in 3D environments. ACM Transactions on Graphics 33, 6 (2014), 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Manolis Savva, Angel X Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2016. Pigraphs: learning interaction snapshots from observations. ACM Transactions on Graphics 35, 4 (2016), 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics 38, 6 (2019), 209--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Tomu Tahara, Takashi Seno, Gaku Narita, and Tomoya Ishikawa. 2020. Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph. In 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 249--255.Google ScholarGoogle ScholarCross RefCross Ref
  52. Julien Valentin, Vibhav Vineet, Ming-Ming Cheng, David Kim, Jamie Shotton, Push-meet Kohli, Matthias Nießner, Antonio Criminisi, Shahram Izadi, and Philip Torr. 2015. Semanticpaint: Interactive 3d labeling and learning at your fingertips. ACM Transactions on Graphics 34, 5 (2015), 1--17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, and Xiaolong Wang. 2021a. Synthesizing long-term 3d human motion and interaction in 3d scenes. In CVPR. 9401--9411.Google ScholarGoogle Scholar
  54. Jingbo Wang, Sijie Yan, Bo Dai, and Dahua Lin. 2021b. Scene-aware generative network for human motion synthesis. In CVPR. 12206--12215.Google ScholarGoogle Scholar
  55. Tianyi Wang, Xun Qian, Fengming He, Xiyun Hu, Ke Huo, Yuanzhi Cao, and Karthik Ramani. 2020. CAPturAR: An augmented reality tool for authoring human-involved context-aware applications. In UIST. 328--341.Google ScholarGoogle Scholar
  56. Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics 32, 4 (2013), 1--15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hui Ye, Kin Chung Kwan, Wanchao Su, and Hongbo Fu. 2020. ARAnimator: in-situ character animation in mobile AR with user-defined motion gestures. ACM Transactions on Graphics 39, 4 (2020), 83--1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In CVPR. 3119--3126.Google ScholarGoogle Scholar
  59. Jixuan Zhi, Lap-Fai Yu, and Jyh-Ming Lien. 2021. Designing Human-Robot Coexistence Space. IEEE Robotics and Automation Letters 6, 4 (2021), 7161--7168.Google ScholarGoogle ScholarCross RefCross Ref
  60. Zhiying Zhou, Adrian David Cheok, JiunHorng Pan, and Yu Li. 2004. Magic Story Cube: an interactive tangible interface for storytelling. In Proceedings of the International Conference on Advances in Computer Entertainment Technology. 364--365.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Song-Chun Zhu and David Mumford. 2007. A stochastic grammar of images. Now Publishers Inc.Google ScholarGoogle Scholar

Index Terms

  1. Interactive augmented reality storytelling guided by scene semantics

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 41, Issue 4
      July 2022
      1978 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3528223
      Issue’s Table of Contents

      Copyright © 2022 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2022
      Published in tog Volume 41, Issue 4

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader