Abstract
We present a novel interactive augmented reality (AR) storytelling approach guided by indoor scene semantics. Our approach automatically populates virtual contents in real-world environments to deliver AR stories, which match both the story plots and scene semantics. During the storytelling process, a player can participate as a character in the story. Meanwhile, the behaviors of the virtual characters and the placement of the virtual items adapt to the player's actions. An input raw story is represented as a sequence of events, which contain high-level descriptions of the characters' states, and is converted into a graph representation with automatically supplemented low-level spatial details. Our hierarchical story sampling approach samples realistic character behaviors that fit the story contexts through optimizations; and an animator, which estimates and prioritizes the player's actions, animates the virtual characters to tell the story in AR. Through experiments and a user study, we validated the effectiveness of our approach for AR storytelling in different environments.
Supplemental Material
- Henny Admoni and Brian Scassellati. 2017. Social eye gaze in human-robot interaction: a review. Journal of Human-Robot Interaction 6, 1 (2017), 25--63.Google Scholar
Digital Library
- Shailen Agrawal and Michiel van de Panne. 2016. Task-based locomotion. ACM Transactions on Graphics 35, 4 (2016), 1--11.Google Scholar
Digital Library
- Raphael Anderegg, Loïc Ciccone, and Robert W Sumner. 2018. PuppetPhone: pup-peteering virtual characters using a smartphone. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games. 1--6.Google Scholar
Digital Library
- Michael Argyle and Mark Cook. 1976. Gaze and mutual gaze. (1976).Google Scholar
- Andreas Aristidou, Joan Lasenby, Yiorgos Chrysanthou, and Ariel Shamir. 2018. Inverse kinematics techniques in computer graphics: A survey. In Computer graphics forum, Vol. 37. Wiley Online Library, 35--58.Google Scholar
- Yunfei Bai, Kristin Siu, and C Karen Liu. 2012. Synthesis of concurrent object manipulation tasks. ACM Transactions on Graphics 31, 6 (2012), 1--9.Google Scholar
Digital Library
- Mark Billinghurst, Hirokazu Kato, and Ivan Poupyrev. 2001. The magicbook-moving seamlessly between reality and virtuality. IEEE Computer Graphics and Applications 21, 3 (2001), 6--8.Google Scholar
Digital Library
- Zhe Cao, Hang Gao, Karttikeya Mangalam, Qi-Zhi Cai, Minh Vo, and Jitendra Malik. 2020. Long-term human motion prediction with scene context. In European Conference on Computer Vision. Springer, 387--404.Google Scholar
Digital Library
- Justine Cassell and Kimiko Ryokai. 2001. Making space for voice: Technologies to support children's fantasy and storytelling. Personal and ubiquitous computing 5, 3 (2001), 169--190.Google Scholar
- Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Niessner, Manolis Savva, Shuran Song, Andy Zeng, and Yinda Zhang. 2017. Matterport3D: Learning from RGB-D Data in Indoor Environments. International Conference on 3D Vision (2017).Google Scholar
Cross Ref
- Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann.2021. Scene Graphs: A Survey of Generations and Applications. arXiv preprint arXiv:2104.01111 (2021).Google Scholar
- Long Chen, Wen Tang, Nigel John, Tao Ruan Wan, and Jian Jun Zhang. 2018. Context-aware mixed reality: A framework for ubiquitous interaction. arXiv preprint arXiv:1803.05541 (2018).Google Scholar
- Mengyu Chen, Andrés Monroy-Hernández, and Misha Sra. 2021. SceneAR: Scene-based Micro Narratives for Sharing and Remixing in Augmented Reality. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 294--303.Google Scholar
- Yifei Cheng, Yukang Yan, Xin Yi, Yuanchun Shi, and David Lindlbauer. 2021. SemanticAdapt: Optimization-based Adaptation of Mixed Reality Layouts Leveraging Virtual-Physical Semantic Connections. In UIST. 282--297.Google Scholar
- Sung Ho Choi, Kyeong-Beom Park, Dong Hyeon Roh, Jae Yeol Lee, Mustafa Mohammed, Yalda Ghasemi, and Heejin Jeong. 2022. An integrated mixed reality system for safety-aware human-robot collaboration using deep learning and digital twin generation. Robotics and Computer-Integrated Manufacturing 73 (2022), 102258.Google Scholar
Digital Library
- Zhi-Chao Dong, Wenming Wu, Zenghao Xu, Qi Sun, Guanjie Yuan, Ligang Liu, and Xiao-Ming Fu. 2021. Tailored Reality: Perception-aware Scene Restructuring for Adaptive VR Navigation. ACM Transactions on Graphics 40, 5 (2021), 1--15.Google Scholar
Digital Library
- Matthew Fisher, Manolis Savva, and Pat Hanrahan. 2011. Characterizing structural relationships in scenes using graph kernels. In ACM SIGGRAPH 2011 papers. 1--12.Google Scholar
Digital Library
- Matthew Fisher, Manolis Savva, Yangyan Li, Pat Hanrahan, and Matthias Nießner. 2015. Activity-centric scene synthesis for functional 3D scene modeling. ACM Transactions on Graphics 34, 6 (2015), 1--13.Google Scholar
Digital Library
- Ran Gal, Lior Shapira, Eyal Ofek, and Pushmeet Kohli. 2014. FLARE: Fast layout for augmented reality applications. In 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 207--212.Google Scholar
Cross Ref
- Terrell Glenn, Ananya Ipsita, Caleb Carithers, Kylie Peppler, and Karthik Ramani. 2020. StoryMakAR: Bringing stories to life with an augmented reality & physical prototyping toolkit for youth. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14.Google Scholar
Digital Library
- Raphaël Grasset, Andreas Dünser, and Mark Billinghurst. 2008. Edutainment with a mixed reality book: a visually augmented illustrative childrens' book. In Proceedings of the international conference on advances in computer entertainment technology. 292--295.Google Scholar
Digital Library
- Theodore P Grosvenor. 2007. Primary care optometry. Elsevier Health Sciences.Google Scholar
- Abhinav Gupta, Scott Satkin, Alexei A Efros, and Martial Hebert. 2011. From 3d scene geometry to human workspace. In CVPR 2011. IEEE, 1961--1968.Google Scholar
Digital Library
- Mohamed Hassan, Duygu Ceylan, Ruben Villegas, Jun Saito, Jimei Yang, Yi Zhou, and Michael J Black. 2021a. Stochastic scene-aware motion prediction. In ICCV. 11374--11384.Google Scholar
- Mohamed Hassan, Partha Ghosh, Joachim Tesch, Dimitrios Tzionas, and Michael J Black. 2021b. Populating 3D Scenes by Learning Human-Scene Interaction. In CVPR. 14708--14718.Google Scholar
- W Keith Hastings. 1970. Monte Carlo sampling methods using Markov chains and their applications. (1970).Google Scholar
- Fengming He, Xiyun Hu, Tianyi Wang, Ananya Ipsita, and Karthik Ramani. 2022. ScalAR: Authoring Semantically Adaptive Augmented Reality Experiences in Virtual Reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems.Google Scholar
- Chenfanfu Jiang, Siyuan Qi, Yixin Zhu, Siyuan Huang, Jenny Lin, Lap-Fai Yu, Demetri Terzopoulos, and Song-Chun Zhu. 2018. Configurable 3d scene synthesis and 2d image rendering with per-pixel ground truth using stochastic grammars. International Journal of Computer Vision 126, 9 (2018), 920--941.Google Scholar
Digital Library
- Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In CVPR. 3668--3678.Google Scholar
- Vladimir G Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2pose: Human-centric shape analysis. ACM Transactions on Graphics 33, 4 (2014), 1--12.Google Scholar
Digital Library
- Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. Science 220, 4598 (1983), 671--680.Google Scholar
- Yining Lang, Wei Liang, and Lap-Fai Yu. 2019. Virtual agent positioning driven by scene semantics in mixed reality. In 2019 IEEE VR. IEEE, 767--775.Google Scholar
- Changyang Li, Haikun Huang, Jyh-Ming Lien, and Lap-Fai Yu. 2021. Synthesizing scene-aware virtual reality teleport graphs. ACM Transactions on Graphics 40, 6 (2021), 1--15.Google Scholar
Digital Library
- Manyi Li, Akshay Gadi Patil, Kai Xu, Siddhartha Chaudhuri, Owais Khan, Ariel Shamir, Changhe Tu, Baoquan Chen, Daniel Cohen-Or, and Hao Zhang. 2019. Grains: Generative recursive autoencoders for indoor scenes. ACM Transactions on Graphics 38, 2 (2019), 1--16.Google Scholar
Digital Library
- Wei Liang, Xinzhe Yu, Rawan Alghofaili, Yining Lang, and Lap-Fai Yu. 2021b. Scene-Aware Behavior Synthesis for Virtual Pets in Mixed Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--12.Google Scholar
Digital Library
- Zhihao Liang, Zhihao Li, Songcen Xu, Mingkui Tan, and Kui Jia. 2021a. Instance Segmentation in 3D Scenes using Semantic Superpoint Tree Networks. In ICCV. 2783--2792.Google Scholar
- David Lindlbauer, Anna Maria Feit, and Otmar Hilliges. 2019. Context-aware online adaptation of mixed reality interfaces. In UIST. 147--160.Google Scholar
- Rui Ma, Honghua Li, Changqing Zou, Zicheng Liao, Xin Tong, and Hao Zhang. 2016. Action-driven 3D indoor scene evolution. ACM Trans. Graph. 35, 6 (2016), 173--1.Google Scholar
Digital Library
- Nicholas Metropolis, Arianna W Rosenbluth, Marshall N Rosenbluth, Augusta H Teller, and Edward Teller. 1953. Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 6 (1953), 1087--1092.Google Scholar
Cross Ref
- Microsoft. 2016. Fragments. www.microsoft.com/en-us/p/fragments/9nblggh5ggm8Google Scholar
- Benjamin Nuernberger, Eyal Ofek, Hrvoje Benko, and Andrew D Wilson. 2016. Snapto-reality: Aligning augmented reality to the real world. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1233--1244.Google Scholar
Digital Library
- Sören Pirk, Vojtech Krs, Kaimo Hu, Suren Deepak Rajasekaran, Hao Kang, Yusuke Yoshiyasu, Bedrich Benes, and Leonidas J Guibas. 2017. Understanding and exploiting object interaction landscapes. ACM Trans. Graph. 36, 3 (2017), 1--14.Google Scholar
Digital Library
- Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. 2018. Virtualhome: Simulating household activities via programs. In CVPR. 8494--8502.Google Scholar
- Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. 2017. Predicting human activities using stochastic grammar. In ICCV. 1164--1172.Google Scholar
- Siyuan Qi, Yixin Zhu, Siyuan Huang, Chenfanfu Jiang, and Song-Chun Zhu. 2018. Human-centric indoor scene synthesis using stochastic grammar. In CVPR. 5899--5908.Google Scholar
- Shuwen Qiu, Hangxin Liu, Zeyu Zhang, Yixin Zhu, and Song-Chun Zhu. 2020. Human-Robot Interaction in a Shared Augmented Reality Workspace. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 11413--11418.Google Scholar
Digital Library
- Dariusz Rumiński and Krzysztof Walczak. 2013. Creation of interactive AR content on mobile devices. In International Conference on Business Information Systems. Springer, 258--269.Google Scholar
Cross Ref
- Manolis Savva, Angel X Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2014. SceneGrok: Inferring action maps in 3D environments. ACM Transactions on Graphics 33, 6 (2014), 1--10.Google Scholar
Digital Library
- Manolis Savva, Angel X Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nießner. 2016. Pigraphs: learning interaction snapshots from observations. ACM Transactions on Graphics 35, 4 (2016), 1--12.Google Scholar
Digital Library
- Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics 38, 6 (2019), 209--1.Google Scholar
Digital Library
- Tomu Tahara, Takashi Seno, Gaku Narita, and Tomoya Ishikawa. 2020. Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph. In 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct). IEEE, 249--255.Google Scholar
Cross Ref
- Julien Valentin, Vibhav Vineet, Ming-Ming Cheng, David Kim, Jamie Shotton, Push-meet Kohli, Matthias Nießner, Antonio Criminisi, Shahram Izadi, and Philip Torr. 2015. Semanticpaint: Interactive 3d labeling and learning at your fingertips. ACM Transactions on Graphics 34, 5 (2015), 1--17.Google Scholar
Digital Library
- Jiashun Wang, Huazhe Xu, Jingwei Xu, Sifei Liu, and Xiaolong Wang. 2021a. Synthesizing long-term 3d human motion and interaction in 3d scenes. In CVPR. 9401--9411.Google Scholar
- Jingbo Wang, Sijie Yan, Bo Dai, and Dahua Lin. 2021b. Scene-aware generative network for human motion synthesis. In CVPR. 12206--12215.Google Scholar
- Tianyi Wang, Xun Qian, Fengming He, Xiyun Hu, Ke Huo, Yuanzhi Cao, and Karthik Ramani. 2020. CAPturAR: An augmented reality tool for authoring human-involved context-aware applications. In UIST. 328--341.Google Scholar
- Kun Xu, Kang Chen, Hongbo Fu, Wei-Lun Sun, and Shi-Min Hu. 2013. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Transactions on Graphics 32, 4 (2013), 1--15.Google Scholar
Digital Library
- Hui Ye, Kin Chung Kwan, Wanchao Su, and Hongbo Fu. 2020. ARAnimator: in-situ character animation in mobile AR with user-defined motion gestures. ACM Transactions on Graphics 39, 4 (2020), 83--1.Google Scholar
Digital Library
- Yibiao Zhao and Song-Chun Zhu. 2013. Scene parsing by integrating function, geometry and appearance models. In CVPR. 3119--3126.Google Scholar
- Jixuan Zhi, Lap-Fai Yu, and Jyh-Ming Lien. 2021. Designing Human-Robot Coexistence Space. IEEE Robotics and Automation Letters 6, 4 (2021), 7161--7168.Google Scholar
Cross Ref
- Zhiying Zhou, Adrian David Cheok, JiunHorng Pan, and Yu Li. 2004. Magic Story Cube: an interactive tangible interface for storytelling. In Proceedings of the International Conference on Advances in Computer Entertainment Technology. 364--365.Google Scholar
Digital Library
- Song-Chun Zhu and David Mumford. 2007. A stochastic grammar of images. Now Publishers Inc.Google Scholar
Index Terms
Interactive augmented reality storytelling guided by scene semantics
Recommendations
Location-Aware Adaptation of Augmented Reality Narratives
CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing SystemsThe recent popularity of augmented reality (AR) devices has enabled players to participate in interactive narratives through virtual events and characters populated in a real-world environment, where different actions may lead to different story ...
Object-focused mixed reality storytelling: technology-driven content creation and dissemination for engaging user experiences
PCI '18: Proceedings of the 22nd Pan-Hellenic Conference on InformaticsIn this paper we present the creation and deployment of a concerted set of Mixed Reality Technologies designed for creating engaging public experiences and enhanced storytelling opportunities focused on artefacts. Under the title of the Mixed Reality ...
Narrative Approaches to Design Multi-screen Augmented Reality Experiences
EVA London 2014: Proceedings of the EVA London 2014 on Electronic Visualisation and the ArtsAugmented Reality (AR) represents the future of the digital integrated museum experience. There is considerable scope for providing engaging and interactive experiences when using AR combined with traditional museum practices, particularly relative to ...





Comments