skip to main content
research-article
Open Access

Egocentric scene reconstruction from an omnidirectional video

Published:22 July 2022Publication History
Skip Abstract Section

Abstract

Omnidirectional videos capture environmental scenes effectively, but they have rarely been used for geometry reconstruction. In this work, we propose an egocentric 3D reconstruction method that can acquire scene geometry with high accuracy from a short egocentric omnidirectional video. To this end, we first estimate per-frame depth using a spherical disparity network. We then fuse per-frame depth estimates into a novel spherical binoctree data structure that is specifically designed to tolerate spherical depth estimation errors. By subdividing the spherical space into binary tree and octree nodes that represent spherical frustums adaptively, the spherical binoctree effectively enables egocentric surface geometry reconstruction for environmental scenes while simultaneously assigning high-resolution nodes for closely observed surfaces. This allows to reconstruct an entire scene from a short video captured with a small camera trajectory. Experimental results validate the effectiveness and accuracy of our approach for reconstructing the 3D geometry of environmental scenes from short egocentric omnidirectional video inputs. We further demonstrate various applications using a conventional omnidirectional camera, including novel-view synthesis, object insertion, and relighting of scenes using reconstructed 3D models with texture.

Skip Supplemental Material Section

Supplemental Material

3528223.3530074.mp4

presentation

100-205-supp-video.mp4

supplemental material

References

  1. Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt. 2020. OmniPhotos: Casual 360° VR Photography. ACM Trans. Graph. 39, 6 (2020), 267:1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Blender Online Community. 2022. Blender - a 3D modelling and rendering package. Blender Foundation. https://www.blender.org/Google ScholarGoogle Scholar
  3. Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marc Eder, Pierre Moulon, and Li Guan. 2019. Pano Popups: Indoor 3D Reconstruction with a Plane-Aware Network. In 3DV. 76--84. Google ScholarGoogle ScholarCross RefCross Ref
  5. Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Trans. Graph. 36, 6 (2017), 234:1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter Hedman and Johannes Kopf. 2018. Instant 3D Photography. ACM Trans. Graph. 37, 4 (2018), 101:1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable Inside-Out Image-Based Rendering. ACM Trans. Graph. 35, 6 (2016), 231:1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniel Hernandez-Juarez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure, and Antonio M. López. 2016. Embedded Real-time Stereo Estimation via Semi-Global Matching on the GPU. In International Conference on Computational Science. 143--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Heiko Hirschmüller. 2008. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Trans. Pattern Anal. 30, 2 (2008), 328--341. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sunghoon Im, Hyowon Ha, François Rameau, Hae-Gon Jeon, Gyeongmin Choe, and In So Kweon. 2016. All-around Depth from Small Motion with A Spherical Panoramic Camera. In ECCV. Google ScholarGoogle ScholarCross RefCross Ref
  11. Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Push-meet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. 2011. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In UIST. 559--568. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hualie Jiang, Zhe Sheng, Siyu Zhu, Zilong Dong, and Rui Huang. 2021. UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation. IEEE Robotics and Automation Letters 6, 2 (2021), 1519--1526. Google ScholarGoogle ScholarCross RefCross Ref
  13. Lei Jin, Yanyu Xu, Jia Zheng, Junfei Zhang, Rui Tang, Shugong Xu, Jingyi Yu, and Shenghua Gao. 2020. Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery. In CVPR. 886--895. Google ScholarGoogle ScholarCross RefCross Ref
  14. Sing Bing Kang and Richard Szeliski. 1997. 3-D Scene Data Recovery Using Omnidirectional Multibaseline Stereo. Int. J. Comput. Vis. 25, 2 (1997), 167--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael Kazhdan and Hugues Hoppe. 2013. Screened Poisson Surface Reconstruction. ACM Trans. Graph. 32, 3 (2013), 29:1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Hansung Kim and Adrian Hilton. 2013. 3D Scene Reconstruction from Multiple Spherical Stereo Pairs. Int. J. Comput. Vis. 104, 1 (2013), 94--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ren Komatsu, Hiromitsu Fujii, Yusuke Tamura, Atsushi Yamashita, and Hajime Asama. 2020. 360° Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron. In IROS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tilman Kühner and Julius Kümmerle. 2020. Large-Scale Volumetric Scene Reconstruction using LiDAR. In ICRA. 6261--6267. Google ScholarGoogle ScholarCross RefCross Ref
  19. Po Kong Lai, Shuang Xie, Jochen Lang, and Robert Laganière. 2019. Real-time panoramic depth maps from omni-directional stereo images for 6 DoF videos in virtual reality. In IEEE VR. 405--412. Google ScholarGoogle ScholarCross RefCross Ref
  20. Joo Ho Lee, Hyunho Ha, Yue Dong, Xin Tong, and Min H. Kim. 2020. TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning. In CVPR. 1272--1280. Google ScholarGoogle ScholarCross RefCross Ref
  21. Shigang Li. 2008. Binocular Spherical Stereo. IEEE Transactions on Intelligent Transportation Systems 9, 4 (2008), 589--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Vadim Litvinov and Maxime Lhuillier. 2013. Incremental Solid Modeling from Sparse and Omnidirectional Structure-from-Motion Data. In BMVC.Google ScholarGoogle Scholar
  23. William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. Computer Graphics 21, 4 (1987), 163--169. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. 2020. Consistent Video Depth Estimation. ACM Trans. Graph. 39, 4 (2020), 71:1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bruno Lévy, Sylvain Petitjean, Nicolas Ray, and Jérome Maillot. 2002. Least Squares Conformal Maps for Automatic Texture Atlas Generation. ACM Trans. Graph. 21, 3 (2002), 362--371. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kevin Matzen, Michael F. Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski. 2017. Low-cost 360 Stereo Photography and Video Capture. ACM Trans. Graph. 36, 4 (2017), 148:1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/dataGoogle ScholarGoogle Scholar
  28. Andréas Meuleman, Hyeonjoong Jang, Daniel S. Jeon, and Min H. Kim. 2021. Real-Time Sphere Sweeping Stereo from Multiview Fisheye Images. In CVPR. Google ScholarGoogle ScholarCross RefCross Ref
  29. Pierre Moulon, Pascal Monasse, Romuald Perrot, and Renaud Marlet. 2016. OpenMVG: Open multiple view geometry. In International Workshop on Reproducible Research in Pattern Recognition. 60--74. Google ScholarGoogle ScholarCross RefCross Ref
  30. Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Realtime 3D Reconstruction at Scale Using Voxel Hashing. ACM Trans. Graph. 32, 6 (2013), 169:1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ryan Styles Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A System for Acquiring, Compressing, and Rendering Panoramic Light Field Stills for Virtual Reality. ACM Trans. Graph. 37, 6 (2018), 197:1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Albert Parra Pozo, Michael Toksvig, Terry Filiba Schrager, Joyse Hsu, Uday Mathur, Alexander Sorkine-Hornung, Rick Szeliski, and Brian Cabral. 2019. An Integrated 6DoF Video Camera and System Design. ACM Trans. Graph. 38, 6 (2019), 216:1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Giovanni Pintore, Marco Agus, Eva Almansa, Jens Schneider, and Enrico Gobbetti. 2021. SliceNet: Deep Dense Depth Estimation From a Single Indoor Panorama Using a Slice-Based Representation. In CVPR. 11531--11540. Google ScholarGoogle ScholarCross RefCross Ref
  34. Marc Pollefeys, Luc Van Gool, Maarten Vergauwen, Frank Verbiest, Kurt Cornelis, Jan Tops, and Reinhard Koch. 2004. Visual Modeling with a Hand-Held Camera. Int. J. Comput. Vis. 59, 3 (2004), 207--232. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Pedro V. Sander, Steven J. Gortler, John Snyder, and Hugues Hoppe. 2002. Signal-Specialized Parametrization. In Eurographics Workshop on Rendering. 87--98.Google ScholarGoogle Scholar
  36. Scott Schaefer and Joe Warren. 2005. Dual Marching Cubes: Primal Contouring of Dual Grids. Comput. Graph. Forum 24, 2 (2005), 195--201. Google ScholarGoogle ScholarCross RefCross Ref
  37. Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR. 4104--4113. Google ScholarGoogle ScholarCross RefCross Ref
  38. Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In ECCV. 501--518. Google ScholarGoogle ScholarCross RefCross Ref
  39. Ana Serrano, Incheol Kim, Zhili Chen, Stephen DiVerdi, Diego Gutierrez, Aaron Hertzmann, and Belen Masia. 2019. Motion parallax for 360° RGBD video. IEEE Trans. Vis. Comput. Graph. 25, 5 (2019), 1817--1827. Google ScholarGoogle ScholarCross RefCross Ref
  40. Shinya Sumikura, Mikiya Shibuya, and Ken Sakurada. 2019. OpenVSLAM: a Versatile Visual SLAM Framework. In International Conference on Multimedia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2021. HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features. In CVPR. 2573--2582. Google ScholarGoogle ScholarCross RefCross Ref
  42. Zachary Teed and Jia Deng. 2020. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, and Min Sun. 2018. Self-Supervised Learning of Depth and Camera Motion from 360° Videos. In ACCV. Google ScholarGoogle ScholarCross RefCross Ref
  44. Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, and Yi-Hsuan Tsai. 2020b. BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion. In CVPR. 462--471. Google ScholarGoogle ScholarCross RefCross Ref
  45. Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, and Min Sun. 2020a. 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume. In ICRA. 582--588. Google ScholarGoogle ScholarCross RefCross Ref
  46. Katja Wolff, Changil Kim, Henning Zimmer, Christopher Schroers, Mario Botsch, Olga Sorkine-Hornung, and Alexander Sorkine-Hornung. 2016. Point Cloud Noise and Outlier Removal for Image-Based 3D Reconstruction. In 3DV. 118--127. Google ScholarGoogle ScholarCross RefCross Ref
  47. Changhee Won, Jongbin Ryu, and Jongwoo Lim. 2019a. OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching. In ICCV. 8986--8995. Google ScholarGoogle ScholarCross RefCross Ref
  48. Changhee Won, Jongbin Ryu, and Jongwoo Lim. 2019b. SweepNet: Wide-baseline Omnidirectional Depth Estimation. In ICRA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Changhee Won, Hochang Seok, Zhaopeng Cui, Marc Pollefeys, and Jongwoo Lim. 2020. OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems. In ICRA. 559--566. Google ScholarGoogle ScholarCross RefCross Ref
  50. Ming Zeng, Fukai Zhao, Jiaxiang Zheng, and Xinguo Liu. 2013. Octree-based fusion for realtime 3D reconstruction. Graphical Models 75, 3 (2013), 126--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wei Zeng, Sezer Karaoglu, and Theo Gevers. 2020. Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image. In ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Jianing Zhang, Tianyi Zhu, Anke Zhang, Xiaoyun Yuan, Zihan Wang, Sebastian Beetschen, Lan Xu, Xing Lin, Qionghai Dai, and Lu Fang. 2020. Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality. In ICCP. Google ScholarGoogle ScholarCross RefCross Ref
  53. Kun Zhou, John Synder, Baining Guo, and Heung-Yeung Shum. 2004. Iso-Charts: Stretch-Driven Mesh Parameterization Using Spectral Analysis. In Symposium on Geometry Processing (SGP). 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Qian-Yi Zhou and Vladlen Koltun. 2014. Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras. ACM Trans. Graph. 33, 4 (2014), 155:1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, and Petros Daras. 2019. Spherical View Synthesis for Self-Supervised 360° Depth Estimation. In 3DV. 690--699. Google ScholarGoogle ScholarCross RefCross Ref
  56. Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. 2018. OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas. In ECCV. 448--465. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Egocentric scene reconstruction from an omnidirectional video

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 41, Issue 4
      July 2022
      1978 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3528223
      Issue’s Table of Contents

      Copyright © 2022 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 July 2022
      Published in tog Volume 41, Issue 4

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader