Abstract
Omnidirectional videos capture environmental scenes effectively, but they have rarely been used for geometry reconstruction. In this work, we propose an egocentric 3D reconstruction method that can acquire scene geometry with high accuracy from a short egocentric omnidirectional video. To this end, we first estimate per-frame depth using a spherical disparity network. We then fuse per-frame depth estimates into a novel spherical binoctree data structure that is specifically designed to tolerate spherical depth estimation errors. By subdividing the spherical space into binary tree and octree nodes that represent spherical frustums adaptively, the spherical binoctree effectively enables egocentric surface geometry reconstruction for environmental scenes while simultaneously assigning high-resolution nodes for closely observed surfaces. This allows to reconstruct an entire scene from a short video captured with a small camera trajectory. Experimental results validate the effectiveness and accuracy of our approach for reconstructing the 3D geometry of environmental scenes from short egocentric omnidirectional video inputs. We further demonstrate various applications using a conventional omnidirectional camera, including novel-view synthesis, object insertion, and relighting of scenes using reconstructed 3D models with texture.
Supplemental Material
- Tobias Bertel, Mingze Yuan, Reuben Lindroos, and Christian Richardt. 2020. OmniPhotos: Casual 360° VR Photography. ACM Trans. Graph. 39, 6 (2020), 267:1--12. Google Scholar
Digital Library
- Blender Online Community. 2022. Blender - a 3D modelling and rendering package. Blender Foundation. https://www.blender.org/Google Scholar
- Brian Curless and Marc Levoy. 1996. A volumetric method for building complex models from range images. In SIGGRAPH. 303--312. Google Scholar
Digital Library
- Marc Eder, Pierre Moulon, and Li Guan. 2019. Pano Popups: Indoor 3D Reconstruction with a Plane-Aware Network. In 3DV. 76--84. Google Scholar
Cross Ref
- Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Trans. Graph. 36, 6 (2017), 234:1--15. Google Scholar
Digital Library
- Peter Hedman and Johannes Kopf. 2018. Instant 3D Photography. ACM Trans. Graph. 37, 4 (2018), 101:1--12. Google Scholar
Digital Library
- Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable Inside-Out Image-Based Rendering. ACM Trans. Graph. 35, 6 (2016), 231:1--11. Google Scholar
Digital Library
- Daniel Hernandez-Juarez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure, and Antonio M. López. 2016. Embedded Real-time Stereo Estimation via Semi-Global Matching on the GPU. In International Conference on Computational Science. 143--153. Google Scholar
Digital Library
- Heiko Hirschmüller. 2008. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Trans. Pattern Anal. 30, 2 (2008), 328--341. Google Scholar
Digital Library
- Sunghoon Im, Hyowon Ha, François Rameau, Hae-Gon Jeon, Gyeongmin Choe, and In So Kweon. 2016. All-around Depth from Small Motion with A Spherical Panoramic Camera. In ECCV. Google Scholar
Cross Ref
- Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Push-meet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. 2011. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. In UIST. 559--568. Google Scholar
Digital Library
- Hualie Jiang, Zhe Sheng, Siyu Zhu, Zilong Dong, and Rui Huang. 2021. UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation. IEEE Robotics and Automation Letters 6, 2 (2021), 1519--1526. Google Scholar
Cross Ref
- Lei Jin, Yanyu Xu, Jia Zheng, Junfei Zhang, Rui Tang, Shugong Xu, Jingyi Yu, and Shenghua Gao. 2020. Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery. In CVPR. 886--895. Google Scholar
Cross Ref
- Sing Bing Kang and Richard Szeliski. 1997. 3-D Scene Data Recovery Using Omnidirectional Multibaseline Stereo. Int. J. Comput. Vis. 25, 2 (1997), 167--183. Google Scholar
Digital Library
- Michael Kazhdan and Hugues Hoppe. 2013. Screened Poisson Surface Reconstruction. ACM Trans. Graph. 32, 3 (2013), 29:1--13. Google Scholar
Digital Library
- Hansung Kim and Adrian Hilton. 2013. 3D Scene Reconstruction from Multiple Spherical Stereo Pairs. Int. J. Comput. Vis. 104, 1 (2013), 94--116. Google Scholar
Digital Library
- Ren Komatsu, Hiromitsu Fujii, Yusuke Tamura, Atsushi Yamashita, and Hajime Asama. 2020. 360° Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron. In IROS. Google Scholar
Digital Library
- Tilman Kühner and Julius Kümmerle. 2020. Large-Scale Volumetric Scene Reconstruction using LiDAR. In ICRA. 6261--6267. Google Scholar
Cross Ref
- Po Kong Lai, Shuang Xie, Jochen Lang, and Robert Laganière. 2019. Real-time panoramic depth maps from omni-directional stereo images for 6 DoF videos in virtual reality. In IEEE VR. 405--412. Google Scholar
Cross Ref
- Joo Ho Lee, Hyunho Ha, Yue Dong, Xin Tong, and Min H. Kim. 2020. TextureFusion: High-Quality Texture Acquisition for Real-Time RGB-D Scanning. In CVPR. 1272--1280. Google Scholar
Cross Ref
- Shigang Li. 2008. Binocular Spherical Stereo. IEEE Transactions on Intelligent Transportation Systems 9, 4 (2008), 589--600. Google Scholar
Digital Library
- Vadim Litvinov and Maxime Lhuillier. 2013. Incremental Solid Modeling from Sparse and Omnidirectional Structure-from-Motion Data. In BMVC.Google Scholar
- William E. Lorensen and Harvey E. Cline. 1987. Marching cubes: A high resolution 3D surface construction algorithm. Computer Graphics 21, 4 (1987), 163--169. Google Scholar
Digital Library
- Xuan Luo, Jia-Bin Huang, Richard Szeliski, Kevin Matzen, and Johannes Kopf. 2020. Consistent Video Depth Estimation. ACM Trans. Graph. 39, 4 (2020), 71:1--13. Google Scholar
Digital Library
- Bruno Lévy, Sylvain Petitjean, Nicolas Ray, and Jérome Maillot. 2002. Least Squares Conformal Maps for Automatic Texture Atlas Generation. ACM Trans. Graph. 21, 3 (2002), 362--371. Google Scholar
Digital Library
- Kevin Matzen, Michael F. Cohen, Bryce Evans, Johannes Kopf, and Richard Szeliski. 2017. Low-cost 360 Stereo Photography and Video Capture. ACM Trans. Graph. 36, 4 (2017), 148:1--12. Google Scholar
Digital Library
- Morgan McGuire. 2017. Computer Graphics Archive. https://casual-effects.com/dataGoogle Scholar
- Andréas Meuleman, Hyeonjoong Jang, Daniel S. Jeon, and Min H. Kim. 2021. Real-Time Sphere Sweeping Stereo from Multiview Fisheye Images. In CVPR. Google Scholar
Cross Ref
- Pierre Moulon, Pascal Monasse, Romuald Perrot, and Renaud Marlet. 2016. OpenMVG: Open multiple view geometry. In International Workshop on Reproducible Research in Pattern Recognition. 60--74. Google Scholar
Cross Ref
- Matthias Nießner, Michael Zollhöfer, Shahram Izadi, and Marc Stamminger. 2013. Realtime 3D Reconstruction at Scale Using Voxel Hashing. ACM Trans. Graph. 32, 6 (2013), 169:1--11. Google Scholar
Digital Library
- Ryan Styles Overbeck, Daniel Erickson, Daniel Evangelakos, Matt Pharr, and Paul Debevec. 2018. A System for Acquiring, Compressing, and Rendering Panoramic Light Field Stills for Virtual Reality. ACM Trans. Graph. 37, 6 (2018), 197:1--15. Google Scholar
Digital Library
- Albert Parra Pozo, Michael Toksvig, Terry Filiba Schrager, Joyse Hsu, Uday Mathur, Alexander Sorkine-Hornung, Rick Szeliski, and Brian Cabral. 2019. An Integrated 6DoF Video Camera and System Design. ACM Trans. Graph. 38, 6 (2019), 216:1--16. Google Scholar
Digital Library
- Giovanni Pintore, Marco Agus, Eva Almansa, Jens Schneider, and Enrico Gobbetti. 2021. SliceNet: Deep Dense Depth Estimation From a Single Indoor Panorama Using a Slice-Based Representation. In CVPR. 11531--11540. Google Scholar
Cross Ref
- Marc Pollefeys, Luc Van Gool, Maarten Vergauwen, Frank Verbiest, Kurt Cornelis, Jan Tops, and Reinhard Koch. 2004. Visual Modeling with a Hand-Held Camera. Int. J. Comput. Vis. 59, 3 (2004), 207--232. Google Scholar
Digital Library
- Pedro V. Sander, Steven J. Gortler, John Snyder, and Hugues Hoppe. 2002. Signal-Specialized Parametrization. In Eurographics Workshop on Rendering. 87--98.Google Scholar
- Scott Schaefer and Joe Warren. 2005. Dual Marching Cubes: Primal Contouring of Dual Grids. Comput. Graph. Forum 24, 2 (2005), 195--201. Google Scholar
Cross Ref
- Johannes L. Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR. 4104--4113. Google Scholar
Cross Ref
- Johannes L. Schönberger, Enliang Zheng, Jan-Michael Frahm, and Marc Pollefeys. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. In ECCV. 501--518. Google Scholar
Cross Ref
- Ana Serrano, Incheol Kim, Zhili Chen, Stephen DiVerdi, Diego Gutierrez, Aaron Hertzmann, and Belen Masia. 2019. Motion parallax for 360° RGBD video. IEEE Trans. Vis. Comput. Graph. 25, 5 (2019), 1817--1827. Google Scholar
Cross Ref
- Shinya Sumikura, Mikiya Shibuya, and Ken Sakurada. 2019. OpenVSLAM: a Versatile Visual SLAM Framework. In International Conference on Multimedia. Google Scholar
Digital Library
- Cheng Sun, Min Sun, and Hwann-Tzong Chen. 2021. HoHoNet: 360 Indoor Holistic Understanding with Latent Horizontal Features. In CVPR. 2573--2582. Google Scholar
Cross Ref
- Zachary Teed and Jia Deng. 2020. RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. In ECCV. Google Scholar
Digital Library
- Fu-En Wang, Hou-Ning Hu, Hsien-Tzu Cheng, Juan-Ting Lin, Shang-Ta Yang, Meng-Li Shih, Hung-Kuo Chu, and Min Sun. 2018. Self-Supervised Learning of Depth and Camera Motion from 360° Videos. In ACCV. Google Scholar
Cross Ref
- Fu-En Wang, Yu-Hsuan Yeh, Min Sun, Wei-Chen Chiu, and Yi-Hsuan Tsai. 2020b. BiFuse: Monocular 360 Depth Estimation via Bi-Projection Fusion. In CVPR. 462--471. Google Scholar
Cross Ref
- Ning-Hsu Wang, Bolivar Solarte, Yi-Hsuan Tsai, Wei-Chen Chiu, and Min Sun. 2020a. 360SD-Net: 360° Stereo Depth Estimation with Learnable Cost Volume. In ICRA. 582--588. Google Scholar
Cross Ref
- Katja Wolff, Changil Kim, Henning Zimmer, Christopher Schroers, Mario Botsch, Olga Sorkine-Hornung, and Alexander Sorkine-Hornung. 2016. Point Cloud Noise and Outlier Removal for Image-Based 3D Reconstruction. In 3DV. 118--127. Google Scholar
Cross Ref
- Changhee Won, Jongbin Ryu, and Jongwoo Lim. 2019a. OmniMVS: End-to-End Learning for Omnidirectional Stereo Matching. In ICCV. 8986--8995. Google Scholar
Cross Ref
- Changhee Won, Jongbin Ryu, and Jongwoo Lim. 2019b. SweepNet: Wide-baseline Omnidirectional Depth Estimation. In ICRA. Google Scholar
Digital Library
- Changhee Won, Hochang Seok, Zhaopeng Cui, Marc Pollefeys, and Jongwoo Lim. 2020. OmniSLAM: Omnidirectional Localization and Dense Mapping for Wide-baseline Multi-camera Systems. In ICRA. 559--566. Google Scholar
Cross Ref
- Ming Zeng, Fukai Zhao, Jiaxiang Zheng, and Xinguo Liu. 2013. Octree-based fusion for realtime 3D reconstruction. Graphical Models 75, 3 (2013), 126--136. Google Scholar
Digital Library
- Wei Zeng, Sezer Karaoglu, and Theo Gevers. 2020. Joint 3D Layout and Depth Prediction from a Single Indoor Panorama Image. In ECCV. Google Scholar
Digital Library
- Jianing Zhang, Tianyi Zhu, Anke Zhang, Xiaoyun Yuan, Zihan Wang, Sebastian Beetschen, Lan Xu, Xing Lin, Qionghai Dai, and Lu Fang. 2020. Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality. In ICCP. Google Scholar
Cross Ref
- Kun Zhou, John Synder, Baining Guo, and Heung-Yeung Shum. 2004. Iso-Charts: Stretch-Driven Mesh Parameterization Using Spectral Analysis. In Symposium on Geometry Processing (SGP). 45--54. Google Scholar
Digital Library
- Qian-Yi Zhou and Vladlen Koltun. 2014. Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras. ACM Trans. Graph. 33, 4 (2014), 155:1--10. Google Scholar
Digital Library
- Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, Federico Alvarez, and Petros Daras. 2019. Spherical View Synthesis for Self-Supervised 360° Depth Estimation. In 3DV. 690--699. Google Scholar
Cross Ref
- Nikolaos Zioulis, Antonis Karakottas, Dimitrios Zarpalas, and Petros Daras. 2018. OmniDepth: Dense Depth Estimation for Indoors Spherical Panoramas. In ECCV. 448--465. Google Scholar
Digital Library
Index Terms
Egocentric scene reconstruction from an omnidirectional video
Recommendations
Modelling and accuracy estimation of a new omnidirectional depth computation sensor
Depth computation is an attractive feature in computer vision. The use of traditional perspective cameras for panoramic perception requires several images, most likely implying the use of several cameras or of a sensor with mobile elements. Moreover, ...
3D Scene Reconstruction from Multiple Spherical Stereo Pairs
We propose a 3D environment modelling method using multiple pairs of high-resolution spherical images. Spherical images of a scene are captured using a rotating line scan camera. Reconstruction is based on stereo image pairs with a vertical displacement ...
3D Scene Reconstruction with an Un-calibrated Light Field Camera
AbstractThis paper is concerned with the problem of multi-view 3D reconstruction with an un-calibrated micro-lens array based light field camera. To acquire 3D Euclidean reconstruction, existing approaches commonly apply the calibration with a ...





Comments