Abstract
We present an algorithm that enables casual 3D photography. Given a set of input photos captured with a hand-held cell phone or DSLR camera, our algorithm reconstructs a 3D photo, a central panoramic, textured, normal mapped, multi-layered geometric mesh representation. 3D photos can be stored compactly and are optimized for being rendered from viewpoints that are near the capture viewpoints. They can be rendered using a standard rasterization pipeline to produce perspective views with motion parallax. When viewed in VR, 3D photos provide geometrically consistent views for both eyes. Our geometric representation also allows interacting with the scene using 3D geometry-aware effects, such as adding new objects to the scene and artistic lighting effects.
Our 3D photo reconstruction algorithm starts with a standard structure from motion and multi-view stereo reconstruction of the scene. The dense stereo reconstruction is made robust to the imperfect capture conditions using a novel near envelope cost volume prior that discards erroneous near depth hypotheses. We propose a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces. The two panoramas are fused into a single non-redundant, well-connected geometric mesh. We provide videos demonstrating users interactively viewing and manipulating our 3D photos.
- Robert Anderson, David Gallup, Jonathan T. Barron, Janne Kontkanen, Noah Snavely, Carlos Hernandez Esteban, Sameer Agarwal, and Steven M. Seitz. 2016. Jump: Virtual Reality Video. ACM Transactions on Graphics 35, 6 (2016). Google Scholar
Digital Library
- Jonathan T. Barron and Jitendra Malik. 2015. Shape, Illumination, and Reflectance from Shading. IEEE Trans. Pattern Anal. Mach. Intell. 37, 8 (2015), 1670--1687.Google Scholar
Digital Library
- Frederic Besse, Carsten Rother, Andrew Fitzgibbon, and Jan Kautz. 2014. PMBP: Patch-Match Belief Propagation for Correspondence Field Estimation. Int. J. Comput. Vision 110, 1 (2014), 2--13. Google Scholar
Digital Library
- Aaron F. Bobick and Stephen S. Intille. 1999. Large Occlusion Stereo. International Journal of Computer Vision 33, 3 (1999), 181--200. Google Scholar
Digital Library
- Chris Buehler, Michael Bosse, Leonard McMillan, Steven Gortler, and Michael Cohen. 2001. Unstructured Lumigraph Rendering. (2001), 425--432. Google Scholar
Digital Library
- Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth Synthesis and Local Warps for Plausible Image-based Navigation. ACM Trans. Graph. 32, 3 (2013), 30:1--30:12. Google Scholar
Digital Library
- Robert T. Collins. 1996. A space-sweep approach to true multi-image matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1996). 358--363. Google Scholar
Digital Library
- Paul Debevec, Chris Tchou, Andrew Gardner, Tim Hawkins, Charis Poullis, Jessi Stumpfel, Andrew Jones, Nathaniel Yun, Per Einarsson, Therese Lundgren, Marcos Fajardo, and Philippe Martinez. 2004. Estimating Surface Reflectance Properties of a Complex Scene under Captured Natural Illumination. ICT Technical Report ICT TR 06 2004 (2004).Google Scholar
- Paul E. Debevec, Camillo J. Taylor, and Jitendra Malik. 1996. Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-based Approach. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '96). ACM, New York, NY, USA, 11--20. Google Scholar
Digital Library
- Sylvain Duchêne, Clement Riant, Gaurav Chaurasia, Jorge Lopez-Moreno, Pierre-Yves Laffont, Stefan Popov, Adrien Bousseau, and George Drettakis. 2015. Multi-View Intrinsic Images of Outdoors Scenes with an Application to Relighting. ACM Transactions on Graphics (2015). Google Scholar
Digital Library
- David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image Using a Multi-scale Deep Network. Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS) (2014), 2366--2374. Google Scholar
Digital Library
- Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2016. Direct Sparse Odometry. arXiv:1607.02565 (2016).Google Scholar
- Facebook. 2016. Facebook Surround 360. https://facebook360.fb.com/facebook-surround-360/. (2016). Accessed: 2016-12-26.Google Scholar
- John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to Predict New Views From the World's Imagery. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016).Google Scholar
- Simon Fuhrmann and Michael Goesele. 2014. Floating Scale Surface Reconstruction. ACM Trans. Graph. 33, 4 (2014), article no. 46. Google Scholar
Digital Library
- Simon Fuhrmann, Fabian Langguth, and Michael Goesele. 2014. MVE: A Multi-view Reconstruction Environment. Proceedings of the Eurographics Workshop on Graphics and Cultural Heritage (GCH '14) (2014), 11--18. Google Scholar
Digital Library
- Yasutaka Furukawa and Carlos Hernández. 2015. Multi-View Stereo: A Tutorial. Foundations and Trends. in Computer Graphics and Vision 9, 1--2 (2015), 1--148. Google Scholar
Digital Library
- Yasutaka Furukawa and Jean Ponce. 2010. Accurate, Dense, and Robust Multiview Stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 32, 8 (2010), 1362--1376. Google Scholar
Digital Library
- Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2015. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion. The IEEE International Conference on Computer Vision (ICCV) (2015). Google Scholar
Digital Library
- Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. CVPR (2017).Google Scholar
- M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz. 2007. Multi-View Stereo for Community Photo Collections. (2007), 1--8.Google Scholar
- Google. 2015. Carboard Camera. https://googleblog.blogspot.com/2015/12/step-inside-your-photos-with-cardboard.html/. (2015). Accessed: 2016-12-26.Google Scholar
- Peter Hedman, Tobias Ritschel, George Drettakis, and Gabriel Brostow. 2016. Scalable Inside-out Image-based Rendering. ACM Trans. Graph. 35, 6 (2016), 231:1--231:11. Google Scholar
Digital Library
- Sunghoon Im, Hyowon Ha, François Rameau, Hae-Gon Jeon, Gyeongmin Choe, and InSo Kweon. 2016. All-Around Depth from Small Motion with a Spherical Panoramic Camera. European Conference on Computer Vision (ECCV '16) (2016), 156--172.Google Scholar
Cross Ref
- Hiroshi Ishiguro, Masashi Yamamoto, and Saburo Tsuji. 1990. Omni-directional stereo for making global map. In Third International Conference on Computer Vision. IEEE, 540--547.Google Scholar
Cross Ref
- Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. 2011. KinectFusion: Real-time 3D Reconstruction and Interaction Using a Moving Depth Camera. Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (2011), 559--568. Google Scholar
Digital Library
- Michal Jancosek and Tomas Pajdla. 2011. Multi-view Reconstruction Preserving Weakly-supported Surfaces. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011) (2011), 3121--3128. Google Scholar
Digital Library
- Kevin Karsch, Varsha Hedau, David Forsyth, and Derek Hoiem. 2011. Rendering Synthetic Objects into Legacy Photographs. ACM Trans. Graph. 30, 6 (2011), 157:1--157:12. Google Scholar
Digital Library
- Michael Kazhdan and Hugues Hoppe. 2013. Screened Poisson Surface Reconstruction. ACM Trans. Graph. 32, 3 (2013), article no. 29. Google Scholar
Digital Library
- Erum Arif Khan, Erik Reinhard, Roland W. Fleming, and Heinrich H. Bülthoff. 2006. Image-based Material Editing. ACM Transactions on Graphics (Proc. SIGGRAPH 2006) 25, 3 (2006), 654--663. Google Scholar
Digital Library
- Vladimir Kolmogorov and Ramin Zabih. 2004. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 2 (2004), 65--81. Google Scholar
Digital Library
- Nikos Komodakis and Georgios Tziritas. 2007. Approximate Labeling via Graph Cuts Based on Linear Programming. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 8 (2007), 1436--1453. Google Scholar
Digital Library
- Johannes Kopf, Michael F. Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint Bilateral Upsampling. ACM Trans. Graph. 26, 3 (2007). Google Scholar
Digital Library
- Johannes Kopf, Fabian Langguth, Daniel Scharstein, Richard Szeliski, and Michael Goesele. 2013. Image-based Rendering in the Gradient Domain. ACM Trans. Graph. 32, 6 (2013), 199:1--199:9. Google Scholar
Digital Library
- Vivek Kwatra, Arno Schödl, Irfan Essa, Greg Turk, and Aaron Bobick. 2003. Graphcut Textures: Image and Video Synthesis Using Graph Cuts. ACM Trans. Graph. 22, 3 (2003), 277--286. Google Scholar
Digital Library
- Fabian Langguth, Kalyan Sunkavalli, Sunil Hadap, and Michael Goesele. 2016. Shading-aware Multi-view Stereo. Proceedings of the European Conference on Computer Vision (ECCV) (2016).Google Scholar
Cross Ref
- Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization Using Optimization. ACM Trans. Graph. 23, 3 (2004), 689--694. Google Scholar
Digital Library
- Kaimo Lin, Nianjuan Jiang, Loong-Fah Cheong, Minh N. Do, and Jiangbo Lu. 2016. SEAGULL: Seam-Guided Local Alignment for Parallax-Tolerant Image Stitching. 14th European Conference on Computer Vision (ECCV) (2016), 370--385.Google Scholar
- Sheng-Jie Luo, I-Chao Shen, Bing-Yu Chen, Wen-Huang Cheng, and Yung-Yu Chuang. 2012. Perspective-aware Warping for Seamless Stereoscopic Image Cloning. ACM Trans. Graph. 31, 6 (2012), article no. 182. Google Scholar
Digital Library
- Ziyang Ma, Kaiming He, Yichen Wei, Jian Sun, and Enhua Wu. 2013. Constant Time Weighted Median Filtering for Stereo Matching and Beyond. In IEEE International Conference on Computer Vision (ICCV 2013). 49--56. Google Scholar
Digital Library
- Raúl Mur-Artal and Juan D. Tardós. 2016. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras. arXiv preprint arXiv:1610.06475 (2016).Google Scholar
- OpenMVS. 2016. OpenMVS: open Multi-View Stereo reconstruction library. https://github.com/cdcseacave/openMVS. (2016). Accessed: 2016-12-26.Google Scholar
- Shmuel Peleg and Moshe Ben-Ezra. 1999. Stereo panorama with a single camera. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1999) (1999), 395--401.Google Scholar
Cross Ref
- Shmuel Peleg, Moshe Ben-Ezra, and Yael Pritch. 2001. Omnistereo: panoramic stereo imaging. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 3 (2001), 279--290. Google Scholar
Digital Library
- Realities. 2017. realities.io | Go Places. http://realities.io/. (2017). Accessed: 2017-1-12.Google Scholar
- Christoph Rhemann, Asmaa Hosni, Michael Bleyer, Carsten Rother, and Margit Gelautz. 2011. Fast cost-volume filtering for visual correspondence and beyond. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011). 3017--3024. Google Scholar
Digital Library
- Christian Richardt, Yael Pritch, Henning Zimmer, and Alexander Sorkine-Hornung. 2013. Megastereo: Constructing High-Resolution Stereo Panoramas. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2013) (2013), 1256--1263. Google Scholar
Digital Library
- Daniel Scharstein and Richard Szeliski. 2002. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. International Journal of Computer Vision 47, 1--3 (2002), 7--42. Google Scholar
Digital Library
- Frank Schmitt and Lutz Priese. 2009. Sky detection in CSC-segmented color images. International Conference on Computer Vision Theory and Applications (VISAPP 2009) (2009), 101--106.Google Scholar
- Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. 2016. Pixelwise View Selection for Unstructured Multi-View Stereo. European Conference on Computer Vision (ECCV) (2016).Google Scholar
- Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 1. IEEE, 519--528. Google Scholar
Digital Library
- Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. 1998. Layered Depth Images. Proceedings of SIGGRAPH '98 (1998), 231--242. Google Scholar
Digital Library
- Harry Shumand RickSzeliski. 1998. Construction and refinement of panoramic mosaics with global and local alignment. Sixth International Conference on Computer Vision (ICCV '98) (1998), 953--958. Google Scholar
Digital Library
- Richard Szeliski. 2006. Image Alignment and Stitching: A Tutorial. Found. Trends. Comput. Graph. Vis. 2, 1 (2006), 1--104. Google Scholar
Digital Library
- Jayant Thatte, Jean-Baptiste Boin, Haricharan Lakshman, and Bernd Girod. 2016. Depth augmented stereo panorama for cinematic virtual reality with head-motion parallax. 2016 IEEE International Conference on Multimedia and Expo (ICME) (2016).Google Scholar
Cross Ref
- Benjamin Ummenhofer and Thomas Brox. 2015. Global, Dense Multiscale Reconstruction for a Billion Points. IEEE International Conference on Computer Vision (ICCV) (2015). Google Scholar
Digital Library
- Benjamin Ummenhofer, Huizhong Zhou, Jonas Uhrig, Nikolaus Mayer, Eddy Ilg, Alexey Dosovitskiy, and Thomas Brox. 2017. DeMoN:Depth and Motion Network for Learning Monocular Stereo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).Google Scholar
- Valve. 2016. Valve Developer Community: Advanced Outdoors Photogrammetry. https://developer.valvesoftware.com/wiki/Destinations/Advanced_Outdoors_Photogrammetry. (2016). Accessed: 2016-11-3.Google Scholar
- George Vogiatzis, Carlos Hernández Esteban, Philip H. S. Torr, and Roberto Cipolla. 2007. Multiview Stereo via Volumetric Graph-Cuts and Occlusion Robust Photo-Consistency. IEEE Trans. Pattern Anal. Mach. Intell. 29, 12 (2007), 2241--2246. Google Scholar
Digital Library
- Michael Waechter, Mate Beljan, Simon Fuhrmann, Nils Moehrle, Johannes Kopf, and Michael Goesele. 2017. Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction. ACM Trans. Graph. 36, 1 (2017), article no. 8. Google Scholar
Digital Library
- Michael Waechter, Nils Moehrle, and Michael Goesele. 2014. Let There Be Color! Large-Scale Texturing of 3D Reconstructions. ECCV 2014 8693 (2014), 836--850.Google Scholar
Cross Ref
- Katja Wolff, Changil Kim, Henning Zimmer, Christopher Schroers, Mario Botsch, Olga Sorkine-Hornung, and Alexander Sorkine-Hornung. 2016. Point Cloud Noise and Outlier Removal for Image-Based 3D Reconstruction. In International Conference on 3D Vision (3DV 2016). 118--127.Google Scholar
Cross Ref
- Chenglei Wu, Bennet Wilburn, Yasuyuki Matsushita, and Christian Theobalt. 2011. High-quality Shape from Multi-view Stereo and Shading Under General Illumination. IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11) (2011), 969--976. Google Scholar
Digital Library
- Kuk-Jin Yoon and In-So Kweon. 2005. Locally adaptive support-weight approach for visual correspondence search. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), Vol. 2. 924--931. Google Scholar
Digital Library
- Julio Zaragoza, Tat-Jun Chin, Michael S. Brown, and David Suter. 2013. As-Projective-As-Possible Image Stitching with Moving DLT. Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (2013), 2339--2346. Google Scholar
Digital Library
- Fan Zhang and Feng Liu. 2014. Parallax-Tolerant Image Stitching. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), 3262--3269. Google Scholar
Digital Library
- Fan Zhang and Feng Liu. 2015. Casual Stereoscopic Panorama Stitching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15) (2015), 2002--2010.Google Scholar
- Ke Colin Zheng, Sing Bing Kang, Michael F. Cohen, and Richard Szeliski. 2007. Layered Depth Panoramas. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2007) (2007), 1--8.Google Scholar
- C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality Video View Interpolation Using a Layered Representation. ACM Trans. Graph. (Proc. SIGGRAPH 2004) 23, 3 (2004), 600--608. Google Scholar
Digital Library
Index Terms
Casual 3D photography
Recommendations
Instant 3D photography
We present an algorithm for constructing 3D panoramas from a sequence of aligned color-and-depth image pairs. Such sequences can be conveniently captured using dual lens cell phone cameras that reconstruct depth maps from synchronized stereo image ...
Visual Modeling with a Hand-Held Camera
In this paper a complete system to build visual models from camera images is presented. The system can deal with uncalibrated image sequences acquired with a hand-held camera. Based on tracked or matched features the relations between multiple views are ...
Surface light fields for 3D photography
SIGGRAPH '00: Proceedings of the 27th annual conference on Computer graphics and interactive techniquesA surface light field is a function that assigns a color to each ray originating on a surface. Surface light fields are well suited to constructing virtual images of shiny objects under complex lighting conditions. This paper presents a framework for ...





Comments