Abstract
We present an end-to-end system for real-time environment capture, 3D reconstruction, and stereoscopic view synthesis on a mobile VR headset. Our solution allows the user to use the cameras on their VR headset as their eyes to see and interact with the real world while still wearing their headset, a feature often referred to as Passthrough. The central challenge when building such a system is the choice and implementation of algorithms under the strict compute, power, and performance constraints imposed by the target user experience and mobile platform. A key contribution of this paper is a complete description of a corresponding system that performs temporally stable passthrough rendering at 72 Hz with only 200 mW power consumption on a mobile Snapdragon 835 platform. Our algorithmic contributions for enabling this performance include the computation of a coarse 3D scene proxy on the embedded video encoding hardware, followed by a depth densification and filtering step, and finally stereoscopic texturing and spatio-temporal up-sampling. We provide a detailed discussion and evaluation of the challenges we encountered, as well as algorithm and performance trade-offs in terms of compute and resulting passthrough quality.;[email protected] described system is available to users as the Passthrough+ feature on Oculus Quest. We believe that by publishing the underlying system and methods, we provide valuable insights to the community on how to design and implement real-time environment sensing and rendering on heavily resource constrained hardware.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Passthrough+: Real-time Stereoscopic View Synthesis for Mobile Mixed Reality
- Luca Ballan, Gabriel J. Brostow, Jens Puwein, and Marc Pollefeys. 2010. Unstructured Video-Based Rendering: Interactive Exploration of Casually Captured Videos. ACM Trans. Graph. (Proc. SIGGRAPH) 29, Article 87 (July 2010), 11 pages. Issue 4. https://doi.org/10.1145/1778765.1778824Google Scholar
Digital Library
- Jonathan T. Barron and Ben Poole. 2016. The Fast Bilateral Solver. In The European Conference on Computer Vision (ECCV).Google Scholar
- Michael Bleyer, Christoph Rhemann, and Carsten Rother. 2011. PatchMatch Stereo - Stereo Matching with Slanted Support Windows. In British Machine Vision Conference (BMVC 2011).Google Scholar
- Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth Synthesis and Local Warps for Plausible Image-based Navigation. ACM Trans. Graph. 32, 3, Article 30 (June 2013), 12 pages. https://doi.org/10.1145/2487228.2487238Google Scholar
Digital Library
- Gaurav Chaurasia, Olga Sorkine, and George Drettakis. 2011. Silhouette-Aware Warping for Image-Based Rendering. Comput. Graph. Forum (Proc. EGSR) 30, 4 (2011), 1223--1232. https://doi.org/10.1111/j.1467--8659.2011.01981.xGoogle Scholar
Digital Library
- S. Chen and L. Williams. 1993. View Interpolation for Image Synthesis. In ACM SIGGRAPH 1993 Conference Proceedings. 279--288.Google Scholar
- S. E. Chen. 1995. QuickTime VR - An Image-Based Approach to Virtual Environment Navigation. In ACM SIGGRAPH 1995 Conference Proceedings. 29--38.Google Scholar
- J. Davis, R. Ramamoorthi, and S. Rusinkiewicz. 2003. Spacetime stereo: a unifying framework for depth from triangulation. In CVPR, Vol. 2. II-359. https://doi.org/10.1109/CVPR.2003.121 1491Google Scholar
- Matías Di Martino and Gabriele Facciolo. 2018. An Analysis and Implementation of Multigrid Poisson Solvers With Verified Linear Complexity. Image Processing On Line 8 (2018), 192--218. https://doi.org/10.5201/ipol.2018.228Google Scholar
Cross Ref
- Sean Ryan Fanello, Julien Valentin, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, Carlo Ciliberto, Philip Davidson, and Shahram Izadi. 2017. Low Compute and Fully Parallel Computer Vision With HashMatch. In The IEEE International Conference on Computer Vision (ICCV).Google Scholar
- Michael Goesele, Jens Ackermann, Simon Fuhrmann, Carsten Haubold, Ronny Klowsky, Drew Steedly, and Richard Szeliski. 2010. Ambient Point Clouds for View Interpolation. ACM Trans. Graph. 29, 4, Article 95 (July 2010), 6 pages. https://doi.org/10.1145/1778765.1778832Google Scholar
Digital Library
- Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Trans. Graph. 36, 6, Article 234 (Nov. 2017), 15 pages. https://doi.org/10.1145/3130800.3130828Google Scholar
Digital Library
- Peter Hedman and Johannes Kopf. 2018. Instant 3D Photography. ACM Trans. Graph. 37, 4, Article 101 (July 2018), 12 pages. https://doi.org/10.1145/3197517.3201384Google Scholar
Digital Library
- Heiko Hirschmüller. 2008. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 2 (February 2008), 328--341.Google Scholar
Digital Library
- H. Hirschmüller, M. Buder, and I. Ernst. 2012. Memory Efficient Semi-Global Matching. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences I3 (Jul 2012), 371--376. https://doi.org/10.5194/isprsannals-I-3-371-2012Google Scholar
- Aleksander Holynski and Johannes Kopf. 2018. Fast Depth Densification for Occlusion-aware Augmented Reality. ACM Trans. Graph. 37, 6, Article 194 (Dec. 2018), 11 pages. https://doi.org/10.1145/3272127.3275083Google Scholar
- Alexander Hornung and Leif Kobbelt. 2009. Interactive Pixel-Accurate Free Viewpoint Rendering from Images with Silhouette Aware Sampling. Comput. Graph. Forum 28, 8 (2009), 2090--2103. https://doi.org/10.1111/j.1467--8659.2009.01416.xGoogle Scholar
Cross Ref
- M. Jakubowski and G. Pastuszak. 2013. Block-based motion estimation algorithms --- a survey. Opto-Electronics Review 21, 1 (01 Mar 2013), 86--102. https://doi.org/10.2478/s11772-013-0071-0Google Scholar
Cross Ref
- Takeo Kanade, Atsushi Yoshida, Kazuo Oda, Hiroshi Kano, and Masaya Tanaka. 1996. A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). 196--202.Google Scholar
Cross Ref
- Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization using optimization. In Proc. SIGGRAPH. 689--694. https://doi.org/10.1145/1186562.1015780Google Scholar
Digital Library
- Christian Lipski, Christian Linz, Kai Berger, Anita Sellent, and Marcus Magnor. 2010. Virtual Video Camera: Image-Based Viewpoint Navigation Through Space and Time. Comput. Graph. Forum 29, 8 (2010), 2555--2568. https://doi.org/10.1111/j.1467--8659.2010.01824.xGoogle Scholar
Cross Ref
- Amrita Mazumdar, Armin Alaghi, Jonathan T Barron, David Gallup, Luis Ceze, Mark Oskin, and Steven M Seitz. 2017. A hardware-friendly bilateral solver for real-time virtual reality video. In Proceedings of High Performance Graphics. ACM, 13.Google Scholar
Digital Library
- L. McMillan and G. Bishop. 1995. Plenoptic Modeling: An Image-Based Rendering System. In ACM SIGGRAPH 1995 Conference Proceedings. 39--46.Google Scholar
- Harris Nover, Supreeth Achar, and Dan Goldman. 2018. ESPReSSo: Efficient Slanted PatchMatch for Real-time Spacetime Stereo. In International Conference on 3D Vision.Google Scholar
- J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi. 2004. Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine 4, 1 (2004), 7--28. https://doi.org/10.1109/MCAS.2004.1286980Google Scholar
Cross Ref
- Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Trans. Graph. (Proc. SIGGRAPH) 22, 3 (July 2003), 313--318. https://doi.org/10.1145/882262.882269Google Scholar
Digital Library
- Christian Richardt, Douglas Orr, Ian Davies, Antonio Criminisi, and Neil A. Dodgson. 2010. Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid. In Proceedings of the European Conference on Computer Vision (ECCV) (Lecture Notes in Computer Science), Vol. 6313. 510--523. https://doi.org/10.1007/978-3-642-15558-1_37Google Scholar
- Daniel Scharstein and Richard Szeliski. 2002. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. International Journal of Computer Vision 47, 1 (May 2002), 7--42. http://vision.middlebury.edu/stereo/Google Scholar
Digital Library
- H.-Y.. Shum, S.-C. Chan, and S. B. Kang. 2007. Image-Based Rendering. Springer, New York, NY.Google Scholar
- Timo Stich, Christian Linz, Georgia Albuquerque, and Marcus Magnor. 2008. View and Time Interpolation in Image Space. Comput. Graph. Forum 27, 7 (2008), 1781--1787.Google Scholar
Cross Ref
- Richard Szeliski. 2006. Locally Adapted Hierarchical Basis Preconditioning. ACM Trans. Graph. 25, 3 (July 2006), 1135--1143. https://doi.org/10.1145/1141911.1142005Google Scholar
Digital Library
- Richard Szeliski. 2010. Computer Vision: Algorithms and Applications (1st ed.). Springer, New York. http://szeliski.org/BookGoogle Scholar
- Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal Wadhwa, Max Dzitsiuk, Michael Schoenberg, Vivek Verma, Ambrus Csaszar, Eric Turner, Ivan Dryanovski, Joao Afonso, Jose Pascoal, Konstantine Tsotsos, Mira Leung, Mirko Schmidt, Onur Guleryuz, Sameh Khamis, Vladimir Tankovitch, Sean Fanello, Shahram Izadi, and Christoph Rhemann. 2018. Depth from Motion for Smartphone AR. ACM Trans. Graph. 37, 6, Article 193 (Dec. 2018), 19 pages. https://doi.org/10.1145/3272127.3275041Google Scholar
Digital Library
Index Terms
Passthrough+: Real-time Stereoscopic View Synthesis for Mobile Mixed Reality
Recommendations
Visuo-Haptic Mixed Reality with Unobstructed Tool-Hand Integration
Visuo-haptic mixed reality consists of adding to a real scene the ability to see and touch virtual objects. It requires the use of see-through display technology for visually mixing real and virtual objects, and haptic devices for adding haptic ...
Mixed reality in virtual world teleconferencing
VR '10: Proceedings of the 2010 IEEE Virtual Reality ConferenceIn this paper we present a Mixed Reality (MR) teleconferencing application based on Second Life (SL) and the OpenSim virtual world. Augmented Reality (AR) techniques are used for displaying virtual avatars of remote meeting participants in real physical ...
Through Tinted Eyeglasses
EIC Roy Want introduces the special issue on cross-reality environments and discusses alternate realities including virtual reality, augmented reality, embodied virtuality, cross-reality, and mixed reality.






Comments