skip to main content
research-article

Passthrough+: Real-time Stereoscopic View Synthesis for Mobile Mixed Reality

Published:04 May 2020Publication History
Skip Abstract Section

Abstract

We present an end-to-end system for real-time environment capture, 3D reconstruction, and stereoscopic view synthesis on a mobile VR headset. Our solution allows the user to use the cameras on their VR headset as their eyes to see and interact with the real world while still wearing their headset, a feature often referred to as Passthrough. The central challenge when building such a system is the choice and implementation of algorithms under the strict compute, power, and performance constraints imposed by the target user experience and mobile platform. A key contribution of this paper is a complete description of a corresponding system that performs temporally stable passthrough rendering at 72 Hz with only 200 mW power consumption on a mobile Snapdragon 835 platform. Our algorithmic contributions for enabling this performance include the computation of a coarse 3D scene proxy on the embedded video encoding hardware, followed by a depth densification and filtering step, and finally stereoscopic texturing and spatio-temporal up-sampling. We provide a detailed discussion and evaluation of the challenges we encountered, as well as algorithm and performance trade-offs in terms of compute and resulting passthrough quality.;[email protected] described system is available to users as the Passthrough+ feature on Oculus Quest. We believe that by publishing the underlying system and methods, we provide valuable insights to the community on how to design and implement real-time environment sensing and rendering on heavily resource constrained hardware.

Skip Supplemental Material Section

Supplemental Material

References

  1. Luca Ballan, Gabriel J. Brostow, Jens Puwein, and Marc Pollefeys. 2010. Unstructured Video-Based Rendering: Interactive Exploration of Casually Captured Videos. ACM Trans. Graph. (Proc. SIGGRAPH) 29, Article 87 (July 2010), 11 pages. Issue 4. https://doi.org/10.1145/1778765.1778824Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jonathan T. Barron and Ben Poole. 2016. The Fast Bilateral Solver. In The European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  3. Michael Bleyer, Christoph Rhemann, and Carsten Rother. 2011. PatchMatch Stereo - Stereo Matching with Slanted Support Windows. In British Machine Vision Conference (BMVC 2011).Google ScholarGoogle Scholar
  4. Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth Synthesis and Local Warps for Plausible Image-based Navigation. ACM Trans. Graph. 32, 3, Article 30 (June 2013), 12 pages. https://doi.org/10.1145/2487228.2487238Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Gaurav Chaurasia, Olga Sorkine, and George Drettakis. 2011. Silhouette-Aware Warping for Image-Based Rendering. Comput. Graph. Forum (Proc. EGSR) 30, 4 (2011), 1223--1232. https://doi.org/10.1111/j.1467--8659.2011.01981.xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chen and L. Williams. 1993. View Interpolation for Image Synthesis. In ACM SIGGRAPH 1993 Conference Proceedings. 279--288.Google ScholarGoogle Scholar
  7. S. E. Chen. 1995. QuickTime VR - An Image-Based Approach to Virtual Environment Navigation. In ACM SIGGRAPH 1995 Conference Proceedings. 29--38.Google ScholarGoogle Scholar
  8. J. Davis, R. Ramamoorthi, and S. Rusinkiewicz. 2003. Spacetime stereo: a unifying framework for depth from triangulation. In CVPR, Vol. 2. II-359. https://doi.org/10.1109/CVPR.2003.121 1491Google ScholarGoogle Scholar
  9. Matías Di Martino and Gabriele Facciolo. 2018. An Analysis and Implementation of Multigrid Poisson Solvers With Verified Linear Complexity. Image Processing On Line 8 (2018), 192--218. https://doi.org/10.5201/ipol.2018.228Google ScholarGoogle ScholarCross RefCross Ref
  10. Sean Ryan Fanello, Julien Valentin, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, Carlo Ciliberto, Philip Davidson, and Shahram Izadi. 2017. Low Compute and Fully Parallel Computer Vision With HashMatch. In The IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle Scholar
  11. Michael Goesele, Jens Ackermann, Simon Fuhrmann, Carsten Haubold, Ronny Klowsky, Drew Steedly, and Richard Szeliski. 2010. Ambient Point Clouds for View Interpolation. ACM Trans. Graph. 29, 4, Article 95 (July 2010), 6 pages. https://doi.org/10.1145/1778765.1778832Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. ACM Trans. Graph. 36, 6, Article 234 (Nov. 2017), 15 pages. https://doi.org/10.1145/3130800.3130828Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Peter Hedman and Johannes Kopf. 2018. Instant 3D Photography. ACM Trans. Graph. 37, 4, Article 101 (July 2018), 12 pages. https://doi.org/10.1145/3197517.3201384Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Heiko Hirschmüller. 2008. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence 30, 2 (February 2008), 328--341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Hirschmüller, M. Buder, and I. Ernst. 2012. Memory Efficient Semi-Global Matching. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences I3 (Jul 2012), 371--376. https://doi.org/10.5194/isprsannals-I-3-371-2012Google ScholarGoogle Scholar
  16. Aleksander Holynski and Johannes Kopf. 2018. Fast Depth Densification for Occlusion-aware Augmented Reality. ACM Trans. Graph. 37, 6, Article 194 (Dec. 2018), 11 pages. https://doi.org/10.1145/3272127.3275083Google ScholarGoogle Scholar
  17. Alexander Hornung and Leif Kobbelt. 2009. Interactive Pixel-Accurate Free Viewpoint Rendering from Images with Silhouette Aware Sampling. Comput. Graph. Forum 28, 8 (2009), 2090--2103. https://doi.org/10.1111/j.1467--8659.2009.01416.xGoogle ScholarGoogle ScholarCross RefCross Ref
  18. M. Jakubowski and G. Pastuszak. 2013. Block-based motion estimation algorithms --- a survey. Opto-Electronics Review 21, 1 (01 Mar 2013), 86--102. https://doi.org/10.2478/s11772-013-0071-0Google ScholarGoogle ScholarCross RefCross Ref
  19. Takeo Kanade, Atsushi Yoshida, Kazuo Oda, Hiroshi Kano, and Masaya Tanaka. 1996. A Stereo Machine for Video-rate Dense Depth Mapping and Its New Applications. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'96). 196--202.Google ScholarGoogle ScholarCross RefCross Ref
  20. Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization using optimization. In Proc. SIGGRAPH. 689--694. https://doi.org/10.1145/1186562.1015780Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Christian Lipski, Christian Linz, Kai Berger, Anita Sellent, and Marcus Magnor. 2010. Virtual Video Camera: Image-Based Viewpoint Navigation Through Space and Time. Comput. Graph. Forum 29, 8 (2010), 2555--2568. https://doi.org/10.1111/j.1467--8659.2010.01824.xGoogle ScholarGoogle ScholarCross RefCross Ref
  22. Amrita Mazumdar, Armin Alaghi, Jonathan T Barron, David Gallup, Luis Ceze, Mark Oskin, and Steven M Seitz. 2017. A hardware-friendly bilateral solver for real-time virtual reality video. In Proceedings of High Performance Graphics. ACM, 13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. McMillan and G. Bishop. 1995. Plenoptic Modeling: An Image-Based Rendering System. In ACM SIGGRAPH 1995 Conference Proceedings. 39--46.Google ScholarGoogle Scholar
  24. Harris Nover, Supreeth Achar, and Dan Goldman. 2018. ESPReSSo: Efficient Slanted PatchMatch for Real-time Spacetime Stereo. In International Conference on 3D Vision.Google ScholarGoogle Scholar
  25. J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stockhammer, and T. Wedi. 2004. Video coding with H.264/AVC: tools, performance, and complexity. IEEE Circuits and Systems Magazine 4, 1 (2004), 7--28. https://doi.org/10.1109/MCAS.2004.1286980Google ScholarGoogle ScholarCross RefCross Ref
  26. Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Trans. Graph. (Proc. SIGGRAPH) 22, 3 (July 2003), 313--318. https://doi.org/10.1145/882262.882269Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Christian Richardt, Douglas Orr, Ian Davies, Antonio Criminisi, and Neil A. Dodgson. 2010. Real-time Spatiotemporal Stereo Matching Using the Dual-Cross-Bilateral Grid. In Proceedings of the European Conference on Computer Vision (ECCV) (Lecture Notes in Computer Science), Vol. 6313. 510--523. https://doi.org/10.1007/978-3-642-15558-1_37Google ScholarGoogle Scholar
  28. Daniel Scharstein and Richard Szeliski. 2002. A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms. International Journal of Computer Vision 47, 1 (May 2002), 7--42. http://vision.middlebury.edu/stereo/Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H.-Y.. Shum, S.-C. Chan, and S. B. Kang. 2007. Image-Based Rendering. Springer, New York, NY.Google ScholarGoogle Scholar
  30. Timo Stich, Christian Linz, Georgia Albuquerque, and Marcus Magnor. 2008. View and Time Interpolation in Image Space. Comput. Graph. Forum 27, 7 (2008), 1781--1787.Google ScholarGoogle ScholarCross RefCross Ref
  31. Richard Szeliski. 2006. Locally Adapted Hierarchical Basis Preconditioning. ACM Trans. Graph. 25, 3 (July 2006), 1135--1143. https://doi.org/10.1145/1141911.1142005Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Richard Szeliski. 2010. Computer Vision: Algorithms and Applications (1st ed.). Springer, New York. http://szeliski.org/BookGoogle ScholarGoogle Scholar
  33. Julien Valentin, Adarsh Kowdle, Jonathan T. Barron, Neal Wadhwa, Max Dzitsiuk, Michael Schoenberg, Vivek Verma, Ambrus Csaszar, Eric Turner, Ivan Dryanovski, Joao Afonso, Jose Pascoal, Konstantine Tsotsos, Mira Leung, Mirko Schmidt, Onur Guleryuz, Sameh Khamis, Vladimir Tankovitch, Sean Fanello, Shahram Izadi, and Christoph Rhemann. 2018. Depth from Motion for Smartphone AR. ACM Trans. Graph. 37, 6, Article 193 (Dec. 2018), 19 pages. https://doi.org/10.1145/3272127.3275041Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Passthrough+: Real-time Stereoscopic View Synthesis for Mobile Mixed Reality

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!