skip to main content
research-article
Open Access

Stereo magnification: learning view synthesis using multiplane images

Published:30 July 2018Publication History
Skip Abstract Section

Abstract

The view synthesis problem---generating novel views of a scene from known imagery---has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.

Skip Supplemental Material Section

Supplemental Material

a65-zhou.mp4
065-276.mp4

References

  1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sameer Agarwal, Keir Mierle, and Others. 2016. Ceres Solver, http://ceres-solver.org. (2016).Google ScholarGoogle Scholar
  3. Apple. 2016. Portrait mode now available on iPhone 7 Plus with iOS 10.1. https://www.apple.com/newsroom/2016/10/portrait-mode-now-available-on-iphone-7-plus-with-ios-101/. (2016).Google ScholarGoogle Scholar
  4. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv.1607.06450 (2016).Google ScholarGoogle Scholar
  5. Alexandre Chapiro, Simon Heinzle, Tunç Ozan Aydin, Steven Poulakos, Matthias Zwicker, Aljosa Smolic, and Markus Gross. 2014. Optimizing stereo-to-multiview conversion for autostereoscopic displays. In Computer graphics forum. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2018. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. on Pattern Analysis and Machine Intelligence 40, 4 (2018).Google ScholarGoogle ScholarCross RefCross Ref
  7. Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In ICCV.Google ScholarGoogle Scholar
  8. Shenchang Eric Chen and Lance Williams. 1993. View Interpolation for Image Synthesis. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paul E Debevec, Camillo J Taylor, and Jitendra Malik. 1996. Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Piotr Didyk, Pitchaya Sitthi-Amorn, William Freeman, Frédo Durand, and Wojciech Matusik. 2013. Joint view expansion and filtering for automultiscopic 3D displays. In Proc. SIGGRAPH.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jakob Engel, Vladlen Koltun, and Daniel Cremers. 2018. Direct sparse odometry. IEEE Trans. on Pattern Analysis and Machine Intelligence 40, 3 (2018).Google ScholarGoogle ScholarCross RefCross Ref
  13. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to Predict New Views From the World's Imagery. In CVPR.Google ScholarGoogle Scholar
  14. Christian Forster, Matia Pizzoli, and Davide Scaramuzza. 2014. SVO: Fast Semi-Direct Monocular Visual Odometry. In ICRA.Google ScholarGoogle Scholar
  15. Ravi Garg and Ian Reid. 2016. Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue. In ECCV.Google ScholarGoogle Scholar
  16. Clément Godard, Oisin Mac Aodha, and Gabriel J. Brostow. 2017. Unsupervised Monocular Depth Estimation with Left-Right Consistency. In CVPR.Google ScholarGoogle Scholar
  17. Google. 2017a. Introducing VR180 cameras, https://vr.google.com/vr180/. (2017).Google ScholarGoogle Scholar
  18. Google. 2017b. Portrait mode on the Pixel 2 and Pixel 2 XL smartphones. https://research.googleblog.com/2017/10/portrait-mode-on-pixel-2-and-pixel-2-xl.html. (2017).Google ScholarGoogle Scholar
  19. Steven J. Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F. Cohen. 1996. The Lumigraph. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hyowon Ha, Sunghoon Im, Jaesik Park, Hae-Gon Jeon, and In So Kweon. 2016. High-quality Depth from Uncalibrated Small Motion Clip. In CVPR.Google ScholarGoogle Scholar
  21. Richard Hartley and Andrew Zisserman. 2003. Multiple View Geometry in Computer Vision. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Samuel W Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. In Proc. SIGGRAPH Asia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Peter Hedman, Suhib Alsisan, Richard Szeliski, and Johannes Kopf. 2017. Casual 3D Photography. In Proc. SIGGRAPH Asia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michael Holroyd, Ilya Baran, Jason Lawrence, and Wojciech Matusik. 2011. Computing and fabricating multilayer models. In Proc. SIGGRAPH Asia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. 2015. Spatial transformer networks. In NIPS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV.Google ScholarGoogle Scholar
  27. Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-Based View Synthesis for Light Field Cameras. In Proc. SIGGRAPH Asia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Petr Kellnhofer, Piotr Didyk, Szu-Po Wang, Pitchaya Sitthi-Amorn, William Freeman, Fredo Durand, and Wojciech Matusik. 2017. 3DTV at Home: Eulerian-Lagrangian Stereo-to-Multiview Conversion. In Proc. SIGGRAPH.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  30. Marc Levoy and Pat Hanrahan. 1996. Light Field Rendering. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ziwei Liu, Raymond Yeh, Xiaoou Tang, Yiming Liu, and Aseem Agarwala. 2017. Video Frame Synthesis Using Deep Voxel Flow. In ICCV.Google ScholarGoogle Scholar
  32. Lytro. 2018. Lytro. https://www.lytro.com/. (2018).Google ScholarGoogle Scholar
  33. Montiel J. M. M. Mur-Artal, Raúl and Juan D. Tardós. 2015. ORB-SLAM: a Versatile and Accurate Monocular SLAM System. IEEE Trans. on Robotics 31, 5 (2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Eric Penner and Li Zhang. 2017. Soft 3D Reconstruction for View Synthesis. In Proc. SIGGRAPH Asia.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thomas Porter and Tom Duff. 1984. Compositing Digital Images. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Christian Riechert, Frederik Zilly, Peter Kauff, Jens Güther, and Ralf Schäfer. 2012. Fully automatic stereo-to-multiview conversion in autostereoscopic displays. The Best of IET and IBC 4 (09 2012).Google ScholarGoogle Scholar
  37. Johannes Lutz Schönberger and Jan-Michael Frahm. 2016. Structure-from-Motion Revisited. In CVPR.Google ScholarGoogle Scholar
  38. Jonathan Shade, Steven Gortler, Li-wei He, and Richard Szeliski. 1998. Layered depth images. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).Google ScholarGoogle Scholar
  40. Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to Synthesize a 4D RGBD Light Field from a Single Image. In ICCV.Google ScholarGoogle Scholar
  41. Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017. Multi-view Supervision for Single-view Reconstruction via Differentiable Ray Consistency. In CVPR.Google ScholarGoogle Scholar
  42. Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017).Google ScholarGoogle Scholar
  43. John YA Wang and Edward H Adelson. 1994. Representing moving images with layers. IEEE Trans. on Image Processing 3, 5 (1994). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Zhou Wang, Alan Bovik, Hamid Sheikh, and Eero Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Trans. on Image Processing 13, 4 (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sven Wanner, Stephan Meister, and Bastian Goldluecke. 2013. Datasets and benchmarks for densely sampled 4d light fields. In VMV.Google ScholarGoogle Scholar
  46. G. Wetzstein, D. Lanman, W Heidrich, and R. Raskar. 2011. Layered 3D: Tomographic Image Synthesis for Attenuation-based Light Field and High Dynamic Range Displays. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wikipedia. 2017. Multiplane camera. https://en.wikipedia.org/wiki/Multiplane_camera. (2017).Google ScholarGoogle Scholar
  48. Junyuan Xie, Ross B. Girshick, and Ali Farhadi. 2016. Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks. In ECCV.Google ScholarGoogle Scholar
  49. Fisher Yu and David Gallup. 2014. 3D Reconstruction from Accidental Motion. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. In ICLR.Google ScholarGoogle Scholar
  51. Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Networks as a Perceptual Metric. In CVPR.Google ScholarGoogle Scholar
  52. Zhoutong Zhang, Yebin Liu, and Qionghai Dai. 2015. Light field from micro-baseline image pair. In CVPR.Google ScholarGoogle Scholar
  53. Tinghui Zhou, Matthew Brown, Noah Snavely, and David Lowe. 2017. Unsupervised learning of depth and ego-motion from video. In CVPR.Google ScholarGoogle Scholar
  54. Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A Efros. 2016. View synthesis by appearance flow. In ECCV.Google ScholarGoogle Scholar
  55. C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality Video View Interpolation Using a Layered Representation. In Proc. SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Stereo magnification: learning view synthesis using multiplane images

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Graphics
            ACM Transactions on Graphics  Volume 37, Issue 4
            August 2018
            1670 pages
            ISSN:0730-0301
            EISSN:1557-7368
            DOI:10.1145/3197517
            Issue’s Table of Contents

            Copyright © 2018 Owner/Author

            Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 July 2018
            Published in tog Volume 37, Issue 4

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader