skip to main content
research-article

LFGAN: 4D Light Field Synthesis from a Single RGB Image

Authors Info & Claims
Published:17 February 2020Publication History
Skip Abstract Section

Abstract

We present a deep neural network called the light field generative adversarial network (LFGAN) that synthesizes a 4D light field from a single 2D RGB image. We generate light fields using a single image super-resolution (SISR) technique based on two important observations. First, the small baseline gives rise to the high similarity between the full light field image and each sub-aperture view. Second, the occlusion edge at any spatial coordinate of a sub-aperture view has the same orientation as the occlusion edge at the corresponding angular patch, implying that the occlusion information in the angular domain can be inferred from the sub-aperture local information. We employ the Wasserstein GAN with gradient penalty (WGAN-GP) to learn the color and geometry information from the light field datasets. The network can generate a plausible 4D light field comprising 8×8 angular views from a single sub-aperture 2D image. We propose new loss terms, namely epipolar plane image (EPI) and brightness regularization (BRI) losses, as well as a novel multi-stage training framework to feed the loss terms at different time to generate superior light fields. The EPI loss can reinforce the network to learn the geometric features of the light fields, and the BRI loss can preserve the brightness consistency across different sub-aperture views. Two datasets have been used to evaluate our method: in addition to an existing light field dataset capturing scenes of flowers and plants, we have built a large dataset of toy animals consisting of 2,100 light fields captured with a plenoptic camera. We have performed comprehensive ablation studies to evaluate the effects of individual loss terms and the multi-stage training strategy, and have compared LFGAN to other state-of-the-art techniques. Qualitative and quantitative evaluation demonstrates that LFGAN can effectively estimate complex occlusions and geometry in challenging scenes, and outperform other existing techniques.

Skip Supplemental Material Section

Supplemental Material

References

  1. Martin Arjovsky and Léon Bottou. 2017. Towards principled methods for training generative adversarial networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  2. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. 214--223.Google ScholarGoogle Scholar
  3. Tom E. Bishop and Paolo Favaro. 2012. The light field camera: Extended depth of field, aliasing, and superresolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 972--986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tom E. Bishop, Sara Zanetti, and Paolo Favaro. 2009. Light field superresolution. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’09). IEEE, Los Alamitos, CA, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  5. Robert C. Bolles, H. Harlyn Baker, and David H. Marimont. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. International Journal of Computer Vision 1, 1 (1987), 7--55.Google ScholarGoogle ScholarCross RefCross Ref
  6. Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv:1512.03012.Google ScholarGoogle Scholar
  7. Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics 32, 3 (2013), Article 30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: Internet image montage. ACM Transactions on Graphics 28, 5 (2009), Article 124.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems. 1486--1494.Google ScholarGoogle Scholar
  10. Piotr Didyk, Pitchaya Sitthi-Amorn, William Freeman, Frédo Durand, and Wojciech Matusik. 2013. Joint view expansion and filtering for automultiscopic 3D displays. ACM Transactions on Graphics 32, 6 (2013), Article 221.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Alexei A. Efros and William T. Freeman. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 341--346.Google ScholarGoogle Scholar
  12. Darel Rex Finley. 2006. HSP color model—Alternative to HSV (HSB) and HSL. Retrieved January 28, 2020 from https://alienryderflex.com/hsp.htmlGoogle ScholarGoogle Scholar
  13. John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5515--5524.Google ScholarGoogle Scholar
  14. Todor Georgiev and Andrew Lumsdaine. 2009. Superresolution with Plenoptic Camera 2.0. Technical Report. Adobe Systems Incorporated.Google ScholarGoogle Scholar
  15. Todor Georgiev, Ke Colin Zheng, Brian Curless, David Salesin, Shree K. Nayar, and Chintan Intwala. 2006. Spatio-angular resolution tradeoffs in integral photography. Rendering Techniques 2006 (2006), 263--272.Google ScholarGoogle Scholar
  16. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.Google ScholarGoogle Scholar
  17. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems. 5769--5779.Google ScholarGoogle Scholar
  18. Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2458--2467.Google ScholarGoogle ScholarCross RefCross Ref
  19. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 5967--5976.Google ScholarGoogle Scholar
  20. Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics 35, 6 (2016), Article 193.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196.Google ScholarGoogle Scholar
  22. Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  23. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4681--4690.Google ScholarGoogle ScholarCross RefCross Ref
  24. Anat Levin and Fredo Durand. 2010. Linear view synthesis using a dimensionality gap light field prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1831--1838.Google ScholarGoogle ScholarCross RefCross Ref
  25. Anat Levin, William T. Freeman, and Frédo Durand. 2008. Understanding camera trade-offs through a Bayesian analysis of light field projections. In Proceedings of the European Conference on Computer Vision. 88--101.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Anat Levin, Samuel W. Hasinoff, Paul Green, Frédo Durand, and William T. Freeman. 2009. 4D frequency analysis of computational cameras for depth of field extension. ACM Transactions on Graphics 28, 3 (2009), Article 97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 31--42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andrew Lumsdaine and Todor Georgiev. 2009. The focused plenoptic camera. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’09). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  29. Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2813--2821.Google ScholarGoogle ScholarCross RefCross Ref
  30. Luke Metz, Ben Poole, David Pfau, and Jascha Sohl-Dickstein. 2016. Unrolled generative adversarial networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  31. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784.Google ScholarGoogle Scholar
  32. Kaushik Mitra and Ashok Veeraraghavan. 2012. Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, Los Alamitos, CA, 22--28.Google ScholarGoogle ScholarCross RefCross Ref
  33. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  34. Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. 1060--1069.Google ScholarGoogle Scholar
  35. Scott E. Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning what and where to draw. In Advances in Neural Information Processing Systems. 217--225.Google ScholarGoogle Scholar
  36. Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling deep image synthesis with sketch and color. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 2.Google ScholarGoogle ScholarCross RefCross Ref
  37. Lixin Shi, Haitham Hassanieh, Abe Davis, Dina Katabi, and Fredo Durand. 2014. Light field reconstruction using sparsity in the continuous Fourier domain. ACM Transactions on Graphics 34, 1 (2014), Article 12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  39. Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to synthesize a 4D RGBD light field from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Vol. 2. 6.Google ScholarGoogle ScholarCross RefCross Ref
  40. Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. 2016. Multi-view 3D models from single images with a convolutional network. In Proceedings of the European Conference on Computer Vision. 322--337.Google ScholarGoogle ScholarCross RefCross Ref
  41. Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 1. 3.Google ScholarGoogle ScholarCross RefCross Ref
  42. Suren Vagharshakyan, Robert Bregovic, and Atanas Gotchev. 2017. Accelerated Shearlet-domain light field reconstruction. IEEE Journal of Selected Topics in Signal Processing 11, 7 (2017), 1082--1091.Google ScholarGoogle ScholarCross RefCross Ref
  43. Suren Vagharshakyan, Robert Bregovic, and Atanas Gotchev. 2018. Light field reconstruction using Shearlet transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 133--147.Google ScholarGoogle ScholarCross RefCross Ref
  44. Ting-Chun Wang, Alexei A. Efros, and Ravi Ramamoorthi. 2015. Occlusion-aware depth estimation using light-field cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, Los Alamitos, CA, 3487--3495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Sven Wanner and Bastian Goldluecke. 2012. Spatial and angular variational super-resolution of 4D light fields. In Proceedings of the European Conference on Computer Vision. 608--621.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sven Wanner and Bastian Goldluecke. 2014. Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 606--619.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Gaochang Wu, Mandan Zhao, Liangyong Wang, Qionghai Dai, Tianyou Chai, and Yebin Liu. 2017. Light field reconstruction using deep convolutional network on EPI. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6319--6327.Google ScholarGoogle ScholarCross RefCross Ref
  48. Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling deep image synthesis with texture patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  49. Jimei Yang, Scott E. Reed, Ming-Hsuan Yang, and Honglak Lee. 2015. Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In Advances in Neural Information Processing Systems. 1099--1107.Google ScholarGoogle Scholar
  50. Youngjin Yoon, Hae-Gon Jeon, Donggeun Yoo, Joon-Young Lee, and In So Kweon. 2015. Learning a deep convolutional network for light-field image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 24--32.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 5907--5915.Google ScholarGoogle ScholarCross RefCross Ref
  52. Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful image colorization. In Proceedings of the European Conference on Computer Vision. 649--666.Google ScholarGoogle Scholar
  53. Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. 2017. Multi-view image generation from a single-view. arXiv:1704.04886.Google ScholarGoogle Scholar
  54. Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A. Efros. 2016. View synthesis by appearance flow. In Proceedings of the European Conference on Computer Vision. 286--301.Google ScholarGoogle Scholar
  55. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2242--2251.Google ScholarGoogle Scholar

Index Terms

  1. LFGAN: 4D Light Field Synthesis from a Single RGB Image

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!