Abstract
We present a deep neural network called the light field generative adversarial network (LFGAN) that synthesizes a 4D light field from a single 2D RGB image. We generate light fields using a single image super-resolution (SISR) technique based on two important observations. First, the small baseline gives rise to the high similarity between the full light field image and each sub-aperture view. Second, the occlusion edge at any spatial coordinate of a sub-aperture view has the same orientation as the occlusion edge at the corresponding angular patch, implying that the occlusion information in the angular domain can be inferred from the sub-aperture local information. We employ the Wasserstein GAN with gradient penalty (WGAN-GP) to learn the color and geometry information from the light field datasets. The network can generate a plausible 4D light field comprising 8×8 angular views from a single sub-aperture 2D image. We propose new loss terms, namely epipolar plane image (EPI) and brightness regularization (BRI) losses, as well as a novel multi-stage training framework to feed the loss terms at different time to generate superior light fields. The EPI loss can reinforce the network to learn the geometric features of the light fields, and the BRI loss can preserve the brightness consistency across different sub-aperture views. Two datasets have been used to evaluate our method: in addition to an existing light field dataset capturing scenes of flowers and plants, we have built a large dataset of toy animals consisting of 2,100 light fields captured with a plenoptic camera. We have performed comprehensive ablation studies to evaluate the effects of individual loss terms and the multi-stage training strategy, and have compared LFGAN to other state-of-the-art techniques. Qualitative and quantitative evaluation demonstrates that LFGAN can effectively estimate complex occlusions and geometry in challenging scenes, and outperform other existing techniques.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, LFGAN: 4D Light Field Synthesis from a Single RGB Image
- Martin Arjovsky and Léon Bottou. 2017. Towards principled methods for training generative adversarial networks. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. 214--223.Google Scholar
- Tom E. Bishop and Paolo Favaro. 2012. The light field camera: Extended depth of field, aliasing, and superresolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 5 (2012), 972--986.Google Scholar
Digital Library
- Tom E. Bishop, Sara Zanetti, and Paolo Favaro. 2009. Light field superresolution. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’09). IEEE, Los Alamitos, CA, 1--9.Google Scholar
Cross Ref
- Robert C. Bolles, H. Harlyn Baker, and David H. Marimont. 1987. Epipolar-plane image analysis: An approach to determining structure from motion. International Journal of Computer Vision 1, 1 (1987), 7--55.Google Scholar
Cross Ref
- Angel X. Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, et al. 2015. ShapeNet: An information-rich 3D model repository. arXiv:1512.03012.Google Scholar
- Gaurav Chaurasia, Sylvain Duchene, Olga Sorkine-Hornung, and George Drettakis. 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Transactions on Graphics 32, 3 (2013), Article 30.Google Scholar
Digital Library
- Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2Photo: Internet image montage. ACM Transactions on Graphics 28, 5 (2009), Article 124.Google Scholar
Digital Library
- Emily L. Denton, Soumith Chintala, Arthur Szlam, and Rob Fergus. 2015. Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in Neural Information Processing Systems. 1486--1494.Google Scholar
- Piotr Didyk, Pitchaya Sitthi-Amorn, William Freeman, Frédo Durand, and Wojciech Matusik. 2013. Joint view expansion and filtering for automultiscopic 3D displays. ACM Transactions on Graphics 32, 6 (2013), Article 221.Google Scholar
Digital Library
- Alexei A. Efros and William T. Freeman. 2001. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 341--346.Google Scholar
- Darel Rex Finley. 2006. HSP color model—Alternative to HSV (HSB) and HSL. Retrieved January 28, 2020 from https://alienryderflex.com/hsp.htmlGoogle Scholar
- John Flynn, Ivan Neulander, James Philbin, and Noah Snavely. 2016. DeepStereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 5515--5524.Google Scholar
- Todor Georgiev and Andrew Lumsdaine. 2009. Superresolution with Plenoptic Camera 2.0. Technical Report. Adobe Systems Incorporated.Google Scholar
- Todor Georgiev, Ke Colin Zheng, Brian Curless, David Salesin, Shree K. Nayar, and Chintan Intwala. 2006. Spatio-angular resolution tradeoffs in integral photography. Rendering Techniques 2006 (2006), 263--272.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672--2680.Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C. Courville. 2017. Improved training of Wasserstein GANs. In Advances in Neural Information Processing Systems. 5769--5779.Google Scholar
- Rui Huang, Shu Zhang, Tianyu Li, and Ran He. 2017. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2458--2467.Google Scholar
Cross Ref
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). IEEE, Los Alamitos, CA, 5967--5976.Google Scholar
- Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras. ACM Transactions on Graphics 35, 6 (2016), Article 193.Google Scholar
Digital Library
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196.Google Scholar
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 4681--4690.Google Scholar
Cross Ref
- Anat Levin and Fredo Durand. 2010. Linear view synthesis using a dimensionality gap light field prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, Los Alamitos, CA, 1831--1838.Google Scholar
Cross Ref
- Anat Levin, William T. Freeman, and Frédo Durand. 2008. Understanding camera trade-offs through a Bayesian analysis of light field projections. In Proceedings of the European Conference on Computer Vision. 88--101.Google Scholar
Digital Library
- Anat Levin, Samuel W. Hasinoff, Paul Green, Frédo Durand, and William T. Freeman. 2009. 4D frequency analysis of computational cameras for depth of field extension. ACM Transactions on Graphics 28, 3 (2009), Article 97.Google Scholar
Digital Library
- Marc Levoy and Pat Hanrahan. 1996. Light field rendering. In Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques. ACM, New York, NY, 31--42.Google Scholar
Digital Library
- Andrew Lumsdaine and Todor Georgiev. 2009. The focused plenoptic camera. In Proceedings of the IEEE International Conference on Computational Photography (ICCP’09). IEEE, Los Alamitos, CA, 1--8.Google Scholar
Cross Ref
- Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2813--2821.Google Scholar
Cross Ref
- Luke Metz, Ben Poole, David Pfau, and Jascha Sohl-Dickstein. 2016. Unrolled generative adversarial networks. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784.Google Scholar
- Kaushik Mitra and Ashok Veeraraghavan. 2012. Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’12). IEEE, Los Alamitos, CA, 22--28.Google Scholar
Cross Ref
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In Proceedings of the 33rd International Conference on Machine Learning, Vol. 48. 1060--1069.Google Scholar
- Scott E. Reed, Zeynep Akata, Santosh Mohan, Samuel Tenka, Bernt Schiele, and Honglak Lee. 2016. Learning what and where to draw. In Advances in Neural Information Processing Systems. 217--225.Google Scholar
- Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling deep image synthesis with sketch and color. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 2.Google Scholar
Cross Ref
- Lixin Shi, Haitham Hassanieh, Abe Davis, Dina Katabi, and Fredo Durand. 2014. Light field reconstruction using sparsity in the continuous Fourier domain. ACM Transactions on Graphics 34, 1 (2014), Article 12.Google Scholar
Digital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Pratul P. Srinivasan, Tongzhou Wang, Ashwin Sreelal, Ravi Ramamoorthi, and Ren Ng. 2017. Learning to synthesize a 4D RGBD light field from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17), Vol. 2. 6.Google Scholar
Cross Ref
- Maxim Tatarchenko, Alexey Dosovitskiy, and Thomas Brox. 2016. Multi-view 3D models from single images with a convolutional network. In Proceedings of the European Conference on Computer Vision. 322--337.Google Scholar
Cross Ref
- Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, and Jitendra Malik. 2017. Multi-view supervision for single-view reconstruction via differentiable ray consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17), Vol. 1. 3.Google Scholar
Cross Ref
- Suren Vagharshakyan, Robert Bregovic, and Atanas Gotchev. 2017. Accelerated Shearlet-domain light field reconstruction. IEEE Journal of Selected Topics in Signal Processing 11, 7 (2017), 1082--1091.Google Scholar
Cross Ref
- Suren Vagharshakyan, Robert Bregovic, and Atanas Gotchev. 2018. Light field reconstruction using Shearlet transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 1 (2018), 133--147.Google Scholar
Cross Ref
- Ting-Chun Wang, Alexei A. Efros, and Ravi Ramamoorthi. 2015. Occlusion-aware depth estimation using light-field cameras. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, Los Alamitos, CA, 3487--3495.Google Scholar
Digital Library
- Sven Wanner and Bastian Goldluecke. 2012. Spatial and angular variational super-resolution of 4D light fields. In Proceedings of the European Conference on Computer Vision. 608--621.Google Scholar
Digital Library
- Sven Wanner and Bastian Goldluecke. 2014. Variational light field analysis for disparity estimation and super-resolution. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 606--619.Google Scholar
Digital Library
- Gaochang Wu, Mandan Zhao, Liangyong Wang, Qionghai Dai, Tianyou Chai, and Yebin Liu. 2017. Light field reconstruction using deep convolutional network on EPI. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6319--6327.Google Scholar
Cross Ref
- Wenqi Xian, Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2018. TextureGAN: Controlling deep image synthesis with texture patches. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google Scholar
Cross Ref
- Jimei Yang, Scott E. Reed, Ming-Hsuan Yang, and Honglak Lee. 2015. Weakly-supervised disentangling with recurrent transformations for 3D view synthesis. In Advances in Neural Information Processing Systems. 1099--1107.Google Scholar
- Youngjin Yoon, Hae-Gon Jeon, Donggeun Yoo, Joon-Young Lee, and In So Kweon. 2015. Learning a deep convolutional network for light-field image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 24--32.Google Scholar
Digital Library
- Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiaogang Wang, and Dimitris Metaxas. 2017. StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 5907--5915.Google Scholar
Cross Ref
- Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful image colorization. In Proceedings of the European Conference on Computer Vision. 649--666.Google Scholar
- Bo Zhao, Xiao Wu, Zhi-Qi Cheng, Hao Liu, Zequn Jie, and Jiashi Feng. 2017. Multi-view image generation from a single-view. arXiv:1704.04886.Google Scholar
- Tinghui Zhou, Shubham Tulsiani, Weilun Sun, Jitendra Malik, and Alexei A. Efros. 2016. View synthesis by appearance flow. In Proceedings of the European Conference on Computer Vision. 286--301.Google Scholar
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, Los Alamitos, CA, 2242--2251.Google Scholar
Index Terms
LFGAN: 4D Light Field Synthesis from a Single RGB Image
Recommendations
Image Super-Resolution with Perceptual Quality Assessment Guidance
ICIGP '23: Proceedings of the 2023 6th International Conference on Image and Graphics ProcessingGenerative Adversarial Networks (GAN) have demonstrated the potential to recover realistic details for single image super-resolution (SISR). However, without considering perceptual metrics in optimization, existing perceptual SR methods could not show ...
Enhanced image super-resolution using hierarchical generative adversarial network
Recently, generative adversarial networks (GAN) have been introduced in single-image super-resolution (SISR) to reconstruct more realistic high-resolution (HR) images. In this paper, we propose an effective SISR method, named super-resolution using ...
Single Image Super-Resolution Reconstruction Technique based on A Single Hybrid Dictionary
A new sparse domain approach is proposed in this paper to realize the single image super-resolution (SR) reconstruction based upon one single hybrid dictionary, which is deduced from the mixture of both the high resolution (HR) image patch samples and ...






Comments