Abstract
View synthesis (VS) for light field images is a very time-consuming task due to the great quantity of involved pixels and intensive computations, which may prevent it from the practical three-dimensional real-time systems. In this article, we propose an acceleration approach for deep learning-based light field view synthesis, which can significantly reduce calculations by using compact-resolution (CR) representation and super-resolution (SR) techniques, as well as light-weight neural networks. The proposed architecture has three cascaded neural networks, including a CR network to generate the compact representation for original input views, a VS network to synthesize new views from down-scaled compact views, and a SR network to reconstruct high-quality views with full resolution. All these networks are jointly trained with the integrated losses of CR, VS, and SR networks. Moreover, due to the redundancy of deep neural networks, we use the efficient light-weight strategy to prune filters for simplification and inference acceleration. Experimental results demonstrate that the proposed method can greatly reduce the processing time and become much more computationally efficient with competitive image quality.
- [1] . 2007. Seam carving for content-aware image resizing. In ACM SIGGRAPH 2007 Papers. 10–es. Google Scholar
Digital Library
- [2] . 2018. Variational image compression with a scale hyperprior. In International Conference on Learning Representations. https://openreview.net/forum?id=rkcQFMZRb.Google Scholar
- [3] . 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. 2546–2554. Google Scholar
Digital Library
- [4] . 2006. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 535–541. Google Scholar
Digital Library
- [5] . 2013. Depth synthesis and local warps for plausible image-based navigation. ACM Trans. Graph. 32, 3 (2013), 1–12. Google Scholar
Digital Library
- [6] . 2018. 3D object proposals using stereo imagery for accurate object class detection. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1259–1272.Google Scholar
Cross Ref
- [7] . 2017. A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282. Retrieved from https://arxiv.org/abs/1710.09282.Google Scholar
- [8] . 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv:1602.02830. Retrieved from https://arxiv.org/abs/1602.02830.Google Scholar
- [9] . 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in Neural Information Processing Systems. 1269–1277. Google Scholar
Digital Library
- [10] . 2013. Joint view expansion and filtering for automultiscopic 3D displays. ACM Trans. Graph. 32, 6 (2013), 1–8. Google Scholar
Digital Library
- [11] . 2015. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38, 2 (2015), 295–307. Google Scholar
Digital Library
- [12] . 2016. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision. Springer, 391–407.Google Scholar
Cross Ref
- [13] . 2011. Novel 2-D MMSE subpixel-based image down-sampling. IEEE Trans. Circ. Syst. Vid. Technol. 22, 5 (2011), 740–753. Google Scholar
Digital Library
- [14] . 2017. Super resolution of light field images using linear subspace projection of patch-volumes. IEEE J. Select. Top. Sign. Process. 11, 7 (2017), 1058–1071.Google Scholar
Cross Ref
- [15] . 2021. Relation graph network for 3D object detection in point clouds. IEEE Trans. Image Process. 30 (2021), 92–107.Google Scholar
Digital Library
- [16] . 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems. 2962–2970. Google Scholar
Digital Library
- [17] . 2016. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5515–5524.Google Scholar
Cross Ref
- [18] . 2017. Joint machine learning and game theory for rate control in high efficiency video coding. IEEE Trans. Image Process. 26, 12 (2017), 6074–6089.Google Scholar
Digital Library
- [19] . 2016. DCT coefficient distribution modeling and quality dependency analysis based frame-level bit allocation for HEVC. IEEE Trans. Circ. Syst. Vid. Technol. 26, 1 (2016), 139–153.Google Scholar
Digital Library
- [20] . [n.d.].Google Scholar
- [21] . 2020. Low-rate image compression with super-resolution learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’20). 607–610.Google Scholar
Cross Ref
- [22] . 2006. Spatio-angular resolution tradeoff in integral photography. In Proceedings of Eurographics Symposium on Rendering. 263–272. Google Scholar
Digital Library
- [23] . 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 249–256.Google Scholar
- [24] . 2018. Morphnet: Fast & simple resource-constrained structure learning of deep networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1586–1595.Google Scholar
Cross Ref
- [25] . 2016. Dynamic network surgery for efficient dnns. In Advances in Neural Information Processing Systems. 1379–1387. Google Scholar
Digital Library
- [26] . 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. 1135–1143. Google Scholar
Digital Library
- [27] . 1989. Comparing biases for minimal network construction with back-propagation. In Advances in Neural Information Processing Systems. 177–185. Google Scholar
Digital Library
- [28] . 2018. Deep back-projection networks for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1664–1673.Google Scholar
Cross Ref
- [29] . 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in Neural Information Processing Systems. 164–171. Google Scholar
Digital Library
- [30] . 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026–1034. Google Scholar
Digital Library
- [31] . 2018. Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV’18). 784–800.Google Scholar
Digital Library
- [32] . 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4340–4349.Google Scholar
Cross Ref
- [33] . 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https://arxiv.org/abs/1503.02531.Google Scholar
- [34] . 2011. Sequential model-based optimization for general algorithm configuration. In Proceedings of the International Conference on Learning and Intelligent Optimization. Springer, 507–523. Google Scholar
Digital Library
- [35] . 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.Google Scholar
Cross Ref
- [36] . 2014. Speeding up convolutional neural networks with low rank expansions. arXiv:1405.3866. Retrieved from https://arxiv.org/abs/1405.3866.Google Scholar
- [37] . 2017. An end-to-end compression framework based on convolutional neural networks. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2017), 3007–3018.Google Scholar
Digital Library
- [38] . 2019. Unified no-reference quality assessment of singly and multiply distorted stereoscopic images. IEEE Trans. Image Process. 28, 4 (2019), 1866–1881.Google Scholar
Digital Library
- [39] . 2019. A risk-aware pairwise rank learning approach for visual discomfort prediction of stereoscopic 3D. IEEE Sign. Process. Lett. 26, 11 (2019), 1588–1592.Google Scholar
Cross Ref
- [40] . 2001. A taxonomy of global optimization methods based on response surfaces. J. Global Optim. 21, 4 (2001), 345–383. Google Scholar
Digital Library
- [41] . 2016. Learning-based view synthesis for light field cameras. ACM Trans. Graph. 35, 6 (2016), 1–10. Google Scholar
Digital Library
- [42] . 2016. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. 1646–1654.Google Scholar
Cross Ref
- [43] . 2016. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1637–1645.Google Scholar
Cross Ref
- [44] . 2015. Adam: A method for stochastic optimization. In International Conference on Learning Representations. https://openreview.net/forum?id=8gmWwjFyLj.Google Scholar
- [45] . 2017. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 624–632.Google Scholar
Cross Ref
- [46] . 1990. Optimal brain damage. In Advances in Neural Information Processing Systems. 598–605. Google Scholar
Digital Library
- [47] . 2010. Linear view synthesis using a dimensionality gap light field prior. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 1831–1838.Google Scholar
Cross Ref
- [48] . 2017. Pruning filters for efficient ConvNets. In International Conference on Learning Representations. https://openreview.net/forum?id=rJqFGTslg.Google Scholar
- [49] . 2018. Learning a convolutional neural network for image compact-resolution. IEEE Trans. Image Process. 28, 3 (2018), 1092–1107.Google Scholar
Cross Ref
- [50] . 2020. MMNet: Multi-stage and multi-scale fusion network for RGB-d salient object detection. In Proceedings of the 28th ACM International Conference on Multimedia. Association for Computing Machinery, 2436–2444. Google Scholar
Digital Library
- [51] . 2017. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 136–144.Google Scholar
Cross Ref
- [52] . 2016. Face model compression by distilling knowledge from neurons. In Proceedings of the 30th AAAI Conference on Artificial Intelligence. Google Scholar
Digital Library
- [53] . 2019. Importance estimation for neural network pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11264–11272.Google Scholar
Cross Ref
- [54] . 2013. Plenoptic layer-based modeling for image based rendering. IEEE Trans. Image Process. 22, 9 (2013), 3405–3419. Google Scholar
Digital Library
- [55] . 2014. Bayesian view synthesis and image-based rendering principles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3906–3913. Google Scholar
Digital Library
- [56] . 2020. The stanford lytro light field archive. Retrieved on Sept. 1, 2020 from http://lightfields.stanford.edu/LF2016.html.Google Scholar
- [57] . 2013. Learning separable filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2754–2761. Google Scholar
Digital Library
- [58] . 2013. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6655–6659.Google Scholar
Cross Ref
- [59] . 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.Google Scholar
Cross Ref
- [60] . 2014. Light field reconstruction using sparsity in the continuous fourier domain. ACM Trans. Graph. 34, 1 (2014), 1–13. Google Scholar
Digital Library
- [61] . 2018. Epinet: A fully-convolutional neural network using epipolar geometry for depth from light field images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4748–4757.Google Scholar
Cross Ref
- [62] . 2017. Learning to synthesize a 4d rgbd light field from a single image. In Proceedings of the IEEE International Conference on Computer Vision. 2243–2251.Google Scholar
Cross Ref
- [63] . 2015. Convolutional neural networks with low-rank regularization. arXiv:1511.06067. Retrieved from https://arxiv.org/abs/1511.06067.Google Scholar
- [64] . 2017. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3147–3155.Google Scholar
Cross Ref
- [65] . 2019. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2820–2828.Google Scholar
Cross Ref
- [66] . 2017. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision. 4799–4807.Google Scholar
Cross Ref
- [67] . 2017. Accelerated shearlet-domain light field reconstruction. IEEE J. Select. Top. Sign. Process. 11, 7 (2017), 1082–1091.Google Scholar
Cross Ref
- [68] . 2017. Light field reconstruction using shearlet transform. IEEE Trans. Pattern Anal. Mach. Intell. 40, 1 (2017), 133–147.Google Scholar
Cross Ref
- [69] . 2012. Spatial and angular variational super-resolution of 4D light fields. In Proceedings of the European Conference on Computer Vision. Springer, 608–621. Google Scholar
Digital Library
- [70] . 2013. Variational light field analysis for disparity estimation and super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2013), 606–619. Google Scholar
Digital Library
- [71] . 2017. Light field reconstruction using deep convolutional network on EPI. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6319–6327.Google Scholar
Cross Ref
- [72] . 2015. Learning a deep convolutional network for light-field image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 24–32. Google Scholar
Digital Library
- [73] . 2017. Scaling deep learning on gpu and knights landing clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12. Google Scholar
Digital Library
- [74] . 2019. AutoSlim: Towards one-shot architecture search for channel numbers. arXiv:1903.11728. Retrieved from https://arxiv.org/abs/1903.11728.Google Scholar
- [75] . 2015. Rate distortion optimized inter-view frame level bit allocation method for MV-HEVC. IEEE Trans. Multimedia 17, 12 (2015), 2134–2146.Google Scholar
Digital Library
- [76] . 2016. Plenopatch: Patch-based plenoptic image manipulation. IEEE Trans. Vis. Comput. Graph. 23, 5 (2016), 1561–1573. Google Scholar
Digital Library
- [77] . 2016. A unified scheme for super-resolution and depth estimation from asymmetric stereoscopic video. IEEE Trans. Circ. Syst. Vid. Technol. 26, 3 (2016), 479–493. Google Scholar
Digital Library
- [78] . 2015. Light field from micro-baseline image pair. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3800–3809.Google Scholar
Cross Ref
- [79] . 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In International Conference on Learning Representations. https://openreview.net/forum?id=HyQJ-mclg.Google Scholar
Index Terms
A Fast View Synthesis Implementation Method for Light Field Applications
Recommendations
Learning-based view synthesis for light field cameras
With the introduction of consumer light field cameras, light field imaging has recently become widespread. However, there is an inherent trade-off between the angular and spatial resolution, and thus, these cameras often sparsely sample in either ...
Light field mapping: efficient representation and hardware rendering of surface light fields
A light field parameterized on the surface offers a natural and intuitive description of the view-dependent appearance of scenes with complex reflectance properties. To enable the use of surface light fields in real-time rendering we develop a compact ...
Neural Light Transport for Relighting and View Synthesis
The light transport (LT) of a scene describes how it appears under different lighting conditions from different viewing directions, and complete knowledge of a scene’s LT enables the synthesis of novel views under arbitrary lighting. In this article, we ...






Comments