Abstract
Performance is a critical challenge in mobile image processing. Given a reference imaging pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements and enable real-time evaluation. For this, we introduce a new neural network architecture inspired by bilateral grid processing and local affine color transforms. Using pairs of input/output images, we train a convolutional neural network to predict the coefficients of a locally-affine model in bilateral space. Our architecture learns to make local, global, and content-dependent decisions to approximate the desired image transformation. At runtime, the neural network consumes a low-resolution version of the input image, produces a set of affine transformations in bilateral space, upsamples those transformations in an edge-preserving fashion using a new slicing node, and then applies those upsampled transformations to the full-resolution image. Our algorithm processes high-resolution images on a smartphone in milliseconds, provides a real-time viewfinder at 1080p resolution, and matches the quality of state-of-the-art approximation techniques on a large class of image operators. Unlike previous work, our model is trained off-line from data and therefore does not require access to the original operator at runtime. This allows our model to learn complex, scene-dependent transformations for which no reference implementation is available, such as the photographic edits of a human retoucher.
Supplemental Material
References
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/Google Scholar
- Andrew Adams, Jongmin Baek, and Myers Abraham Davis. 2010. Fast High-Dimensional Filtering Using the Permutohedral Lattice. Computer Graphics Forum (2010).Google Scholar
- Mathieu Aubry, Sylvain Paris, Samuel W Hasinoff, Jan Kautz, and Frédo Durand. 2014. Fast local laplacian filters: Theory and applications. ACM TOG (2014). Google Scholar
Digital Library
- Jonathan T Barron, Andrew Adams, YiChang Shih, and Carlos Hernández. 2015. Fast bilateral-space stereo for synthetic defocus. CVPR (2015).Google Scholar
- Jonathan T Barron and Ben Poole. 2016. The Fast Bilateral Solver. ECCV (2016).Google Scholar
- Adrien Bousseau, Sylvain Paris, and Frédo Durand. 2009. User-assisted intrinsic images. ACM TOG (2009). Google Scholar
Digital Library
- Vladimir Bychkovsky, Sylvain Paris, Eric Chan, and Frédo Durand. 2011. Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs. CVPR (2011). Google Scholar
Digital Library
- Jiawen Chen, Andrew Adams, Neal Wadhwa, and Samuel W Hasinoff. 2016. Bilateral guided upsampling. ACM TOG (2016). Google Scholar
Digital Library
- Jiawen Chen, Sylvain Paris, and Frédo Durand. 2007. Real-time edge-aware image processing with the bilateral grid. ACM TOG (2007). Google Scholar
Digital Library
- Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a deep convolutional network for image super-resolution. ECCV (2014).Google Scholar
- David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth map prediction from a single image using a multi-scale deep network. NIPS (2014). Google Scholar
Digital Library
- Zeev Farbman, Raanan Fattal, and Dani Lischinski. 2011. Convolution pyramids. ACM TOG (2011). Google Scholar
Digital Library
- Michaël Gharbi, Gaurav Chaurasia, Sylvain Paris, and Frédo Durand. 2016. Deep Joint Demosaicking and Denoising. ACM TOG (2016). Google Scholar
Digital Library
- Michaël Gharbi, YiChang Shih, Gaurav Chaurasia, Jonathan Ragan-Kelley, Sylvain Paris, and Frédo Durand. 2015. Transform Recipes for Efficient Cloud Photo Enhancement. ACM TOG (2015). Google Scholar
Digital Library
- Samuel W Hasinoff, Dillon Sharlet, Ryan Geiss, Andrew Adams, Jonathan T Barron, Florian Kainz, Jiawen Chen, and Marc Levoy. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. ACM TOG (2016). Google Scholar
Digital Library
- Kaiming He and Jian Sun. 2015. Fast Guided Filter. CoRR (2015).Google Scholar
- Kaiming He, Jian Sun, and Xiaoou Tang. 2013. Guided image filtering. TPAMI (2013).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. CoRR (2015). Google Scholar
Digital Library
- James Hegarty, John Brunhaver, Zachary DeVito, Jonathan Ragan-Kelley, Noy Cohen, Steven Bell, Artem Vasilyev, Mark Horowitz, and Pat Hanrahan. 2014. Darkroom: compiling high-level image processing code into hardware pipelines. ACM TOG (2014). Google Scholar
Digital Library
- Sung Ju Hwang, Ashish Kapoor, and Sing Bing Kang. 2012. Context-based automatic local image enhancement. ECCV (2012).Google Scholar
- Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM TOG (2016). Google Scholar
Digital Library
- Eddy Ilg, Nikolaus Mayer, Tonmoy Saikia, Margret Keuper, Alexey Dosovitskiy, and Thomas Brox. 2016. Flownet 2.0: Evolution of optical flow estimation with deep networks. CoRR (2016).Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML (2015). Google Scholar
Digital Library
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2016. Image-to-Image Translation with Conditional Adversarial Networks. CoRR (2016).Google Scholar
- Max Jaderberg, Karen Simonyan, Andrew Zisserman, and others. 2015. Spatial transformer networks. In Advances in Neural Information Processing Systems. 2017--2025. Google Scholar
Digital Library
- Vidit Jain and Erik Learned-Miller. 2010. FDDB: A Benchmark for Face Detection in Unconstrained Settings. Technical Report UM-CS-2010--009. University of Massachusetts, Amherst.Google Scholar
- Varun Jampani, Martin Kiefel, and Peter V. Gehler. 2016. Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks. CVPR (2016).Google Scholar
- Liad Kaufman, Dani Lischinski, and Michael Werman. 2012. Content-Aware Automatic Photo Enhancement. Computer Graphics Forum (2012). Google Scholar
Digital Library
- Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. ICLR (2015).Google Scholar
- Johannes Kopf, Michael F Cohen, Dani Lischinski, and Matt Uyttendaele. 2007. Joint bilateral upsampling. ACM TOG (2007). Google Scholar
Digital Library
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. NIPS (2012). Google Scholar
Digital Library
- Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A closed-form solution to natural image matting. TPAMI (2008). Google Scholar
Digital Library
- Sifei Liu, Jinshan Pan, and Ming-Hsuan Yang. 2016. Learning recursive filters for low-level vision via a hybrid neural network. ECCV (2016).Google Scholar
- Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. CVPR (2015).Google Scholar
- Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. 2016. Automatically Scheduling Halide Image Processing Pipelines. ACM TOG (2016). Google Scholar
Digital Library
- Sylvain Paris and Frédo Durand. 2006. A fast approximation of the bilateral filter using a signal processing approach. ECCV (2006). Google Scholar
Digital Library
- Sylvain Paris, Samuel W Hasinoff, and Jan Kautz. 2011. Local Laplacian filters: edge-aware image processing with a Laplacian pyramid. ACM TOG (2011). Google Scholar
Digital Library
- Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2012. Decoupling Algorithms from Schedules for Easy Optimization of Image Processing Pipelines. ACM TOG (2012). Google Scholar
Digital Library
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention. Google Scholar
Cross Ref
- Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. 2016. Deep Automatic Portrait Matting. ECCV (2016).Google Scholar
- Yichang Shih, Sylvain Paris, Frédo Durand, and William T Freeman. 2013. Data-driven hallucination of different times of day from a single outdoor photo. ACM TOG (2013). Google Scholar
Digital Library
- Carlo Tomasi and Roberto Manduchi. 1998. Bilateral filtering for gray and color images. ICCV (1998). Google Scholar
Digital Library
- Li Xu, Jimmy Ren, Qiong Yan, Renjie Liao, and Jiaya Jia. 2015. Deep Edge-Aware Filters. ICML (2015). Google Scholar
Digital Library
- Zhicheng Yan, Hao Zhang, Baoyuan Wang, Sylvain Paris, and Yizhou Yu. 2016. Automatic photo adjustment using deep neural networks. ACM TOG (2016). Google Scholar
Digital Library
- Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. CoRR (2015).Google Scholar
- Lu Yuan and Jian Sun. 2011. High quality image reconstruction from raw and jpeg image pair. ICCV (2011). Google Scholar
Digital Library
- Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. 2016. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising. CoRR (2016). Google Scholar
Digital Library
Index Terms
Deep bilateral learning for real-time image enhancement





Comments