skip to main content
research-article

Dual-Lens HDR using Guided 3D Exposure CNN and Guided Denoising Transformer

Authors Info & Claims
Published:16 March 2023Publication History
Skip Abstract Section

Abstract

We study the high dynamic range (HDR) imaging problem in dual-lens systems. Existing methods usually treat the HDR imaging problem as an image fusion problem and the HDR result is estimated by fusing the aligned short exposure image and long exposure image. However, the image fusion pipeline depends highly on the image alignment, which is difficult to be perfect. We propose to transfer the dual-lens HDR imaging problem into the disentangled enhancement of exposure correction and denoising for the short exposure image, guided by the long exposure image. In the guided exposure correction module, we make use of the guidance image and 3D color transformation to propose a guided 3D exposure CNN (GEC) to get the rough HDR result from the short exposure image. Then, in the guided denoising module, we make use of the cross-attention mechanism to propose a guided denoising transformer (GDT) to directly use the long exposure image as guidance to denoise the rough HDR result in a pyramid way. And in both modules, we bypass the difficult image alignment processing. Experimental results demonstrate the superiority of our method over the state-of-the-art ones.

REFERENCES

  1. [1] Batz M., Richter T., Garbas J., Papst A., Seiler J., and Kaup A.. 2014. High dynamic range video reconstruction from a stereo camera setup. In Signal Processing: Image Communication. 191202.Google ScholarGoogle Scholar
  2. [2] Cao G., Zhou F., Liu K., Wang A., and Fan L.. 2022. A decoupled kernel prediction network guided by soft mask for single image HDR reconstruction. ACM Transactions on Multimedia Computing, Communications, and Applications (2022).Google ScholarGoogle Scholar
  3. [3] Chen D., Yuan L., Liao J., Yu N., and Hua G.. 2018. Stereoscopic neural style transfer. The IEEE Conference on Computer Vision and Pattern Recognition (2018).Google ScholarGoogle Scholar
  4. [4] Chen G., Chen C., Guo S., Liang Z., Wong K., and Zhang L.. 2021. HDR video reconstruction: A coarse-to-fine network and a real-world benchmark dataset. International Conference on Computer Vision (2021), 25022511.Google ScholarGoogle Scholar
  5. [5] Chen H., Wang Y., Guo T., Xu C., Deng Y., Liu Z., Ma S., Xu C., Xu C., and Gao W.. 2021. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1229912310.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Chen X., Liu Y., Zhang Z., Qiao Y., and Dong C.. 2021. HDRUNet: Single Image HDR reconstruction with denoising and dequantization. CVPR (2021).Google ScholarGoogle Scholar
  7. [7] Chen Y., Jiang G., Yu M., Yang Y., and Ho Y.. 2020. Learning stereo high dynamic range imaging from a pair of cameras with different exposure parameters. IEEE Transactions on Computational Imaging (2020).Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Y., Jiang G., Yu M., Yang Y., and Ho Y.. 2020. Learning stereo high dynamic range imaging from a pair of cameras with different exposure parameters. In IEEE Transactions on Computational Imaging. 10441058.Google ScholarGoogle Scholar
  9. [9] Chen Y., Yu M., Chen K., Jiang G., Song Y., Peng Z., and Chen F.. 2019. New stereo high dynamic range imaging method using generative adversarial networks. In International Conference on Image Processing. 35023506.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Cho H., Kim S., and Lee S.. 2014. Single-shot high dynamic range imaging using coded electronic shutter. Computer Graphics Forum 33, 7 (2014), 329338.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Cogalan U. and Akyuz A.. 2020. Deep joint deinterlacing and denoising for single shot dual-ISO HDR reconstruction. TIP (2020).Google ScholarGoogle Scholar
  12. [12] Cogalan U., Bemana M., Myszkowski K., Seidel H., and Ritschel T.. 2022. Learning HDR video reconstruction for dual-exposure sensors with temporally-alternating exposures. Computers and Graphics 105 (2022), 5772.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Devevec P. E. and Malik J.. 1997. Recovering high dynamic range radiance maps from photographs. TOG (1997).Google ScholarGoogle Scholar
  14. [14] Dong X., Bonev B., Zhu Y., and Yuille A.. 2015. Region-based temporally consistent video post-processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 714722.Google ScholarGoogle Scholar
  15. [15] Dong X., Hu X., Li W., Wang X., and Wang Y.. 2021. MIEHDR CNN: Main image enhancement based ghost-free high dynamic range imaging using dual-lens systems. AAAI (2021).Google ScholarGoogle Scholar
  16. [16] Dong X. and Li W.. 2019. Shoot high-quality color images using dual-lens system with monochrome and color cameras. Neurocomputing (2019), 2232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Dong X., Li W., Hu X., Wang X., and Wang Y.. 2020. A colorization framework for monochrome-color dual-lens systems using a deep convolutional network. IEEE Transactions on Visualization and Computer Graphics 28, 3 (2020), 14691485.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Dong X., Li W., and Wang X.. 2021. Pyramid convolutional network for colorization in monochrome-color multi-lens camera system. Neurocomputing 450 (2021), 129142.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Dong X., Li W., Wang X., and Wang Y.. 2019. Learning a deep convolutional network for colorization in monochrome-color dual-lens system. AAAI Conference on Artificial Intelligence (2019).Google ScholarGoogle Scholar
  20. [20] Dong X., Li W., Wang X., and Wang Y.. 2020. Cycle-CNN for colorization towards real monochrome-color camera systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1072110728.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Dong X., Liu C., Hu X., Xu K., and Li W.. 2022. Spatially consistent transformer for colorization in monochrome-color dual-lens system. IEEE Transactions on Image Processing (2022), 67476760.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Dong X., Liu C., Li W., Hu X., Wang X., and Wang Y.. 2021. Self-supervised colorization towards monochrome-color camera systems using cycle CNN. IEEE Transactions on Image Processing 30 (2021), 66096622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Dong X., Wang G., Pang Y., Li W., and Wen J.. 2011. Fast efficient algorithm for enhancement of low lighting video. IEEE Internation Conference on Multimedia and Expo (2011), 16.Google ScholarGoogle Scholar
  24. [24] Dong X., Yuan L., Li W., and Yuille A.. 2015. Temporally consistent region-based video exposure correction. IEEE International Conference on Multimedia and Expo (2015), 16.Google ScholarGoogle Scholar
  25. [25] Du Z., Wu S., Huang D., Li W., and Wang Y.. 2019. Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition. IEEE Transactions on Affective Computing 12, 3 (2019), 565578.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Hao Shijie, Han Xu, Guo Yanrong, and Wang Meng. 2022. Decoupled low-light image enhancement. ACM Transactions on Multimedia Computing, Communications, and Applications (2022).Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Hasinoff S., Sharlet D., Geiss R., Adams A., Barron J., Kainz F., Chen J., and Levoy M.. 2016. Burst photography for high dynamic range and low-light imaging on mobile cameras. TOG (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] He K., Sun J., and Tang X.. 2010. Guided image filtering. ECCV (2010).Google ScholarGoogle Scholar
  29. [29] Hu Wei, Seifi Mozhdeh, and Reinhard Erik. 2018. Over- and under-exposure reconstruction of a single plenoptic capture. ACM Transactions on Multimedia Computing, Communications, and Applications (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. [30] Hu Xianjun, Zhang Weiming, Li Ke, Hu Honggang, and Yu Nenghai. 2016. Secure nonlocal denoising in outsourced images. ACM Transactions on Multimedia Computing, Communications, and Applications (2016).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Huang X., Zhang Q., Feng Y., Li H., Wang X., and Wang Q.. 2022. HDR-NeRF: High dynamic range neural radiance fields. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1839818408.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Jeon D., Baek S., Choi I., and Kim M.. 2018. Enhancing the spatial resolution of stereo images using a parallax prior. CVPR (2018).Google ScholarGoogle Scholar
  33. [33] Joo J., Li W., Steen F., and Zhu S.. 2014. Visual persuasion: Inferring communicative intents of images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 216223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Kalantari N. K. and Ramomoorthi R.. 2017. Deep high dynamic range imaging of dynamic scenes. TOG (2017).Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Kim Hanul, Choi Su-Min, Kim Chang-Su, and Koh Yeong Jun. 2021. Representative color transform for image enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 44594468.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Kumar Manoj, Weissenborn Dirk, and Kalchbrenner Nal. 2021. Colorization transformer. arXiv preprint arXiv: 2102.04432 (2021).Google ScholarGoogle Scholar
  37. [37] Li B., Lin C., Shi B., Huang T., Gao W., and Kuo C.. 2018. Depth-aware stereo video retargeting. CVPR (2018).Google ScholarGoogle Scholar
  38. [38] Li H., Ma K., Yong H., and Zhang L.. 2020. Fast multi-scale structural patch decomposition for multi-exposure image fusion. TIP (2020).Google ScholarGoogle Scholar
  39. [39] Li R., Wang C., Liu S., Wang J., Liu G., and Zeng B.. 2021. UPHDR-GAN: Generative adversarial network for high dynamic range imaging with unpaired data. CVPR (2021).Google ScholarGoogle Scholar
  40. [40] Li W., Dong X., and Wang Y.. 2021. Human emotion recognition with relational region-level analysis. IEEE Transactions on Affective Computing (2021).Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Li W., Joo J., Qi H., and Zhu S.. 2016. Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Transactions on Multimedia 19, 2 (2016), 367381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. [42] Liang Jingyun, Cao Jiezhang, Sun Guolei, Zhang Kai, Gool Luc Van, and Timofte Radu. 2021. SwinIR: Image restoration using Swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 18331844.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Lin H. and Chang W.. 2009. High dynamic range imaging for stereoscopic scene representation. In International Conference on Image Processing. 43054308.Google ScholarGoogle Scholar
  44. [44] Liu C., Yuen J., and Torralba A.. 2011. SIFT flow: Dense correspondence across scenes and its applications. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 5 (2011), 978994.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Liu Y., Lai W., Chen Y., Kao Y., Yang M., Chuang Y., and Huang J.. 2020. Single-image HDR reconstruction by learning to reverse the camera pipeline. CVPR (2020).Google ScholarGoogle Scholar
  46. [46] Liu Z., Lin W., Li X., Rao Q., Jiang T., Han M., Fan H., Sun J., and Liu S.. 2021. ADNet: Attention-guided deformable convolutional network for high dynamic range imaging. CVPR (2021).Google ScholarGoogle Scholar
  47. [47] Liu Ze, Lin Yutong, Cao Yue, Hu Han, Wei Yixuan, Zhang Zheng, Lin Stephen, and Guo Baining. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. International Conference on Computer Vision (ICCV) (2021).Google ScholarGoogle Scholar
  48. [48] Ma K., Duanmu Z., Zhu H., Fang Y., and Wang Z.. 2020. Deep guided learning for fast multi-exposure image fusion. TIP (2020).Google ScholarGoogle Scholar
  49. [49] Mantiuk R., Kim K., Rempel A., and Heidrich W.. 2011. HDR-VDP-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on Graphics (2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Martel J., Mueller L., Carey S., Dudek P., and Wetzstein G.. 2020. Neural sensors: Learning pixel exposures for HDR imaging and video compressive sensing with programmable sensors. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 7 (2020), 16421653.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Mertens T., Kautz J., and Reeth F.. 2007. Exposure fusion. Pacific Graphics (2007).Google ScholarGoogle Scholar
  52. [52] Mildenhall B., Barron J., Chen J., Sharlet D., Ng R., and Carroll R.. 2018. Burst denoising with kernel prediction networks. CVPR (2018).Google ScholarGoogle Scholar
  53. [53] Mildenhall B., Hedman P., Martin-Brualla R., Srinivasan P., and Barron J.. 2022. Nerf in the dark: High dynamic range view synthesis from noisy raw images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1619016199.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Niu Y., Wu J., Liu W., Guo W., and Lau R.. 2021. HDR-GAN: HDR image reconstruction from multi-exposed LDR images with large motions. IEEE Transactions on Image Processing 30 (2021), 38853896.Google ScholarGoogle ScholarCross RefCross Ref
  55. [55] Pan L., Dai Y., Liu M., and Porikli F.. 2017. Simultaneous stereo video deblurring and scene flow estimation. CVPR (2017).Google ScholarGoogle Scholar
  56. [56] Park W., Ji S., Kang S., Jung S., and Ko S.. 2017. Stereo vision-based high dynamic range imaging using differently-exposed image pair. In Sensors. 1473.Google ScholarGoogle Scholar
  57. [57] Petschnigg G., Agrawala M., Hoppe H., Szeliski R., Cohen M., and Toyama K.. 2004. Digital photography with flash and no-flash image pairs. TOG (2004).Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. [58] Prabhakar K., Arora R., Swaminathan A., Singh K., and Babu R.. 2019. A fast, scalable and reliable deghosting method for extreme exposure fusion. ICCP (2019).Google ScholarGoogle Scholar
  59. [59] Prabhakar K., Senthil G., Agrawal S., Babu R., and Gorthi R.. 2021. Labeled from unlabeled: Exploiting unlabeled data for few-shot deep HDR deghosting. CVPR (2021).Google ScholarGoogle Scholar
  60. [60] Santos M., Ren T., and Kalantari N.. 2020. Single image HDR reconstruction using a CNN with masked features and perceptual loss. TOG (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. [61] Sun N., Mansour H., and Ward R.. 2010. HDR image construction from multi-exposed stereo LDR images. In IEEE International Conference on Image Processing. 29732976.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Trinidad M., Brualla R., Kainz F., and Kontkanen J.. 2019. Multi-view image fusion. ICCV (2019).Google ScholarGoogle Scholar
  63. [63] Wang J., Xue T., Barron J., and Chen J.. 2019. Stereoscopic dark flash for low-light photography. ICCP (2019).Google ScholarGoogle Scholar
  64. [64] Wang L., Wang Y., Liang Z., Lin Z., Yang J., An W., and Guo Y.. 2019. Learning parallax attention for stereo image super-resolution. CVPR (2019).Google ScholarGoogle Scholar
  65. [65] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Wang X., Li W., Mu G., Huang D., and Wang Y.. 2018. Facial expression synthesis by U-Net conditional generative adversarial networks. ACM International Conference on Multimedia Retrieval 450 (2018), 283290.Google ScholarGoogle Scholar
  67. [67] Wang X., Wang Y., and Li W.. 2019. U-net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3s (2019), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Wang X., Wang Y., Li W., Du Z., and Huang D.. 2021. Facial expression animation by landmark guided residual module. IEEE Transactions on Affective Computing (2021).Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Wang Y., Zhang Z., Li W., and Jiang F.. 2012. Combining tensor space analysis and active appearance models for aging effect simulation on face images. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 4 (2012), 11071118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Wang Z., Bovik A. C., Sheikh H. R., and Simoncelli E. P.. 2004. Image quality assessment from error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Wen Longyin, Qi Honggang, and Lyu Siwei. 2018. Contrast enhancement estimation for digital image forensics. ACM Transactions on Multimedia Computing, Communications, and Applications (2018).Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. [72] Weng Shuchen, Sun Jimeng, Li Yu, Li Si, and Shi Boxin. ([n. d.].)CT2: Colorization transformer via color tokens. (n.d.).Google ScholarGoogle Scholar
  73. [73] Wu S., Xu J., Tai Y., and Tang C.. 2018. Deep high dynamic range imaging with large foreground motions. ECCV (2018).Google ScholarGoogle Scholar
  74. [74] Xiong Pengfei and Chen Yu. 2021. Hierarchical fusion for practical ghost-free high dynamic range imaging. ACM MM (2021).Google ScholarGoogle Scholar
  75. [75] Xu K., Li W., Wang X., Wang X., Yan K., Hu X., and Dong X.. 2022. CUR transformer: A convolutional unbiased regional transformer for image denoising. ACM Transactions on Multimedia Computing, Communications, and Applications (2022).Google ScholarGoogle Scholar
  76. [76] Xu Xiaogang, Wang Ruixing, Fu Chi-Wing, and Jia Jiaya. 2022. SNR-aware low-light image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 1771417724.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Yan Chenggang, Li Zhisheng, Zhang Yongbing, Liu Yutao, and Ji Xiangyang. 2020. Depth image denoising using nuclear norm and learning graph model. ACM Transactions on Multimedia Computing, Communications, and Applications (2020).Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Yan Q., Gong D., Shi Q., Hengel A., Shen C., Reid I., and Zhang Y.. 2019. Attention-guided network for ghost-free high dynamic range imaging. CVPR (2019).Google ScholarGoogle Scholar
  79. [79] Yan Q., Zhang L., Liu Y., Zhu Y., Sun J., Shi Q., and Zhang Y.. 2020. Deep HDR imaging via a non-local network. TIP (2020).Google ScholarGoogle Scholar
  80. [80] Yuan Lu and Sun Jian. 2012. Automatic exposure correction of consumer photographs. ECCV (2012).Google ScholarGoogle Scholar
  81. [81] Zeng Hui, Cai Jianrui, Li Lida, Cao Zisheng, and Zhang Lei. 2020. Learning Image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).Google ScholarGoogle ScholarCross RefCross Ref
  82. [82] Zhai Guangtao, Sun Wei, Min Xiongkuo, and Zhou Jiantao. 2021. Perceptual quality assessment of low-light image enhancement. ACM Transactions on Multimedia Computing, Communications, and Applications (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Zheng Y., Huang D., Li W., Wang S., and Wang Y.. 2019. 2D-3D heterogeneous face recognition based on deep coupled spectral regression. IEEE Computer Vision and Pattern Recognition Workshops (2019).Google ScholarGoogle Scholar
  84. [84] Zhou S., Zhang J., Zuo W., Xie H., Pan J., and Ren J.. 2019. DAVANet: Stereo deblurring with view aggregation. CVPR (2019).Google ScholarGoogle Scholar

Index Terms

  1. Dual-Lens HDR using Guided 3D Exposure CNN and Guided Denoising Transformer

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 5
          September 2023
          262 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3585398
          • Editor:
          • Abdulmotaleb El Saddik
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 March 2023
          • Online AM: 6 January 2023
          • Accepted: 28 December 2022
          • Revised: 27 December 2022
          • Received: 11 August 2022
          Published in tomm Volume 19, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!