skip to main content
research-article

Real-time Image Enhancement with Attention Aggregation

Published:17 February 2023Publication History
Skip Abstract Section

Abstract

Image enhancement has stimulated significant research works over the past years for its great application potential in video conferencing scenarios. Nevertheless, most existing image enhancement approaches are still struggling to find a good tradeoff that reduces the computational cost as much as possible while maintaining plausible result quality. Recently, curve-based mapping methods are proposed and have shown great potential for real-time and high-quality image enhancement of arbitrary resolutions. In this article, we take advantage of the curve-based mapping representation and focus on further improving the enhancement quality and robustness, while minimizing additional computational costs. Specifically, we (1) carefully re-formulate the curve function to improve learning stability, and (2) aggregate different semantic attention into the curve regression process, which can overcome the major problems of curve-based methods that generate moderate results with low contrast. The semantic attention is jointly learned with the supervision from class activation mapping of pre-trained feature extractors, thus reducing the manual annotation cost of semantic labels. Experiments have shown that our proposed method significantly improves curve-based methods both qualitatively and quantitatively, achieving visually plausible results compared with other deep neural network-based enhancement methods, and maintains a very low computational cost, i.e., taking 18.7 ms for a 360p image on a single P40 GPU. Extensive experiments demonstrate that our method is also capable of video enhancement tasks.

REFERENCES

  1. [1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 265283.Google ScholarGoogle Scholar
  2. [2] Ahlberg J. Harold, Nilson Edwin Norman, and Walsh Joseph Leonard. 2016. The Theory of Splines and Their Applications: Mathematics in Science and Engineering: A Series of Monographs and Textbooks. Elsevier.Google ScholarGoogle Scholar
  3. [3] Aubry Mathieu, Paris Sylvain, Hasinoff Samuel W., Kautz Jan, and Durand Frédo. 2014. Fast local laplacian filters: Theory and applications. ACM Transactions on Graphics 33, 5 (2014), 114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Barash Danny. 2002. Fundamental relationship between bilateral filtering, adaptive smoothing, and the nonlinear diffusion equation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 6 (2002), 844847.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Bonneel Nicolas, Tompkin James, Sunkavalli Kalyan, Sun Deqing, Paris Sylvain, and Pfister Hanspeter. 2015. Blind video temporal consistency. ACM Transactions on Graphics 34, 6 (2015), 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Bychkovsky Vladimir, Paris Sylvain, Chan Eric, and Durand Frédo. 2011. Learning photographic global tonal adjustment with a database of input/output image pairs. In Proceedings of the CVPR 2011. IEEE, 97104.Google ScholarGoogle Scholar
  7. [7] Caballero Jose, Ledig Christian, Aitken Andrew, Acosta Alejandro, Totz Johannes, Wang Zehan, and Shi Wenzhe. 2017. Real-time video super-resolution with spatio-temporal networks and motion compensation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 47784787.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Dongdong, Liao Jing, Yuan Lu, Yu Nenghai, and Hua Gang. 2017. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision. 11051114.Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Gharbi Michaël, Chen Jiawen, Barron Jonathan T., Hasinoff Samuel W., and Durand Frédo. 2017. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics 36, 4 (2017), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] HaCohen Yoav, Shechtman Eli, Goldman Dan B., and Lischinski Dani. 2013. Optimizing color consistency in photo collections. ACM Transactions on Graphics 32, 4 (2013), 110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770778.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Hu Yuanming, He Hao, Xu Chenxi, Wang Baoyuan, and Lin Stephen. 2018. Exposure: A white-box photo post-processing framework. ACM Transactions on Graphics 37, 2 (2018), 117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Huang Yan, Wang Wei, and Wang Liang. 2015. Bidirectional recurrent convolutional networks for multi-frame super-resolution. In Proceedings of the Advances in Neural Information Processing Systems. 235243.Google ScholarGoogle Scholar
  15. [15] Huang Yukun, Zha Zheng-Jun, Fu Xueyang, and Zhang Wei. 2019. Illumination-invariant person re-identification. In Proceedings of the 27th ACM International Conference on Multimedia. 365373.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Iandola Forrest N., Han Song, Moskewicz Matthew W., Ashraf Khalid, Dally William J., and Keutzer Kurt. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv:1602.07360. Retrieved from https://arxiv.org/abs/1602.07360.Google ScholarGoogle Scholar
  17. [17] Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A.. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 11251134.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Johnson Justin, Alahi Alexandre, and Fei-Fei Li. 2016. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision. Springer, 694711.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Kalchbrenner Nal, Oord Aäron van den, Simonyan Karen, Danihelka Ivo, Vinyals Oriol, Graves Alex, and Kavukcuoglu Koray. 2017. Video pixel networks. In Proceedings of the 34th International Conference on Machine Learning. JMLR. org, 17711779.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. [20] Kim Han-Ul, Koh Young Jun, and Kim Chang-Su. 2020. PieNet: Personalized image enhancement network. In Proceedings of the European Conference on Computer Vision. Springer, 374390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. [21] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.Google ScholarGoogle Scholar
  22. [22] Kopf Johannes, Cohen Michael F., Lischinski Dani, and Uyttendaele Matt. 2007. Joint bilateral upsampling. In Proceedings of the ACM SIGGRAPH 2007 Papers.ACM.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Lai Wei-Sheng, Huang Jia-Bin, Wang Oliver, Shechtman Eli, Yumer Ersin, and Yang Ming-Hsuan. 2018. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision. 170185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Liu Sifei, Pan Jinshan, and Yang Ming-Hsuan. 2016. Learning recursive filters for low-level vision via a hybrid neural network. In Proceedings of the European Conference on Computer Vision. Springer, 560576.Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Liu Shiguang, Wang Huixin, and Zhang Xiaoli. 2021. Video decolorization based on the CNN and LSTM neural network. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 3 (2021), 118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Moran Sean, Marza Pierre, McDonagh Steven, Parisot Sarah, and Slabaugh Gregory. 2020. DeepLPF: Deep local parametric filters for image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1282612835.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Park Jongchan, Lee Joon-Young, Yoo Donggeun, and Kweon In So. 2018. Distort-and-recover: Color enhancement using deep reinforcement learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 59285936.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Pickup Lyndsey C., Pan Zheng, Wei Donglai, Shih YiChang, Zhang Changshui, Zisserman Andrew, Schölkopf Bernhard, and Freeman William T.. 2014. Seeing the arrow of time. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Rebol Manuel and Knöbelreiter Patrick. 2020. Frame-to-frame consistent semantic segmentation. In Proceedings of the Joint Austrian Computer Vision And Robotics Workshop (ACVRW’20).Google ScholarGoogle Scholar
  30. [30] Russakovsky Olga, Deng Jia, Su Hao, Krause Jonathan, Satheesh Sanjeev, Ma Sean, Huang Zhiheng, Karpathy Andrej, Khosla Aditya, Bernstein Michael, Alexander C. Berg, and Li Fei-Fei. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision 115, 3 (2015), 211252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Sandler Mark, Howard Andrew, Zhu Menglong, Zhmoginov Andrey, and Chen Liang-Chieh. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 45104520.Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Simonyan Karen and Zisserman Andrew. 2014. Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015).Google ScholarGoogle Scholar
  33. [33] Su Shuochen, Delbracio Mauricio, Wang Jue, Sapiro Guillermo, Heidrich Wolfgang, and Wang Oliver. 2017. Deep video deblurring for hand-held cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 12791288.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Sutton Richard S. and Barto Andrew G.. 2018. Reinforcement Learning: An Introduction. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Tao Xin, Gao Hongyun, Liao Renjie, Wang Jue, and Jia Jiaya. 2017. Detail-revealing deep video super-resolution. In Proceedings of the IEEE International Conference on Computer Vision. 44724480.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Wang Baoyuan, Yu Yizhou, and Xu Ying-Qing. 2011. Example-based image color and tone style enhancement. ACM Transactions on Graphics 30, 4 (2011), 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Wang Ruixing, Zhang Qing, Fu Chi-Wing, Shen Xiaoyong, Zheng Wei-Shi, and Jia Jiaya. 2019. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 68496857.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Wang Wei, Chen Xin, Yang Cheng, Li Xiang, Hu Xuemei, and Yue Tao. 2019. Enhancing low light videos by exploring high sensitivity camera noise. In Proceedings of the IEEE International Conference on Computer Vision. 41114119.Google ScholarGoogle ScholarCross RefCross Ref
  39. [39] contributors Wikipedia. 2021. CIELAB color space-Wikipedia, The Free Encyclopedia. Retrieved February 27, 2021 from https://en.wikipedia.org/w/index.php?title=CIELAB_color _space&oldid=1008944203Google ScholarGoogle Scholar
  40. [40] Xia Xide, Zhang Meng, Xue Tianfan, Sun Zheng, Fang Hui, Kulis Brian, and Chen Jiawen. 2020. Joint bilateral learning for real-time universal photorealistic style transfer. In Proceedings of the European Conference on Computer Vision. Springer, 327342.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Xingjian SHI, Chen Zhourong, Wang Hao, Yeung Dit-Yan, Wong Wai-Kin, and Woo Wang-chun. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the Advances in Neural Information Processing Systems. 802810.Google ScholarGoogle Scholar
  42. [42] Xu Xin, Wang Shiqin, Wang Zheng, Zhang Xiaolong, and Hu Ruimin. 2021. Exploring image enhancement for salient object detection in low light images. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1s (2021), 119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Yan Zhicheng, Zhang Hao, Wang Baoyuan, Paris Sylvain, and Yu Yizhou. 2016. Automatic photo adjustment using deep neural networks. ACM Transactions on Graphics 35, 2 (2016), 115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. [44] Zhang Mohan, Gao Qiqi, Wang Jinglu, Turbell Henrik, Zhao David, Yu Jinhui, and Lu Yan. 2020. RT-VENet: A convolutional network for real-time video enhancement. In Proceedings of the 28th ACM International Conference on Multimedia. 40884097.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. [45] Zhou Bolei, Khosla Aditya, Lapedriza Agata, Oliva Aude, and Torralba Antonio. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 29212929.Google ScholarGoogle ScholarCross RefCross Ref
  46. [46] Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A.. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 22232232.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Real-time Image Enhancement with Attention Aggregation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 2s
      April 2023
      545 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572861
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 17 February 2023
      • Online AM: 26 September 2022
      • Accepted: 10 September 2022
      • Revised: 4 August 2022
      • Received: 5 September 2021
      Published in tomm Volume 19, Issue 2s

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)175
      • Downloads (Last 6 weeks)14

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text

    HTML Format

    View this article in HTML Format .

    View HTML Format
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!