skip to main content
research-article

CUR Transformer: A Convolutional Unbiased Regional Transformer for Image Denoising

Published:25 February 2023Publication History
Skip Abstract Section

Abstract

Image denoising is a fundamental problem in computer vision and multimedia computation. Non-local filters are effective for image denoising. But existing deep learning methods that use non-local computation structures are mostly designed for high-level tasks, and global self-attention is usually adopted. For the task of image denoising, they have high computational complexity and have a lot of redundant computation of uncorrelated pixels. To solve this problem and combine the marvelous advantages of non-local filter and deep learning, we propose a Convolutional Unbiased Regional (CUR) transformer. Based on the prior that, for each pixel, its similar pixels are usually spatially close, our insights are that (1) we partition the image into non-overlapped windows and perform regional self-attention to reduce the search range of each pixel, and (2) we encourage pixels across different windows to communicate with each other. Based on our insights, the CUR transformer is cascaded by a series of convolutional regional self-attention (CRSA) blocks with U-style short connections. In each CRSA block, we use convolutional layers to extract the query, key, and value features, namely Q, K, and V, of the input feature. Then, we partition the Q, K, and V features into local non-overlapped windows and perform regional self-attention within each window to obtain the output feature of this CRSA block. Among different CRSA blocks, we perform the unbiased window partition by changing the partition positions of the windows. Experimental results show that the CUR transformer outperforms the state-of-the-art methods significantly on four low-level vision tasks, including real and synthetic image denoising, JPEG compression artifact reduction, and low-light image enhancement.

REFERENCES

  1. [1] Abdelhamed Abdelrahman, Lin Stephen, and Brown Michael S.. 2018. A high-quality denoising dataset for smartphone cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 16921700.Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Adams Andrew, Gelfand Natasha, Dolson Jennifer, and Levoy Marc. 2009. Gaussian kd-trees for fast high-dimensional filtering. In ACM SIGGRAPH 2009 Papers. 112.Google ScholarGoogle Scholar
  3. [3] Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar, Blake Hechtman, and Jonathon Shlens. 2021. Scaling local self-attention for parameter efficient visual backbones. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  4. [4] Brox Thomas, Kleinschmidt Oliver, and Cremers Daniel. 2008. Efficient nonlocal means for denoising of textural patterns. IEEE Transactions on Image Processing 17, 7 (2008), 10831092.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Buades Antoni, Coll Bartomeu, and Morel J-M.. 2005. A non-local algorithm for image denoising. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2. IEEE, 6065.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Chang Meng, Li Qi, Feng Huajun, and Xu Zhihai. 2020. Spatial-adaptive network for single image denoising. In European Conference on Computer Vision. Springer, 171187.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Chen Hanting, Wang Yunhe, Guo Tianyu, Xu Chang, Deng Yiping, Liu Zhenhua, Ma Siwei, Xu Chunjing, Xu Chao, and Gao Wen. 2021. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1229912310.Google ScholarGoogle ScholarCross RefCross Ref
  8. [8] Chen Yunjin and Pock Thomas. 2016. Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2016), 12561272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Dabov Kostadin, Foi Alessandro, Katkovnik Vladimir, and Egiazarian Karen. 2007. Color image denoising via sparse 3D collaborative filtering with grouping constraint in luminance-chrominance space. In 2007 IEEE International Conference on Image Processing, Vol. 1. IEEE, I–313.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  11. [11] Dong Chao, Deng Yubin, Loy Chen Change, and Tang Xiaoou. 2015. Compression artifacts reduction by a deep convolutional network. In Proceedings of the IEEE International Conference on Computer Vision. 576584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Dong Chao, Loy Chen Change, He Kaiming, and Tang Xiaoou. 2015. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2015), 295307.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Dong Xuan, Bonev Boyan, Zhu Yu, and Yuille Alan L.. 2015. Region-based temporally consistent video post-processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 714722.Google ScholarGoogle Scholar
  14. [14] Dong Xuan, Li Weixin, Hu Xiaoyan, Wang Xiaojie, and Wang Yunhong. 2022. A colorization framework for monochrome-color dual-lens systems using a deep convolutional network. IEEE Transactions on Visualization and Computer Graphics Vol. 28. 14691485.Google ScholarGoogle ScholarCross RefCross Ref
  15. [15] Dong Xuan, Li Weixin, Wang Xiaojie, and Wang Yunhong. 2019. Learning a deep convolutional network for colorization in monochrome-color dual-lens system. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 82558262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Dong Xuan, Li Weixin, Wang Xiaojie, and Wang Yunhong. 2020. Cycle-CNN for colorization towards real monochrome-color camera systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1072110728.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Dong Xuan, Liu Chang, Li Weixin, Hu Xiaoyan, Wang Xiaojie, and Wang Yunhong. 2021. Self-supervised colorization towards monochrome-color camera systems using cycle CNN. IEEE Transactions on Image Processing 30 (2021), 66096622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Dong Xuan, Wang Guan, Pang Yi, Li Weixin, Wen Jiangtao, Meng Wei, and Lu Yao. 2011. Fast efficient algorithm for enhancement of low lighting video. In 2011 IEEE International Conference on Multimedia and Expo. IEEE, 16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Dosovitskiy Alexey, Beyer Lucas, Kolesnikov Alexander, Weissenborn Dirk, Zhai Xiaohua, Unterthiner Thomas, Dehghani Mostafa, Minderer Matthias, Heigold Georg, Gelly Sylvain, Uszkoreit Jakob, and Houlsby Neil. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).Google ScholarGoogle Scholar
  20. [20] Du Zhengyin, Wu Suowei, Huang Di, Li Weixin, and Wang Yunhong. 2019. Spatio-temporal encoder-decoder fully convolutional network for video-based dimensional emotion recognition. IEEE Transactions on Affective Computing 12, 3 (2019), 565578.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Foi Alessandro, Katkovnik Vladimir, and Egiazarian Karen. 2007. Pointwise shape-adaptive DCT for high-quality denoising and deblocking of grayscale and color images. IEEE Transactions on Image Processing 16, 5 (2007), 13951411.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Fu Xueyang, Zeng Delu, Huang Yue, Liao Yinghao, Ding Xinghao, and Paisley John. 2016. A fusion-based enhancing method for weakly illuminated images. Signal Processing 129 (2016), 8296.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Fu Xueyang, Zeng Delu, Huang Yue, Zhang Xiao-Ping, and Ding Xinghao. 2016. A weighted variational model for simultaneous reflectance and illumination estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 27822790.Google ScholarGoogle ScholarCross RefCross Ref
  24. [24] Guo Xiaojie, Li Yu, and Ling Haibin. 2016. LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing 26, 2 (2016), 982993.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] He Kaiming, Sun Jian, and Tang Xiaoou. 2012. Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 6 (2012), 13971409.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Huang Jia-Bin, Singh Abhishek, and Ahuja Narendra. 2015. Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 51975206.Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Jancsary Jeremy, Nowozin Sebastian, and Rother Carsten. 2012. Loss-specific training of non-parametric image restoration models: A new state of the art. In European Conference on Computer Vision. Springer, 112125.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Joo Jungseock, Li Weixin, Steen Francis F., and Zhu Song-Chun. 2014. Visual persuasion: Inferring communicative intents of images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 216223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. [29] Khaw Hui Ying, Soon Foo Chong, Chuah Joon Huang, and Chow Chee-Onn. 2019. High-density impulse noise detection and removal using deep convolutional neural network with particle swarm optimisation. IET Image Processing 13, 2 (2019), 365374.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  31. [31] Levin Anat and Nadler Boaz. 2011. Natural image denoising: Optimality and inherent bounds. In CVPR’11. IEEE, 28332840.Google ScholarGoogle Scholar
  32. [32] Li Mading, Liu Jiaying, Yang Wenhan, Sun Xiaoyan, and Guo Zongming. 2018. Structure-revealing low-light image enhancement via robust retinex model. IEEE Transactions on Image Processing 27, 6 (2018), 28282841.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Li Weixin, Dong Xuan, and Wang Yunhong. 2021. Human emotion recognition with relational region-level analysis. IEEE Transactions on Affective Computing (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Li Weixin, Joo Jungseock, Qi Hang, and Zhu Song-Chun. 2016. Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Transactions on Multimedia 19, 2 (2016), 367381.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. [35] Liang Jingyun, Cao Jiezhang, Sun Guolei, Zhang Kai, Gool Luc Van, and Timofte Radu. 2021. Swinir: Image restoration using Swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 18331844.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Liang Zhetong, Cai Jianrui, Cao Zisheng, and Zhang Lei. 2021. Cameranet: A two-stage framework for effective camera ISP learning. IEEE Transactions on Image Processing 30 (2021), 22482262.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. [37] Lin Tsung-Yi, Maire Michael, Belongie Serge, Hays James, Perona Pietro, Ramanan Deva, Dollár Piotr, and Zitnick C. Lawrence. 2014. Microsoft Coco: Common objects in context. In European Conference on Computer Vision. Springer, 740755.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Liu Ze, Lin Yutong, Cao Yue, Hu Han, Wei Yixuan, Zhang Zheng, Lin Stephen, and Guo Baining. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021).Google ScholarGoogle Scholar
  39. [39] Martin David, Fowlkes Charless, Tal Doron, and Malik Jitendra. 2001. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV’01), Vol. 2. IEEE, 416423.Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Mildenhall Ben, Barron Jonathan T., Chen Jiawen, Sharlet Dillon, Ng Ren, and Carroll Robert. 2018. Burst denoising with kernel prediction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 25022510.Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] MindSpore. [n.d.]. https://www.mindspore.cn/.Google ScholarGoogle Scholar
  42. [42] Ren Chao, He Xiaohai, Wang Chuncheng, and Zhao Zhibo. 2021. Adaptive consistency prior based deep network for image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 85968606.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Ren Xutong, Li Mading, Cheng Wen-Huang, and Liu Jiaying. 2018. Joint enhancement and denoising method via sequential decomposition. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS’18). IEEE, 15.Google ScholarGoogle Scholar
  44. [44] Ronneberger Olaf, Fischer Philipp, and Brox Thomas. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 234241.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Sheikh H. R.. 2005. LIVE image quality assessment database release 2. http://live.ece.utexas.edu/research/quality.Google ScholarGoogle Scholar
  46. [46] Tian Chunwei, Xu Yong, Li Zuoyong, Zuo Wangmeng, Fei Lunke, and Liu Hong. 2020. Attention-guided CNN for image denoising. Neural Networks 124 (2020), 117129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. [47] Tomasi Carlo and Manduchi Roberto. 1998. Bilateral filtering for gray and color images. In 6th International Conference on Computer Vision (IEEE Cat. No. 98CH36271). IEEE, 839846.Google ScholarGoogle ScholarCross RefCross Ref
  48. [48] Touvron Hugo, Cord Matthieu, Douze Matthijs, Massa Francisco, Sablayrolles Alexandre, and Jégou Hervé. 2021. Training data-efficient image transformers & distillation through attention. In International Conference on Machine Learning. PMLR, 1034710357.Google ScholarGoogle Scholar
  49. [49] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 59986008.Google ScholarGoogle Scholar
  50. [50] Wang Ruixing, Zhang Qing, Fu Chi-Wing, Shen Xiaoyong, Zheng Wei-Shi, and Jia Jiaya. 2019. Underexposed photo enhancement using deep illumination estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 68496857.Google ScholarGoogle ScholarCross RefCross Ref
  51. [51] Wang Wenjing, Wei Chen, Yang Wenhan, and Liu Jiaying. 2018. GLADNet: Low-light enhancement network with global awareness. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG’18). IEEE, 751755.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. [52] Wang Xiaolong, Girshick Ross, Gupta Abhinav, and He Kaiming. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 77947803.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Wang Xueping, Wang Yunhong, and Li Weixin. 2019. U-Net conditional GANs for photo-realistic and identity-preserving facial expression synthesis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15, 3s (2019), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Wang Xueping, Wang Yunhong, Li Weixin, Du Zhengyin, and Huang Di. 2021. Facial expression animation by landmark guided residual module. IEEE Transactions on Affective Computing (2021).Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Wang Yunhong, Zhang Zhaoxiang, Li Weixin, and Jiang Fangyuan. 2012. Combining tensor space analysis and active appearance models for aging effect simulation on face images. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 4 (2012), 11071118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] Wang Zhou, Bovik Alan C., Sheikh Hamid R., and Simoncelli Eero P.. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600612.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. [57] Wei Chen, Wang Wenjing, Yang Wenhan, and Liu Jiaying. 2018. Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560 (2018).Google ScholarGoogle Scholar
  58. [58] Wu Haiping, Xiao Bin, Codella Noel, Liu Mengchen, Dai Xiyang, Yuan Lu, and Zhang Lei. 2021. Cvt: Introducing convolutions to vision transformers. arXiv preprint arXiv:2103.15808 (2021).Google ScholarGoogle Scholar
  59. [59] Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, and Huaxia Xia. 2021. Twins: Revisiting spatial attention design in vision transformers. Arxiv (2021).Google ScholarGoogle Scholar
  60. [60] Ying Zhenqiang, Li Ge, and Gao Wen. 2017. A bio-inspired multi-exposure fusion framework for low-light image enhancement. arXiv preprint arXiv:1711.00591 (2017).Google ScholarGoogle Scholar
  61. [61] Ying Zhenqiang, Li Ge, Ren Yurui, Wang Ronggang, and Wang Wenmin. 2017. A new low-light image enhancement algorithm using camera response model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 30153022.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Yuan Kun, Guo Shaopeng, Liu Ziwei, Zhou Aojun, Yu Fengwei, and Wu Wei. 2021. Incorporating convolution designs into visual transformers. arXiv preprint arXiv:2103.11816 (2021).Google ScholarGoogle Scholar
  63. [63] Zamir Syed Waqas, Arora Aditya, Khan Salman, Hayat Munawar, Khan Fahad Shahbaz, and Yang Ming-Hsuan. 2022. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 57285739.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Zeng Hui, Cai Jianrui, Li Lida, Cao Zisheng, and Zhang Lei. 2022. Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44. 20582073.Google ScholarGoogle Scholar
  65. [65] Zhang Kai, Zuo Wangmeng, Chen Yunjin, Meng Deyu, and Zhang Lei. 2017. Beyond a Gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing 26, 7 (2017), 31423155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. [66] Zhang Yulun, Li Kunpeng, Li Kai, Wang Lichen, Zhong Bineng, and Fu Yun. 2018. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 286301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Zhang Yulun, Li Kunpeng, Li Kai, Zhong Bineng, and Fu Yun. 2019. Residual non-local attention networks for image restoration. arXiv preprint arXiv:1903.10082 (2019).Google ScholarGoogle Scholar
  68. [68] Zhang Yulun, Tian Yapeng, Kong Yu, Zhong Bineng, and Fu Yun. 2020. Residual dense network for image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 7 (2020), 24802495.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Zhang Yonghua, Zhang Jiawan, and Guo Xiaojie. 2019. Kindling the darkness: A practical low-light image enhancer. In Proceedings of the 27th ACM International Conference on Multimedia. 16321640.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CUR Transformer: A Convolutional Unbiased Regional Transformer for Image Denoising

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Multimedia Computing, Communications, and Applications
          ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3
          May 2023
          514 pages
          ISSN:1551-6857
          EISSN:1551-6865
          DOI:10.1145/3582886
          • Editor:
          • Abdulmotaleb El Saddik
          Issue’s Table of Contents

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2023
          • Online AM: 11 October 2022
          • Accepted: 26 September 2022
          • Revised: 23 September 2022
          • Received: 13 January 2022
          Published in tomm Volume 19, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        View Full Text

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!